Data Driven Class Blog

Monday, July 11, 2016

College Research: Initial Findings

We used a t-test to compare the mean Total Federal Aid and the mean of the non-Federal Aid during 1990-1995. We also used a t-test to compare the mean Total Federal and the mean of the non-Federal Aid during 2010-2015. In comparing each of these means we were looking for the variance in the difference between these two main sources of aid during each time period and how this mean difference has changed from the beginning to end of our data set time interval.

Our results were:

1990-1995

data: ninetyFederal and ninetynonFederal

t = -5.7375, df = 5.6606, p-value = 0.001488

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-18925.763 -7493.037

sample estimates:

mean of x mean of y

17614.4 30823.8

2010-2015

data: TenFederal and TennonFederal

t = -11.471, df = 4.6212, p-value = 0.0001433

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-40681.45 -25481.75

sample estimates:

mean of x mean of y

70472.6 103554.2

This shows us that there is a significant difference in the means of federal and non-federal aid for both sets of years. Additionally, the 95% confidence interval is shown to be between much greater mean differences in 2010-2015, and a p-value of 2010-2015 approximately one tenth of 1990-1995. The p-values listed above describe a respective .1488% and .01433% chance of these mean differences being produced by random error assuming even amounts of aid from Federal and non-Federal sources and as the p-value decreases there would be a smaller chance of that difference being produced by random error. Overall, this data describes a trend of more non-Federal aid than Federal aid during each set of five years with this disparity increasing between the time periods.

Sunday, July 10, 2016

Initial Findings

Here is the data we used for our research:

State	Year	Population	Fatal motor vehicle crashes	Motor vehicle crash deaths	Rate
NH	2005	1309940	156	166	12.7
NH	2006	1314895	116	127	9.7
NH	2007	1315828	122	129	9.8
NH	2008	1315809	128	139	10.6
NH	2009	1324575	97	110	8.3
NH	2010	1316470	120	128	9.7
NH	2011	1318194	84	90	6.8
NH	2012	1320718	101	108	8.2
NH	2013	1323459	124	135	10.2
NH	2014	1326813	89	95	7.2
MA	2005	6398743	418	442	6.9
MA	2006	6437193	404	430	6.7
MA	2007	6449755	390	417	6.5
MA	2008	6497967	337	363	5.6
MA	2009	6593587	308	334	5.1
MA	2010	6547629	299	314	4.8
MA	2011	6587536	321	337	5.1
MA	2012	6646144	333	349	5.3
MA	2013	6692824	309	326	4.9
MA	2014	6745408	310	328	4.9

Using this data, we tried to prove that NH is a safer state to drive in the MA. In the t-tests performed, we hoped to see that fatal crashes and crash deaths were higher in MA than in NH.

Welch Two Sample t-test

data: car$Fatal.motor.vehicle.crashes[car$state == "NH"] and car$Fatal.motor.vehicle.crashes[car$state == "MA"]

t = -14.751, df = 13.021, p-value = 1.66e-09

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-262.761 -195.639

sample estimates:

mean of x mean of y

113.7 342.9

In our initial testing, we compared the fatal motor vehicle crashes from 2005-2014 in the states of NH and MA to determine if the means of the two states were actually significant. The result of the p-value gives a percentage of if the result of the data is generated by random chance alone. With a lower p-value, under 5%, it shows that there is a significant difference between the data. With the means of NH being much lower than the means of MA, with a low p-value, it shows that there are a lot more crashes each year in MA than in NH.

Welch Two Sample t-test

data: car$Motor.vehicle.crash.deaths[car$state == "NH"] and car$Motor.vehicle.crash.deaths[car$state == "MA"]

t = -14.497, df = 12.866, p-value = 2.399e-09

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-277.2982 -205.3018

sample estimates:

mean of x mean of y

122.7 364.0

With the second test being the crash deaths in NH and MA, with the t test showing a significant difference in means of 122.7 crash deaths in NH and 364 crash deaths in MA with a p-value of under 1%. This means the results are do not happen by random chance alone, showing there are much more crash deaths in MA than in NH each year.

Looking at the test results, it shows how MA has higher fatal crashes each year than in NH, meaning that in these cases, MA is worse to drive in than NH.

Interpreting Statistical Tests Gender/Alcohol Consumption (July 10)

July 10, 2016 Drug Research Update

Before running tests on the data, we were looking for a correlation of genders to drinking and smoking. After running a few statistical tests on a set of data provided from a government source, we concluded that there is a significant correlation between genders in terms of drinking and smoking cigarettes. The tests we ran are T.tests and Chi squared tests, both which show significant correlation between genders in terms of alcohol and drugs.

	Drank Alcohol	Haven't Drank Alcohol Before
Male	7,424	4,909
Female	7,518	4,558

data: drank.table

X-squared = 10.814, df = 1, p-value = 0.001007

The p-value is approximately 0.01, for this data. This means that there is a significant correlation between gender and patterns of alcohol consumption.

	Smoked Before	Haven't Smoked Before
Male	6,787	20,080
Female	6,914	22,521

data: smoked.table

X-squared = 23.869, df = 1, p-value = 1.031e-06

The p-value from this test was 1.031 * 10-6, indicating that there is a significant difference between genders in terms of prior alcohol consumption. The data show that females are significantly more likely than men to have consumed alcohol while underage.

The p-value is a measure of the significance of a given set of data. The p-value tells the researcher the probability of the data given his or her hypothesis. Given the hypothesis, our p-value shows the probability that our data shows correlation.