Monday, July 11, 2016

College Research: Initial Findings


We used a t-test to compare the mean Total Federal Aid and the mean of the non-Federal Aid during 1990-1995. We also used a t-test to compare the mean Total Federal and the mean of the non-Federal Aid during 2010-2015. In comparing each of these means we were looking for the variance in the difference between these two main sources of aid during each time period and how this mean difference has changed from the beginning to end of our data set time interval.

Our results were:

1990-1995
data:  ninetyFederal and ninetynonFederal
t = -5.7375, df = 5.6606, p-value = 0.001488
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-18925.763  -7493.037
sample estimates:
mean of x mean of y
 17614.4   30823.8

2010-2015

data:  TenFederal and TennonFederal
t = -11.471, df = 4.6212, p-value = 0.0001433
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-40681.45 -25481.75
sample estimates:
mean of x mean of y
 70472.6  103554.2

This shows us that there is a significant difference in the means of federal and non-federal aid for both sets of years. Additionally, the 95% confidence interval is shown to be between much greater mean differences in 2010-2015, and a p-value of 2010-2015 approximately one tenth of 1990-1995. The p-values listed above describe a respective .1488% and .01433% chance of these mean differences being produced by random error assuming even amounts of aid from Federal and non-Federal sources and as the p-value decreases there would be a smaller chance of that difference being produced by random error. Overall, this data describes a trend of more non-Federal aid than Federal aid during each set of five years with this disparity increasing between the time periods.

Sunday, July 10, 2016

Initial Findings

Here is the data we used for our research:    

State
Year
Population
Fatal motor vehicle crashes
Motor vehicle crash deaths
Rate
NH
2005
1309940
156
166
12.7
NH
2006
1314895
116
127
9.7
NH
2007
1315828
122
129
9.8
NH
2008
1315809
128
139
10.6
NH
2009
1324575
97
110
8.3
NH
2010
1316470
120
128
9.7
NH
2011
1318194
84
90
6.8
NH
2012
1320718
101
108
8.2
NH
2013
1323459
124
135
10.2
NH
2014
1326813
89
95
7.2
MA
2005
6398743
418
442
6.9
MA
2006
6437193
404
430
6.7
MA
2007
6449755
390
417
6.5
MA
2008
6497967
337
363
5.6
MA
2009
6593587
308
334
5.1
MA
2010
6547629
299
314
4.8
MA
2011
6587536
321
337
5.1
MA
2012
6646144
333
349
5.3
MA
2013
6692824
309
326
4.9
MA
2014
6745408
310
328
4.9

Using this data, we tried to prove that NH is a safer state to drive in the MA. In the t-tests performed, we hoped to see that fatal crashes and crash deaths were higher in MA than in NH.

    Welch Two Sample t-test
data:  car$Fatal.motor.vehicle.crashes[car$state == "NH"] and car$Fatal.motor.vehicle.crashes[car$state == "MA"]
t = -14.751, df = 13.021, p-value = 1.66e-09
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-262.761 -195.639
sample estimates:
mean of x mean of y
   113.7     342.9

In our initial testing, we compared the fatal motor vehicle crashes from 2005-2014 in the states of NH and MA to determine if the means of the two states were actually significant. The result of the p-value gives a percentage of if the result of the data is generated by random chance alone. With a lower p-value, under 5%, it shows that there is a significant difference between the data. With the means of NH being much lower than the means of MA, with a low p-value, it shows that there are a lot more crashes each year in MA than in NH.
   
Welch Two Sample t-test
data:  car$Motor.vehicle.crash.deaths[car$state == "NH"] and car$Motor.vehicle.crash.deaths[car$state == "MA"]
t = -14.497, df = 12.866, p-value = 2.399e-09
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-277.2982 -205.3018
sample estimates:
mean of x mean of y
   122.7     364.0

With the second test being the crash deaths in NH and MA, with the t test showing a significant difference in means of 122.7 crash deaths in NH and 364 crash deaths in MA with a p-value of under 1%. This means the results are do not happen by random chance alone, showing there are much more crash deaths in MA than in NH each year.
    Looking at the test results, it shows how MA has higher fatal crashes each year than in NH, meaning that in these cases, MA is worse to drive in than NH.

Interpreting Statistical Tests Gender/Alcohol Consumption (July 10)

July 10, 2016   Drug Research Update
Before running tests on the data, we were looking for a correlation of genders to drinking and smoking. After running a few statistical tests on a set of data provided from a government source, we concluded that there is a significant correlation between genders in terms of drinking and smoking cigarettes. The tests we ran are T.tests and Chi squared tests, both which show significant correlation between genders in terms of alcohol and drugs.  



Drank Alcohol
Haven't Drank Alcohol Before
Male
7,424
4,909
Female
7,518
4,558

data:  drank.table
X-squared = 10.814, df = 1, p-value = 0.001007

The p-value is approximately 0.01, for this data. This means that there is a significant correlation between gender and patterns of alcohol consumption.



Smoked Before
Haven't Smoked Before
Male
6,787
20,080
Female
6,914
22,521

data:  smoked.table
X-squared = 23.869, df = 1, p-value = 1.031e-06

The p-value from this test was 1.031 * 10-6, indicating that there is a significant difference between genders in terms of prior alcohol consumption. The data show that females are significantly more likely than men to have consumed alcohol while underage.

The p-value is a measure of the significance of a given set of data. The p-value tells the researcher the probability of the data given his or her hypothesis. Given the hypothesis, our p-value shows the probability that our data shows correlation.