Project Chi-Square

Problem I.  
The following data represents the starting gate positions and how the horses finished in terms of Win (first place), Place (second place), or Show (third place). The data (all races ran in 2008) was compiled at
  in Maywood, IL.

Gate

1

2

3

4

5

6

7

8 9
Win 215 200 181 147 138 80 64 35 47
Place 194 210 164 152 138 81 76 30 59
Show 141 177 159 151 151 119 96 48 66
Total Starts 1,105 1,105 1,105 1,105 1,103 1,095 1,067 799 419

A.  Show all four steps for each hypothesis test needed to test the claim.  At a 0.05 level
     of significance, test for the existence of this dependency between starting gate position and
     whether a horse finishes "win (first)" or "Not win (not first)".

Gate

1

2

3

4

5

6

7

8 9
Win 215 200 181 147 138 80 64 35 47
Not Win 890 905 924 958 965 1,015 1,003 764 372

Step 1:   H0:   Variables are independent 
               H1:    Variables are dependent, claim.

Step 2: 

Step 3:

Use MINITAB to determine the Contingency Table Chi-Square test statistic and its corresponding p-value for the Example

  1.  Enter the nine values from Row 1 into Row 1, Cols. 1 through 9.
  2.   Enter the nine values from Row 2 into Row 2, Cols. 1 through 9.
  3.  Click on  “Stat” at the top of the screen.
  4. 
Then select  “TABLES” from the menu that appears.
  5. 
Then select  “Chi-Square Test” from the menu that appears.
  6.  Highlight "C1" and select "Select" to place "C1" into "Columns containing the table".
  7.  Repeat this selection process for the remaining columns.
  8.  Select “OK”. 

Chi-Square Test: C1, C2, C3, C4, C5, C6, C7, C8, C9 
Expected counts are printed below observed counts
Chi-Square contributions are printed below expected counts

           C1      C2      C3      C4      C5      C6      C7      C8      C9
    1     215     200     181     147     138      80      64      35      47
       137.40  137.40  137.40  137.40  137.15  136.15  132.67   99.35   52.10
       43.833  28.525  13.838   0.671   0.005  23.159  35.544  41.678   0.499

    2     890     905     924     958     965    1015    1003     764     372
       967.60  967.60  967.60  967.60  965.85  958.85  934.33  699.65  366.90
        6.224   4.051   1.965   0.095   0.001   3.288   5.047   5.918   0.071

Total    1105    1105    1105    1105    1103    1095    1067     799     419
       Total
    1   1107
    2   7796
Total   8903

Chi-Sq = 214.413, DF = 8, P-Value = 0.000

 

Step 4:
Reject the Null Hypothesis.
The data supports the claim.  There appears to be dependency between starting position and winning.

B.  If a dependency exists between "winning (first)" and "starting position".  Find 95% confidence
      intervals for the percentage of winning for each gate.

Using the TI-83 to Calculate Confidence Intervals for Proportions.

1.  Press the "STAT" button.
2.  Use the right arrow to highlight "TESTS".
3.  Use the down arrow to select "A:1-PropZInt..." and press "ENTER".
4.  Enter 215 as the value for "x" and press "ENTER".
5.  Enter 1055 as the sample size, n, and press "ENTER".
6.  Enter 0.95 as the "C-Level" and press "ENTER".
7.  As "Calculate is highlighted, press "ENTER".
     The 95% confidence interval (1-PropZInt) is (.17123, .21791).
     Repeat for the remaining gates.

 

Gate 95% confidence interval 
1 0.17123 < P < 0.21791
2 0.15829 < P < 0.20370
3 0.14198 < P < 0.18562
4 0.11301 < P < 0.15306
5 0.10559 < P < 0.14464
6 0.05765 < P < 0.08847
7 0.04573 < P < 0.07423
8 0.02961 < P < 0.05800
9 0.08236 < P < 0.14306

C.  Which starting position is most probable to produce a winner?
     
Hint:  Which intervals overlap the interval with the largest lower limit?  Gates 1, 2 and 3.

D.   Show all four steps for each hypothesis test needed to test the claim.  At a 0.05 level
       of significance, test for the existence of a dependency between starting gate position and
       whether a horse finishes "in the money" (win, place or show" or "out of the money".)

Gate

1

2

3

4

5

6

7

8 9
In the Money 550 587 504 450 427 280 236 113 172
Not In the Money 555 518 601 655 676 815 831 686 247

Step 1:   H0:   Variables are independent 
               H1:    Variables are dependent, claim.

Step 2: 

 

Step 3:  Using the MINITAB instructions above:

Chi-Square Test: C1, C2, C3, C4, C5, C6, C7, C8, C9 
Expected counts are printed below observed counts
Chi-Square contributions are printed below expected counts

           C1      C2      C3      C4      C5      C6      C7       C8      C9
    1     550     587     504     450     427     280     236      113     172
       411.94  411.94  411.94  411.94  411.19  408.21  397.77   297.86  156.20
       46.271  74.395  20.574   3.517   0.608  40.269  65.793  114.732   1.598

     2     555     518     601     655     676     815     831      686     247
       693.06  693.06  693.06  693.06  691.81  686.79  669.23   501.14  262.80
       27.502  44.219  12.229   2.090   0.361  23.935  39.106   68.194   0.950
Total    1105    1105    1105    1105    1103    1095    1067      799     419

       Total
    1   3319
    2   5584
Total   8903


Chi-Sq = 586.341, DF = 8,
P-Value = 0.000

Step 4:
Reject the Null Hypothesis.
The data supports the claim.  There appears to be dependency between starting position and finishing
"in the money".

E.  If a dependency exists between "finish in the money (win-place-show)" and "starting position". 
     Find a 95% confidence intervals for the percentage of "finishing in the money" for each gate.

Gate 95% confidence interval 
1 0.46826 < P < 0.52722
2 0.50108 < P < 0.56064
3 0.42674 < P < 0.48548
4 0.37827 < P < 0.43621
5 0.35838 < P < 0.41587
6 0.22987 < P < 0.28155
7 0.19628 < P < 0.24608
8 0.11726 < P< 0.16559
9 0.36340 < P < 0.45760

F.  Which starting position is most probable to produce a horse "finishing in the money"?
      Hint:  Which intervals overlap the interval with the largest lower limit?   Gates 1 and 2

Problem II.  
The following data represents the starting gate positions and how the horses finished in terms of Win (first place), Place (second place), or Show (third place). The data (all races ran in 2008) was compiled at
  in Crete, IL.

Gate

1

2

3

4

5

6

7

8 9 10
Win 263 275 244 241 297 252 166 117 80 24
Place 288 278 223 253 281 243 163 124 75 24
Show 267 248 293 208 245 232 185 155 99 32
Total Starts 1,955 1,955 1,955 1,955 1,951 1,931 1,744 1,393 915 507

A.  Show all four steps for each hypothesis test needed to test the claim.  At a 0.05 level
     of significance, test for the existence of this dependency between starting gate  position and
     whether a horse finishes "win (first)" or "Not win (not first)".

Gate

1

2

3

4

5

6

7

8 9 10
Win 263 275 244 241 297 252 166 117 80 24
Not Win 1,692 1,680 1,711 1,714 1,654 1,679 1,578 1,276 835 483

Step 1:   H0:   Variables are independent 
               H1:    Variables are dependent, claim.

Step 2: 

 

Step 3:  Using the MINITAB instructions above: 

Chi-Square Test: C1, C2, C3, C4, C5, C6, C7, C8, C9, C10, C11, C12 
Expected counts are printed below observed counts
Chi-Square contributions are printed below expected counts

             C1       C2       C3       C4       C5       C6       C7       C8
    1      263      275      244      241      297      252      166      117
        235.44   235.44   235.44   235.44   234.95   232.55   210.03   167.76
         3.227    6.648    0.311    0.131   16.384    1.627    9.229   15.357

     2     1692     1680     1711     1714     1654     1679     1578     1276
         1719.56  1719.56  1719.56  1719.56  1716.05  1698.45  1533.97  1225.24
          0.442    0.910    0.043    0.018    2.243    0.223    1.264    2.103
 Total     1955     1955     1955     1955     1951     1931     1744     1393

           C9     C10    C11    C12  Total
    1      80      24      0      0   1959
       110.19   61.06   0.48   0.24
        8.272  22.491  0.482  0.241

     2     835     483      4      2  14308
       804.81  445.94   3.52   1.76
        1.133   3.079  0.066  0.033
Total     915     507      4      2  16267

 Chi-Sq = 95.957, DF = 11

WARNING: 2 cells with expected counts less than 1. Chi-Square approximation
         probably invalid. 
4 cells with expected counts less than 5.

Combine gates 10, 11 and 12 to get an expected value of at least 5 in all cells.

Chi-Square Test: C1, C2, C3, C4, C5, C6, C7, C8, C9, C10
Expected counts are printed below observed counts
Chi-Square contributions are printed below expected counts

            C1       C2       C3       C4       C5       C6       C7       C8
    1      263      275      244      241      297      252      166      117
        235.44   235.44   235.44   235.44   234.95   232.55   210.03   167.76
         3.227    6.648    0.311    0.131   16.384    1.627    9.229   15.357

    2     1692     1680     1711     1714     1654     1679     1578     1276
       1719.56  1719.56  1719.56  1719.56  1716.05  1698.45  1533.97  1225.24
         0.442    0.910    0.043    0.018    2.243    0.223    1.264    2.103
Total     1955     1955     1955     1955     1951     1931     1744     1393

           C9     C10  Total
    1      80      24   1959
       110.19   61.78
        8.272  23.103

    2     835     489  14308
       804.81  451.22
        1.133   3.163
Total     915     513  16267

Chi-Sq = 95.832, DF = 9, P-Value = 0.000

Step 4:
Reject the Null Hypothesis.
The data supports the claim.  There appears to be dependency between starting position and winning.

B.  If a dependency exists between "winning (first)" and "starting position".  Find 95% confidence
     intervals for the percentage of winning for each gate.
     Hint:  See lesson on confidence intervals for proportions .

Gate 95% confidence interval 
1 0.11940 < P < 0.14965
2 0.12525 < P < 0.15608
3 0.11016 < P < 0.13946
4 0.10870 < P < 0.13785
5 0.13692 < P < 0.16817
6 0.11548 < P < 0.14553
7 0.08141 < P < 0.10896
8 0.06943 < P < 0.09856
9 0.06913 < P < 0.10573
10 0.03371 < P < 0.07280

C.  Which starting position is most probable to produce a winner?
      Hint:  Which intervals overlap the interval with the largest lower limit? 
      Gates 1,2, 3, 4, 5 and 6.

D.   Show all four steps for each hypothesis test needed to test the claim.  At a 0.05 level
      of significance, test for the existence of a dependency between starting gate position and
      whether a horse finishes "in the money" (win, place or show" or "out of the money".)

Gate

1

2

3

4

5

6

7

8 9 10
In the Money 818 801 760 702 823 727 514 396 254 80
Not In the Money 1,137 1,154 1,195 1,253 1,128 1,204 1,230 997 663 427

Step 1:   H0:   Variables are independent 
               H1:    Variables are dependent, claim.

Step 2: 

 

Step 3:  Using the MINITAB instructions above:  

Chi-Square Test: C1, C2, C3, C4, C5, C6, C7, C8, C9, C10, C11, C12 
Expected counts are printed below observed counts
Chi-Square contributions are printed below expected counts

            C1       C2       C3       C4       C5       C6       C7      C8
    1      818      801      760      702      823      727      514     396
        706.21   706.21   706.21   706.21   704.77   697.54   629.99  503.20
        17.695   12.722    4.097    0.025   19.835    1.244   21.356  22.837

     2     1137     1154     1195     1253     1128     1204     1230     997
       1248.79  1248.79  1248.79  1248.79  1246.23  1233.46  1114.01  889.80
        10.007    7.195    2.317    0.014   11.217    0.703   12.077  12.915
Total     1955     1955     1955     1955     1951     1931     1744    1393

            C9     C10    C11    C12  Total
    1     254      80      2      1   5878
       331.25  183.15   2.17   1.08
       18.016  58.091  0.013  0.006

     2     663     427      4      2  10394
       585.75  323.85   3.83   1.92
       10.188  32.851  0.007  0.004
 Total     917     507      6      3  16272

 Chi-Sq = 275.433, DF = 11, P-Value = 0.000
 4 cells with expected counts less than 5.

Combine gates 10, 11 and 12 to get an expected value of at least 5 in all cells.

Chi-Square Test: C1, C2, C3, C4, C5, C6, C7, C8, C9, C10 
Expected counts are printed below observed counts
Chi-Square contributions are printed below expected counts

            C1       C2       C3       C4       C5       C6       C7      C8
    1      818      801      760      702      823      727      514     396
        706.21   706.21   706.21   706.21   704.77   697.54   629.99  503.20
        17.695   12.722    4.097    0.025   19.835    1.244   21.356  22.837

     2     1137     1154     1195     1253     1128     1204     1230     997
       1248.79  1248.79  1248.79  1248.79  1246.23  1233.46  1114.01  889.80
        10.007    7.195    2.317    0.014   11.217    0.703   12.077  12.915
 
Total     1955     1955     1955     1955     1951     1931     1744    1393

            C9     C10  Total
    1     254      83   5878
       331.25  186.40
       18.016  57.356

     2     663     433  10394
       585.75  329.60
       10.188  32.436
Total     917     516  16272

Chi-Sq = 274.252, DF = 9, P-Value = 0.000

Step 4:
Reject the Null Hypothesis.
The data supports the claim.  There appears to be dependency between starting position and finishing
"in the money".

E.  If a dependency exists between "finish in the money (win-place-show)" and "starting position". 
      Find a 95% confidence intervals for the percentage of "finishing in the money" for each gate.

Gate 95% confidence interval 
1 0.39655 < P < 0.44028
2 0.38792 < P < 0.43152
3 0.36714 < P < 0.41036
4 0.33781 < P < 0.38034
5 0.39992 < P < 0.44375
6 0.35488 < P < 0.39810
7 0.27333 < P < 0.31612
8 0.26059 < P < 0.30797
9 0.24858 < P < 0.30661
10 0.12606 < P < 0.18952

F.  Which starting position is most probable to produce a horse "finishing in the money"?
      Hint:  Which intervals overlap the interval with the largest lower limit?   Gates 1, 2, 3,  5, and 6.

Click on the hand to return to the Bus. Stat. projects.