Homer White, Georgetown College

Always remember to make sure the necessary packages are loaded:

```
require(mosaic)
require(tigerstats)
```

- At Georgetown College, all students must acquire 48 “NEXUS” credits to graduate
- You get NEXUS credits by participating in “culturally enriching” events
- A certain NEXUS event in April drew 200 students
- The breakdown of attendees by class rank was as follows:

Class |
Fresh | Sop | Jun | Sen |
---|---|---|---|---|

Observed Count |
62 | 27 | 33 | 80 |

At the time, the distribution of class rank for GC was known to be:

Class |
Fresh | Sop | Jun | Sen |
---|---|---|---|---|

Percent |
30% | 25% | 25% | 20% |

…then one's chance of deciding to attend the event would not depend on class rank.

So in a group of 200 attendees we would expect to see about:

- \( 200 \times0.30=60 \) freshmen
- \( 200 \times 0.25 = 50 \) sophomores
- \( 200 \times 0.25 = 50 \) juniors
- \( 200 \times 0.20 = 40 \) seniors,

give a take some for chance variation in the process of deciding whether to attend.

Class |
Fresh | Sop | Jun | Sen |
---|---|---|---|---|

Observed Count |
62 | 27 | 33 | 80 |

Expected Count |
60 | 50 | 50 | 40 |

Does this data provide strong evidence that the attendees were NOT like a random sample from the GC population?

- This time we KNOW the distribution of the factor variable
**class rank**. The question is whether April NEXUS attendance is like random sampling.

\( H_0: \) Attendance at the event is like random sampling, as far as class rank is concerned

\( H_a: \) Decision to attend is based on class rank

Set up Null values and observed counts:

```
Nulls <- c(0.30,0.25,0.25,0.20)
obs <- c(fresh=62,soph=27,jun=33,sen=80)
```

The test:

```
chisqtestGC(obs,p=Nulls,verbose=FALSE)
```

```
Chi-Square Statistic = 55.8482
Degrees of Freedom of the table = 3
P-Value = 0
```

- \( P < 0.05 \), so we reject \( H_0 \).
- This data provided strong evidence that decision to attend in April is based on class rank.
- (Graduating seniors are scrambling for NEXUS credits!)

- You are asked to roll a fair die 6000 times, and turn in a tally of the results
- No way are you going to do this
- You don't know that R could do it in a flash:

```
rmultinom(n=1,size=6000,
prob=rep(1/6,6))
```

- So you decide to make up the results

How to fill in the table?

Spots | One | Two | Three | Four | Five | Six |
---|---|---|---|---|---|---|

Made-Up Count | ? | ? | ? | ? | ? | ? |

- Counts must add to 6000
- Each count should be \( \approx 1000 \)
- give or take some for fake chance error!

You decide on:

Spots | One | Two | Three | Four | Five | Six |
---|---|---|---|---|---|---|

Made-Up Count | 1003 | 998 | 999 | 1002 | 1001 | 997 |

- You turn in these “results”
- Instructor thinks: “The observed counts agree with expected counts, very, very well.”
Instructor wonders:

*Is it reasonable to believe that the observed data are the result of 6000 actual rolls of a fair die?*

The hypotheses are:

\( H_0: \) Data are the result of 6000 real rolls

\( H_a: \) Data are faked

```
chisqtestGC(c(1003,998,999,1002,1001,997),
p=rep(1/6,6),
verbose=FALSE)
```

```
Chi-Square Statistic = 0.028
Degrees of Freedom of the table = 5
P-Value = 1
```

- \( \chi^2 \)-statistic really is \( 0.028 \)
- but the \( P \)-value is not right. We need:

\[ P(\chi^2 \leq 0.028 \vert H_0 \text{ is true}) \]

This is \( 1-\text{the P-value reported in test output} \)

```
[1] 6.909e-06
```

- \( P \approx 0 \), so reject \( H_0 \).
- It seems the data was faked.