Homer White, Georgetown College

Always remember to make sure the necessary packages are loaded:

```
require(mosaic)
require(tigerstats)
```

Are a majority of Georgetown College Students female?

We'll investigate this question inferentially with the `m111survey`

data.

Let

\( p = \) proportion of all GC students who are female

Our hypotheses are:

\( H_0: p=0.50 \)

\( H_a: p > 0.50 \) (one-sided, for variety!)

Note: 0.50 is \( p_0 \), the Null's belief about \( p \).

```
proptestGC(~sex,data=m111survey,
success="female",p=0.50,
alternative="greater",
graph=TRUE)
```

Note: We set the argument `p`

to \( p_0 \), the Null's belief about \( p \).

```
female n estimated.prop
40 71 0.5634
```

- 40 successes \( \geq 10 \)
- 31 failures \( \geq 10 \)
- enough to trust the results of
`proptestGC()`

Hopefully this was also a simple random sample.

So, we are safe to proceed.

```
Estimate of p: 0.5634
SE(p.hat): 0.05886
```

- Null expects \( \hat{p} \) to be about 0.50
- We actually got \( \hat{p} = 0.5634 \)
- This is 0.0634 more than what Null expected
- Not much more than one SE above what Null expected
- Our results don't seem so surprising, if \( H_0 \) is correct!

```
Test Statistic: z = 0.9571
```

\[ z \approx \frac{\hat{p}-p_0}{SE(\hat{p})}=\frac{\textbf{Observed Difference}}{\textbf{Standard Error}} \]

- “\( z \)-score style” (yet again!)
- Tell you how many SEs \( \hat{p} \) is above or below what \( H_0 \) expected
- Well,
*about*how many (“Continuity correction”)

```
P-value: P = 0.1692
```

- \( z \) is a random variable
- \( z \approx norm(0,1) \)

\( P \)-value is:

\[ P(z \geq 0.9571 \vert H_0 \text{ is true}) \]

```
[1] 0.1692
```

If 50% of all GC students are female, then there is about a 16.92% chance of getting a test statistic at least as far from 0 as the one we actually got.

Our results are not so unlikely, if the Null is correct!

Since \( P > 0.05 \), we do not reject \( H_0 \).

This data did not provide strong evidence that a majority of Georgetown College students are female.

- Let \( X = \) count of successes.
- Then if \( H_0 \) is true, then \( X \sim binom(n,p_0) \)
- When \( n \) is “big enough”, \( X \approx \) normal
- So \( \hat{p}=X/n \) is also \( \approx \) normal

So then

\[ z \approx \frac{\hat{p}-p_0}{SE(\hat{p})} \]

is \( \approx norm(0,1) \). We get a \( P \)-value from this.

To “catch” all of the rectangle centered at 40, `proptestGC()`

says there were 39.5 successes.

Then it uses

\[ 39.5/71 \]

instead of

\[ 40/71 \]

for \( \hat{p} \).

Test statistic would be:

\[ z=\frac{40/71-0.5}{SE(\hat{p})}=1.0768 \]

\( P \)-value would be:

\[ P(z \geq 1.0768 \vert H_0 \text{ is true})=0.1408 \]

Test statistic is:

\[ z=\frac{39.5/71-0.5}{SE(\hat{p})}=0.9571 \]

\( P \)-value would be:

\[ P(z \geq 0.9571 \vert H_0 \text{ is true})=0.1692 \]

- Without correction, \( P \)-value \( \approx 14.08\% \)
- With correction, \( P \)-value \( \approx 16.92\% \)
- The difference is nearly 3%!

The correction is important at smaller sample sizes.

(For very large \( n \), it does not make much difference.)

- Let \( X = \) count of successes.
- If \( H_0 \) is true, then \( X \sim binom(n,p_0) \),
*exactly*!

So why not base the \( P \)-value directly on the \( binom(n,p_0) \) distribution?

Are a majority of Georgetown College students female?

Let

\( p = \) proportion of all GC students who are female

Our hypotheses are:

\( H_0: p=0.50 \)

\( H_a: p > 0.50 \)

```
binomtestGC(~sex,data=m111survey,
success="female",
p=0.5,alternative="greater")
```

- \( \hat{p}=40/71=0.5634 \)
- \( SE(\hat{p})=0.0589 \)
- If Null is correct, number of successes \( X \sim binom(71,0.5) \)
- No test statistic given!
- (But you can think of the 40 successes as the test statistic)
- \( P \)-value is \( P(X \geq 40 \vert p=0.5) \)

```
[1] 0.1712
```

- \( P = 0.1712 \)
- If only 50% of all GC students are female, then there is about a 17.12% chance of getting 40 or more females in a sample of 71, as we did in this study.
- Our results are not so unlikely, if \( H_0 \) is true.
- Since \( P > 0.05 \), we do not reject \( H_0 \).
- This study did not provide strong evidence that a majority of GC students are female.

A random sample of 2500 registered voters shows that 1325 of them favor the Affordable Care Act.

Do a majority of all registered voters in the U.S. favor the Act?

Let

\( p = \) proportion of all registered voters in the U.S. who favor the Act

Our hypotheses are:

\( H_0: p=0.50 \)

\( H_a: p \neq 0.50 \) (two-sided)

We have summary data.

- argument
`x`

is set to the number of successes - argument
`n`

is set to the sample size

```
binomtestGC(x=1325,n=2500,p=0.50)
```

- We had 1325 voters approve of the Act (this is like a test statistic).
- \( \hat{p}=1325/2500=0.53 \)
- \( SE(\hat{p}) \approx 0.01 \)
- Our results are about 3 SE's above what \( H_0 \) expected.
- \( P = 0.0029 \)
- Since \( P < 0.05 \), we reject \( H_0 \).
- We have strong evidence that a majority of registered voters in the U.S. favor the Act.

```
xtabs(~cappun,data=gss02)
```

```
cappun
Favor Oppose
899 409
```

899 out of 1308 (68.7%) approved of the death penalty.

```
xtabs(~cappun,data=gss2012)
```

```
cappun
FAVOR OPPOSE
1183 641
```

1183 out of 1824 (64.8%) approved.

In the U.S. population, did support for capital punishment decline between 2002 and 2012?

**Notice**: This is a question about the relationship between two factor variables:

- Explanatory variable is the
**year under study**(2002,2012). - Response variable is
**opinion about capital punishment**(favor, oppose). - Both variables have two values,
- so we are interested in \( p_1-p_2 \).

Let

\( p_1 \) = the proportion of all U.S. adults in 2002 who support capital punishment

\( p_2 \) = the proportion of all U.S. adults in 2012 who support capital punishment

Then our hypotheses are:

\( H_0 \): \( p_1 - p_2 = 0 \)

\( H_a \): \( p_1 - p_2 \neq 0 \)

We have summary data:

Year | \( x \) | \( n \) |
---|---|---|

year 2002 | 899 | 1308 |

year 2012 | 1183 | 1824 |

```
proptestGC(x=c(899,1183),
n=c(1308,1824),
p=0)
```

```
Estimate of p1-p2: 0.03873
SE(p1.hat - p2.hat): 0.01701
```

- \( H_0 \) expected \( \hat{p}_1-\hat{p}_2 \) to be about 0
- We actually got \( \hat{p}_1-\hat{p}_2 = 0.03873 \)
- This is a bit more than two SEs above what \( H_0 \) expected

95%-confidence interval for \( p_1-p_2 \) is

```
lower upper
0.005399 0.072069
```

Observe that 0 (Null's belief about \( p_1-p_2 \)) is below this interval.

```
Test Statistic: z = 2.277
```

\[ z=\frac{\hat{p}_1-\hat{p}_2}{SE(\hat{p}_1-\hat{p}_2)}=\frac{\textbf{Observed Difference}}{\textbf{Standard Error}} \]

- “\( z \)-score style” (yet again!)
- Tell you how many SEs \( \hat{p}_1-\hat{p}_2 \) is above or below what \( H_0 \) expected

- As a random variable \( \hat{p}_1-\hat{p}_2 \approx \) normal
- If \( H_0 \) is right, then \( EV(\hat{p}_1-\hat{p}_2)=0 \)
- Therefore:

\[ z=\frac{\hat{p}_1-\hat{p}_2}{SE(\hat{p}_1-\hat{p}_2)} \approx \frac{\hat{p}_1-\hat{p}_2-0}{SD(\hat{p}_1-\hat{p}_2)} \approx norm(0,1) \]

We use this approximation for the P-Value.

```
P-value: P = 0.0228
```

If support for the death penalty remained constant from 2002 to 2012, then there is only about a 2.28% chance of getting a test statistic at least as far from 0 as the one we actually got.

- Since \( P < 0.05 \), we will reject \( H_0 \).
- The GSS data provided strong evidence that support for the death penalty declined a bit between 2002 and 2012.

At Georgetown College, who is more likely to believe in love at first sight: a female or a male?

```
SexLove <- xtabs(~sex+love_first,
data=m111survey)
SexLove
```

```
love_first
sex no yes
female 22 18
male 23 8
```

```
rowPerc(SexLove)
```

```
no yes Total
female 55.00 45.00 100
male 74.19 25.81 100
```

Let

\( p_1 \) = the proportion of all GC females who believe in love at first sight

\( p_2 \) = the proportion of all GC males who believe in love at first sight

The our hypotheses are:

\( H_0 \): \( p_1 - p_2 = 0 \)

\( H_a \): \( p_1 - p_2 \neq 0 \)

```
proptestGC(~sex+love_first,data=m111survey,
success="yes",p=0)
```

Note same old formula for studying relationship between two factor variables:

\[ \sim Explanatory + Response \]

```
## yes n estimated.prop
## female 18 40 0.4500
## male 8 31 0.2581
```

Notice:

- fewer than 10 successes for the males.
`proptestGC()`

delivers a warning

We are dealing with two factor variables, so could use:

```
chisqtestGC(~sex+love_first,data=m111survey,
simulate.p.value="random",
B=3000,graph=TRUE)
```

- this gives us a test of significance
- but (sadly) no confidence interval for \( p_1-p_2 \)

If you want to be sure that a certain group comes first, then use the argument `first`

:

For example:

```
proptestGC(~sex+love_first,data=m111survey,
first="male",
success="yes",p=0,
simulate.p.value="random",B=3000,
graph=TRUE)
```

Just set `verbose`

to `FALSE`

:

```
proptestGC(~sex+love_first,data=m111survey,
first="male",
success="yes",p=0,
simulate.p.value="random",B=3000,
verbose=FALSE)
```