Homer White, Georgetown College

Always remember to make sure the necessary packages are loaded:

```
require(mosaic)
require(tigerstats)
```

Confidence intervals answer this question:

Given the data,

within what range of valuesmight the parameter reasonably lie?

Tests of significance answer this question:

Given the data, is it reasonable to believe that the parameter

is a particular given value?

```
data(m111survey)
View(m111survey)
help(m111survey)
```

Focus on variable **fastest**.

Define

\( \mu = \) mean fastest speed ever driven, for all GC students

A *one-sided* test:

\( H_0: \mu=100 \)

\( H_a: \mu >100 \)

\( \mu_0 \) is what the Null Hypothesis claims \( \mu \) is.

In our example,

\[ \mu_0=100. \]

Need to set

- argument
`mu`

to \( \mu_0 \) (100, this time) - argument
`alternative`

to “greater”

```
ttestGC(~fastest,data=m111survey,
mu=100,
alternative="greater")
```

```
variable mean sd n
fastest 105.9 20.88 71
```

We see:

- \( \bar{x}=105.9 \), higher than \( H_0 \) expected
- sample size \( n=71 \), big enough to trust
`ttestGC()`

```
95% Confidence Interval for mu:
lower upper
101.771329 Inf
```

- One-sided interval (goes with one-sided test)
- Hmm, interval does not contain \( \mu_0=100 \):
- looking bad for the Null!

```
Estimate of mu: 105.9
SE(x.bar): 2.478
```

Observe:

- Null expects \( \bar{x} \approx \mu_0=100 \), give or take some for chance error
- But we got \( \bar{x}=105.9 \)
- That more than two SEs above what \( H_0 \) expected:
- looking bad for \( H_0 \)!

```
Test Statistic: t = 2.382
```

The test statistic \( t \) is defined as:

\[ t=\frac{\bar{x}-\mu_0}{SE(\bar{x})}=\frac{\textbf{Observed Difference}}{\textbf{Standard Error}} \]

- “\( z \)-score style”
- says how many SEs \( \bar{x} \) is above or below what \( H_0 \) expected it to be!

```
Test Statistic: t = 2.382
```

This time it worked out to be:

\[ t=\frac{\bar{x}-\mu_0}{SE(\bar{x})}=\frac{105.9-100}{2.478}=2.382. \]

```
Degrees of Freedom: 70
P-value: P = 0.009974
```

- \( t \) is a random variable
- \( t \approx t(70) \)

\[ \textbf{P-value}=P(t \geq 2.382 \vert H_0 \text{ is true}) \approx 0.009974 \]

```
ptGC(bound=2.382,region="above",df=70,
graph=TRUE)
```

```
[1] 0.00997
```

```
ttestGC(~fastest,data=m111survey,
mu=100,
alternative="greater",
graph=TRUE)
```

\( P \approx 0.009974 \approx 0.01 = 1\% \), so we say:

“If the Null is correct, then there is only about 1% chance of getting a test statistic at least as big as the one we got in our study. ”

Either:

- \( H_0 \) is correct, and we got a rather unusual sample, or
- \( H_0 \) is not correct, and \( \mu > 100 \).

Second option seems more reasonable!

(We define the parameter of interest, and state our hypotheses.)

Let

\( \mu = \) mean fastest speed ever driven, for all GC students

Our hypotheses are:

\( H_0: \mu=100 \)

\( H_a: \mu >100 \)

We took a simple random sample from the population, and the sample size \( n \) was 71, which is large enough to trust the results of the test.

The test statistic was:

\[ t=2.382. \]

Our sample mean was 2.382 standard errors above what the Null Hypothesis expected it to be.

The P-value was 0.009974.

If the Null is correct, then there is only about 1% chance of getting a test statistic at least as big as the one we got in our study.

Since \( P < 0.05 \), we reject \( H_0 \).

This data provided strong evidence that the mean fastest speed driven at GC is more than 100 mph.

Research Question:The mean GPA of all UK students is known to be 3.3. Does the`mat111survey`

data provide strong evidence that the mean GPA of all GC students is lower?

Let

\( \mu \) = the mean GPA of all GC students.

Then our hypotheses are:

\( H_0 \): \( \mu=3.3 \)

\( H_a \): \( \mu < 3.3 \)

```
ttestGC(~GPA,data=m111survey,
mu=3.3,alternative="less",
graph=TRUE)
```

```
Test Statistic: t = -1.733
Degrees of Freedom: 69
P-value: P = 0.04382
```

This time,

\[ \textbf{P-value} = P(t \leq -1.733 \vert H_0 \text{ is true})=0.04382. \]

```
[1] 0.04378
```

If the Null is correct, then there is only about 4.38% chance of getting a test statistic less than or equal to the one we got in our study.

Let

\( \mu \) = the mean GPA of all GC students.

Then our hypotheses are:

\( H_0 \): \( \mu=3.3 \)

\( H_a \): \( \mu \neq 3.3 \)

```
ttestGC(~GPA,data=m111survey,
mu=3.3,alternative="two.sided",
graph=TRUE)
```

```
Test Statistic: t = -1.733
Degrees of Freedom: 69
P-value: P = 0.08763
```

*t* is just the same as for the one-sided test.

But this time, \( P \)-value is

\[ P(t \leq -1.733 \textbf{ or } t \geq 1.733 \vert H_0 \text{ is true}) \\ =0.08763, \]

twice as big as before!

```
[1] 0.08756
```

If the Null is correct, then there is about an 8.6% chance of getting a test statistic at least as far from 0 as the one we got in our study.

(Given our usual cut-off of \( \alpha=0.05 \), we would not reject \( H_0 \).)

- If you have
*prior*evidence that the \( \mu \) differs from \( \mu_0 \) in a particular direction, then you may set up a one-sided test for that direction. - If not, just perform a two-sided test.

Does the data provide strong evidence that GC males drive faster, on average, than GC females do?

Let

\( \mu_1 \) = the mean fastest speed of all GC females.

\( \mu_2 \) = the mean fastest speed of all GC males.

Hypotheses are:

\( H_0 \): \( \mu_1-\mu_2 = 0 \)

\( H_a \): \( \mu_1-\mu_2 \neq 0 \)

```
ttestGC(fastest~sex,data=m111survey,
mu=0,alternative="two.sided",
graph=TRUE)
```

Note:

- argument still
`mu`

(even though two means) - don't really need to set alternative=“two.sided”:
- two-sided is the default

```
group mean sd n
female 100.0 17.61 40
male 113.5 22.57 31
```

Both sample sizes are “big enough”, so safe to proceed.

```
Estimate of mu1-mu2: -13.4
SE(x1.bar - x2.bar): 4.918
```

\( \bar{x}_1-\bar{x}_2 \) is nearly three SEs below what \( H_0 \) expected it to be.

This looks bad for \( H_0 \).

The 95% confidence interval for \( \mu_1-\mu_2 \) is:

```
lower upper
-23.254640 -3.548586
```

This does not contain 0 (the Null's belief).

Again, looks bad for \( H_0 \).

```
Test Statistic: t = -2.725
```

The test statistic \( t \) is defined as:

\[ t=\frac{\bar{x}-\bar{x}_2}{SE(\bar{x}-\bar{x}_2)}=\frac{\textbf{Observed Difference}}{\textbf{Standard Error}} \]

- “\( z \)-score style”
- says how many SEs \( \bar{x}_1-\bar{x}_2 \) is above or below what \( H_0 \) expected it to be!

```
Test Statistic: t = -2.725
```

This time it worked out as:

\[ t=\frac{\bar{x}-\bar{x}_2}{SE(\bar{x}-\bar{x}_2)}=\frac{100-113.5}{4.918}=-2.725 \]

The observed difference between the sample means was 2.725 standard errors below what \( H_0 \) was expecting.

```
Degrees of Freedom: 55.49
P-value: P = 0.008579
```

- \( t \) is a random variable
- if \( H_0 \) is true, then \( t \approx t(55.49) \)

\[ \textbf{P-value}\\ =P(t \leq -2.725 \text{ or } t \geq 2.725\vert H_0 \text{ is true}) \\ \approx 0.008579 \]

```
[1] 0.008585
```

\( P < 0.05 \), so we will reject \( H_0 \).

This data provided strong evidence that GC males drive faster than GC females, on average.

What if you had defined:

\( \mu_1 \) = the mean fastest speed of all GC males.

\( \mu_2 \) = the mean fastest speed of all GC females.

For *you*, GC males are the “first” population.

Problem:

How do you get

`ttestGC()`

to recognize this?

Solution:

Set the

`first`

argument to “male”.

```
ttestGC(fastest~sex,data=m111survey,
mu=0,alternative="two.sided",
first="male",graph=TRUE)
```

```
group mean sd n
male 113.5 22.57 31
female 100.0 17.61 40
```

- Suppose the null is correct.
- Then getting a test statistic as big as the one we got is very unlikely.
- Therefore, it's not reasonable to believe the Null.

But if the \( P \)-value is not very small, then we can't make the move from Step 1 to Step 2.

- Suppose the null is correct.
- Then getting a test statistic as big as the one we got is not unlikely.
- Therefore … ??

- No definite conclusion can be drawn
- We don't have strong evidence against \( H_0 \)
- But we don't have evidence
*for*it, either.

The Null Hypothesis claims the parameter is a *specific value*, for example

\( H_0: \mu = 100 \)

- A low \( P \)-value provides evidence
*against*\( H_0 \)- but does not absolutely prove it is false

- A high \( P \)-value does NOT provide evidence
*for*\( H_0 \)- and it certainly does not prove that it is true!

```
data(labels)
View(labels)
help(labels)
```

Research Question:

Will people rate peanut butter more highly, on average, if it comes with a Jiff jar than if it comes in a Great Value jar?

Let

\( \mu_d = \) mean difference in rating (Jiff minus Great Value) for all Georgetown College students

The hypotheses are:

\( H_0: \mu_d = 0 \)

\( H_a: \mu_d \neq 0 \)

```
ttestGC(~jiffrating-greatvaluerating,
data=labels, mu=0)
```

- We hope that the 30 subjects were like a simple random sample from the population.
- Sample size \( n \) is 30. Let's go ahead and check the sample for:
- signs of strong skewness
- extreme outliers

```
diff <- labels$jiffrating - labels$greatvaluerating
favstats(diff)
```

```
min Q1 median Q3 max mean sd n missing
-5 1 2.5 4 8 2.367 2.81 30 0
```

```
histogram(~diff,
xlab="difference in ratings (jiff - gv)",
main="Jiff vs. Great Value",
type="count",
breaks=seq(from=-5.5,to=8.5,by=1))
```

- integer values (not continuous)
- some left-skewness
- maybe an outlier or two at smaller values

These could be a problem at smaller sample sizes, but probably OK here.

```
Estimate of mu-d: 2.367
SE(d.bar): 0.513
```

- \( H_0 \) expects \( \bar{d} \) to be about 0, give or take some for chance error
- We actually got \( \bar{d}=2.367 \).
- This is more than 4 SEs above what \( H_0 \) expected.
- Looks bad for \( H_0 \).

95%-confidence interval for \( \mu_d \) is:

```
lower upper
1.317442 3.415892
```

- Note that 0 (the Null's belief about \( \mu_d \)) is outside the interval
- Again: looking bad for \( H_0 \)!

```
Test Statistic: t = 4.613
```

\[ t=\frac{\bar{d}-0}{SE(\bar{d})}=\frac{\textbf{Observed Difference}}{\textbf{Standard Error}} \]

- “\( z \)-score style” (again!)
- says how many SEs \( \bar{d} \) is above or below what \( H_0 \) expected it to be!

```
Test Statistic: t = 4.613
```

\[ t=\frac{\bar{d}-0}{SE(\bar{d})}=\frac{2.387-0}{0.513}=4.613 \]

The sample mean of differences was 4.613 standard errors above what \( H_0 \) was expecting it to be!

```
Degrees of Freedom: 29
P-value: P = 7.421e-05
```

- \( t \) is a random variable
- if \( H_0 \) is true, then \( t \approx t(29) \)

\[ \textbf{P-value}\\ =P(t \leq -4.613 \text{ or } t \geq 4.613\vert H_0 \text{ is true}) \\ \approx 7.42 \times 10^{-5} \]

Quick Math Review:

- \( 10^{-5}=1/10^5 \)
- \( 10^5 = 100,000 \)
- So \( 10^{-5}=1/100,000 \), which is “one in 100,000”
- So \( 7.42 \times 10^{-5} \) is about 7 in 100,000

So we can say:

If labels make no difference, then there is only about a 7 in 100,000 chance of getting a test statistic that is at least as far from 0 as the one we actually got!

This looks really bad for \( H_0 \).

\( P < 0.05 \), so we will reject \( H_0 \).

This data provided strong evidence that, for GC students, jar-labels affect perception of the quality of peanut butter.

Sometimes you don't want quite so much output to the console. To limit output:

set argument

`verbose`

to`FALSE`

.

```
ttestGC(fastest~sex,data=m111survey,
mu=0,alternative="two.sided",
verbose=FALSE)
```