Homer White, Georgetown College

Always remember to make sure the necessary packages are loaded:

```
require(mosaic)
require(tigerstats)
```

You are not a god-like being. You are a mere mortal, a statistician, who has just taken a simple random sample of size \( n \) from a population.

The following is all true about you:

- You don't know population mean \( \mu \).
- You don't know population SD \( \sigma \).
- You DO know what \( \bar{x} \) is.
- You DO know that, BEFORE you took the sample, \( \bar{x} \) was liable to turn out to be within one an SD or so of the EV \( \mu \).

… that for the sample you have just taken, \( \bar{x} \) is an SD or so away from \( \mu \).

In other words:

The population mean \( \mu \) is liable to be within an SD or so of your sample mean \( \bar{x} \).

The only problem is that:

\[ SD(\bar{x})=\frac{\sigma}{\sqrt{n}} \]

- You know your sample size \( n \),
- but you do NOT know population SD \( \sigma \)!

You can *estimate* \( \sigma \) with \( s \), the SD of the *sample*.

This leads to the so-called *standard error* of \( \bar{x} \):

\[ SE(\bar{x})=\frac{s}{\sqrt{n}}. \]

- You take a sample of size \( n=36 \) from a population with unknown mean \( \mu \).
- From your sample, you compute \( \bar{x}=60 \).
- From your sample, you compute \( s=3 \).

You can then estimate:

\[ SD(\bar{x}) \approx SE(\bar{x}) = \frac{3}{\sqrt{36}}=0.5. \]

So you figure that \( \mu \) is liable to be within 0.5 or so of 60, your sample mean.

Your sample size was \( n=36 \ge 30 \), so \( n \) is “big enough”: \( \bar{x} \), as a random variable, was about normal!

Since there was about a 68% chance \( \bar{x} \) would be within one SD of \( \mu \), you can go out on a limb and say:

“I am about 68%-confident that \( \mu \) is within one SE of my \( \bar{x} \).”

In other words:

“I am about 68%-confident that \( \mu \) is between 59.5 and 60.5.”

If the probability distribution of an estimator is approximately normal, then

- we can be about 68%-confident that the parameter will be within one SE of the estimator
- we can be about 95%-confident that the parameter will be within two SE of the estimator
- we can be about 99.7%-confident that the parameter will be within three SEs of the estimator

Back to the previous example:

- You take a sample of size \( n=36 \) from a population with unknown mean \( \mu \).
- From your sample, you compute \( \bar{x}=60 \).
- From your sample, you compute \( s=3 \).
- You compute \( SE(\bar{x})=0.5 \).

You are about 68%-confident that

\[ \bar{x} - SE(\bar{x}) < \mu < \bar{x} + SE(\bar{x}), \\ 59.5 < \mu < 60.5 \]

This is a a rough 68%-confidence interval for \( \mu \).

You are about 95%-confident that

\[ \bar{x} - 2SE(\bar{x}) < \mu < \bar{x} + 2SE(\bar{x}), \\ 59.0 < \mu < 61.0 \]

This is a a rough 95%-confidence interval for \( \mu \).

You are about 99.7%-confident that

\[ \bar{x} - 3SE(\bar{x}) < \mu < \bar{x} + 3SE(\bar{x}), \\ 58.5 < \mu < 61.5 \]

This is a a rough 99.7%-confidence interval for \( \mu \).

Plug in estimates for population parameters, wherever they occur in the formula for the SD:

Estimator | SD | SE |
---|---|---|

\( \bar{x} \) | \( \frac{\sigma}{\sqrt{n}} \) | \( \frac{s}{\sqrt{n}} \) |

\( \bar{x}_1-\bar{x}_2 \) | \( \sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}} \) | \( \sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}} \) |

\( \hat{p} \) | \( \sqrt{\frac{p(1-p)}{n}} \) | \( \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \) |

\( \hat{p}_1-\hat{p}_2 \) | \( \sqrt{\frac{p_1(1-p_1)}{n_1}+\frac{p_2(1-p_2)}{n_2}} \) | \( \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1}+\frac{\hat{p}_2(1-\hat{p}_2)}{n_2}} \) |

\( \bar{d} \) | \( \frac{\sigma_d}{\sqrt{n}} \) | \( \frac{s_d}{\sqrt{n}} \) |

In the next chapter we will learn to make confidence intervals that are less “rough”.