Homer White, Georgetown College
Always remember to make sure the necessary packages are loaded:
Remember that our ideal is:
the treatment groups should be as similar as possible with respect to every variable (except \( X \)) that might affect \( Y \).
Interestingly, we always use chance, at least in part, in order to approximate this ideal.
In a completely randomized design, subjects are divided into the available treatment groups solely by chance.
If a treatment group is to have size \( m \), then every set of \( m \) subjects has the same chance to be that treatment group.
For an experiment on whether aspirin reduces risk of heart disease, suppose that the first 200 members of
imagpop have agreed to participate:
AspHeartSubs <- imagpop[1:200,]
|Aspirin Group||Aspirin Pills||100 subjects|
|Placebo Group||Fake Pills||100 subjects|
set.seed(12345) Assignment <- RandomExp(AspHeartSubs, sizes=c(100,100), groups=c("placebo","aspirin"))
sex math income treat.grp 5 male no 15500 placebo 6 male no 49800 placebo
sex math income treat.grp 200 female no 88000 placebo 1 female no 40900 aspirin
sex math income treat.grp 198 male no 13700 aspirin 199 female no 52700 aspirin
… with respect to sex?
treat.grp sex placebo aspirin female 51 56 male 49 44
female male Total placebo 51 49 100 aspirin 56 44 100
… with respect to income?
.group mean sd 1 placebo 39215 26995 2 aspirin 39486 29352
An experiment is said to involve replication if each the treatment group contains more than one subject.
The more replication, the better!
There are two reasons why replication is good.
The more subjects you have, the more likely it is that the randomization produces treatment groups that are similar to each other.
(And the more similar they are likely to be!)
This has to do with our notion of causality.
Roughly we say that \( X \) causes \( Y \) if, there exist two different values \( x_1 \) and \( x_2 \) of \( X \) so that whenever two people A and B are alike in every way except that
the distribution of \( Y \) for A is different from the distribution of \( Y \) for B.
We would say that asprin causes reduced risk of heart disease IF we could show that:
if two people are alike in every way except that one takes aspirin and the other does not, then the one who takes asprin has a less chance of getting heart disease.
… we settle for two groups, instead. If the number of subjects is large and we randomize, then
If these estimated chances differ enough, then we can ascribe the difference to the asprin.
What do you do if you can't have much replication?
data(SmallExp) View(SmallExp) help(SmallExp)
16 subjects for experiment to compare two weight-lifting programs:
Try complete randomization several times:
Do the groups always look similar?
Step One: Create Blocks (groups of very similar subjects):
Step two: In each block, randomly assign subjects to the two groups:
set.seed(33192) RandomExp(SmallExp,sizes=c(8,8), groups=c("Program.A","Program.B"), block=c("sex","athlete"))
Then at least sex and athelete won't be confounding factors!
data(saltmarsh) View(saltmarsh) help(saltmarsh)
salt biomass block 1 10 11.8 Field1 2 15 21.3 Field1 3 20 8.8 Field1 4 25 10.4 Field1 5 30 2.2 Field1 6 35 8.4 Field1 7 10 15.1 Field2 8 15 22.3 Field2 9 20 8.1 Field2 10 25 8.5 Field2
Research Question: Does the level of salt in soil affect how much plant life grows there?
You can tell blocking was done:
block salt Field1 Field2 Field3 Field4 10 1 1 1 1 15 1 1 1 1 20 1 1 1 1 25 1 1 1 1 30 1 1 1 1 35 1 1 1 1
Suppose you plan two treatment groups.
A matched pairs design is an extreme form of blocking in which each block contains exactly two subjects.
Research Questions: Which shoe sole (A or B) wears out more quickly?
In a repeated measures design, each subject is measured two or more times under different conditions.
data(labels) View(labels) help(labels)
Compute difference between ratings, for each subject:
diff <- labels$jiffrating - labels$greatvaluerating
Numerical look at the differences:
min Q1 median Q3 max mean sd n -5 1 2.5 4 8 2.37 2.81 30
If one group in an experiment is not treated in any special way, or is present for comparative purposes, then this group is often called the control group.
Example: In the aspirin-and-heart-disease experiment, the Control Group was the group that took the fake pill.
(Experiment is to see if the aspirin works better than taking nothing at all.)
If the subjects in an experiment do not know which treatment group they are in, then the experiment is said to be single-blinded.
If the people who measure the response variable do not know which groups the subjects are in, then the experiment is also said to be single-blinded.
A placebo is an inert substance given to subjects in the control group.
The placebo resembles the substances given to subjects in the other treatment groups.
Purpose: to make the experiment single-blinded.
If neither the subjects nor the people responsible for measuring the response variable know the group assignments of the subjects, then the experiment is said to be double-blinded.
A double-dummy design is a procedure to blind an experiment, even when the treatments don't resemble each other at all.
Research Question: Which type of smoking-cessation treatment—nicotine gum or nicotine patch—is more effective in helping people quit smoking?
Suppose 50 smokers agree to participate.
Problem: subject KNOWS whether he/she is getting gum or patch.
In neither group would subjects be able to tell which group – Patch or Gum – they are in.
Reseach Question: Is length of sentence related to choice of major?
.group mean sd 1 humanities 25.26 15.97 2 math.sci 30.49 15.65 3 pre.prof 25.02 15.05 4 social.sci 25.49 14.67
(The question is about chance in the gathering of the subjects!)
Reseach Question: Is length of sentence affected by suggested race of the defendant?
.group mean sd 1 black 27.77 14.95 2 white 25.85 15.76
(The question is about chance in the assignment of subjects to treatment groups!)
… patterns in the data depend, at least in part, on chance.
So we have to step back and learn more about how chance processes work. This is the subject of probability.