Design of Studies (Pt. 1)

Homer White, Georgetown College

In Part 1:

Load Packages

Always remember to make sure the necessary packages are loaded:


Observational Studies

Back to Contents


data=Relationship Questions

Really interesting questions in science are about relationships between variables:

  • Usually one variable \( X \) is considered explanatory
  • The other variable \( Y \) is considered a response

The explanatory variable might

  • help explain
  • help predict
  • maybe even cause

the response variable.

Relationship Question

Does smoking cause lung cancer?

  • Explanatory \( X \) is smoke (Yes, No)
  • Response variable \( Y \) is lung cancer (Yes,No)

Relationship Questions

Learn about the attitudes data frame:


Relationship Questions

Who gives the longest sentences on average: math/science majors, humanities majors, social science majors, or pre-professional majors?

  • Explanatory variable \( X \) = major
  • Response variable \( Y \) = sentence

Quick Look (Numerical)

      .group  mean median
1 humanities 25.26     20
2   math.sci 30.49     30
3 25.02     25
4 social.sci 25.49     25

Quick Look (Graph)

plot of chunk majorsent

Observational Study


In an observational study researchers simply observe or question the subjects. In particular, they measure the values of the explanatory variable \( X \) and measure the values of the response variable \( Y \), for each subject.


Causation Questions

One of the most common reasons to study the relationship between two variables is to see if one of them causes the other.

Speeding Causes Tickets

plot of chunk unnamed-chunk-7 (The arrow says: “Speeding causes tickets”)

A Cause Can Be Partial!

plot of chunk unnamed-chunk-8

A Cause Can Be Indirect!

plot of chunk unnamed-chunk-9

Accelerator pressure is still a causal factor in getting a ticket!

Rough Idea of Causation

Roughly we say that \( X \) causes \( Y \) if, there exist two different values \( x_1 \) and \( x_2 \) of \( X \) so that whenever two people A and B are like in every way except that

  • A's value of \( X \) is \( x_1 \), and
  • B's value of \( X \) is \( x_2 \)


the distribution of \( Y \) for A is different from the distribution of of \( Y \) for B.


For example, we say that speed causes tickets because we believe that:

if two people are alike in every way except that one drives faster than the other, then the one who drives faster has a greater chance of getting a ticket.

Causation Implies Association ...

plot of chunk unnamed-chunk-10

... But Not Vice Versa!

plot of chunk unnamed-chunk-11

Clearly an Association ...

plot of chunk unnamed-chunk-12

... Due to a Common Cause!

plot of chunk unnamed-chunk-13

Confounding Variables

An Observed Association

plot of chunk unnamed-chunk-14

How might we explain this association?

Does X Cause Y?

plot of chunk unnamed-chunk-15

Does Y Cause X?

plot of chunk unnamed-chunk-16

Or is there a Third Variable?

plot of chunk unnamed-chunk-17

\( Z \) is different from \( X \) and from \( Y \)

Associated With Explanatory

plot of chunk unnamed-chunk-18

\( Z \) is associated with \( X \) (but is NOT caused by \( X \)).

Helps to Cause the Response

plot of chunk unnamed-chunk-19

\( Z \) is also a cause of the response variable,

Confounding Variable

plot of chunk unnamed-chunk-20

\( Z \) accounts for some (or all?) of the \( X \)-\( Y \) association.

Confounding Variable

Definition: In a study of the relationship between an explanatory variable \( X \) and a response variable \( Y \), the variable \( Z \) is called a confounding variable if:

  • \( Z \) is a third variable
  • \( Z \) is associated with \( X \) but is not caused by it
  • \( Z \) helps to cause \( Y \)

    Confounding variables are often present in observational studies!


An observational study finds that brightly-colored cars are more likely to receive tickets than cars of more drab colors are.

Does this mean that the color of a car is a causal factor in whether or not its driver will get a ticket?

A Possible Confounder

plot of chunk unnamed-chunk-21

A Possible Confounder

In words:

“The variable driving habits is a possible confounder. People who drive fast are more likely than others to prefer sports cars, which are often brightly colored. Their fast driving also causes them to get lots of tickets.”


Definition of an Experiment


In an experiment, researchers manipulate something and observe the effects of the manipulation on a response variable.

Most commonly, the manipulation consists in assigning the values of an explanatory variable \( X \) to the subjects.

Experiment Terminology

The experimental units, or subjects, are the individuals who particpate in the experiment.

(They need not be people.)

The treatments are the different values of the explanatory variable that researchers assign to subjects.

Subjects are divided into treatment groups. Members of the same treatment group all have the same treatment (value of \( X \)).

Attitudes Question

Crime: You are on a jury for a manslaughter case in Lewistown, PA. The defendant has been found guilty, and in Pennsylvania it is part of the job of the jury to recommend a sentence to the judge. The facts of the case are as follows. The defendant, Tyrone Marcus Watson, a 35-year old native of Lewistown, was driving under the influence of alchohol on the evening of Tuesday July 17, 2001. At approximately 11:00 PM Watson drove through a red light, striking a pedestrian, Betsy Brockenheimer, a 20-year old resident of Lewistown. Brockenheimer was taken unconscious to the hospital and died of her injuries about one hour later. Watson did not flee the scene, nor did he resist arrest.

Question (Continued)

The prior police record for Mr. Watson is as follows: two minor traffic violations, and one previous arrest, five years ago, for DUI. No one was hurt in that incident.

Watson has now been convicted of DUI and manslaughter. The minimum jail term for this combination of offenses is two years; the maximum term is fifty years. In the blank below, write a number from 2 to 50 as your recommended length of sentence for Tyrone Marcus Watson.

Different Versions!

On all survey forms the question was the same except for names of the defendant and the victim.

Defendant Name Victim Name
Tyrone Marcus Waton Betsy Brockenheimer
Tyrone Marcus Watson Latisha Dawes
William Shane Winchester Betsy Brockenheimer
William Shane Winchester Latisha Dawes

The four possible forms were distributed randomly to the subjects.

Some Research Questions

Does the suggested race of the defendant affect the length of sentence recommended?

  • Explanatory \( X \) = defrace
  • Response variable \( Y \) = sentence

    Does the suggested race of the victim affect the length of sentence recommended?

  • Explanatory \( X \) = vicrace

  • Response variable \( Y \) = sentence


For each Research Question:

  • researchers gave the survey form to the subjects,
  • so researchers assigned the value of \( X \) to subjects.

So both Questions were studied by means of an experiment.

Think About This

Consider this research Question:

Does one's sex affect one's risk of developing colon cancer?

Could this Research question be studied by an experiment?

No! Explanatory variable is sex, and we cannot assign people their sex.

Think About This

Consider this research Question:

Does taking aspirin help prevent heart disease?

Could this Research question be studied by an experiment?

Yes. Explanatory variable is whether or not one takes aspirin. We can assign the values of this variable to subjects:

  • Select some subjects to take asprin every day
  • The other subjects must not use aspirin

The Experimental Ideal

In an experiment, we try to assign values of \( X \) so that:

the treatment groups are as similar as possible with respect to every variable (except \( X \)) that might affect \( Y \).

Then any differences in \( Y \) between the groups can be ascribed to \( X \), and not to some confounding factor!