Rebekah Robinson, Georgetown College

- Statistical Relationships (Parts 1 & 2)
- Cautions (Part 3)

Always remember to make sure the necessary packages are loaded:

```
require(mosaic)
require(tigerstats)
```

**Graphical**

- Histogram
- Density Plot
- Box/Violin Plot

**Numerical**

- Median/IQR
- Quantiles
- Mean/SD

We will work primarily with 3 datasets in this Chapter. Load them and take a look at them.

```
data(m111survey)
View(m111survey)
```

```
data(pennstate1)
View(pennstate1)
help(pennstate1)
```

```
data(ucdavis1)
View(ucdavis1)
help(ucdavis1)
```

- Scatterplots
- Correlation
- Regression Equation

There are 2 main types of relationships between 2 numerical variables.

**Deterministic**relationships allow you to*exactly*determine the value of one variable from the value of the other.**Statistical**relationships allow you to*estimate*the typical value of one variable from the value of the other. There is variation from the average pattern.

Every temperature in \( ^o F \) has *exactly* one corresponding temperature in \( ^o C \).

\[ y=\dfrac{5}{9}(x-32) \]

This is a **deterministic** relationship because there is no variation in the pattern.

We will use three tools to study **statistical** relationships.

Scatterplots (Part 1)

Correlation (Part 2)

Regression Equation (Part 2)

Research Question: At Penn State, how is a student's right handspan related to his/her height?

Question about the

*relationship*between two variables.Explanatory variable:

**RtSpan**(numerical)Response variable:

**Height**(numerical)

A **scatterplot** graphically displays the relationship between 2 numerical variables allowing us to visually identify

overall patterns,

directions,

strength of association.

```
xyplot(Height~RtSpan,data=pennstate1,
xlab="Right Handspan (cm)",
ylab="Height (in)",
col="blue",pch=19)
```

We are using **response~explanatory** as our input.

Each point \( (x,y) \) represents one of the students in the survey.

The \( x \)-coordinate is their right handspan.

The \( y \)-coordinate is their height.

Since males tend to have larger hands

andbe taller than females, perhaps the relationship observed between right handspan and height is simply a result of a student's sex.

We can investigate this using parallel scatterplots. We will *condition* on the factor variable **sex**.

```
xyplot(Height~RtSpan|Sex,data=pennstate1,
xlab="Right Handspan (cm)",
ylab="Height (in)", pch=19)
```

The relationship observed in the original scatterplot seems to hold separately for both males and females.

Keeping the points on the same plot and grouping **sex** by color makes the plot easier to interpret.

To overlay a scatterplot, we make use of the `groups`

argument.

```
xyplot(Height~RtSpan,
groups=Sex, data=pennstate1,
xlab="Right Handspan (cm)",
ylab="Height (in)", pch=19,
auto.key=TRUE)
```

If the observed pattern between 2 numerical variables seems to follow a linear trend, we can describe the **direction** as one of the following:

positive linear association

negative linear association

no linear association

High values of one variable tend to accompany high values of the other.

Low values of one variable tend to accompany low values of the other.

High values of one variable tend to accompany low values of the other.

Low values of one variable tend to accompany high values of the other.

There is no apparent pattern between the two variables.

Data with nonlinear association certainly exist. **Curvilinear** data follows the trend of a curve.

Let's look at an example of this. Load and read about the **fuel** dataset.

```
data(fuel)
View(fuel)
help(fuel)
```

The data would not be well described by a line.

As speed increases up to about 60 kph, fuel efficiency decreases.

As speed increases up from 60 kph, fuel efficiency increases.

Part 2 will begin with the second tool that we will use to study statistical relationships - **correlation**.