Rebekah Robinson, Georgetown College

`R`

is a statistical software program.- RStudio is an
*integrated development environment*(IDE) that facilitates the use of`R`

. - RStudio makes
`R`

easier and more fun to use!

The bottom left panel

- is called the
**Console**. - is the “brain” of
`R`

.

Anything entered in the **Console** will be executed by `R`

.

Give it a try! Place your cursor in the **Console** panel, type the following, and then hit return.

```
4+9
```

The top right panel has two tabs:

**History**- stores all commands entered in the**Console**.

Take a look at your **History** tab. You should see `4+9`

.

**Environment**- stores all objects (datasets, functions, etc.) that have been entered in the**Console**.

There isn't anything in your **Environment** tab because we have not created any objects.

Let's create an expression object! Type the following:

```
mysum <- 4+9
```

- You have stored the sum
`4+9`

in the object named`mysum`

. - There is nothing special about the name
`mysum`

. You could have named it anything! - Look in the
**Environment**tab. You should see this object, along with it's value. - Go back to the
**Console**and type the following to return the value stored in`mysum`

.

```
mysum
```

The bottom right panel has four tabs:

**Files**- lists available files
- allows you to upload new files

**Plots**- displays plots that you create in the Console

**Packages**- allows you to install and load necessary packages

**Help**- view help files

There are 3 important packages to load when you work in `R`

`tigerstats`

`mosaic`

`manipulate`

You can load them one of two ways:

- Checking the appropriate box in the
**Packages**tab. - Typing the following into your Console:

```
require(mosaic)
require(tigerstats)
require(manipulate)
```

The top left panel

- is called the
**Source**. - is where you will do most of your typing.
- is where you will open, edit, and save files.

Let's open an R Script file. Select

*File**New File**R Script*

An R Script is a file to store code. This allows you to

- easily modify and edit long lines of code
- save your work
- share your work with others

Once a command is typed into an R script file, it should be run through the **Console**.

- Select the entire command. Then copy and paste into the
**Console**. - With the cursor on the line you want to run, press the
`Run`

button at the top right of the**Source**window.

The most basic way to use `R`

is as a calculator. Type the following expressions into an RScript and run them through the **Console**.

```
5+4
123-45
23*3.7
84/7
```

- use
`+`

for addition - use
`-`

for subtraction - use
`*`

for multiplication - use
`/`

for division

When using `R`

as a calculator, use parentheses to preserve the order of operations. The expression

```
7+5*2
```

```
[1] 17
```

is different than

```
(7+5)*2
```

```
[1] 24
```

`R`

has some built in mathematical functions that should be familiar to you. For example:

```
sqrt(81) #square root function
```

```
[1] 9
```

```
cos(pi) #cosine of pi
```

```
[1] -1
```

Use the `#`

sign to add a comment next to a line of code. This helps you remember what a particular line of code does!

What if you couldn't remember what the square root function was? You could access the `help`

file on this function using either of the following methods.

- Type the following into the
**Console**

```
help(sqrt)
```

OR

- Type the first few letters of the function into the
**Console**and press`Tab`

. Select the appropriate function from the list. Press`F1`

to open the help file in the**Help**tab.

There are several built in functions to become familiar with:

To combine values into a list, type

```
c(1, 3, 5)
```

It is useful to store a list in an object. Name it whatever you like!

```
mylist <- c(1, 3, 5) #creates the object
mylist #calls the object
```

```
[1] 1 3 5
```

To combine words or letters into a list, type

```
mywordlist <- c("A", "B", "Cat")
```

To see the list, type:

```
mywordlist
```

```
[1] "A" "B" "Cat"
```

Letters and words must be put in quotations.

To create a list of numbers that are all the same, you can use the concatenate function.

```
c(2, 2, 2, 2, 2)
```

It is easier to use the `rep`

function:

```
myreps <- rep(x=2,times=5)
myreps
```

```
[1] 2 2 2 2 2
```

```
rep(x=2,times=5)
```

The `rep`

function requires two inputs:

`x`

is the value that you want to replicate`times`

is the number of times you want to replicate`x`

You do not have to enter the names of the inputs

```
rep(2,5)
```

```
[1] 2 2 2 2 2
```

as long as you enter them in the correct order.

```
rep(5,2)
```

```
[1] 5 5
```

You can also replicate letters or words.

```
rep("apple", 3)
```

```
[1] "apple" "apple" "apple"
```

The `seq`

function comes in handy for making seqences of values.

To create the sequence `1, 2, 3, 4, 5`

, type

```
seq(from=1,to=5,by=1)
```

```
[1] 1 2 3 4 5
```

```
seq(from=1,to=5,by=1)
```

This functions requires three inputs:

`from`

is the starting point of the sequence,`to`

is the ending point of the sequence,`by`

is the increment.

This function is useful for other increments.

```
seq(from=1,to=5,by=0.5)
```

```
[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
```

```
seq(0,1,0.1)
```

```
[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
```

- A volunteer draws 10 cards from our classroom deck of cards,
*with replacement*. - The cards will be shuffled between each draw.
- The class wins a point for each red card drawn and the instructor wins a point for each black card drawn.

*With replacement* means that the volunteer will

- draw a card
- record the color
- put it back in the deck before drawing another card.

Questions to consider before we play:

What is the probability that the volunteer will pull out a red card?

Of the 10 cards drawn, how many do you

*expect*to be red?Do we think that the volunteer will draw exactly the hypothesized number of red cards?

Let's play!

Questions to consider about our results:

Do these results seem consistent with how many we expected to be red? Or do they seem strange?

Do you still believe your hypothesized probability of drawing a red card? In other words, do you still believe that we are playing with a standard deck?

We will investigate these questions by playing more games. Let's simulate 1000 games in `R`

.

Let's start by making a deck of cards in `R`

. Since we only care about the color, we can do this by:

```
mycards <- c(rep("Red",26),rep("Black",26))
```

Let's look at the deck:

```
mycards
```

You should see 26 Reds followed by 26 Blacks.

Now, we will randomly draw 10 cards from our virtual deck.

```
sample(mycards, size=10, replace=TRUE)
```

You can take a sample of ten cards and then count up the results:

```
table(sample(mycards,size=10,replace=T))
```

```
Black Red
6 4
```

Let's play 3 games:

```
do(3)*table(sample(mycards,
size=10, replace=TRUE))
```

```
Black Red
1 7 3
2 4 6
3 2 8
```

- Game One: 7 black and 3 red cards
- Game Two: 4 black and 6 red cards
- Game Three: 2 black and 8 red cards

```
Black Red
1 7 3
2 4 6
3 2 8
```

Let's display these results in terms of how many games resulted in a given number of red cards.

```
Red
0 1 2 3 4 5 6 7 8 9 10
0 0 0 1 0 0 1 0 1 0 0
```

Now that we know how to read the results, let's simulate 1000 games.

```
Red
0 1 2 3 4 5 6 7 8 9 10
0 5 41 119 205 249 209 121 42 8 0
```

For example,

- Five red cards were drawn in 249 of the 1000 games.
- Eight red cards were drawn in 42 of the 1000 games.
- Nine red cards were drawn in 8 of the 1000 games.
- Ten red cards were drawn in 0 of the 1000 games.

Another way to think about these numbers is:

- Five red cards were drawn in 25.9% of the games.

```
(259/1000)*100
```

- Eight red cards were drawn in 4.2% of the games.
- Nine red cards were drawn in 0.8% of the games
- Ten red cards were drawn in 0% of the games.

How *likely* is it that our volunteer drew their original hand, based on our simulations?

A graphical representation of these percents is useful.

- Horizontal axis - the number of red cards in a hand of 10 cards from a standard deck.
- Vertical axis - the percentage of times (out of the 1000 simulated games) that a particular number of red cards was drawn.

Let's shade a bar in the histogram to mark the number of red cards our volunteer drew in the class game.

For example, suppose our volunteer drew 9 red cards.

This shaded region in the histogram represents the estimated chance of drawing 9 red cards from a standard deck.

The *likelihood* that our class game resulted in such a high number of red cards (or higher) if we were really drawing from a standard deck of playing cards is called a **p-value**. (This P-value is about 0.008)

We wanted to test the hypothesis that we were playing with a standard deck.

Data was gathered from a real-world experiment to test our hypothesis.

We asked, “How likely was it to draw the hand that we did

*if*we drew from a standard deck?” “We simulated 1000 games using

`R`

, and counted the # of games that gave us at least the result we got in the experiment.We calculated a P-value, the probability of getting results as extreme as ours (or more so!) from a standard deck.

Finally, draw a conclusion.

If we assume that our volunteer drew 10 cards from a standard deck of cards, there is about a 0.8% chance of drawing 9 red cards.

- What does this suggest about our originial hypothesis?
- Do you believe we were really playing with a standard deck?
- How convinced are you?

**Goal:** Translate data into knowledge and understanding of the world around us. Statistics is the art and science of learning from data!

The card game we played above is a perfect example of the three aspects of statistics.

**Design**- asking the right questions and collecting useful data.**Description**- summarizing and analyzing data.-
**Inference**- making decisions, generalizations, and turning data into new knowledge.