Chapter 3 Notes

Exercise

Can you spot the difference between a character string and a number? Here’s a test: Which of these are character strings and which are numbers? 1, “1”, “one”.

x <- 1
y <- "1"
z <- "one"
typeof(1)
[1] "double"
typeof(x)
[1] "double"
typeof("1")
[1] "character"
typeof(y)
[1] "character"
typeof("one")
[1] "character"
typeof(z)
[1] "character"

Exercise

Create an atomic vector that stores just the face names of the cards in a royal flush (ace through ten all one suit), for example, the ace of spades, king of spades, queen of spades, jack of spades, and ten of spades. The face name of the ace of spades would be “ace”, and “spades” is the suit. Which type of vector will you use to save the names?

FaceNames <- c("ace", "king", "queen", "jack", "ten")
FaceNames
[1] "ace"   "king"  "queen" "jack"  "ten"  
typeof(FaceNames)
[1] "character"

Exercise

Create the following matrix, which stores the name and suit of every card in a royal flush.

     [,1]    [,2]    
[1,] "ace"   "spaces"
[2,] "king"  "spaces"
[3,] "queen" "spaces"
[4,] "jack"  "spaces"
[5,] "ten"   "spaces"
hand1 <- c("ace", "king", "queen", "jack", "ten", rep("spades", 5))
matrix(hand1, ncol = 2)
     [,1]    [,2]    
[1,] "ace"   "spades"
[2,] "king"  "spades"
[3,] "queen" "spades"
[4,] "jack"  "spades"
[5,] "ten"   "spades"
matrix(hand1, nrow = 5)
     [,1]    [,2]    
[1,] "ace"   "spades"
[2,] "king"  "spades"
[3,] "queen" "spades"
[4,] "jack"  "spades"
[5,] "ten"   "spades"
dim(hand1) <- c(5, 2)
hand1
     [,1]    [,2]    
[1,] "ace"   "spades"
[2,] "king"  "spades"
[3,] "queen" "spades"
[4,] "jack"  "spades"
[5,] "ten"   "spades"

Note: R matrices are column major.

Dates and Times

now <- Sys.time()
now
[1] "2016-02-15 14:47:38 EST"
typeof(now)
[1] "double"
class(now)
[1] "POSIXct" "POSIXt" 
SEC <- unclass(now)
SEC
[1] 1455565658

The number stored in SEC represents the number of seconds that have passed between the time and 12:00 AM January 1st 1970 (in the Universal Time Coordinated (UTC) zone).

Factors

gender <- factor(c("male", "female", "female", "male"))
typeof(gender)
[1] "integer"
attributes(gender)
$levels
[1] "female" "male"  

$class
[1] "factor"
unclass(gender)
[1] 2 1 1 2
attr(,"levels")
[1] "female" "male"  

Exercise

Many card games assign a numerical value to each card. For example, in blackjack, each face card is worth 10 points, each number card is worth between 2 and 10 points, and each ace is worth 1 or 11 points, depending on the final score. Make a virtual card by combining “ace”, “heart”, and 1 into a vector. What type of atomic vector will result? Character Check if you are right.

card <- c("ace", "heart", 1)
typeof(card)
[1] "character"

Exercise

Use a list to store a single playing card, like the ace of hearts, which has a point value of one. The list should save the face of the card, the suit, and the point value in separate elements.

card <-  list(face = "ace", suit = "hearts", value = 1)
card
$face
[1] "ace"

$suit
[1] "hearts"

$value
[1] 1

Data Frames

df <- data.frame(face = c("ace", "two", "six"), suit = c("clubs", "clubs", "clubs"), value = c(1, 2, 3))
df
  face  suit value
1  ace clubs     1
2  two clubs     2
3  six clubs     3
typeof(df)
[1] "list"
class(df)
[1] "data.frame"
str(df)
'data.frame':   3 obs. of  3 variables:
 $ face : Factor w/ 3 levels "ace","six","two": 1 3 2
 $ suit : Factor w/ 1 level "clubs": 1 1 1
 $ value: num  1 2 3
df2 <- data.frame(face = c("ace", "two", "six"), suit = c("clubs", "clubs", "clubs"), value = c(1, 2, 3), stringsAsFactors = FALSE)
df2
  face  suit value
1  ace clubs     1
2  two clubs     2
3  six clubs     3
typeof(df2)
[1] "list"
class(df2)
[1] "data.frame"
str(df2)
'data.frame':   3 obs. of  3 variables:
 $ face : chr  "ace" "two" "six"
 $ suit : chr  "clubs" "clubs" "clubs"
 $ value: num  1 2 3

Creating a deck of cards with less typing

Face <- c("king", "queen","jack", "ten","nine","eight","seven","six","five","four","three","two","ace")
Suit <- c("spades","clubs", "diamonds", "hearts")
Value = 13:1
deck <- data.frame(face = rep(Face, 4), suit = rep(Suit, each = 13), value = rep(Value, 4), stringsAsFactors = FALSE)
library(DT)
datatable(deck)

Reading a file from a secure web site using readr::read_csv() and repmis::source_data()

Note: read.csv() will not read from https (Hypertext Transfer Protocol Secure) web sites.

site <- "https://gist.githubusercontent.com/garrettgman/9629323/raw/ee5dfc039fd581cb467cc69c226ea2524913c3d8/deck.csv"
deck2 <- readr::read_csv(site)
head(deck2)
   face   suit value
1  king spades    13
2 queen spades    12
3  jack spades    11
4   ten spades    10
5  nine spades     9
6 eight spades     8
deck1 <- repmis::source_data(url = site, sep = ",", header = TRUE)
Downloading data from: https://gist.githubusercontent.com/garrettgman/9629323/raw/ee5dfc039fd581cb467cc69c226ea2524913c3d8/deck.csv 
SHA-1 hash of the downloaded data file is:
a1cdb425b6cd2b030f9538257b7c2a61c6e6c8b1
datatable(deck1)

Saving Data

write.csv(deck1, file = "cards.csv", row.names = FALSE)

Downloading any file secure or not

We download the deck.csv file from the supplied GitHub url stored in the character string site and store the downloaded file as DFcards.csv in the same directory as the current document.

download.file(url = site, destfile = "./DFcards.csv", method = "curl")
list.files()
 [1] "cards.csv"            "Chapter7.Rmd"         "Chapters1and2.html"  
 [4] "Chapters1and2.Rmd"    "Chapters3and4.Rmd"    "Chapters5and6.Rmd"   
 [7] "DFcards.csv"          "PackageBuilding.html" "PackageBuilding.Rmd" 
[10] "PackagesUsed.bib"     "PNG"                 

Chapter 4 Notes

Selecting Values

head(deck)
   face   suit value
1  king spades    13
2 queen spades    12
3  jack spades    11
4   ten spades    10
5  nine spades     9
6 eight spades     8
deck[1, 1]
[1] "king"
deck[1, 1:3]
  face   suit value
1 king spades    13
deck[1:2, 1]  # returns a single column
[1] "king"  "queen"
deck[1:2, 1, drop = FALSE] # returns a data frame
   face
1  king
2 queen
deck[-(2:52), 1:3] # everything except rows 2-52 for cols 1-3
  face   suit value
1 king spades    13

Blank Spaces

You can use a blank space to tell R to extract every value in a dimension.

deck[1, ]  # same as deck[1, 1:3]
  face   suit value
1 king spades    13

Logical Values

set.seed(4)
sims <- 10000
xbar <- numeric(sims)
for(i in 1:sims){
  xbar[i] <- mean(runif(50, 0, 10))
}
mean(xbar)
[1] 4.997238
sd(xbar)
[1] 0.4113463
library(ggplot2)
DF <- data.frame(xbar = xbar)
ggplot(data = DF, aes(x = xbar)) + 
  geom_density(fill = "lightblue") + 
  stat_function(fun = dnorm, args = list(mean = 5, sd = ((10)/sqrt(12))/sqrt(50)), color = "red") + 
  theme_bw()

What percent of the values in xbar are between \(5 - (10/\sqrt{12})/(\sqrt{50}) = 4.5917517\) and \(5 + (10/\sqrt{12})/(\sqrt{50}) = 5.4082483\)?

LOG <- xbar >= (5 - ((10)/sqrt(12))/sqrt(50)) & xbar <= (5 + ((10)/sqrt(12))/sqrt(50))
head(LOG, n = 10)
 [1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE
head(which(LOG == TRUE), n = 10)
 [1]  2  3  4  5  6  7  9 10 11 13
length(xbar[LOG])
[1] 6808
mean(LOG)
[1] 0.6808
# Compare to Normal
pnorm(1) - pnorm(-1)
[1] 0.6826895

Exercise

Use the preceding ideas to write a shuffle function. shuffle() should take a data frame and return a shuffled copy of the data frame.

shuffle <- function(cards){
  index <- sample(dim(cards)[1], size = dim(cards)[1], replace = FALSE)
  cards[index, ]
}
deck2 <- shuffle(cards = deck)
deck2[1:5, ]
    face     suit value
43   ten   hearts    10
38   two diamonds     2
24 three    clubs     3
10  four   spades     4
23  four    clubs     4

Dollar Signs and Double Brackets

deck$value
 [1] 13 12 11 10  9  8  7  6  5  4  3  2  1 13 12 11 10  9  8  7  6  5  4
[24]  3  2  1 13 12 11 10  9  8  7  6  5  4  3  2  1 13 12 11 10  9  8  7
[47]  6  5  4  3  2  1
mean(deck$value)
[1] 7
LST <- list(numbers = c(1, 2, 3, 4, 5), logical = c(TRUE, FALSE, TRUE), strings = c("dog", "cat", "horse", "car"))
LST
$numbers
[1] 1 2 3 4 5

$logical
[1]  TRUE FALSE  TRUE

$strings
[1] "dog"   "cat"   "horse" "car"  

Subsetting the first element:

LST[1]             # a list
$numbers
[1] 1 2 3 4 5
LST[[1]]           # values inside element
[1] 1 2 3 4 5
LST$numbers        # values inside element
[1] 1 2 3 4 5
LST[['numbers']]   # values inside element
[1] 1 2 3 4 5

If you subset a list with single-bracket notation, R will return a smaller list. If you subset a list with double-bracket notation, R will return the values inside the element subsetted of the list.