Directions

Recreate this document exactly using R Markdown. A great reference for creating technical documents with R Markdown is bookdown: Authoring Books and Technical Documents with R Markdown. Your YAML should look similar to:

---
title: "Writing Assignment"
author: "Leave This Blank"
bibliography: [packages.bib, ISLR.bib]
output: 
    bookdown::html_document2
date: 'Last compiled: `r format(Sys.time(), "%b %d, %Y")`'
---

1 From page 62 of ISLR (James et al. 2013)

Let \(\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1x_i\) be the prediction for \(Y\) based on the \(i^{\text{th}}\) value of \(X\). Then \(e_i = y_i - \hat{y}_i\) represents the \(i^{\text{th}}\) residual—this is the difference between the \(i^{\text{th}}\) observed response and the \(i^{\text{th}}\) response value that is predicted by our linear model. We define the residual sum of squares (RSS) as

\[\begin{equation*} \mathrm{RSS} = e_1^2 + e_2^2 + \cdots + e_n^2. \end{equation*}\]

or equivalently as

\[\begin{equation} \mathrm{RSS} = (y_1 - \hat{\beta}_0 - \hat{\beta}_1x_1 )^2 + (y_2 - \hat{\beta}_0 - \hat{\beta}_1x_2 )^2 + \cdots + (y_n - \hat{\beta}_0 - \hat{\beta}_1x_n )^2 \tag{1.1} \end{equation}\]

The least squares approach chooses \(\hat{\beta}_0\) and \(\hat{\beta}_1\) to minimize the RSS. Using some calculus,one can show that the minimizers are

\[\begin{equation} \begin{split} \hat{\beta}_1 &= \frac{\sum_{i=1}^n(x_i - \bar{x})(y_i - \bar{y}) }{\sum_{i=1}^n (x_i - \bar{x})^2},\\ \hat{\beta}_0 &= \bar{y} - \hat{\beta}_1\bar{x},\\ \end{split} \tag{1.2} \end{equation}\]

where \(\bar{y} \equiv \tfrac{1}{n}\sum_{i=1}^n y_i\) and \(\bar{x} \equiv \tfrac{1}{n}\sum_{i=1}^n x_i\) are the sample means.

2 From page 63 of ISLR (James et al. 2013)

Recall that we assume that the true relationship between \(X\) and \(Y\) takes the form \(Y = f(X) + \epsilon\) for some unknown function \(f\), where \(\epsilon\) is a mean-zero random error term. If \(f\) is to be approximated by a linear function, then we write this relationship as

\[\begin{equation} Y = \beta_0 + \beta_1 + \epsilon \tag{2.1} \end{equation}\]

Here \(\beta_0\) is the intercept term—that is, the expected values of \(Y\) when \(X = 0\), and \(\beta_1\) is the slope—the average increase in \(Y\) associated with a one-unit increase in \(X\). The error term is a catch-all for what we miss with this simple model: the true relationship is probably not linear, there may be other variables that cause variation in \(Y\), and there may be measurement error. We typically assume that the error term is independent of \(X\).

3 From page 143 of ISLR (James et al. 2013)

To indicate that a \(p\)-dimensional random variable \(X\) has a multivariate Gaussian distribution, we write \(X \sim N(\mu, \mathbf{\Sigma})\). Here \(E(X) = \mu\) is the mean of \(X\) (a vector with \(p\) components), and \(\mathrm{Cov}(X) = \mathbf{\Sigma}\) is the \(p \times p\) covariance matrix of \(X\). Formally, the multivariate Gaussian density is defined as

\[\begin{equation} f(x) = \frac{1}{(2\pi)^{p/2}| \mathbf{\Sigma}|^{1/2}} \text{exp} \left(-\frac{1}{2}(x - \mu)^T \mathbf{\Sigma}^{-1} (x - \mu) \right) \tag{3.1} \end{equation}\]

In the case of \(p > 1\) predictors, the LDA classifier assumes that the observations in the \(k^{\text{th}}\) class are drawn from a multivariate Gaussian distribution \(N(\mu_k, \mathbf{\Sigma})\), where \(\mu_k\) is a class-specific mean vector, and \(\mathbf{\Sigma}\) is the covariance matrix that is common to all \(K\) classes. Plugging the density function for the \(k^{\text{th}}\) class, \(f_{k}(X = x)\), into (3.1) and performing a little bit of algebra reveals that the Bayes classifier assigns an observation \(X=x\) to the class for which

\[\begin{equation} \delta_k(x) = x^T \mathbf{\Sigma}^{-1} \mu_k -\frac{1}{2} \mu_k^T \mathbf{\Sigma}^{-1} \mu_k + \text{log}\pi_k \tag{3.2} \end{equation}\]

is the largest.

4 Inserting a graph

set.seed(123)
x <- rnorm(1000, 100, 15)
DF <- data.frame(x = x)
library(ggplot2)
ggplot(data = DF, aes(x = x)) + 
  geom_histogram(fill = "blue", color = "black", binwidth = 5) + 
  theme_bw()

Figure 4.1: Your descriptive caption here

xbar <- mean(x)
SD <- sd(x)
c(xbar, SD)

[1] 100.24192  14.87542

Figure 4.1 is unimodal with a mean of 100.2419 and a standard deviation of 14.8754.

5 Automagically Creating References

Review your last assignment to create a file named packages.bib to cite the ggplot2 package used to create Figure 4.1. Figure 4.1 was created with ggplot2 by Wickham and Chang (2016). This document specifies the output as bookdown::html_document2. The function bookdown::html_document2 is from bookdown written by Xie (2016).

sessionInfo()

R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bookdown_0.3  ggplot2_2.2.1

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.9      assertthat_0.1   digest_0.6.12    rprojroot_1.2   
 [5] plyr_1.8.4       grid_3.3.2       gtable_0.2.0     backports_1.0.5 
 [9] magrittr_1.5     evaluate_0.10    scales_0.4.1     highr_0.6       
[13] stringi_1.1.2    lazyeval_0.2.0   rmarkdown_1.3    labeling_0.3    
[17] tools_3.3.2      stringr_1.1.0    munsell_0.4.3    yaml_2.1.14     
[21] colorspace_1.3-2 htmltools_0.3.5  knitr_1.15.1     tibble_1.2

References

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. 1st ed. 2013, Corr. 6th printing 2016 edition. New York: Springer.

Wickham, Hadley, and Winston Chang. 2016. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.

Xie, Yihui. 2016. Bookdown: Authoring Books and Technical Documents with R Markdown. https://CRAN.R-project.org/package=bookdown.

Writing Assignment

Leave This Blank

Last compiled: Feb 11, 2017