Tentative STT 3851 Course Schedule

General Notes:

Please bring a notebook and pencil to every class
The principal documents for this course are ModernDive: An Introduction to Statistical and Data Sciences via R (MD), Data Science with R (DSWR), and An Introduction to Statistical Learning with Applications in R (ISLR)

Grading Rubric for Assignments

Field	Excellent (3)	Competent (2)	Needs Work (1)
Reproducible	All graphs, code, and answers are created from text files. Answers are never hard-coded but instead are inserted using inline R code. An automatically generated references section with properly formatted citations when appropriate and `sessionInfo()` are provided at the end of the document.	All graphs, code, and answers are created from text files. Answers are hard coded. No `sessionInfo()` is provided at the end of the document. References are present but not cited properly or not automatically generated.	Document uses copy and paste with graphs or code. Answers are hard coded; and references, when appropriate are hard coded.
Statistical Understanding	Answers to questions demonstrate clear statistical understanding by comparing theoretical answers to simulated answers. When hypotheses are tested, classical methods are compared and contrasted to randomization methods. When confidence intervals are constructed, classical approaches are compared and contrasted with bootstrap procedures. The scope of inferential conclusions made is appropriate for the sampling method.	Theoretical and simulated answers are computed but no discussion is present comparing and contrasting the results. When hypotheses are tested, results for classical and randomization methods are presented but are not compared and contrasted. When confidence intervals are constructed, classical and bootstrap approaches are computed but the results are not compared and contrasted. The scope of inferential conclusions made is appropriate for the sampling method.	Theoretical and simulated answers are not computed correctly. No comparison between classical and randomization approaches is present when testing hypotheses. When confidence intervals are constructed, there is no comparison between classical and bootstrap confidence intervals .
Graphics	Graphs for categorical data (barplot, mosaic plot, etc.) have appropriately labeled axes and titles. Graphs for quantitative data (histograms, density plots, violin plots, etc.) have appropriately labeled axes and titles. Multivariate graphs use appropriate legends and labels. Computer variable names are replaced with descriptive variable names.	Appropriate graphs for the type of data are used. Not all axes have appropriate labels or computer variable names are used in the graphs.	Inappropriate graphs are used for the type of data. Axes are not labeled and computer variable names appear in the graphs.
Coding	Code (primarily R) produces correct answers. Non-standard or complex functions are commented. Code is formatted using a consistent standard.	Code produces correct answers. Commenting is not used with non-standard and complex functions. No consistent code formatting is used.	Code does not produce correct answers. Code has no comments and is not formatted.
Clarity	Few errors of grammar and usage; any minor errors do not interfere with meaning. Language style and word choice are highly effective and enhance meaning. Style and word choice are appropriate for the assignment.	Some errors of grammar and usage; errors do not interfere with meaning. Language style and word choice are, for the most part, effective and appropriate for the assignment.	Major errors of grammar and usage make meaning unclear. Language style and word choice are ineffective and/or inappropriate.

Week 1: (Aug 17, 19)

Before the first class meeting, read chapter 1 (Getting Started with Data in R) of MD—pgs 1-20
Become familiar with the Appstate RStudio server. You will use your Appstate user name and password to log in to the server. You must be registered in the class to access the server.
Sign-up for a free account on GitHub. When you register for a free individual GitHub account, request a student discount to obtain a few private repositories as well as unlimited public repositories. Please use something similar to FirstNameLastName as your username when you register with GitHub. For example, my username on GitHub is alanarnholt. If you have a popular name such as John Smith, you may need to provide some other distinguishing characteristic in your username.
Work through chapter 1 (Git and GitHub) of DSWR. Make sure RStudio is set up to communicate with Git by following the directions in HappyGitWithR for introducing yourself to Git.—VIDEO of the setup process
Set cache your credentials and set up a personal access token (PAT) by following the directions in HappyGitWithR.
Work through chapter 2 (Introduction to R) of DSWR
Start PS-01 due by 10:00 am Aug 24 — Final corrections due by 5:00 pm Aug 26 (assignment link in ASULEARN)—VIDEO of how to accept and clone the assignment

Optional

Introduction to R slides
Watch Paul the Octopus clip (61 seconds).
You may want to install Git, R, RStudio, zotero, and optionally \(LaTeX\) on your personal computer. If you do, you will want to follow Jenny Bryan’s excellent advice for installing R and RStudio and installing Git. Jenny’s advice is also in chapters 6 and 7 of Happy Git and GitHub for the useR. Note: Git, R, RStudio, and \(LaTeX\) are installed on the Appstate RStudio server.
Watch the following videos as appropriate:
Install R on Mac (2 min)
Install R for Windows (3 min)
Install R and RStudio on Windows (5 min)

Week 2: (Aug 24, 26)

Complete PS-01 by 10:00 am Aug 24 — Final corrections due by 5:00 pm Aug 26 (assignment link in ASULEARN)
In class work on dplyr Ch1 Handout (assignment link in ASULEARN)
Before class read chapter 3 (Data Wrangling) of MD — pgs 65-96
Before class read chapter 4 (Data Importing and “Tidy” Data) of MD — pgs 99-117
Partial Lecture Slides
Work through chapter 3 (Starting with Data) of DSWR
Work through chapter 4 (Data Manipulation) of DSWR
Complete the Data Wrangling chapter of Introduction to the Tidyverse— DataCamp — Due NLT 5:00 pm Aug 24
Complete the Data Visualization chapter of Introduction to the Tidyverse— DataCamp — Due NLT 5:00 pm Aug 26
Start PS-03 due by 10:00 am Aug 31 — Final corrections due by 5:00 pm Sep 2 — (assignment link in ASULEARN)

Optional

Read Getting used to R, RStudio, and R Markdown
Watch the following Videos:

Week 3: (Aug 31, Sep 2)

Complete PS-03 by 10:00 am Aug 31 — Final corrections due by 5:00 pm Sep 2 — (assignment link in ASULEARN)
Before class read chapter 2 (Data Visualization) of MD — pgs 21-62
Work through chapter 5 (Using ggplot2) of DSWR
Partial Lecture Slides
Complete the Grouping and Summarizing chapter of Introduction to the Tidyverse— DataCamp — Due NLT 5:00 pm Aug 31
Complete the Types of Visualizations chapter of Introduction to the Tidyverse— DataCamp — Due NLT 5:00 pm Sep 2
Start PS-02 due by 10:00 am Sep 7 — Final corrections due by 5:00 pm Sep 9—(assignment link in ASULEARN)

Test yourself:

Optional

Overplotting Examples and possible Solutions
Read Chapters 1-2 of bookdown: Authoring Books and Technical Documents with R Markdown
Complete Data Visualization with ggplot2 (Part 1) (DataCamp)
Nice Tidyverse Cheat Sheet

Week 4: (Sep 7, 9)

Complete PS-02 by 10:00 am Sep 7 — Final corrections due by 5:00 pm Sep 9—(assignment link in ASULEARN)
Before class read chapter 5 (Basic Regression) of MD — pgs 119-160
Go over in class this document
Complete the Introduction to Modeling chapter of Modeling with Data in the Tidyverse— DataCamp — Due NLT 5:00 pm Sep 7
Complete the Modeling with Basic Regression chapter of Modeling with Data in the Tidyverse— DataCamp — Due NLT 5:00 pm Sep 9

Optional

Add an avatar or picture on your GitHub account
Read the Git and GitHub chapter from Hadley Wickham’s book R Packages
Brian Caffo’s take on R IDEs

Week 5: (Sep 14, 16)

Before class read chapter 6 (Multiple Regression) of MD — pgs 161-191
Complete the Modeling with Multiple Regression chapter of Modeling with Data in the Tidyverse— DataCamp — Due NLT 5:00 pm Sep 14
Complete the Model Assessment and Selection chapter of Modeling with Data in the Tidyverse— DataCamp — Due NLT 5:00 pm Sep 16
Start working on Bookdown assignment Modeling with Data in the Tidyverse - Due NLT 5:00 pm Sep 23 — assignment link in ASULEARN

Optional

Complete Data Manipulation in R with dplyr (DataCamp)

Week 6: (Sep 21, 23)

Bookdown assignment Modeling with Data in the Tidyverse - Due NLT 5:00 pm Sep 23 — assignment link in ASULEARN
Before class read chapter 6 (Multiple Regression) of MD — pgs 161-191

Optional

Read through Misc Regression
Answer the questions at the end of Misc Regression for extra credit (Turn in before Oct 2)
Work on Is this Discrimination?
Some ideas for how to answer the Is this Discrimination?

Week 7: (Sep 28, 30)

Read chapters 1 and 2 of ISLR
Lecture - Bias Variance Tradeoff - Take notes - Slides
Go over Bias Variance Graphs
Go over Chapter 2 Graphs
Discuss flexible models and when to use them
Complete the Visualizing two variables chapter of Correlation and Regression in R— DataCamp — Due NLT 5:00 pm Sep 28
Complete the Correlation chapter of Correlation and Regression in R— DataCamp — Due NLT 5:00 pm Sep 30
Start Lab1 due by 10:00 am Oct 5—Final corrections due by 5:00 pm Oct 7 — (assignment link in asulearn)

Optional

Watch Opening Remarks and Examples (18:18)
Watch Supervised and Unsupervised Learning (12:12)
Watch Statistical Learning and Regression (11:41)
Watch Curse of Dimensionality and Parametric Models (11:40)
Watch Assessing Model Accuracy and Bias-Variance Trade-off (10:04)
Watch Classification Problems and K-Nearest Neighbors (15:37)
Read Section 3.4 of bookdown: Authoring Books and Technical Documents with R Markdown

Week 8: (Oct 5, 7)

Complete Lab1 by 10:00 am Oct 5—Final corrections due by 5:00 pm Oct 7 — (assignment link in asulearn)
Read chapter 3 of ISLR
Linear Regression (slides)
Go over chapter 3 material using R
Complete the Simple Linear Regression chapter of Correlation and Regression in R— DataCamp — Due NLT 5:00 pm Oct 5
Complete the Interpreting Regression Models chapter of Correlation and Regression in R— DataCamp — Due NLT 5:00 pm Oct 5
Complete the Model Fit chapter of Correlation and Regression in R— DataCamp — Due NLT 5:00 pm Oct 7

Optional

Watch Simple Linear Regression and Confidence Intervals (13:01)
Watch Hypothesis Testing (8:24)
Watch Multiple Linear Regression and Interpreting Regression Coefficients (15:38)

Week 9: (Oct 12—Fall Break, Oct 14)

Read chapter 3 of ISLR
Linear Regression (slides)
Go over chapter 3 material using R
Complete the Parallel slopes model chapter of Multiple and Logistic Regression in R— DataCamp — Due NLT 5:00 pm Oct 14
Complete the Evaluating and extending parallel slopes model chapter of Multiple and Logistic Regression in R— DataCamp — Due NLT 5:00 pm Oct 14
Complete the Multiple Regression chapter of Multiple and Logistic Regression in R— DataCamp — Due NLT 5:00 pm Oct 14
Start Writing Assignment - Lab2 due by 10:00 am Oct 19 — Final corrections due by 5:00 pm Oct 21

Optional

Watch Model Selection and Qualitative Predictors (14:51)
Watch Interactions and Nonlinearity (14:16)
Watch Lab: Linear Regression (22:10)
Code to help with the creation of Figure 6.1 in the DataCamp assignment.

Week 10: (Oct 19, 21)

Complete Writing Assignment - Lab2 by 10:00 am Oct 19 — Final corrections due by 5:00 pm Oct 21
Read chapter 3 of ISLR
Linear Regression (slides)
Go over chapter 3 material using R
Complete the Logistic Regression chapter of Multiple and Logistic Regression in R— DataCamp — Due NLT 5:00 pm Oct 19
Complete the Case Study chapter of Multiple and Logistic Regression in R— DataCamp — Due NLT 5:00 pm Oct 21
Start Lab3 due by 10:00 am Oct 26 — Final corrections due by 5:00 pm Oct 28 — (assignment link in ASULEARN)

Optional

Watch Model Selection and Qualitative Predictors (14:51)
Watch Interactions and Nonlinearity (14:16)
Watch Lab: Linear Regression (22:10)

Week 11: (Oct 26, 28)

Complete Lab3 by 10:00 am Oct 26 — Final corrections due by 5:00 pm Oct 28 — (assignment link in ASULEARN)
Read chapter 5 of ISLR
Resampling Methods (slides)
Discuss Cross-Validation Hand Out
Flow control example
Bootstrap example
Complete the Regression models: fitting them and evaluating their performance chapter of Machine Learning with caret in R— DataCamp — Due NLT 5:00 pm Oct 26
Complete the Classification models: fitting them and evaluating their performance chapter of Machine Learning with caret in R— DataCamp — Due NLT 5:00 pm Oct 28
Start kaggle competition — Last submission Nov 25

Optional

Watch Estimating Prediction Error and Validation Set Approach (14:01)
Watch K-fold Cross-Validation (13:33)
Watch Cross-Validation: The Right and Wrong Ways (10:07)
Read the article Modeling Home Prices Using Realtor Data.
The data set can be downloaded from http://www.amstat.org/publications/jse/datasets/homes76.dat.txt, while the code book is available from http://www.amstat.org/publications/jse/datasets/homes76.txt
Example Template Directory
Use the housedata.csv file - See King County Housing Data

Week 12: (Nov 2, 4)

Read Predictive Model Building
Read chapter 6 of ISLR
Linear Model Selection and Regularization (slides)
Complete the Tuning model parameters to improve performance chapter of Machine Learning with caret in R— DataCamp — Due NLT 5:00 pm Nov 2
Complete the Preprocessing your data chapter of Machine Learning with caret in R— DataCamp — Due NLT 5:00 pm Nov 2
Complete the Selecting models: a case study in churn prediction chapter of Machine Learning with caret in R— DataCamp — Due NLT 5:00 pm Nov 4
Work on kaggle competition — Last submission Nov 25
Start Lab4 - Linear Modeling due by 10:00 am Nov 9 — Final corrections due by 5:00 pm Nov 11

Optional

Watch Linear Model Selection and Best Subset Selection (13:44)
Watch Forward Stepwise Selection (12:26)
Watch Backward Stepwise Selection (5:26)
Watch Estimating Test Error Using Mallow’s Cp, AIC, BIC, Adjusted R-squared (14:06)
Watch Estimating Test Error Using Cross-Validation (8:43)

Week 13: (Nov 9, 11)

Complete Lab4 - Linear and Nonlinear Modeling by 10:00 am Nov 9 — Final corrections due by 5:00 pm Nov 11
Read Predictive Model Building
Read chapter 6 of ISLR
Linear Model Selection and Regularization (slides)
Start reproduction of Machine Learning with caret in R — Due 10:00 am Nov 16 — Final corrections due by 5:00 pm Nov 18
Work on kaggle competition — Last submission Nov 25

Optional

Watch Shrinkage Methods and Ridge Regression (12:37)
Watch The Lasso (15:21)
Watch Tuning Parameter Selection for Ridge Regression and Lasso (5:27)
Watch Dimension Reduction (4:45)
Watch Principal Components Regression and Partial Least Squares (15:48)
Watch Lab: Best Subset Selection (10:36)
Watch Lab: Forward Stepwise Selection and Model Selection Using Validation Set (10:32)
Watch Lab: Model Selection Using Cross-Validation (5:32)
Watch Lab: Ridge Regression and Lasso (16:34)

Week 14: (Nov 16, 18)

Complete reproduction of Machine Learning with caret in R — Due 10:00 am Nov 16 — Final corrections due by 5:00 pm Nov 18
Work on kaggle competition — Last submission Nov 25

Week 15: (Nov 23)

Work on kaggle competition — Last submission Nov 25
Accept kaggle slides repository inside ASULEARN.

Week 16: (Nov 30)

Work on kaggle assignment.

Final Exam: (Dec 7—11:00 am-1:30 pm)

Three minute presentation on kaggle competition — Limited to three slides. Accept assignment inside ASULEARN.
- The training and test data sets should be stored inside your repository. All data munging and feature engineering should be included inside the presentation but not necessarily shown (use echo = FALSE). Your presentation should include a description of your best two or three models and lessons learned during the competition.

Tentative STT 3851 Course Schedule - Fall 2021

Alan T. Arnholt

Last Updated on: Nov 28, 2021 at 08:07:14 AM

General Notes:

Grading Rubric for Assignments

Week 1: (Aug 17, 19)

Optional

Week 2: (Aug 24, 26)

Optional

Week 3: (Aug 31, Sep 2)

Optional

Week 4: (Sep 7, 9)

Optional

Week 5: (Sep 14, 16)

Optional

Week 6: (Sep 21, 23)

Optional

Week 7: (Sep 28, 30)

Optional

Week 8: (Oct 5, 7)

Optional

Week 9: (Oct 12—Fall Break, Oct 14)

Optional

Week 10: (Oct 19, 21)

Optional

Week 11: (Oct 26, 28)

Optional

Week 12: (Nov 2, 4)

Optional

Week 13: (Nov 9, 11)

Optional

Week 14: (Nov 16, 18)

Week 15: (Nov 23)

Week 16: (Nov 30)

Final Exam: (Dec 7—11:00 am-1:30 pm)