Create a directory named HomePricesProject inside your private class GitHub repository. Store all work for this project in this directory.
Read the article Modeling Home Prices Using Realtor Data.
Create an Rmarkdown document named Project1.Rmd inside the HomePricesProject directory. Complete all subsequent directions in this document.
Read the data from http://ww2.amstat.org/publications/jse/datasets/homes76.dat.txt into an R object named HP.
Remove columns 1, 7, 10, 15, 16, 17, 18, and 19 from HP and store the result back in HP.
Name the columns in HP price, size, lot, bath, bed, year, age, garage, status, active, and elem, respectively.
Use the function datatable from the DT package to display the data from HP. Your data display should look similar to the one below.
Explore the data for variables that might help explain the price of a house.
What are the units for price and size? Use the function stepAIC from the MASS package to create models using forward selection and backward elimination. Store the model from backward elimination in an object named mod.be and the model from forward selection in an object named mod.fs.
Which model (mod.be or mod.fs) do you believe is better and why?
Create a model and name it mod1 that regresses price on all of the variables in HP with the exception of status and year. Produce a summary of mod1 and graph the residuals using residualPlots from the car package. Based on your residual plots, what might you do to mod1? Report the adjusted \(R^2\) value for mod1.
Create a new model (mod2) by adding bath:bed and age\(^2\) to mod1. Report the adjusted \(R^2\) value for mod2.
Create a new model (mod3) by using only edison and harris from elem from mod2. Hint: use I(). Your estimated coefficients should agree with those in the article. Conduct a nested F-test (anova(mod3, mod2)). Does your p-value agree with the one presented in the article? Interpret this test. Report the adjusted \(R^2\) value for mod3.
Compute the training mean square prediction error for all five of the models. Which model has the smallest training mean square prediction error? Do you think this model will also have the smallest test mean square prediction error?
Use mod3 to create a 95% prediction interval for a home with the following features: 1879 feet, lot size category 4, two and a half baths, three bedrooms, built in 1975, two-car garage, and near Parker Elementary School.
EXTRA CREDIT: Install the package effects and run the following code:
library(effects)
plot(allEffects(mod2))
plot(effect("bath*bed", mod2))
plot(effect("bath*bed", mod2, xlevels=list(bed=2:5)))
plot(effect("bath*bed", mod2, xlevels=list(bath=1:3)))
Explain what each set of graphs is showing.