SOA ASA Exam: Predictive Analysis (PA) – 3.2 Linear Models Case Study 2: Feature Selection and Regularization
[mathjax] Learning Objectives After completing this case study, you should be able to: Fit a multiple linear regression model with both numeric and categorical (factor) predictors. Detect and accommodate interactions between predictors which can be quantitative or qualitative. Perform explicit binarization of categorical predictors using the dummyVars() function from the caret package and understand why doing so may be beneficial. Perform stepwise selection using the stepAIC() function from the MASS package and be familiar with the different options allowed by this function. Generate and interpret diagnostic plots for a linear model. Implement regularized regression using the glmnet() and cv.glmnet() functions from the glmnet package. Stage 1: Define the Business Problem Objectives Our goal here is to identify and interpret key factors that relate to a higher or lower Balance with the aid of appropriate linear models. Stage 2: Data Collection Data Design Relevance Read in data and remove irrelevant variables. # CHUNK 1 library(ISLR) data(Credit) Credit$ID <- NULL Data Description The Credit dataset contains n = 400 observations and 11 variables. Numeric predictors are listed first, followed by categorical ones. The target variable is the last variable in the dataset, Balance, an integer-valued variable that ranges from …