Taking too long? Close loading screen.

SOA ASA Exam: Predictive Analysis (PA) – 3.2 Linear Models Case Study 2: Feature Selection and Regularization

[mathjax] Learning Objectives After completing this case study, you should be able to: Fit a multiple linear regression model with both numeric and categorical (factor) predictors. Detect and accommodate interactions between predictors which can be quantitative or qualitative. Perform explicit binarization of categorical predictors using the dummyVars() function from the caret package and understand why doing so may be beneficial. Perform stepwise selection using the stepAIC() function from the MASS package and be familiar with the different options allowed by this function. Generate and interpret diagnostic plots for a linear model. Implement regularized regression using the glmnet() and cv.glmnet() functions from the glmnet package.   Stage 1: Define the Business Problem Objectives Our goal here is to identify and interpret key factors that relate to a higher or lower Balance with the aid of appropriate linear models.   Stage 2: Data Collection Data Design Relevance Read in data and remove irrelevant variables. # CHUNK 1 library(ISLR) data(Credit) Credit$ID <- NULL   Data Description The Credit dataset contains n = 400 observations and 11 variables. Numeric predictors are listed first, followed by categorical ones. The target variable is the last variable in the dataset, Balance, an integer-valued variable that ranges from …

Read more

SOA ASA Exam: Predictive Analysis (PA) – 3.1 Linear Models Case Study 1: Fitting Linear Models in R

[mathjax] Context Suppose that we are statistical consultants hired by the company that offers the product. The company is interested in boosting sales of the product, but cannot directly do so (that is determined by market demand). Instead, it has the liberty to control the advertising expenditure in each of the three advertising media: TV, radio, and newspaper. If we can construct a linear model that accurately predicts sales (the target variable) on the basis of the budgets spent on the three advertising media (the predictors), then such a model can provide the basis for a profitable marketing plan that specifies how much should be spent on the three media to maximize sales, a business issue of great interest to the company. Learning Objectives After completing this case study, you should be able to: Fit a multiple linear regression model using the lm() function and extract useful information from a fitted model using the summary() function. Appreciate why variable significance may change as a result of correlations between variables. Generate additional features such as interaction and polynomial terms in a linear model. Partition the data into training and test sets using the createDataPartition() from the caret package. Generate predictions on …

Read more

SOA ASA Exam: Predictive Analysis (PA) – 3. Linear Models

[mathjax] Basic Terminology Classification of Variables There are two ways to classify variables in a predictive analytic context: By their role in the study (intended use) or by their nature (characteristics). By role The variable that we are interested in predicting is called the target variable (or response variable, dependent variable, output variable). The variables that are used to predict the target variable go by different names, such as predictors, explanatory variables, input variables, or sometimes simply variables if no confusion arises. In an actuarial context, predictors are also known as risk factors or risk drivers.   By nature Variables can also be classified as numeric variables or categorical variables. Such a classification has important implications for developing an effective predictive model that aligns with the character of the target variable and predictors to produce realistic output. Numeric (a.k.a. quantitative) variables: Numeric variables take the form of numbers with an associated range. They can be further classified as discrete / continuous variables. Categorical (a.k.a. qualitative, factor) variables: As their name implies, categorical variables take predefined values, called levels or classes, out of a countable collection of “categories”. When a categorical variable takes only two possible levels, it is called a …

Read more

SOA ASA Exam: Predictive Analysis (PA) – 2. ggplot

[mathjax] Making ggplots Basic Features Load library library(ggplot2)   ggplot Function ggplot(data = <DATA>, mapping= aes(<AESTHETIC_1> = <VARIABLE_1>,                                     <AESTHETIC 2> = <VARIABLE_2>,                                     …)) +     geom_<TYPE>(< … >) +     geom_<TYPE>(< … >) +     <OTHER_FUNCTIDNS> +      … The ggplot() function initializes the plot, defines the source of data using the data argument (almost always a data frame), and specifies what variables in the data are mapped to visual elements in the plot by the mapping argument. Mappings in a ggplot are specified using the aes() function, with aes standing for “aesthetics“. The geom functions: Subsequent to the ggplot() function, we put in geometric objects, or geoms for short, which include points, lines, bars, histograms, boxplots, and many other possibilities, by means of one or more geom functions. Placed layer by layer, these geoms determine what kind of plot is to be drawn and modify its visual characteristics, taking the data and aesthetic mappings specified in the ggplot() function as inputs. …

Read more

SOA ASA Exam: Predictive Analysis (PA) – 1. Basics of R

[mathjax] Data Types Create an integer append “L” to an integer: x <- 1L Data Structures Vectors Create a vector c(…) a <- c(1:5) b <- c(5:1) c <- c(“A”, “B”, “C”) d <- c(TRUE, FALSE, FALSE, TRUE, TRUE) print(a) [1] 1 2 3 4 5 print(b) [1] 5 4 3 2 1 print(c) [1] “A” “B” “C” print(c) [1] “A” “B” “C” print(d) [1] TRUE FALSE FALSE TRUE TRUE   Create a sequence of numbers seq(from, to, by) x <- seq(0, 5, 1) [1] 1 2 3 4 5   Extract subsets of vectors [] # Using positive integers a[2] [1] 2 a[c(2, 4)] [1] 2 4 # Using negative integers b[-1] [1] 4 3 2 1 b[-(2:4)] [1] 5 1 # Using logical vectors a[d] [1] 1 4 5 Remark: Unequal Length: For two vectors of unequal length, the shorter vector is recycled by repeating the elements in shorter vector to match the longer vector. > print(a + 1:3) [1] 2 4 6 5 7   Factors Create a factor: factor(…) # define x as a vector x <- c(“M”, “F”, “M”, “O”, “F”) # factorize x and assign to x.factor x.factor <- factor(x) x.factor [1] M F M O F Levels: F …

Read more

SOA ASA Exam: Predictive Analysis (PA)

[mathjax] Linear Models Classification of Variables Intention (by their role in the study) target/response/dependent/output variable risk factors/drivers Characteristics (by their nature) numeric/quantitative variables categorical/qualitative/factor variables The Model Building Process Stage 1: Define the Business Problem Objectives prediction-focused (accurate prediction) vs. interpretation-focused (relationship)   Descriptive Analytics: Focuses on insights from the past and answers the question, “What happened?” Predictive Analytics: Focuses on the future and addresses, “What might happen next?” Prescriptive Analytics: Suggests decision options; for example, “What would happen if I do this?” or “What is the best course of action?” Constraints Availability of data Implementation issues Stage 2: Data Collection Data Desgin Relevance source the data from the right population and time frame. Sampling Random Sampling Voluntary surveys may be vulnerable to respondent bias. Stratified Sampling Divide the underlying population into a number of non-overlapping strata. Oversampling and undersampling: for unbalanced data. Systematic sampling: Use a set pattern. Random Sampling: set.seed(<n>) data.full$random <- runif(nrow(data.full)) data.train <- data.full[data.full$random < 0.7, ] data.test <- data.full[data.full$random >= 0.7, ] # Present the portion of training data nrow(data.train) / nrow(data.full)   Granularity How detailed the information contained by the variable is. The more detail a variable contains, the more granular it is. Data Quality Issues Reasonableness: …

Read more