Taking too long? Close loading screen.

SOA ASA Exam: Predictive Analysis (PA) – Summary

Summary of Questions Question Answer What are the modeling improvements? Modeling Improvements: Adding an interaction term, factorizing a variable, using a tree-based model to take care non-linear relationship. Describe / Explain … (how X is used). Definition of X Explain one way how X is used Discuss … Definition / Effects Evaluate the influence to the subject Examples: discuss the plausibility of the outliers: What kind of outliers would be implausible? (range) Evaluate why some extreme values are plausible. (causes) discuss the outliers wrt. the goal to to reduce response time below 6 minutes for 90% of calls Definition of the goal What kind of outliers does not fit the goal? (exceeded 6 minutes, ) Evaluate outliers contribute to the goal achievement? discuss the outliers wrt. fitting a GLM that predicts response time. Effects of the outliers to model fitting Propose questions for … that will help clarify the business objective. insights: initial hypothesis or intuition that might explain variation in y data preparation: consulting specialists before performing analysis data collection: if unexpected changes What are the reasons why bias may not always decrease with additional degrees of freedom by adding a new predictor? with no predictive power substantial collinearity …

Read more

SOA ASA Exam: Predictive Analysis (PA) – 6. Principal Components and Cluster Analyses

Principal Components and Cluster Analyses LEARNING OBJECTIVES The candidate will be able to apply cluster and principal components analysis to enhance supervised learning. The Candidate will be able to: Understand and apply K-means clustering. Understand and apply hierarchical clustering. Understand and apply principal component analysis.   Chapter Overview As you can tell from its name, Exam PA is mostly concerned with developing models to “predict” a target variable of interest. In the final chapter of Part II of this study manual, we switch our attention from supervised learning methods to unsupervised learning methods, which ignore the target variable (if present) and look solely at the predictor variables in the dataset to extract their structural relationships. We will learn two unsupervised learning techniques, principal components analysis (PCA) and cluster analysis, which are advanced data exploration tools with the following merits: Exploratory data analysis: These tools lend themselves to high-dimensional datasets, which are characterized by an overwhelming number of variables compared to observations. To make sense of these datasets, it is necessary to explore and visualize the relationships not only between pairs of variables, but also among a large group of variables on a holistic basis. For this purpose, traditional bivariate data …

Read more

SOA ASA Exam: Predictive Analysis (PA) – 5.3. Extended Case Study: Classification Trees

[mathjax] Extended Case Study: Classification Trees LEARNING OBJECTIVES The focus of this section is on constructing, evaluating, and interpreting base and ensemble trees. At the completion of this case study, you should be able to: Understand how decision trees form tree splits based on categorical predictors. Understand how decision trees deal with numeric predictors having a non-linear relationship with the target variable. Build base classification trees, control their complexity by pruning, and interpret their output. Build ensemble trees using the caret package and tune the model parameters for optimal performance. Quantify the prediction accuracy of (base or ensemble) classification trees constructed. Recommend a decision tree taking both prediction accuracy and interpretability into account.   Problem Set-up and Preparatory Steps Data Description This case study revolves around the Wage dataset in the ISLR package. This dataset contains the income and demographic information (e.g., age, education level, marital status) collected through an income survey for a group of 3,000 male workers residing in the Mid-Atlantic region of the US. The data dictionary is shown in Table 5.2. Variable Description Values year Calendar year that wage information was recorded Integer from 2003 to 2009 age Age of worker Integer from 18 to 80 …

Read more

SOA ASA Exam: Predictive Analysis (PA) – 5.2. Mini-Case Study: A Toy Decision Tree

[mathjax] Mini-Case Study: A Toy Decision Tree LEARNING OBJECTIVES In this section, we construct a toy decision tree on a small-scale dataset taken from a sample question of the Modern Actuarial Statistics II Exam of the Casualty Actuarial Society and displayed in Table 5.1. The small number of observations makes it possible for us to perform calculations by hand and replicate the R output that is inadequately explained in the PA e-learning modules and commonly misunderstood by many users (partly due to the somewhat confusing documentation of the package we will use). X1 X2 Y 1 0 1.2 2 1 2.1 3 2 1.5 4 1 3.0 2 2 2.0 1 1 1.6 # CHUNK 1 X1 <- c(1, 2, 3, 4, 2, 1) X2 <- c(0, 1, 2, 1, 2, 1) Y <- c(1.2, 2.1, 1.5, 3.0, 2.0, 1.6) dat <- data.frame(X1, X2, Y) After completing this mini-case study, you should be able to: Fit a decision tree using the rpart() function from the rpart package. Understand how the control parameters of the rpart() function control tree complexity. Produce a graphical representation of a fitted decision tree using the rpart. plot() function. Interpret the output for a decision tree, …

Read more

SOA ASA Exam: Predictive Analysis (PA) – 5.1. Conceptual Foundations of Decision Trees

[mathjax] LEARNING OBJECTIVES Able to construct decision trees for both regression and classification. Understand the basic motivation behind decision trees. Construct regression and classification trees. Use bagging and random forests to improve accuracy. Use boosting to improve accuracy. Select appropriate hyperparameters for decision trees and related techniques.   EXAM NOTE As pointed out in Subsection 3.1.1, there are only two supervised learning techniques in Exam PA, GLMs and decision trees. To assess your knowledge of the syllabus materials effectively, the SOA usually “marries” decision trees with GLMs to come up with a comprehensive exam project.   Chapter Overview This chapter enriches our predictive analytics toolbox and introduces the second type of supervised learning technique in Exam PA, namely, decision trees. Just like GLMs, decision trees can be applied to tackle both regression and classification problems with both numeric and categorical predictors, but with a fundamentally different approach. While GLMs provide a prediction equation based on a linear combination of the predictors, decision trees divide the feature space (i.e., the space of all combinations of feature values) into a finite set of non-overlapping and exhaustive regions of relatively homogeneous observations more amenable to analysis and prediction. To predict a given observation, …

Read more

SOA ASA Exam: Predictive Analysis (PA) – 4.4. Generalized Linear Models Case Study 3

[mathjax] Case Study 3: GLMs for Count and Aggregate Loss Variables Learning Objectives Select appropriate distributions and link functions for count and severity variables. Identify appropriate offsets and weights for count and severity variables. Implement GLMs for count and severity variables in R. Assess the quality of a Poisson GLM using the Pearson goodness-of-fit statistic. Combine the GLMs for count and severity variables to make predictions for an aggregate loss variable.   Background Compiled by the Swedish Committee on the Analysis of Risk Premium in Motor Insurance, the dataset in this case study describes third-party automobile insurance claims for the year 1977 (third-party claims entail payments to someone other than the policyholder and the insurer). This dataset involves grouped observations, with each row of the dataset corresponding to a collection of (homogeneous) policyholders sharing the same set of predictor values rather than a single policyholder. The variables include various automobile and policyholder characteristics such as the distance driven by a vehicle, geographic area, and recent driver claims experience (see Table 4.6 for the data dictionary). Unlike Section 4.3, where our interest is only the probability of making a claim, the outcomes of interest in this case study are twofold: The …

Read more

SOA ASA Exam: Predictive Analysis (PA) – 4.3. Generalized Linear Models Case Study 2

[mathjax] Case Study 2: GLMs for Binary Target Variables Learning Objectives Compared to GLMs for numeric target variables, GLM-based classifiers enjoy some subtly unique features, which will be revealed in the course of this case study. At the completion of this section, you should be able to: Combine factor levels to reduce the dimension of the data. Select appropriate link functions for binary target variables. Implement different kinds of GLMs for binary target variables in R. Incorporate an offset into a logistic regression model. Interpret the results of a fitted logistic regression model.   Background In this case study, we will examine the dataCar dataset in the insuranceData package. This dataset is based on a total of n = 67,856 one-year vehicle insurance policies taken out in 2004 or 2005. The variables in this dataset pertain to different characteristics of the policyholders and their vehicles. The target variable is clm, a binary variable equal to 1 if a claim occurred over the policy period and 0 otherwise.   Stage 1: Define the Business Problem Objective Our objective here is to construct appropriate GLMs to identify key factors associated with claim occurrence. Such factors will provide insurance companies offering vehicle insurance …

Read more

SOA ASA Exam: Predictive Analysis (PA) – 4.2. Generalized Linear Models Case Study 1

[mathjax] Case Study 1: GLMs for Continuous Target Variables Learning Objectives Select appropriate distributions and link functions for a positive, continuous target variable with a right skew. Fit a GLM using the glm() function in R and specify the options of this function appropriately. Make predictions for GLMs using the predict() function and compare the predictive performance of different GLMs. Generate and interpret diagnostic plots for a GLM.   Preparatory Steps Background persinj contains the information of n = 22,036 settled personal injury insurance claims which were reported during the period from July 1989 to the end of 1999. Claims settled with zero payment were not included.   Objective Our objective here is to build GLMs to predict the size of settled claims using related risk factors in the dataset, select the most promising GLM, and quantify its predictive accuracy. For claim size variables, which are continuous, positive-valued and often highly skewed, common modeling options include: Apply a log transformation to claim size and fit a normal linear model to the log-transformed claim size. Build a GLM with the normal distribution and a link function such as the log link to ensure that the target mean is non-negative. Build a …

Read more

SOA ASA Exam: Predictive Analysis (PA) – 4.1. Generalized Linear Models

[mathjax] EXAM PA LEARNING OBJECTIVES Learning Objectives The Candidate will be able to describe and select a Generalized Linear Model (GLM) for a given data set and regression or classification problem. Learning Outcomes The Candidate will be able to: Understand the specifications of the GLM and the model assumptions. Create new features appropriate for GLMs. Interpret model coefficients, interaction terms, offsets, and weights. Select and validate a GLM appropriately. Explain the concepts of bias, variance, model complexity, and the bias-variance trade-off. In Exam PA, there are often tasks that require you to describe, in high-level terms, what a GLM is and the pros and cons of a GLM relative to other predictive models, so the conceptual aspects of GLMs will be useful not only for understanding the practical implementations of GLMs in the next three sections, but also for tackling exam items. Because all of the feature generation (e.g., binarization of categorical predictors, introduction of polynomial and interaction terms) and feature selection techniques (e.g., stepwise selection algorithms and regularization) for linear models generalize to GLMs in essentially the same way and everything we learned about the bias-variance trade-off for linear models also applies here, our focus in this section is …

Read more

SOA ASA Exam: Predictive Analysis (PA) Case Studies

[mathjax] Regularization What is regularization? Reduce model complexity: Reduces the magnitude of the coefficient estimates via the use of a penalty term and serves to prevent overfitting. An alternative to using stepwise selection for identifying useful features. How does regularization work? Variables with limited predictive power will receive a coefficient estimate that is small, if not exactly zero, and therefore are removed from the model. α If it is to identify key factors affecting the target variable, using α = 0, which is ridge regression and does not eliminate any variables, is not appropriate.   Interactions Interpretation There is a significant interaction between [A] and [B], meaning that: The effect of [A] on [Y] varies for … with and without [B].   3.2 Linear Models Case Study 2: Feature Selection and Regularization Learning Objectives After completing this case study, you should be able to: Fit a multiple linear regression model with both numeric and categorical (factor) predictors. Detect and accommodate interactions between predictors which can be quantitative or qualitative. Perform explicit binarization of categorical predictors and understand why doing so may be beneficial. library(caret) dummyVars() Perform stepwise selection and be familiar with the different options allowed by this function. library(MASS) …

Read more