Taking too long? Close loading screen.

SOA ASA Exam: Predictive Analysis (PA) – Summary

Summary of Questions Question Answer What are the modeling improvements? Modeling Improvements: Adding an interaction term, factorizing a variable, using a tree-based model to take care non-linear relationship. Describe / Explain … (how X is used). Definition of X Explain one way how X is used Discuss … Definition / Effects Evaluate the influence to the subject Examples: discuss the plausibility of the outliers: What kind of outliers would be implausible? (range) Evaluate why some extreme values are plausible. (causes) discuss the outliers wrt. the goal to to reduce response time below 6 minutes for 90% of calls Definition of the goal What kind of outliers does not fit the goal? (exceeded 6 minutes, ) Evaluate outliers contribute to the goal achievement? discuss the outliers wrt. fitting a GLM that predicts response time. Effects of the outliers to model fitting Propose questions for … that will help clarify the business objective. insights: initial hypothesis or intuition that might explain variation in y data preparation: consulting specialists before performing analysis data collection: if unexpected changes What are the reasons why bias may not always decrease with additional degrees of freedom by adding a new predictor? with no predictive power substantial collinearity …

Read more

SOA ASA Exam: Predictive Analysis (PA) – 6. Principal Components and Cluster Analyses

Principal Components and Cluster Analyses LEARNING OBJECTIVES The candidate will be able to apply cluster and principal components analysis to enhance supervised learning. The Candidate will be able to: Understand and apply K-means clustering. Understand and apply hierarchical clustering. Understand and apply principal component analysis.   Chapter Overview As you can tell from its name, Exam PA is mostly concerned with developing models to “predict” a target variable of interest. In the final chapter of Part II of this study manual, we switch our attention from supervised learning methods to unsupervised learning methods, which ignore the target variable (if present) and look solely at the predictor variables in the dataset to extract their structural relationships. We will learn two unsupervised learning techniques, principal components analysis (PCA) and cluster analysis, which are advanced data exploration tools with the following merits: Exploratory data analysis: These tools lend themselves to high-dimensional datasets, which are characterized by an overwhelming number of variables compared to observations. To make sense of these datasets, it is necessary to explore and visualize the relationships not only between pairs of variables, but also among a large group of variables on a holistic basis. For this purpose, traditional bivariate data …

Read more

SOA ASA Exam: Predictive Analysis (PA) – 5.3. Extended Case Study: Classification Trees

[mathjax] Extended Case Study: Classification Trees LEARNING OBJECTIVES The focus of this section is on constructing, evaluating, and interpreting base and ensemble trees. At the completion of this case study, you should be able to: Understand how decision trees form tree splits based on categorical predictors. Understand how decision trees deal with numeric predictors having a non-linear relationship with the target variable. Build base classification trees, control their complexity by pruning, and interpret their output. Build ensemble trees using the caret package and tune the model parameters for optimal performance. Quantify the prediction accuracy of (base or ensemble) classification trees constructed. Recommend a decision tree taking both prediction accuracy and interpretability into account.   Problem Set-up and Preparatory Steps Data Description This case study revolves around the Wage dataset in the ISLR package. This dataset contains the income and demographic information (e.g., age, education level, marital status) collected through an income survey for a group of 3,000 male workers residing in the Mid-Atlantic region of the US. The data dictionary is shown in Table 5.2. Variable Description Values year Calendar year that wage information was recorded Integer from 2003 to 2009 age Age of worker Integer from 18 to 80 …

Read more

SOA ASA Exam: Predictive Analysis (PA) – 5.2. Mini-Case Study: A Toy Decision Tree

[mathjax] Mini-Case Study: A Toy Decision Tree LEARNING OBJECTIVES In this section, we construct a toy decision tree on a small-scale dataset taken from a sample question of the Modern Actuarial Statistics II Exam of the Casualty Actuarial Society and displayed in Table 5.1. The small number of observations makes it possible for us to perform calculations by hand and replicate the R output that is inadequately explained in the PA e-learning modules and commonly misunderstood by many users (partly due to the somewhat confusing documentation of the package we will use). X1 X2 Y 1 0 1.2 2 1 2.1 3 2 1.5 4 1 3.0 2 2 2.0 1 1 1.6 # CHUNK 1 X1 <- c(1, 2, 3, 4, 2, 1) X2 <- c(0, 1, 2, 1, 2, 1) Y <- c(1.2, 2.1, 1.5, 3.0, 2.0, 1.6) dat <- data.frame(X1, X2, Y) After completing this mini-case study, you should be able to: Fit a decision tree using the rpart() function from the rpart package. Understand how the control parameters of the rpart() function control tree complexity. Produce a graphical representation of a fitted decision tree using the rpart. plot() function. Interpret the output for a decision tree, …

Read more

SOA ASA Exam: Predictive Analysis (PA) – 5.1. Conceptual Foundations of Decision Trees

[mathjax] LEARNING OBJECTIVES Able to construct decision trees for both regression and classification. Understand the basic motivation behind decision trees. Construct regression and classification trees. Use bagging and random forests to improve accuracy. Use boosting to improve accuracy. Select appropriate hyperparameters for decision trees and related techniques.   EXAM NOTE As pointed out in Subsection 3.1.1, there are only two supervised learning techniques in Exam PA, GLMs and decision trees. To assess your knowledge of the syllabus materials effectively, the SOA usually “marries” decision trees with GLMs to come up with a comprehensive exam project.   Chapter Overview This chapter enriches our predictive analytics toolbox and introduces the second type of supervised learning technique in Exam PA, namely, decision trees. Just like GLMs, decision trees can be applied to tackle both regression and classification problems with both numeric and categorical predictors, but with a fundamentally different approach. While GLMs provide a prediction equation based on a linear combination of the predictors, decision trees divide the feature space (i.e., the space of all combinations of feature values) into a finite set of non-overlapping and exhaustive regions of relatively homogeneous observations more amenable to analysis and prediction. To predict a given observation, …

Read more

SOA ASA Exam: Predictive Analysis (PA) – 4.4. Generalized Linear Models Case Study 3

[mathjax] Case Study 3: GLMs for Count and Aggregate Loss Variables Learning Objectives Select appropriate distributions and link functions for count and severity variables. Identify appropriate offsets and weights for count and severity variables. Implement GLMs for count and severity variables in R. Assess the quality of a Poisson GLM using the Pearson goodness-of-fit statistic. Combine the GLMs for count and severity variables to make predictions for an aggregate loss variable.   Background Compiled by the Swedish Committee on the Analysis of Risk Premium in Motor Insurance, the dataset in this case study describes third-party automobile insurance claims for the year 1977 (third-party claims entail payments to someone other than the policyholder and the insurer). This dataset involves grouped observations, with each row of the dataset corresponding to a collection of (homogeneous) policyholders sharing the same set of predictor values rather than a single policyholder. The variables include various automobile and policyholder characteristics such as the distance driven by a vehicle, geographic area, and recent driver claims experience (see Table 4.6 for the data dictionary). Unlike Section 4.3, where our interest is only the probability of making a claim, the outcomes of interest in this case study are twofold: The …

Read more

SOA ASA Exam: Predictive Analysis (PA) – 4.3. Generalized Linear Models Case Study 2

[mathjax] Case Study 2: GLMs for Binary Target Variables Learning Objectives Compared to GLMs for numeric target variables, GLM-based classifiers enjoy some subtly unique features, which will be revealed in the course of this case study. At the completion of this section, you should be able to: Combine factor levels to reduce the dimension of the data. Select appropriate link functions for binary target variables. Implement different kinds of GLMs for binary target variables in R. Incorporate an offset into a logistic regression model. Interpret the results of a fitted logistic regression model.   Background In this case study, we will examine the dataCar dataset in the insuranceData package. This dataset is based on a total of n = 67,856 one-year vehicle insurance policies taken out in 2004 or 2005. The variables in this dataset pertain to different characteristics of the policyholders and their vehicles. The target variable is clm, a binary variable equal to 1 if a claim occurred over the policy period and 0 otherwise.   Stage 1: Define the Business Problem Objective Our objective here is to construct appropriate GLMs to identify key factors associated with claim occurrence. Such factors will provide insurance companies offering vehicle insurance …

Read more