SOA ASA Exam: Predictive Analysis (PA)
[mathjax] Linear Models Classification of Variables Intention (by their role in the study) target/response/dependent/output variable risk factors/drivers Characteristics (by their nature) numeric/quantitative variables categorical/qualitative/factor variables The Model Building Process Stage 1: Define the Business Problem Objectives prediction-focused (accurate prediction) vs. interpretation-focused (relationship) Descriptive Analytics: Focuses on insights from the past and answers the question, “What happened?” Predictive Analytics: Focuses on the future and addresses, “What might happen next?” Prescriptive Analytics: Suggests decision options; for example, “What would happen if I do this?” or “What is the best course of action?” Constraints Availability of data Implementation issues Stage 2: Data Collection Data Desgin Relevance source the data from the right population and time frame. Sampling Random Sampling Voluntary surveys may be vulnerable to respondent bias. Stratified Sampling Divide the underlying population into a number of non-overlapping strata. Oversampling and undersampling: for unbalanced data. Systematic sampling: Use a set pattern. Random Sampling: set.seed(<n>) data.full$random <- runif(nrow(data.full)) data.train <- data.full[data.full$random < 0.7, ] data.test <- data.full[data.full$random >= 0.7, ] # Present the portion of training data nrow(data.train) / nrow(data.full) Granularity How detailed the information contained by the variable is. The more detail a variable contains, the more granular it is. Data Quality Issues Reasonableness: …