Claim Frequency Modeling in Insurance Pricing using GLM, Deep Learning, and Gradient Boosting
Abstract
What added value can machine learning methods offer for insurance pricing? To answer this question, we model claim frequencies using a large French auto liability insurance dataset and then compare the forecast results. In addition to the methods used in the first version of this case study—generalized linear models (GLM), deep neural networks, and decision tree-based model ensembles (eXtreme Gradient Boosting, "XGBoost")—we have included regularized generalized linear models (LASSO and Ridge), generalized additive models (GAM), and two other modern representatives from the class of decision tree-based model ensembles ("LightGBM" and "CatBoost").
We also incorporate the integration of classical models into neural networks as shown by Schelldorfer and Wüthrich (2019), along with a preceding dimensionality reduction. Additionally, we explore issues related to tariff structure and model stability, perform cross-validation, and address the interpretability of complex decision tree-based methods using SHAP.
The findings reveal that both deep neural networks and decision tree-based model ensembles can at least enhance classical models. Among the classical models, the generalized additive model proves superior but does not reach the predictive capabilities of the decision tree-based model ensembles.
Moreover, the decision tree-based model ensembles "XGBoost" and "LightGBM" show themselves to be vastly superior predictive models even when considering the tariff structure in the examined dataset.
The analysis is publicly available as an R-Notebook. This report presents a brief overview of the key aspects and results. Detailed insights are available in the analysis report (Jupyter Notebook) with numerous graphics, accessible to all interested parties for commenting, copying, modification, and further exploration:
https://www.kaggle.com/floser/glm-neural-nets-and-xgboost-for-insurance-pricing
A new addition is a Python notebook from which the optimized hyperparameters originate:
https://www.kaggle.com/code/floser/use-case-claim-frequency-modeling-python
The DAV (German Actuarial Association) is not responsible for the code and data linked with Kaggle repositories. These reflect the individual opinions of each Kaggle user.
0. Motivation and Sources
This study aims to demonstrate how selected machine learning (ML) methods can lead to more accurate predictions of one or more claims. It is partly based on R-codes by Mario Wüthrich (for GLM2, GLM4, and Deep Learning) and has been extended to include data visualization, generalized additive models, various decision tree-based model ensembles (XGBoost, LightGBM, and CatBoost), cross-validation, tariff structure, and interpretability using SHAP.
1. Dataset
The study uses a publicly available French auto liability insurance dataset with 678,031 policies at the individual contract level. The dataset contains nine descriptive features, including vehicle characteristics (*4), regional characteristics (*3), driver age, and bonus-malus classification, in addition to the number of claims and the duration of insurance. The dataset is enriched with the corresponding claim amounts and is cleansed of claims without corresponding amounts, reducing the dataset to 678,013 policies. An initially very noticeable, implausible claim segment is thus eliminated. For typically much more complex internal company datasets, machine learning methods have greater potential for improvement than shown with this "thin" dataset.
2. Models and Methodology
a) Generalized Linear Models
As representatives of classical actuarial claim frequency models, we use a simple generalized linear model and models extended by polynomial terms and interactions. Insights from comparisons with neural networks and interaction analyses from decision tree-based models are incorporated into the advanced GLMs.
b) Regularized GLM (R-Notebook)
To address the risk of overfitting in GLMs, we apply two regularization approaches using a "penalty term" in the loss function for excessive model complexity. L1-regularization ("LASSO") can lead to some coefficients being set to zero, which acts as a form of feature selection, while L2-regularization ("Ridge Regression") causes the corresponding coefficients to approach but not reach zero.
c) Generalized Additive Models "GAM" (R-Notebook)
Generalized additive models (GAMs) extend GLMs to account for specific nonlinear relationships while retaining additivity. Consequently, GAMs are more flexible than GLMs.
d) Deep Neural Networks
Networks with two hidden layers are already considered deep. We focus on networks shown by Schelldorfer and Wüthrich (2019) characterized by a preceding dimensionality reduction ("embeddings") for the features region and car brand (model "NNemb"). Classical actuarial models like a GLM can be integrated into a "Combined Actuarial Neural Net (CANN)" as shown in the model "NNGLM", potentially reducing computation time and improving classical models via residual analysis. These networks are implemented with three hidden layers of 10 to 20 neurons each and have a relatively low number of parameters for comparably powerful networks.
e) Decision Tree-based Model Ensembles
Currently, decision tree-based model ensembles are the most powerful and relatively easy-to-use tools for machine learning on tabular data. This was demonstrated in the previous version of this case study using "XGBoost" models. Now, we also include alternatives "LightGBM" and "CatBoost."
First, the full potential of an unrestricted boosting model is demonstrated, and then a monotonic increasing constraint on the "BonusMalus" feature is used as an example of a tariff system consideration.
The interpretability of these models is illustrated using SHAP, a method based on game theory, both globally and locally.
f) Null Model INT (Intercept only)
To assess the different levels of accuracy of the aforementioned models, a "null model" without any differentiation (forecast based on average claim frequency) is created as a benchmark.
All twelve models are fitted using 80% of the data, and the model performance is evaluated on the remaining 20%. The goodness-of-fit is measured using weighted deviance assuming a Poisson distribution ("Poisson Deviance").
As the chosen division into training and test samples can significantly influence the absolute levels of observed and predicted claim frequencies, the analysis is performed using 5-fold cross-validation.
3. Results
The model comparisons indicate that decision tree-based model ensembles outperform other models in terms of higher predictive accuracy (lower Poisson Deviance). The observed variability is largely driven by random differences in the samples used and diminishes in relative terms.
4. Conclusion
In summary, both notebooks reveal that gradient boosting models emerge as the superior predictive models in the examined dataset, even when considering the tariff system. XGBoost continues to perform well, but LightGBM is slightly better and significantly faster. To ensure interpretability, SHAP, based on game theory, was used to explain the decisions of LightGBM both globally and locally.
While standard GLMs perform the worst (as do regularized GLMs), generalized additive models, which can be seen as extensions of GLMs, perform remarkably well in the R-Notebook. Neural networks with two-dimensional embeddings, which are easy to visualize, show similar predictive quality to GAM but are considerably more complex to implement.
References
Faraway, J. J. (2016), "Extending the Linear Model with R"https://julianfaraway.github.io/faraway/ELM/
Hastie, T., Tibshirani, R. (1984), "Generalized Additive Models", SLAC PUB-3531: https://www.slac.stanford.edu/pubs/slacpubs/3500/slac-pub-3531.pdf
James et al., "An Introduction to Statistical Learning", 2ed, 2021-2023. Available in two versions with R or Python examples, see https://www.statlearning.com/
Mayer, M., Meier, D., Wüthrich, M. V. (2023), "SHAP for Actuaries: Explain any Model": https://github.com/actuarial-data-science/Tutorials/tree/master/14%20-%20SHAP
Schelldorfer, J., Wüthrich, M. V. (2019) "Nesting Classical Actuarial Models into Neural Networks", SSRN-Preprint 3320525 https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3320525
Wüthrich, M. V., Merz, M. (2023) "Statistical Foundations of Actuarial Learning and its Applications"open-access book: https://link.springer.com/book/10.1007/978-3-031-12409-9