Applications of Gaussian-Inverse Wishart Process Regression Models in Claims Reserving

Abstract

Gaussian processes are stochastic processes based on the normal distribution (i.e., collections of normal random variables indexed by a mathematical set). In the context of probability theory and statistics, these processes are well-known and well-behaved objects that have been extensively explored and used. On the other hand, Gaussian process regression (GPR) is a relatively lesser known procedure based on Gaussian stochastic processes that can be implemented in the context of machine learning for both regression and classification problems.GPR can be defined as a supervised nonparametric machine learning technique stemming from the Bayesian field. This is a relatively novel technique in the field of machine learning and statistics, only used in some geostatistics applications under the name of kriging. Moreover, its use is nearly unknown in the actuarial sciences.We propose a novel procedure based on the inverse Wishart distribution that has not been explored in the context of actuarial modeling. This affords us the advantage of exploring the full correlation between claim amounts, observed and expected, a piece of crucial information in many reserving and capital requirements analyses.Additionally, our goal is to provide actuarial practitioners with an easy explanation of GPR workings alongside a worked example and comparisons with traditional methodologies. In a world where the relevance of advanced analytics is ever growing, GPR models can represent another powerful instrument in the actuary’s toolkit.A main feature of GPR is the ability to fit functions on a set of observations and produce predictions with uncertainty intervals around them. Because of this feature, GPR techniques can be ideal candidates to extend traditional stochastic reserving techniques to meet the ever-growing need of practitioners, regulators, and rating agencies to be able to quantify reserve variability.We believe this work represents a starting point for further research while offering an initial understanding of the topic to anyone interested. Furthermore, we provide all the data and tools needed to replicate the results shown. For this reason, we adopt only open-source software and publicly available datasets. In particular, we use the publicly available NAIC Schedule P dataset to collect loss triangle and premium data. The models’ implementation is performed in R, and the code of the project is made available through a GitHub repository.

Volume
18
Year
2025
Keywords
Reserving
Publications
Variance
Authors
Marco De Virgilis
Giulio Carnevale
Formerly on syllabus
Off