Synthesizing Property & Casualty Ratemaking Datasets using Generative Adversarial Networks

Abstract

Due to confidentiality issues, it can be difficult to access or share interesting datasets for methodological development in actuarial science or other fields where personal data are important. We show how to design three different types of generative adversarial networks (GANs) that can build a synthetic insurance dataset from a confidential original dataset. The goal is to obtain synthetic data that no longer contains sensitive information but still has the same structure as the original dataset and retains the multivariate relationships. In order to adequately model the specific characteristics of insurance data, we use GAN architectures adapted for multicategorical data: a Wassertein GAN with gradient penalty (MC-WGAN-GP), a conditional tabular GAN (CTGAN), and a Mixed Numerical and Categorical Differentially Private GAN (MNCDP-GAN). For transparency, the approaches are illustrated using a public dataset, the French motor third-party liability data. We compare the three different GANs on various aspects: ability to reproduce the original data structure and predictive models, privacy, and ease of use. We find that the MC-WGAN-GP synthesizes the best data, the CTGAN is the easiest to use, and the MNCDP-GAN guarantees differential privacy.This work was supported by an Individual Grant from the Casualty Actuarial Society.

Volume
18
Year
2025
Keywords
Financial and Statistical Methods
Publications
Variance
Authors
Brian Hartman
Marie-Pier Cote
Olivier Mercier
Josh Meyers
Jared Cummings
Elijah Harmon
Formerly on syllabus
Off