Territory Analysis with Mixed Models and Clustering

Abstract
Motivation. Territory as it is currently implemented is not a causal rating variable. The actual causal forces that drive the geographical loss generating process (LGP) do so in a complicated manner. Both the loss cost gradient (LCG) and information density (largely driven by the geographical density of exposures and by loss frequency) can change rapidly, and at different rates and in different directions. This makes the creation of credible homogenous territories difficult. Auxiliary information that reflects the causal forces at work on the geographical LGP can provide useful information to the practitioner. Furthermore, since the conditions that drive the geographical LGP tend to be similar in proximity, the use of information from proximate geographical units can be helpful. However, to date procedures for incorporating auxiliary information involve the subjective consideration of conditions. And the use of proximate experience as a complement is complicated by complex patterns taken on by the LCG in relation to information density. Spline and graduation methods implicitly incorporate this information, but they tend to be applied ad-hoc to different regions. Incorporating a complement of credibility via proximate geographical units is only discussed formally in two papers, and is fairly undeveloped as a method. Another problem involves determining the relative value of information obtained via proximity versus the information provided by auxiliary variables. Separately, the implementation of territory as a categorical variable has prevented the integration of Territory Analysis with the parameterization of the remainder of the classification plan. In addition to these actuarial problems, territory‘s lack of causality creates acceptability problems. Lack of causality and increasingly complex territorial definitions have also reduced jurisdictional loss control incentives. The newly promulgated Proposition 103 regulations in California provide a useful venue for investigating solutions to these problems.

Method. Using the same data that was employed to create the California Private Passenger Automobile Frequency and Severity Bands Manual under Proposition 103, we employ a Mixed Model approach that combines the local zip code indication, an arithmetic model of causal geographical variables, and a proximity complement to determine the ultimate frequency and severity indication for each zip code. We then use constrained cluster analysis to assign these atomic geographical units into objectively determined and optimally configured frequency and severity bands. The constrained cluster analysis involves formulating the problem in terms of Nonlinear Programming.

Results. In three out of four cases, our approach, which is a rudimentary implementation of the mixed models with clustering concept that we introduce here, outperforms the existing Proposition 103 Frequency and Severity Bands Manual in terms of mean absolute deviation.

Conclusions. A mixed model approach is objective and efficient, and can substantially improve accuracy. The use of constrained cluster analysis on the result further achieves these ends. Furthermore, the development and analysis of the mixed model, particularly the arithmetic model of causal geographical variables, can be used to lay the groundwork for the introduction of causal geographical rating variables. These variables, such as traffic density, could eliminate complaints about the lack of causality. Furthermore, since these variables are typically continuous, they could be incorporated directly into the parameterization of the remaining classification plan. In California, such variables could be introduced to progressively supplant relative frequency and severity, improving accuracy and furthering the goals of Proposition 103.

Availability. The R programming language was used in preparing the data and mixed model. R is available free of charge at www.r-project.org. The constrained cluster analysis, employed the Premium Solvertm and KNITROtm Solver Engine. This software is distributed by Frontline Systems, Inc. Order information, including free 15-day trials, are available at www.solver.com.

Keywords. Territory Analysis; Rate Regulation; Predictive Modeling; Credibility; Personal Automobile; Classification Plans.

Page
91-169
Year
2008
Keywords
predictive analytics
Categories
Actuarial Applications and Methodologies
Ratemaking
Trend and Loss Development
Territory Analysis
Actuarial Applications and Methodologies
Ratemaking
Classification Plans
Business Areas
Automobile
Personal
Financial and Statistical Methods
Statistical Models and Methods
Predictive Modeling
Actuarial Applications and Methodologies
Regulation and Law
Rate Regulation
Financial and Statistical Methods
Credibility
Publications
Casualty Actuarial Society Discussion Paper Program
Authors
J. Paul Walsh
Eric J Weibel