477
27
Multilevel Factors and Smoothing
In our treatment of generalised linear models (GLMs) in Chapter 26, we have come across
two different types of rating factors:
a. Continuous variables such as the annual mileage or the driver’s age, which may or
may not be related monotonically to the modelled variable (e.g. the claim frequency)
b. Categorical variables (i.e. variables with a limited number of possible values) that
have in general no inherent order: examples are sex, occupation, car model and
geographical location
Categorical variables are straightforward to model when the number of levels is small,
such as is the case with sex. However, they pose a particular challenge if the number of
levels is very large and there is no obvious way of ordering them, such as is the case for car
models* or postcodes. In this case, the calibration of a GLM becomes problematic because,
for some of the values (in our example, some of the car models), loss data may be scanty.
Also, the lack of a natural ordering means that there is no obvious way of merging catego-
ries together (as we can do with ages) to obtain categories with more data.
Following Ohlsson and Johansson (2010), we call these factors multilevel factors (MLF),
and we describe them as nominal variables (i.e. with no particular ordering) that have too
many levels to be analysed by standard GLM techniques.
Traditional approaches to MLFs are as follows:
1. Reduce the number of categories by merging together categories that are thought
to be similar, in a heuristic fashion. For example, you could reduce the number of
car models drastically by considering only the make of the car (e.g. BMW, Vauxhall,
Fiat) rather than the full car model code, or use the Association of British Insurers’
(ABI) grouping, which classies all vehicles into one of 50 groups with similar
characteristics (Institute and Faculty of Actuaries [IFoA] 2010). Analogously, post-
codes at, say, town or borough level could be merged together. This is of course
not ideal because a lot of useful information is discarded, and is not the modern
approach to MLFs. However, it has traditionally been used when not enough com-
puter power and adequate statistical techniques were available.
2. Find the underlying factors that explain the difference between the different cat-
egories captured by a MLF, and use these as factors as inputs to the GLM instead
of the original MLF. For example, in the case of car models, one can replace the car
model code with factors such as the vehicle’s cubic capacity, year of manufactur-
ing, fuel type, body type and so on, and calibrate the GLM based on these under-
lying factors. The factors can be found by factor analysis (a statistical technique) or
by using the factors already available in vehicle classication systems such as that
*
Ohlsson and Johansson (2010) estimate 2500 car model codes in Sweden, and this estimate can probably be
transferred to other countries as well.

Get Pricing in General Insurance now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.