What machine learning techniques can help actuaries?

Some of the most important questions insurance company CEOs ask their reserving actuaries are: “How much adverse development are we experiencing?” “What is driving these results?” and “Are any parts of the business heading into trouble?”

These are critical questions as adverse development in loss experience is a strategic concern to insurance leaders because it creates uncertainty in achieving targets.

Reserve adequacy is negatively influenced by adverse development and business leaders are very concerned about what drives results. Current actuarial techniques are good at quantifying in the aggregate – but less so in the detail. One of the most powerful tools in the actuarial toolbox is the loss triangle – the primary method in which actuaries organize claim data to be used in a reserve analysis. The reason it is called a loss triangle is that a typical submission of claim data shows numeric values by accident year and evaluation period, which aggregate into a triangle. The loss triangle is fundamental to assessing reserve adequacy on a portfolio of P&C insurance products.

The actuary uses several rigorous analytical techniques alongside the triangle to understand and measure results. While important to assess results in the aggregate, these methods struggle when slicing and dicing into segments. In the aggregate, the data is larger and as such you have more credibility; however, a larger data set tends to be very heterogeneous. As you drill into segments you get more homogenous data; however, the results are less credible.

One solution companies should consider is to incorporate machine learning techniques into reserving processes. Pricing actuaries faced a similar issue when trying to balance homogeneity and credibility and turned to sophisticated modeling techniques to help. Much like machine learning, modern modeling techniques are useful tools to address many problems because they explicitly integrate homogeneity/credibility balancing within the algorithms. They automate manual ‘slice and dice’ searching. However, these methods are often very complicated and can have a ‘black box’ feel – so care and expertise are even more important.

A powerful tool
Much like operating a chainsaw, machine learning is a powerful tool that needs to be approached with caution and respect.In the hands of an expert, machine learning can bring tangible benefits. In the hands of an inexperienced user – that chainsaw can do a lot of damage. It is important to integrate machine learning techniques along with other reserving techniques so that not only are companies able to quantify the adverse development but companies can articulate reasons for that adverse development through machine learning.

What are you modeling?
The fundamental question companies must ask themselves in any modeling exercise is: What are you modeling?

For a line of business like auto insurance – what is the best way to start should you start at the enterprise level, the portfolio level, the coverage level, etc. It is best to align the machine learning process with the existing reserving splits. Since one of the machine learning outputs is ‘better’ segmentation, as companies adopt this more, they may migrate to newer splits.

Granular data
Machine learning techniques are best performed when there is more data – specifically more information about each claim. So, another key piece is to have granular claims data. Not only should companies want the granular claim data, but they should track the data as it changes over time. This could be any time interval. However, given that it should align with reserving – it should be a consistent time interval (e.g., if you do quarterly reserves, then you want to have data around the claim at three months, six months, nine months, etc.).

There are a lot of metrics around the claim – paid losses, case reserve, allocated loss adjustment expenses, etc. – all these various metrics could be analyzed in the machine learning model. The predictors are all information that a company can capture about the claim from policy information to circumstance information to claimant info, etc.

Prep the data
The first step in machine learning is to identify a company’s A/B sampling – making sure that setting aside data for validation is critical in order to assess a model – one approach is to set aside a random X%. Care must be taken to set aside data that has information that is not in the modeling data. Another approach is to set aside a specific time – to capture more of a true measure of predictiveness – however, this can result in a model that gets more out of date quickly.

Once a company has set aside data for validation, the next step is to take the data for modeling and set up a ‘cross-fold,’ which means that the modeling data will be split into groups and the machine learning method will be performed on different subsets (e.g., if you split the modeling data into four folds, the machine learning algorithm will be built four separate times where each time one of the folds is excluded and the ‘final’ model is the combination of the four separate models).

Next, a company needs to consider all the predictors in the model and specify if there is any natural order (e.g., claimant age has a natural order, whereas accident location is more of a categorical construct).

A machine learning model is expressed as a series of ‘hyperparameters.’ These are metrics that describe the shape and structure of the final algorithm. For example, a gradient boosting machine can be described by the depth of the trees (i.e., how many times the data is segmented); the number of iterations (how many trees and trees of trees should be built); the learning rate; etc.

A big part of the machine learning process is to find these parameters – this is usually done through some type of search. A standard approach is to generate a mixture of different parameter sets and see which one produces the most predictive result on the validation data. For example, 300 different sets of parameters could be simulated and the set selected will have generated the lowest mean squared error. Once an optimal set of parameters have been identified, the resulting model needs to be interpreted.

Interpreting the results
A machine learning model is already hard to interpret – we can look at the tree – but keep in mind we built a complex series of recursive trees. Thus, there are three common outputs:

The factor importance output – this allows you to identify which factor is most influential in the model. Care must be taken when interpreting this result because it only tells which factor is important. It is not said whether that importance is associated with either reserve excessiveness or inadequacy. Note, having a proprietary algorithm that identifies the most important factors and the most important combinations of factors identified by the machine learning algorithm is critical to better understanding the underlying structure.
Segment importance output – this is a process where the modeler will articulate the likelihood that a specific claims segment is more or less likely to drive the adverse development. This is different from the first output because it develops a profile that can be described by a set of factors.
Partial dependence plots –is a statistical tool that allows the interpreter to explain complex models into more basic statements – this is quite useful when trying to get a sense of what the model is saying.

Using these interpretation techniques along with the machine learning methods, a company can articulate which claims are likely to have adverse development (the machine learning output) and why the machine learning tool identified those claims (what are key factors, key profiles and how the model could weigh the different factors within a profile.)
Common pitfalls
One of the common pitfalls to watch out for is overfitting. Overfitting is a term used when the model describes the experience data well but does a poor job of predicting future outcomes (i.e., overfitted models are ‘stuck in the past.’) The likelihood of overfitting is extremely high when machine learning models are being used. This is further complicated because it is common to incorporate a layer of automation when updating model results.

Therefore A/B testing is recommended and why the modeling data set is folded. It is also why subject matter expertise is so crucial – it helps the business user with understanding whether the model is doing the right thing in something as high stakes as claims.

Highly categorical variables are another area to consider. An example of this factor is the location of the accident – there are lots of locations and location can be a very important predictor. However, modeling location directly is very difficult because it is a categorical unit. Employing spatial analysis techniques to properly incorporate this type of variable in the model is vital. Spatial techniques use ideas of adjacency and distance to recognize the inherent continuum in locations.

Machine learning can be used as a mechanism to accelerate discovery and fills a gap in actuarial analysis. Blending expertise with science will always be a winning result, carving a path to success.