Ethical AI in life insurance, Hareem Naveed of Munich Re

I come from public policy, that was my training, so I tried to bring a lot of that here because I feel like it’s kind of similar. The biggest thing that we start with is your scope has to be reasonable. Your scope has to make sense. When I started six years ago, people were like, ‘Oh, let’s use facial recognition to detect if you’re a smoker.’ We can’t do that. That doesn’t make any sense. So if someone is coming to you make sure the data they’re using is appropriate and you have the right permissions to use the data, and all those aspects.

We have a cross disciplinary team that includes people from risk management, legal, the business product owner and data scientists, and they just ask the right questions, no one’s going to say, ‘you can’t do that because this is risky.’ They’re gonna say, ‘do you have the right controls? Did you review the documentation? Did you review the agreement?’ That’s the first place to start.

Then when it comes to bias detection and mitigation, one of the things that we do is we think about the intervention. The reason I mentioned scope is important because nobody should be building a model just for the sake of it as an intellectual exercise, right?

If you have a preferred model that can be used to move people up a level, if you think about that, in the framework of somebody applying for life insurance, that’s an assistive action. So the metric on that is different than one in which a model may knock somebody down. So once you define the intervention, you define that as performance testing, so you’re looking at accuracy, precision recall, you also use the bias metric you define and look at it across subgroups. When we look at all demographic variables, for example with age, we can bucket it, so we can say, this is 18 to 35, 35 to 45, 45 to 55, we bucket it, we compute the metric. Then you set a reference group, and the reference group is one that’s historically advantaged, or one that’s the biggest size in your group.

So, you know, for example, for us, maybe 45 to 55 is the largest present in that group. So we set that as the denominator and we calculate that metric. We take the other metrics for the different subgroups and divide them and we use 80% rule, it’s just a yardstick. Whatever you want it to be from the Equal Employment Opportunity Commission. And if your model passes, on the metric you’ve defined, you have to tell us already as the data scientists why you picked the metric, what your intervention is meant to be, how you’re assessing for it, and if it passes that, then you’re good to go. If it doesn’t, you have to mitigate and figure out what’s going on. And sometimes that can be the population you apply your model to. So, for example, we had somebody who built a model that wasn’t doing very well on ages 60 and above. But that didn’t matter so much, because the model was only going to be used on applicants up to age 60. So those are the kinds of mitigation. You can either update the intervention, or you can change the data or maybe look at the labels because for me, bias testing is just as much a part of performance testing, because you don’t want to do worse on a subgroup. Right? Yeah, that doesn’t make any sense.

That’s a framework that we use, it’s really simple, it’s easy for legal to understand, it’s easy for a data scientist to implement, and kind of just understand as a rule of thumb.

Every data scientist, when they have built a model they’re ready to deploy or use, they have to share these metrics and put them on like, one sheet of paper, describe the data, and all the features you use, what your final iteration was, give us your performance metrics, and give us your bias metrics. They are side-by-side with the performance analysis to make sure all that is there. So, that kind of helps because if the models are not going to pass they’re not going to be put up for legal review or ready to play. You put the onus on the developer to iterate. The biggest thing is to make sure we have a culture where this is important, a culture where it’s supported, so no one is going to be penalized. And then a culture that is open to talking about the mitigations.