In this Quick Read, originally published in Insurance CIO Outlook, Tom Fletcher, PartnerRe’s Global Head of Data Science Consulting, discusses the needs for valid, relevant, consistent and fair data – the key to successful model evaluation.
Many years ago, my boss said, referring to evaluating a vendor’s model: “It has to be the right fit for our company; but remember, although they may build the model differently than you would, that doesn’t make it wrong.”
When it came to evaluating predictive models, he believed in striving for balance between protecting the company and keeping an open mind to the potentially innovative ways in which these models could benefit the business.
As traditional sources of data for life insurance underwriting give way for additional data sources that are now being leveraged to accelerate the underwriting process, my ex-boss’s advice seems more relevant – and maybe also more challenging – than ever. When evaluating models, the current challenge for insurer lies in cutting through all the noise to find what really works for their business. In other words, is the data provided by the model valid, relevant, consistent and fair?
The overriding priority is validity. Generally, a model is designed for a specific purpose and there should be solid empirical evidence that the model fulfills that purpose and nothing else. This also entails evaluating the compatibility of the model with the other elements of the process and the data and/or models already being used. The ideal model will offer incremental validity above what is already there. In fact, a model’s usefulness in terms of added value matters more than its empirical strength. Strong models may go unused because there was no specific benefit to be gained from them. Conversely, moderately strong models may be implemented because of the utility they bring.
Valid models come from relevant and consistent data. Imagine tracking some phenomenon using varying parameters (e.g., imperial vs. metric). The same information could have different meanings from one day to the next. That is why it is crucial to know the lineage and reliability of the data when evaluating whether some new source (i.e., data, model or tool) adds value.
The more you know about contextual factors such as poverty, family history, access to healthcare and so forth, the better. These background criteria lead to certain lifestyle characteristics associated with specific behaviors (e.g., exercise, eating habits) that in turn impact the body (e.g., BMI, cholesterol and, blood sugar levels). It might be a long chain of events, but it’s imperative to work very carefully through the logic to show relevance. It’s easy to claim that a correlation is valid, but careful consideration needs to be given to whether that correlation is driven by other factors, as the use of the data may have to be defended to a regulator.
Consistent, relevant and valid data must also be fair: it’s key to ascertain the extent to which the model may introduce unfair discrimination. While historically, insurance companies have not collected protected class information, this is an emerging regulatory requirement.
When evaluating the possibility that a model could cause discrimination, insurers need to ask some critical questions:
Models that inadvertently introduce unfair discrimination into the underwriting process or that are perceived to be unfair can open a “Pandora’s box” of legal and regulatory issues.
When applied in the right way, predictive modeling can be invaluable in establishing actuarially sound principles and accelerating the underwriting process, while simultaneously adding to the volume of information going into the risk evaluation.
The key to success lies in making sure the data can lead to reliable conclusions that demonstrably add value to current business processes.