Subscribe now

Technology

AI trained on millions of life stories can predict risk of early death

A model trained on 6 million people’s health, employment and financial records can predict death more accurately than tools used by the insurance industry

By Matthew Sparkes

18 December 2023

Data covering the entire population of Denmark was used to train an AI to predict people’s life outcomes

Francis Joseph Dean/Dean Pictures / Alamy Stock Photo

An artificial intelligence trained on personal data covering the entire population of Denmark can predict people’s chances of dying more accurately than any existing model, even those used in the insurance industry. The researchers behind the technology say it could also have a positive impact in early prediction of social and health problems – but must be kept out of the hands of big business.

Sune Lehmann Jørgensen at the Technical University of Denmark and his colleagues used a rich dataset from Denmark that covers education, visits to doctors and hospitals, any resulting diagnoses, income and occupation for 6 million people from 2008 to 2020.

They converted this dataset into words that could be used to train a large language model, the same technology that powers AI apps such as ChatGPT. These models work by looking at a series of words and determining which word is statistically most likely to come next, based on vast amounts of examples. In a similar way, the researchers’ Life2vec model can look at a series of life events that form a person’s history and determine what is most likely to happen next.

In experiments, Life2vec was trained on all but the last four years of the data, which was held back for testing. The researchers took data on a group of people aged 35 to 65, half of whom died between 2016 and 2020, and asked Life2vec to predict which who lived and who died. It was 11 per cent more accurate than any existing AI model or the actuarial life tables used to price life insurance policies in the finance industry.

The model was also able to predict the results of a personality test in a subset of the population more accurately than AI models trained specifically to do the job.

Sign up to our The Daily newsletter

The latest science news delivered to your inbox, every day.

Jørgensen believes that the model has consumed enough data that it is likely to be able to shed light on a wide range of health and social topics. This means it could be used to predict health issues and catch them early, or by governments to reduce inequality. But he stresses that it could also be used by companies in a harmful way.

“Clearly, our model should not be used by an insurance company, because the whole idea of insurance is that, by sharing the lack of knowledge of who is going to be the unlucky person struck by some incident, or death, or losing your backpack, we can kind of share this this burden,” says Jørgensen.

But technologies like this are already out there, he says. “They’re likely being used on us already by big tech companies that have tonnes of data about us, and they’re using it to make predictions about us.”

Matthew Edwards at the Institute and Faculty of Actuaries, a professional body in the UK, says insurance companies are certainly interested in new predictive methods, but the bulk of decisions are made by a type of AI called generalised linear models, which are rudimentary compared with this research.

“If you look at what insurance companies have been doing for many, many tens or hundreds of years, it’s been taking what data they have and trying to predict life expectancy from that,” says Edwards. “But we’re deliberately conservative in aspects of adopting new methodology because if you’re writing a policy which might be in force for the next 20 or 30 years, then the last thing you want to make is a material mistake. Everything is open to change, but slow, because nobody wants to make a mistake.”

Journal reference:

Nature Computational Science DOI: 10.1038/s43588-023-00573-5

Topics: