AI as Cancer Oracle?

Using national health records, an AI learns to predict who is at high risk of developing pancreatic cancer.

Abstract illustration of a humanoid looking AI scanning health records

Illustration by Matt Chinworth

Computational biologist Chris Sander and other scientists have developed a method for using artificial intelligence to identify people at high risk of pancreatic cancer, solely from national health records. The crucial insight, the research revealed, was not simply whether a risk factor appeared in a patient’s medical record of symptoms and diagnoses, but when it appeared in the patient’s history. “One of our innovations was to use time sequences of clinical health records rather than just what disease somebody had,” Sander said. The team took advantage of a design feature of generative AI large language models, which predict the next word in a sequence. “We adapted that for a sequence of disease codes,” said Sander, “to see if the next disease in the sequence was cancer or not.”

Sander’s approach, detailed last summer in Nature Medicine, proved to be powerfully predictive. For example, the model was able to identify a subset of patients with a 59-fold higher risk than average of developing pancreatic cancer within three years. The importance of identifying particularly high-risk patients is that they can undergo regular screening via CT or MRI scans to ensure that if they develop pancreatic cancer, it will be caught as early as possible.

Sander, a senior lecturer in the systems biology department at Harvard Medical School, was not an early believer in the abilities of AI to solve such problems. But in 2018, he had received an invitation from Nobel laureate Phil Sharp to attend a small meeting at MIT’s Koch Institute for Integrative Cancer Research that changed the course of his research—and potentially the future of cancer detection.

Sharp wanted to explore potential applications of artificial intelligence for treating pancreatic cancer, which claims the lives of about 50,000 Americans each year. Although it doesn’t rank among the top 10 most-diagnosed cancers in the United States, it is one of the most deadly. Early detection is the most effective way to reduce cancer deaths, but unlike prostate or breast cancer, there are no reliable and easily accessible screening tools for pancreatic cancer. The result is that the disease is often detected only after it has reached an advanced stage, when the odds of surviving five years post-diagnosis are less than 9 percent. Sharp wondered whether artificial intelligence tools could mine patient health records for early warning signs of pancreatic cancer.

Sander was skeptical: “I thought this was just too ambitious. There’s no way AI can just miraculously make a solution to pancreatic cancer.” His decades-long career working at the nexus of big data and biology had exposed him to the challenges associated with using mathematics to make predictions about human diseases—especially ones as complex as cancer. “To make this realistic, the AI would need a lot of medical data, which is a big challenge because medical data can be so unreliable,” Sander said. “But we knew that Denmark has an excellent computerized national health database, and being able to access that data was really the breakthrough.”

With a team of MIT and University of Copenhagen researchers, he accessed more than nine million patient records from the Danish national health and U.S. Veterans Affairs health systems and used these data to build machine learning models that could predict individuals’ risk of developing pancreatic cancer based on their medical history.

The factors that Sander’s AI uses to make its remarkable predictions are not always clear. When a human physician suspects that a patient has or is at risk of pancreatic cancer, they are usually able to explain why. For example, patients with type 2 diabetes who have lost weight over the course of two years are known to have a heightened risk for pancreatic cancer. AI systems look for patterns across millions—or even billions—of parameters, and this complexity results in an opaque prediction model.

Sander said this shouldn’t be a roadblock to applying the technology in clinical settings. “The key thing is whether you can make a correct prediction right now,” he said. His team is working with researchers conducting two pancreatic cancer clinical trials to collect additional patient data to further validate the AI’s predictions. The system is also being validated by multiple hospitals for deployment as a decision support tool, and Sander is in early talks with a company about potentially commercializing the system. In the meantime, he and his colleagues are exploring ways to apply their prediction models to ovarian cancer, which is similarly difficult to detect.

“I think this has a chance of revolutionizing how we deal with pancreatic and other cancers by catching it much earlier,” Sander said. “We’re getting closer to creating actual benefits for patients, sooner than I ever expected.”

Read more articles by: Daniel Oberhaus

You might also like

Artificial Intelligence in the Academy

Harvard symposium assesses the new technology.

How Does Hate Spread?

Harvard symposium probes antisemitic, Islamophobic sentiments

Sam Altman’s Vision for the Future

OpenAI CEO on progress, safety, and policy

Most popular

Sam Altman’s Vision for the Future

OpenAI CEO on progress, safety, and policy

How Does Hate Spread?

Harvard symposium probes antisemitic, Islamophobic sentiments

Artificial Intelligence in the Academy

Harvard symposium assesses the new technology.

More to explore

How is Artificial Intelligence Being Taught at Harvard?

A new Harvard course on artificial intelligence teaches students how to use the tool responsibly.

The Evolution of Human Fathers

Exploring the evolutionary biology of human fathers as caretakers

Civil War American Writer and Abolitionist John Greenleaf Whittier

Homes of the poet and abolitionist, whose verses were said to have inspired Abraham Lincoln.