It started with a cough that wouldn’t go away.
“I just kept coughing and coughing,” says Avi Yagil, a physicist at the University of California, San Diego. “I thought I had picked up a cold on the plane.”
It was December 2011, and Yagil had just arrived back in California for the holidays after spending most of the year at the international physics laboratory CERN in Geneva, Switzerland. The Large Hadron Collider had just completed its second year of data-taking, and he and his colleagues on the CMS experiment were hot on the trail of the elusive Higgs boson. The particle had been predicted in the 1960s but not yet definitively spotted by any experiment.
But before he could get back to work, Yagil had to take care of this cough. A general practitioner took a scan of his lungs and saw that they were full of water. After a few more tests, they saw that the problem was his heart: It was damaged, and they didn’t know why.
For the next four years, Yagil continued his physics research. Yagil and his colleagues discovered the Higgs boson, an advance that earned two of the theorists who predicted its existence a Nobel Prize, and went on to study the Higgs’ properties in detail.
All the while, Yagil’s health continued to decline.
“In early 2016 I was in the clinic with my wife, and they told us that I have Stage D heart failure,” he says. “You Google it and realize that your life expectancy is very short; it’s more lethal than cancer.”
Yagil was hospitalized, and the heart transplant team at UCSD Sulpizio Cardiovascular Center struggled to keep him alive while they searched for a donor. After three months, one was identified, and they performed the surgery that saved Yagil’s life.
During his time in the hospital, Yagil grew to know and trust his doctors and nurses. “It was a second home for me,” he says.
“After I got out, I wanted to do something meaningful, beyond chocolate boxes and bottles of wine.”
Yagil suggested applying techniques he uses as a particle physicist to medical data to help doctors predict fatality rates for heart-failure patients. His doctors, Eric Adler and Barry Greenberg, were open to the idea.
“I’ve never met anyone like Avi before,” says Adler, the cardiologist who performed Yagil’s heart transplant. “He’s a force of nature, and I say that in the best way possible.”
Two different worlds
For hundreds of years, doctors have drawn upon their education and wisdom as the primary mechanisms to evaluate a patient. “A master clinician will examine a patient and base their evaluations on a gut feeling,” Adler says. “It’s how we’ve been making medical decisions for a long time.”
Today, medical evaluations don’t rely on expert intuition alone; doctors also use risk scores and other statistical tools in their decision-making. But those risk scores and tools are based on fairly rudimentary statistical methods and limited data.
“Every day I have a patient ask me: What is my prognosis?” says physician Sophia Airhart, a heart-failure expert at the University of Arizona who was not involved in Yagil and Adler’s work. “More accurate risk prediction tools can help me as a heart-failure provider to better treat the patient in front of me.”
Yagil saw an opportunity to do more. He told Alder he was shocked when he discovered that hospitals store electronic patient data for billing but rarely use this data for research, Adler says. “It was amazing to him that we did not take more advantage of the computing power he used in his daily work as a physicist,” he says. “I think his exact quote was, ‘You might as well write it down on papyrus.’”
For three years, Yagil, Adler and their collaborators worked together to put that data into something considerably more high-tech than ancient scrolls: a supervised machine-learning algorithm.
Supervised machine learning involves feeding examples to an algorithm to “teach” it how to evaluate new data—not totally unlike the training process undertaken by a master clinician, Adler says.
Master clinicians learn by treating hundreds of patients over several decades. While supervised machine learning cannot replace the know-how of a human, it can mimic this learning process by receiving data and then being told by humans what to make of it.
“It allows a computer to have the accuracy of someone who has been doing this for a really long time,” Adler says.
Yagil uses machine-learning algorithms to analyze particle collisions in the LHC, which happen about 600 million times a second. When searching for rare particles such as Higgs bosons, he and his colleagues need to separate the few collisions that might have a Higgs from the billions that don’t. It’s a needle-in-a-haystack problem and impossible to perform without some help from software.
“It’s the only way we can extract tiny signals from datasets dominated by many orders of magnitude larger backgrounds,” he says.
Luckily, the known laws of physics are already very well understood. Scientists can train their particle-hunting algorithms by feeding them millions of examples generated by virtual versions of their experiments. From here, they can find and study rare processes in the real data and search for unexpected phenomena.
But the world of medical research doesn’t take place inside the controlled environment of the LHC. Unlike particle collisions, patients come with complicating factors like busy schedules, forgotten follow-up appointments, and rights to personal privacy. Because there are so many unknowns and only a limited number of patients whose data is available to study, it is difficult for doctors to build accurate models.
There is no database of virtual patients to draw from; Yagil and Adler needed actual patient information to train their algorithm. “This introduces a significant challenge,” Yagil says.
After much discussion and debate, Yagil and his colleagues determined which patient data was too unreliable or had too many unknowns to use in training their algorithm. Some patients, for example, had multiple tests performed within a few days of each other, while others had those same tests spread out over the course of weeks or months.
A person’s health is a little like the weather, Yagil says. “If you take the temperature, pressure, precipitation and wind speed of a single city on a single day, you have a good snapshot of that city’s weather. But if you take these measurements on different days, there is no way to see how those measurements are correlated and you cannot make accurate models or predictions.”
The researchers chose quality over quantity. But it meant that they would have to reduce their already limited sample size, which also presented a risk.
Machine-learning algorithms search for correlations between different variables. The more variables there are, the more opportunities an algorithm has to find patterns. But if an algorithm is given too many variables and only a small sample size, it will find coincidental patterns between the subjects that don’t apply to larger populations—a problem called “over-training.”
With this in mind, the collaborators identified eight simple variables related to patients’ blood work and blood pressure. They mapped these factors against the patients’ lifespans after their diagnosis.
They trained an algorithm using electronic health records from 5822 patients within the health system at UC San Diego. They then tested its accuracy using data from the University of California, San Francisco and 11 European medical centers. The performance was the same across all the samples, indicating that it was not biased by over-training.
A promising partnership
Their work was recently published in the European Journal of Heart Failure. According to their paper, their machine-learning algorithm, called MARKER-HF, evaluates the mortality risk of a diagnosed heart failure with 88% accuracy.
Machine learning is entering medicine, and the larger medical community has taken note, Airhart says. “Machine learning has the potential to really be a game-changer and move the field forward, as the authors have shown,” she says. “The excellent discriminatory power of the MARKER-HF score to predict mortality is a testament to the power of interdisciplinary collaboration, and I applaud professors Yagil and Adler for their work. It is an exciting time for the field.”
Airhart adds that current risk prediction tools can fall short for members of underrepresented populations, who may respond differently to therapies. To account for this, the medical field needs a better way of predicting outcome in patients of different genders and races so that doctors can create tailored and accurate prognoses. “Machine learning may help fill this gap,” she says.
MARKER-HF was created using data from a diverse group of patients and is agnostic to race and gender. When testing their tool, Yagil and his colleagues demonstrated that it had a similar performance for different genders and ethnicities, within statistical uncertainties.
Adler says that collaborating with Yagil had a profound effect on how he thinks about machine learning in medicine.
“When lay people think about AI and machine learning, we think that we can just drop in the data, and the computer will figure it out,” he says. “But we actually need to sit down, roll up our sleeves, and spend a lot of time thinking about our data. You can’t just throw a million patients into supercomputer and see what comes out the other side.
“We spent hours every week going through the results to see if they made sense. The magic was in the collaboration: the doctors and the computer scientists discussing and tinkering.”
They hope their work will help patients and doctors. They also hope it will provide a roadmap for physicians and physicists interested in working together to bring cutting-edge analysis tools into medical research.
“There tend to be walls between different disciplines which hold firm for a long time,” Adler says. “Clearly, that’s changing.”