Why Validating Medical AI is Crucial to Improving Healthcare Outcomes

Understanding the value of the new MedPerf healthcare AI platform

By Tom Mangan November 28, 2023

Medical miracles typically take years of trial and error. What if that time frame could be shrunk to weeks or months without endangering sensitive patient data?

That’s the promise of MedPerf, an open-source platform designed to simplify and accelerate the delivery of AI/ML applications to researchers and clinicians. Released in July 2023, MedPerf helps clinicians and researchers test the effectiveness of mathematical models that empower AI/ML applications for the healthcare industry.

“MedPerf evaluates medical AI models on diverse, real-world data,” said Alexandros Karargyris, a machine learning researcher who helped pioneer the MedPerf platform, in an interview with The Forecast by Nutanix.

MedPerf users can robustly quantify the performance of medical AI models while protecting private data, Karargyris added.

The potential of MedPerf is immense. Validated applications could help clinicians deliver life-saving cures and life-changing therapies much sooner, said Karargyris, who was lead author of a research report in Nature Machine Intelligence announcing MedPerf. Advanced automation also can eliminate time-consuming manual processes, potentially making healthcare organizations more efficient and productive.

Safe Place to Stress Test Medical AI and ML Apps

There’s a clear mandate to innovate: Aging populations and looming retirements mean more people to treat with a shrinking number of clinicians. Their departures in the coming decades will only add to the burdens on those who remain.

How Validated AI Paves the Way to Better Medical Models

Researchers and entrepreneurs are creatively applying AI/ML models to a host of vexing healthcare riddles. For all their progress, they need help clearing up a critical question: Can their AI/ML models be trusted?

Karargyris shared this hypothetical: Cancer researchers in Indiana use patient data from local hospitals to train an AI model to detect a specific kind of tumor on an X-ray. It’s a potentially life-saving breakthrough, but there’s a catch:

“Your model trained on Indiana population data might be biased to that population,” Karargyris said.

The AI model could presumably help any clinician provide an early diagnosis to patients with the same tumors. But how can clinicians halfway around the world be sure the Indiana model produces results that can help their patients?

IT Leaders Get AI-Ready and Go

That’s where MedPerf comes in, enabling medical AI users to share knowledge with colleagues everywhere. Hospitals and clinics can import validated models into their private IT systems and apply them to local medical needs. This helps prevent accidental release of patient data.

“No data leaves the premises of the healthcare organization — it stays within their private network,” Karargyris explained.

Participating organizations share only high-level aggregated results from MedPerf validation testing.

“This is intentional because we aim for patient privacy first and foremost,” he added.

Why Trust is Central to AI Modeling in Healthcare

AI/ML models use pattern-matching algorithms to analyze massive volumes of data and produce useful outcomes much faster than humans can. Models can be trained to analyze just about anything — if there’s enough accurate, unbiased data to produce statistically valid insights.

In medicine, an AI/ML model can comb through the pixels in a digital X-ray image and find evidence of tumors or other anomalies that humans might miss. Similarly, AI/ML can help doctors quickly confirm evidence of COVID-19 infections in lung X-rays.

The appeal is obvious: Faster, more accurate diagnoses help patients, doctors, hospital administrators, insurance company investors and so on. Secure medical AI models prevent expensive, embarrassing data breaches. MedPerf accomplishes all this with federated learning, which makes a global network of AI models available to hospitals, clinics and research organizations. MedPerf’s platform is an intermediary between AI model developers on one side and local clinicians on the other who can deploy models of their choosing to help diagnose their patients’ maladies.

Moving Healthcare IT Faster Into the Future

MedPerf orchestrates and automates the application of AI/ML models, producing dramatically faster results because of federated learning.

“With MedPerf’s orchestration capabilities we can evaluate multiple AI models through the same collaborators in hours instead of months,” said Dr. Spyridon Bakas, assistant professor at the University of Pennsylvania’s Perelman School of Medicine. Bykas is vice chair for benchmarking of clinical translation for MLCommons Medical Working Group.

Thanks to federated learning, the model could conceivably be trained on data from hundreds or even thousands of hospitals, greatly reducing the potential for bias and inaccuracy.

“The deep-learning models that we're talking about are very data hungry,” Karargyris said.

Experiments have shown that models get more accurate as they gather more data.

“Once you train this model, you need to validate it on a diverse selection of data out there in the real world,” Karargyris added.

Validation happens through a benchmarking process that establishes standards for model accuracy.

MLCommons and the Birth of MedPerf

MedPerf is a project of MLCommons, an all-volunteer organization formed to create universal benchmarks for AI accuracy and performance. Founding members of the organization include technology titans like Microsoft, Google, Dell, Intel and many more private sector firms. Researchers and AI practitioners from around the world joined the effort, which establishes consistent statistics on the reliability, speed and efficiency of AI/ML solutions.

Healthcare Technologies Paving Way From Treatment to Prevention

MLCommons formed its Medical Working Group to zero in on healthcare-related AI/ML models. As of July 2023, the working group pulled together the efforts of nine hospitals in 13 countries plus dozens of companies and AI/ML experts. The working group is also developing a project called GaNDLF, a low/no-code platform that lets clinicians build AI/ML applications without needing advanced programming expertise.

“We saw the tide of ML coming very soon, and we said, ‘okay, we need to evaluate performance in meaningful and repeatable ways, and we want to be a nonprofit so that people trust us’,” said Debojyoti “Debo” Dutta, one of the co-founders of MLCommons. Dutta is vice president of engineering (AI) at Nutanix, which helped support the development of MedPerf.

Nutanix joined with the working group in an experiment to test its results on glioblastoma, an aggressive tumor that often attacks the brain and spinal cord. Clinicians like what they are seeing.

“It has excited oncologists who were probably skeptical about AI,” Dutta said.

Putting MedPerf to Work in Clinical Settings

The age of cloud computing complicates matters for medical AI users. For all the flexibility and scale they gain from public cloud services, their compliance challenges often encourage keeping sensitive medical information in on-premises data centers.

Research cited in the Nutanix Cloud Index for Healthcare reflects these realities. Recent data suggested that healthcare organizations slightly trailed the global average among enterprises for adoption of multi-cloud IT infrastructures (53% vs. 60% worldwide). But that may soon change: Just under three-quarters (74%) of Cloud Index survey respondents said they planned to be using multi-cloud infrastructures in the next three years.

How Digital Twin Technology Can Propel Personalized Healthcare

Dutta had deep experience with these kinds of complexities before coming to Nutanix in 2000 at the depth of the pandemic. He was energized by the potential of Nutanix’s multi-cloud platform to address the infrastructure challenges of medical AI.

“Nutanix’s platform can host a lot of amazing next-gen AI workloads that could change the world in ways that we don't even know yet,” Dutta said. This might include creating digital twins to model the entire process of creating new drugs or antibodies to treat diseases. “I'm really excited by medicine and AI and infrastructure — and what they could do for humankind.”

Editor’s note: Learn more about Nutanix GPT-in-a-Box, a full-stack software-defined AI-ready platform designed to simplify and jump-start your initiatives from edge to core. More details in this blog post The AI-Ready Stack: Nutanix Simplifies Your AI Innovation Learning Curve and in the Nutanix Bible.

Tom Mangan is a contributing writer. He is a veteran B2B technology writer and editor, specializing in cloud computing and digital transformation. Contact him on his website or LinkedIn.

Jason Lopez and Ken Kaplan contributed to this story. Lopez is executive producer of Tech Barometer, the podcast outlet for The Forecast. Kaplan is Editor in Chief for The Forecast by Nutanix.