Building Infrastructure to Unlock AI’s Next Big Leap

MLCommons and Lamini co-founder Greg Diamos discusses the path to even smarter generative AI, and the hurdles IT must overcome to get there.

By Jason Johnson

By Jason Johnson January 24, 2024

Like many in the booming field of artificial intelligence (AI) and machine learning (ML), Greg Diamos is all in on AI’s transformative prospects. He envisions a future in which machines compose groundbreaking code, diagnose diseases before they happen, and give superhuman intelligence to each and every one of us.

“Computing is a way to give people magical abilities,” said Diamos, the co-founder of MLCommons, an open platform for benchmarking AI technologies. “I’d like to live in a world where you could give those abilities to everyone. What if every single person had superpowers?”

The catch? In order for AI to truly be disruptive, data centers will need to evolve, and vastly expand, their existing infrastructure. Deep learning tasks are computationally expensive, requiring tremendous resources and IT carve-outs. As the generative AI race heats up, and more organizations onboard predictive tools, IT pros face an uphill battle to support power-hungry AI applications. 

Related

IT Leaders Get AI-Ready and Go

“It must be simultaneously exciting and terrifying to be a data center manager right now,” Diamos told the Forecast. “You don't have enough compute in your data center, no matter who you are.”

Generative AI’s Insatiable Demands

In 2022, he co-founded Lamini, which is building an enterprise LLM fine-tuning system. As an ML systems builder and AI expert, Diamos builds infrastructure stacks that enable GPT-based software. He’s been working in AI for most of his career, scaling some of the first large language models (LLMs) and helping to architect NVIDIA’s influential Volta graphics processing unit (GPU). 

During his tenure at the Chinese search giant Baidu, Diamos was part of a machine learning research team that authored an influential paper on AI scaling laws, making the case for training AI models on extremely large datasets. The paper argued that GPT models became exponentially smarter the more data and resources they were fed. 

“If we project forward, AI is going to keep getting smarter, and we're going to see new abilities emerge out of it,” said Diamos. “But, in order to turn the crank on that intelligence, you need exponential compute.” 

This insatiable appetite for resources has IT professionals wringing their hands. According to the latest Nutanix Enterprise Cloud Index report, the vast majority of respondents (86%) identified running high-performance workloads, including data analytics, AI, and ML, as a challenge with their current IT infrastructure.

Related

Seeing AI’s Impact on Enterprises

In fact, next-generation LLMs, such as OpenAI’s GPT-4 and Google’s PaLM 2, are up to 100 times the size of previous AI models. 

According to Diamos, the biggest bottleneck to AI deployments is obtaining high-powered GPUs. The runaway popularity of ChatGPT and other GPT-based apps has led to a surge in demand for extremely fast processor units that power AI and ML.

“There's a huge supply and demand mismatch, even to run something relatively basic,” Diamos said. 

“The broader IT ecosystem has been isolated from this to some extent because the biggest hyperscalers – your Microsofts and Amazons and Googles – got in on deep learning early. They have stockpiles of hundreds of thousands of GPUs.”

While generative AI workloads typically run on public cloud infrastructure – where large public hyperscalers have access to giant server clusters and the latest and greatest GPUs – rising data center costs and low availability have left organizations scrambling for alternatives.

“If you want to process a reasonable amount of data for a reasonable amount of users … that’s actually pretty expensive right now. You need about an 8x A100 machine to run something basic like Llama 2. That would cost you hundreds of thousands of dollars per year on public cloud—if you can find it,” he estimated. 

AI-optimized Infrastructure  

One viable option for LLM deployments is the hybrid multicloud. As more businesses adopt newer AI and ML technologies, private and edge clouds have been on the upswing, says Diamos. 

“On-prem is probably a better place to be right now. It's easier to get GPU servers and slot them in yourself than to wait for your cloud provider to do it,” he said. “You may not have enough GPUs, but at least you have the capacity to get started, so you aren’t shut out of the market.”

Related

9 Predictions for IT in Age of AI

For organizations looking to deploy AI locally, on the edge, or over a mix of public and private clouds, Nutanix GPT-in-a-Box is a game changer, according to Diamos. Designed to simplify running generative pre-trained transformers, GPT-in-a-Box helps IT departments size, procure, and configure AI-optimized infrastructure.

“Building out AI infrastructure for the first time from scratch can be a nightmare,” he said. “It’s a giant engineering investment. If you can just press a button and start building your own application, it really saves you a lot of time.”

“Nutanix GPT-in-a-Box makes a lot of sense,” he said. “It's just a much easier starting point. You don't have to reinvent the wheel of building out all that infrastructure. It gives you acceleration.”

The Path to Artificial General Intelligence

Diamos says the barrier of entry to generative AI has gotten lower, even as AI-ready servers remain scarce. He forecasts that going forward, LLMs will have a profound impact on many diverse industries and fields. 

“We’ve figured out how to put deep learning in search,” he said, referring to the success of GPT-based apps like ChatGPT and GitHub Copilot. “But you should also be able to apply it to every major industry, like healthcare, manufacturing, logistics, and biotech.”

To his point, a 2023 survey by McKinsey found that 40 percent of organizations expect to invest more in AI as a result of the buzz around generative AI, citing marketing, product deployment, supply chain management, and manufacturing among chief enterprise use cases.

Looking further out, Diamos foresees LLMs with PhD-like intelligence serving as advanced research tools for experts in highly specialized fields. 

“English is general; technology like language models is very general. We haven't seen any industry so far that you can't apply it to,” he said.

LLMs finetuned for context-specific medical AI applications are already making headway in the burgeoning field of personalized medicine, a discipline that uses data analysis to customize medical treatments to an individual patient’s profile. 

“If you can get access to a patient’s medical history and give it to a language model, it can understand that data and make very detailed, precise recommendations about what to do,” he said. “It can do a much better job of understanding your risk of particular medical complications.”

Taking another step back, Diamos predicts that, over the course of a single human lifespan, AI and ML could gain many other human abilities—potentially abilities that surpass those of humans.

“Technological development has been an underpinning over the last several hundred years. So what happens when that is greatly accelerated? What happens when you have machines participating in that and accelerating it?” 

Related

Generative AI Propels IT Modernization

Drawing a comparison to the advancements made in mechanical engineering since the Middle Ages, Diamos envisions forthcoming generations of AI-powered machines and software as highly advanced tools that will vastly amplify human capabilities. 

“The ability of the machines that humans have engineered greatly exceeds the human body's ability. I think we will see the same thing for intelligence. But just like in mechanical engineering, I'm not sure they will look like us. When I think of a backhoe or a pickax or a hammer, it doesn't really look like us.”

“It should be exciting. I just don't know what it's going to look like,” he said.

One thing is for certain: the data center will play a crucial role in driving the advancement of generative AI, but getting there will require considerable computing.

Jason Johnson is a contributing writer. He is a longtime content and copywriter for tech and tech-adjacent businesses. Find him on Linkedin.

© 2024 Nutanix, Inc. All rights reserved. For additional legal information, please go here.