Industry

AI Brings Deeper Understanding of Baseball Players’ Impact on Game

Penn State researchers use machine learning to power a system called Sabermetrics to analyze more than just baseball game statistics to show the impact of each player.
  • Article:Industry
  • Nutanix-Newsroom:Article

September 18, 2025

Major League Baseball can sometimes seem like an exercise in statistics as much as it is a sport. Batting averages, stolen bases, strikeout and runs batted in…almost every aspect of the game can be recorded as a number. For over a century, teams and fans have used this huge body of data to measure player and team performance in minute detail through a statistical process called sabermetrics.

Researchers at Penn State have developed a new method of analysis that draws on recent advances in machine learning to offer an even more accurate picture of how individual players impact the game. But the impact of AI in baseball goes way beyond simple number crunching.

What are sabermetrics?

Sabermetrics data uses 121 kinds of statistics to quantify players’ success or failure in batting, pitching and fielding, and how many games teams win or lose as a result. Creatively crunched, these numbers can guide decisions that can make the difference between winning and losing. 

In 2002, the Oakland A’s took an innovative approach by focusing only on how likely a player was to get on base. As a result, they were able to acquire players who had been undervalued by traditional analysis – and clinch that year’s American League West championship, a story told in the book and movie Moneyball.

Today, artificial intelligence (AI) is blazing the next frontier in baseball as the smart technology has the potential to revolutionize the game, much in the same way as the Oakland A’s clever Moneyball methods transformed the sport forever.

How has AI changed sabermetrics?

For one, AI is quickly becoming a pivotal part of sabermetrics because of the technology’s ability to ingest, process and generate insights on absurd quantities and types of data. Now that massive datasets are able to be quickly analyzed, baseball teams are able to make faster and smarter decisions.

In traditional sabermetrics, a single is a single regardless of what else is going on, like whether runners were on base or where the ball ended up. Recording games as a series of discrete events without any context doesn’t fully capture a player’s impact, according to Connor Heaton, a Ph.D student at Penn State’s College of Information Sciences and Technology (IST). 

“When you simply describe a game as counting statistics, you really lose a lot of information about how the game actually happened,” said Heaton.

Heaton’s model draws on Natural Language Processing (NLP), specifically a sequential modeling technique called Masked Gamestate Modeling, which helps computers infer the meaning of words from the surrounding context. In baseball, Heaton said, a similar process can be used to infer the meaning of game events based on context and the impact they have on the game.

RELATED Restoring Ancient Texts with AI
This Google DeepMind project uses a neural network, natural language processing and machine learning to help historians piece together fragments of ancient Greek decrees from 420 BC.
  • Article:Industry
  • Nutanix-Newsroom:Article

June 17, 2025

Heaton also leveraged the idea of self-supervised contrastive learning, a family of methods used in computer vision to draw conclusions from unlabeled data. The idea is that similar views of the same image will produce outputs that are also similar, and different from other records in a batch of images. 

“We adapted that to baseball, and said that the same player at two close points in time should have a similar impact on the game,” Heaton said.

Heaton and his co-author, IST professor Prasenjit Mitra, trained their model on data from the Statcast system, which uses 12 high-speed cameras at every MLB stadium to record information on pitching, hitting and fielding. There were three kinds of data in all. First, they used the Python package pybaseball to collect pitch-by-pitch data for the 2015-2019 seasons and season-by-season data from 1995-2019, a total of 5,000 games and 4.6 million pitches.

Pitch-by-pitch data included game number, at-bat number and pitch number. The season-by-season data covered the result of each pitch in terms of changes to the “gamestate”: ball-strike count, base occupancy, number of outs and score. Various combinations of these four numbers could lead to one of 325 possible gamestate changes. 

RELATED Swarms of AI Agents Powering Businesses and Daily Life
In this Tech Barometer podcast, disruptive technology investor and analyst Jeremiah Owyang explains the rise of AI agents and a future shaped by a multiplying AI-first mindset.
  • Nutanix-Newsroom:Article, Podcast

June 4, 2025

The third type of input was recordings from old-school sabermetrics, describing each pitcher, batter and their past encounters. They ran the analysis on two A600 GPU workstations in Heaton’s office.

The result, described in a paper that was picked as a finalist at the MIT Sloan Sports Analytics Conference, was a measurement of each player’s short-term impact on games called “player form.” A form, described by a 64-element vector, describes a player’s skill as part of a larger sequence of events, instead of a collection of events in isolation. Expressed in a low-dimensional space called an embedding, “it provides much more nuance into the exact way in which the good players impact the game,” said Heaton.

Heaton and Mitra tested the technique on MLB games from 2015 through 2019. With AI in sabermetrics, their approach was able to predict the winner of a game with almost 60% accuracy.

Forms also seemed to do a better job teasing apart exactly how good players impact the game. One statistic used to evaluate players’ value is called “wins above replacement” (WAR) – a measurement of how much they help their team win compared to a hypothetical replacement player with more pedestrian skills (and lower cost). 

“In analyzing sabermetric-based embeddings, one could reasonably conclude that in order to have a high WAR rating, a player would need to hit a lot of home runs,” Heaton said. 

“The form-based embeddings, on the other hand, provide a much more holistic interpretation, suggesting a variety of ways in which players can bring high value to their team.”

RELATED AI Ambitions in Financial Services Tempered by IT Infrastructure Challenges
Most financial firms are using containerization to support generative AI applications, but nearly all cite a need to improve their IT infrastructure, according to the 2025 Enterprise Cloud Index report.
  • Article:Industry
  • Nutanix-Newsroom:Article
  • Products:Nutanix Cloud Platform (NCP), Nutanix Enterprise AI (NAI)

August 26, 2025

The authors have made the code and data publicly available on Github. They hope to use the methodology to model how events within the same game relate to each other, and what impact other team members such as managers might have on game outcomes.

How AI makes baseball data more useful

But AI’s true impact may extend far beyond sabermetrics. AI is creating a fundamental paradigm shift in the sport that can’t be overstated. For example, AI can now identify talent more robustly than traditional human scouting, potentially giving teams a crucial competitive advantage. By processing every piece of data from biomechanics to video footage, machine learning algorithms can highlight player’s strengths and weaknesses, uncovering information that could easily be missed by the naked eye.

Similarly, the data can be used to transform personnel management. Well-trained AI algorithms can power personalized training for players, aiding their long-term development and improvement by identifying shortcomings and creating individualized training plans to address them. AI tools can utilize high-speed camera footage to study player’s micro-movements and flag potential injuries before they become more serious. 

Even game-time decisions can be made more intelligently thanks to AI. Synthesized data can reach dugouts and managers in time to influence important decisions like changes at pitcher, pinch-hitting or inform defensive alignments. Tons of analysis is devoted to each hitter’s tendencies and adjusting the defense’s positioning for every individual at bat to maximize the teams’ chances of making a play. The Boston Red Sox is one team that has invested in AI early and heavily. Umpires too are now being helped by this technology.

RELATED Emerging Trends for Enterprise AI Agents in Healthcare
Autonomous AI agents use cases show how healthcare providers can use these new technologies to enhance patient care and experiences.
  • Article:Industry
  • Nutanix-Newsroom:Article

July 22, 2025

Baseball AI algorithms can analyze new and novel types of metrics and deliver far more accurate projections than humans or even more traditional statistics. Using decades of historic data to simulate and predict different pitcher-hitter match-ups, the technology can truly become a crystal ball for managers. Houston Astros' AI integrations are partially credited for their World Series win. 

As information-dependent as it is, AI is creating a new wave of data demands that will require the power and help of larger and larger cloud-computing resources.

Perhaps that’s why Major League Baseball recently partnered with Google Cloud to create more personalized fan experiences. 

“For almost 25 years, MLB has been investing in technologies that capture the game with extraordinary detail—every player, every object on the field, the spin of the ball, you name it,” says Sean Curtis, senior vice president of technology and infrastructure operations at MLB. “MLB has been a pioneer in transforming real-time data into a better experience for our fans.”

No two viewers have the same interests and this collaboration allows every viewer to feel better served.

“Technology is a bridge that connects people to their favorite teams and players as a daily part of their modern lives,” said Josh Frost, senior vice president of product baseball and content experience at MLB. “We’re generating new ways for people to interact with baseball that they hadn’t considered before, and Vertex AI will help us take this to the next level.”

Baseball’s investment in technology is also reflected in the stadiums themselves, with many of the newest venues employing screens and on-field and on-player sensors that also help improve the viewer experience.

Beyond Baseball

Stat-heavy baseball is an obvious starting point, Heaton says, but their approach could also be useful in other sports like cricket, basketball or hockey. Baseball, or America’s pastime, has long been steeped in tradition and by embracing an emerging technology like AI, the sport is proving that other industries, even the old-fashioned ones, can do the same. It also illustrates that the technology can be used to truly help humans, not replace them.

In the meantime, “it’s definitely fun that I can watch a baseball game and say it’s for research purposes,” Heaton said.

This is an updated version of the article originally published on June 29, 2022. Chase Guttman updated this story.

Julian Smith is a contributing writer. He is the executive editor Atellan Media and author of Aloha Rodeo and Smokejumper published by HarperCollins. He writes about green tech, sustainability, adventure, culture and history. 

© 2026 Nutanix, Inc. All rights reserved. For additional information and important legal disclaimers, please go here.

Related Articles