We are living in a world of Big Data. Rapid advancements in technology mean that the businesses of today are collecting more data than ever before, and that data continues to grow at an exponential rate. But unstructured data management (or data that isn’t properly formatted and stored in useful schema) is the Achilles heel of the ongoing data revolution.
It’s nearly impossible to wrap your head around the sheer amount of data the business world is generating. A study by IDC estimates that there will be 175 trillion gigabytes of data by the end of 2025. If you were to store all that data on DVDs, the resulting stack would wrap around the Earth 222 times.
Adding to the complexity is the wide variety of storage technologies that allow data to be held, managed and retrieved by applications.
“Storage is kind of like Baskin-Robbins, except there might be more than 31 flavors,” said David Kanter, co-founder and board member of MLCommons and Head of MLPerf, in an interview with The Forecast.
Kanter explained that object storage, edge storage, direct-attached storage, network-attached storage and many more varieties have distinct performance characteristics. When training models process trillions of data points, subtle differences between storage options can have significant effects on performance.
Add to that innovation around accessing and managing storage in different ways. One example is how IT teams can use their software-defined systems powered by Nutanix Cloud Platform (NCP) to manage their hyperconverged storage and external storage. Dell PowerFlex, released in 2025, was among the first external storage integrated with NCP.
Much of this growth in data is due to the rise of enterprise data analytics and the explosive proliferation of AI/ML applications over the past several years. Big data is the lifeblood of these applications, allowing organizations to harness and analyze vast amounts of data to create value-adding insights—but this is only possible when it’s structured properly.
This historic change has caused unstructured data management to rise to the top of the CTO priority list.
“For today’s AI/ML applications (specifically Large Language Models used today), the more access to data there is to train these models, the better the results ultimately can be for the end users of these AI applications,” Alex Ameida, Senior Product Marketing Manager at Nutanix, told The Forecast.
As the industry pushes forward and focuses on improving the accuracy and capabilities of its AI models, the more important data management is becoming.
“The larger the data pool fed into the models being trained, the better,” said Ameida.
With more data, enterprises have access to more powerful AI models and data-driven insights that can provide tangible business value. But, these massive amounts of data come with challenges, too. Most data being generated today is unstructured (around 80 percent of corporate enterprise data).
New data is being generated every second, and the responsibility of managing and storing all that data is huge, especially in the face of continuously evolving regulations. Today’s enterprises must look ahead to anticipate future data needs and embrace new technologies that are emerging to help them navigate this increasingly challenging landscape.
Unstructured data is raw data that has no defined formatting. Some examples of unstructured and semi-structured data would be social media posts, email messages, documents, images, videos, audio files, and several other documents.
At the dawn of the third decade of the 21st century, we’re spending most of our days using some kind of internet-connected electronic device. Almost every action we take creates data – whether that’s opening a website, using social media, or even walking down the road with your phone in your pocket.
It’s not just people creating this data either. The birth of the Internet of Things (IoT) means that smart devices, such as cars, televisions, and home electronics, are generating massive amounts of data, too. As the sheer size and scope of the IoT continues to grow each year, it’s easy to see why the world’s data is growing exponentially and so quickly.
Some of this data is structured, which means it has a specific format that can be quickly processed. A simple example of structured data is a customer database, where each customer’s name, address, email, and other personal details are stored in pre-defined places. To be defined as structured, data must use a strict format that allows for easy access and analysis.
This type of data is challenging because it comes in many different formats. It can’t be searched quickly, and it’s difficult to sort valuable data from the unimportant. Left unsorted and unmanaged, this unstructured data can quickly fill up storage infrastructure, causing enterprises to exponentially increase costs for little business benefit.
Enterprises that are grappling with how to manage their current unstructured data and planning future data storage solutions must consider several key factors. First among these factors is the data storage strategy.
Object storage is one of the most popular approaches, managing cloud-based data as “blobs” or “objects.” Each object comes with its own metadata and this strategy allows users to retain large amounts of unstructured data.
This is usually an upgrade from direct-attached storage, which connects a hard drive directly to the end user’s computer and limits the amount of accessible data.
Finally, edge storage is a suitable strategy for companies that collect large volumes of data at satellite locations and want to avoid transmitting huge amounts of raw data to centralized repositories. Companies can query and analyze data at edge computing centers before transmitting just the results to be batched with other edge analyses and processed further.
When creating a data storage strategy, companies should consider the following questions:
Capacity and Scaling – How much storage space do you need, now and in the future?
Performance – How many simultaneous users does your system need to support and how fast must it process data?
Accessibility – Who needs daily access to your data? How can you enable fast and efficient access to data from geographically dispersed teams while maintaining data security and integrity?
Security and Compliance – How do you keep data safe from unauthorized access or accidental loss? Is certain data subject to stricter security protocols than other data? Is your industry subject to specific data protection legislation?
Ease of Deployment –Consider the process of building and deploying new architecture, what is needed in terms of new hardware and software, and how your business will be affected during the transition.
Backup and recovery –Backups may double or triple your data storage needs, and they can be made at fixed intervals--such as hourly, daily, or in real-time. Anticipate potential data losses or breaches before defining your backup strategy.
Data lakes are one of the most widely used solutions for storing unstructured data. This is a centralized repository where companies can store both structured and unstructured data at scale.
Ameida said that once this data is consolidated into a data lake.
“Tools like Spark, Presto, and Dremio can then be used to query and analyze this data,” he said. For example, he pointed to Nutanix Objects, which helps data lake users quickly filter objects for efficient querying and analysis.
However, Ameida warns that data lakes are not the best option for every use case.
“In specific circumstances, for example, when performance requirements are paramount, data lakes may not be the best approach,” he said. “Use cases calling for heavy data analytics, cloud-native apps, and all-flash configurations are likely a better fit for data sets over an object-based approach like a data lake.”
The flexible nature of data lakes is a major reason why over half of enterprises have built a data lake and another 22% plan to establish a data lake within the next three years.
With something as big and challenging as handling terabytes of unstructured data, it can be tempting to drag your feet. But enterprises that procrastinate on data management run the risk of falling behind their competitors.
Organizations that make data management a priority will benefit from lower long-term storage costs and greater efficiency. The time and financial savings free up IT teams for higher-value tasks.
Additionally, unstructured data stores may contain critical information that can help business leaders make better decisions. But extracting this important information can be a challenge.
“This process requires careful attention from multiple roles within the IT data team,” said Ameida. “It’s important to save data that’s covered under data privacy laws or may have business value while getting rid of unnecessary data that’s simply taking up space.
Technology solutions designed to help enterprises store and manage their unstructured data are the backbone of business intelligence tools. A proper unstructured data management strategy allows these tools to discover the nuggets of gold hidden within raw data and produce insights that help executives deliver business value with confidence.
Editor’s note: Learn about Nutanix Unified Storage, a software-defined approach that consolidates file, object and block storage while offering rich data services such as analytics, lifecycle management, cybersecurity, and strong data protection.
This is an updated article that originally published on May 29, 2020 and updated September 15, 2022. This latest update was done by Marcus Taylor.
Michael Brenner is a keynote speaker, author and CEO of Marketing Insider Group. Michael has written hundreds of articles on sites such as Forbes, Entrepreneur Magazine, and The Guardian and he speaks at dozens of leadership conferences each year covering topics such as marketing, leadership, technology and business strategy. Follow him @BrennerMichael.
© 2025 Nutanix, Inc. All rights reserved. For additional information and important legal disclaimers, please go here.