By Andrea Osika, Sr. PMM, Sustainability, Nutanix
The AI wave is maturing as organizations transition from experimental pilots to business-critical production. This shift has moved the focus from resource-heavy model training to the high-volume demands of AI inference, which is projected to account for roughly two-thirds of all enterprise AI compute in 2026. Inference is persistent, distributed, and resides closer to the user, so the sustainability challenge has fundamentally changed. The bar for success has moved from one-time instances or rounds of efficiently training a model to deployment of continuous, system-level efficiency across the entire hybrid multicloud stack.
In response, the industry is moving toward a more intentional, "AI-Smart" approach to infrastructure. This means treating AI not as an isolated side project, but as a core capability requiring the same resiliency, Day-2 operational discipline, and integrated security as any other mission-critical system. This transition is critical since siloed, legacy infrastructure was not engineered for the high power density or the persistent uptime requirements of modern, distributed AI logic. In fact, these constraints are pushing hyperscalers to redesign their sites.
As energy-intensive AI services scale, infrastructure efficiency is shifting from a secondary utility concern to a primary constraint. A rigid IT infrastructure design risks creating operational bottlenecks and financial unpredictability (TCO) as environments struggle to adapt to rising energy costs or power limitations. Efficiency should be factored into IT modernization alongside performance, sovereignty, and security. Sustainable IT infrastructure centers on building on a flexible, unified platform that can remain adaptable for the next generation of AI execution.
A sustainable IT architecture is the result of intentional design that reduces waste, improves utilization, and adapts as workloads evolve.
Poor utilization is the biggest driver of energy waste. Efficiently scaling AI won’t work if the legacy tech debt underneath it isn’t addressed. Sustainable AI starts with consolidation. By using Hyperconverged Infrastructure (HCI), enterprises can keep infrastructure “tightly packed" and scale resources only when demand rises. This can provide resiliency needed to help eliminate “stranded watts” that can be found in legacy infrastructure, i.e., power allocated to underutilized or “zombie” servers, and absorb constant hardware churn (GPUs/CPUs) without overprovisioning.
Evidence-Based Impact: On average, customers that shared their experiences using the Nutanix Cloud Infrastructure (NCI) solution in the Nutanix Cloud Platform (NCP) reported a 50% reduction in energy consumption, and approximately a 66% decrease in physical footprint after replacing legacy systems.*
* These space or energy savings claims are average results based on case studies of representative Nutanix customers that are publicly available on the Nutanix website as of December 10, 2025, and were initially published between Jan 1, 2023, and Dec 10, 2025. Because potential customer outcomes depend on a variety of factors including their use case, individual requirements, and operating environments, these accounts should not be construed to be a promise or obligation to deliver specific outcomes. We invite you to contact Nutanix here to discuss how we may be able to provide an optimal solution for your specific circumstances.
Running AI reliably on "Day 2" is where sustainability is won or lost. Using cloud-native practices like virtualization and containers to achieve enterprise efficiency allows for automated scaling designed to better align power usage with demand. This approach maximizes utilization by packing workloads onto fewer systems and autoscaling only when needed. The objective is less idle capacity, lower power and cooling demand, and faster scaling for AI workloads moving into production. This targets complexity and "AI Sprawl", where forgotten models continue to unnecessarily consume energy and create security gaps. For many workloads, running containers inside a platform can be an efficient course for the business because:
You cannot manage what you cannot see. Integrated observability provides real-time visibility into the relationship between performance and power-per-watt. This enables IT teams to measure and monitor energy demand dynamically, aligning with the World Economic Forum’s call for "energy-aware" compute. Because AI introduces new, distributed data surfaces, security and power monitoring should be built into the foundation of the platform. By maintaining real-time visibility into the entire stack, IT can intelligently allocate energy resources to mission-critical models, helping ensure that power is used where it has the highest business impact and the lowest risk.
A holistic workspace monitor solution can help organizations holistically understand their infrastructure environment – including storage, compute and networking resources – with useful metrics like CPU utilization, memory usage and storage IOPS as well as power metrics.
The Nutanix Prism dashboard allows organizations to monitor the power usage alongside key system metrics such as CPU usage, memory usage, disk I/O and more.
Additionally, features like Token-Based Guardrails which implements rate limiting and granular cost controls can help to prevent "bill shock" and resource exhaustion. The implementation of Max-Step Execution Caps at the infrastructure or gateway layer can also provide a mechanism to help manage the operational overhead of autonomous agents. By enabling the termination of workflows after a predefined number of steps, this feature can help mitigate the risk of recursive 'hallucination cycles' and potentially reduce the energy waste often associated with unconstrained, non-terminal reasoning.
One of the most impactful ways to reduce operational energy consumption is to pair the right accelerator with the right task. Enterprise IT infrastructure can be optimized by matching the right tool for the job.
The Strategy: Use CPUs when you can, and GPUs only when you must, to maximize performance-per-watt. Leverage a software-defined infrastructure that uses automation to right-size and place workloads efficiently, which can help reduce waste.
| Feature | Traditional Hardware | Software-Defined Infrastructure |
| GPU Usage | 1 GPU = 1 Model | Fractional GPU Sharing (Higher utilization) |
| Compute | Buy new GPUs for all AI | Uses existing CPUs |
| Waste | Provision for peak capacity | Right-Sizing (Match supply to demand) |
| Networking | CPU handles all traffic | DPU Offloading (Free CPU for core tasks) |
Recent research suggests that running AI in a low-carbon location can, in some scenarios, significantly reduce emissions, often a more impactful move than a hardware upgrade alone. A hybrid multicloud architecture can provide the necessary mobility to shift heavy compute to where the grid is cleanest.
By combining renewable-rich cloud regions with localized edge inference, enterprises can deliver AI execution that is both carbon-aware and high-performance.
1 https://www.mdpi.com/2079-8954/13/9/797
2 https://www.arm.com/markets/cloud-ai
3 https://www.rapidus.inc/en/tech/te0017/
Sustainability includes the embodied carbon of the hardware itself. This highlights the importance of choosing energy-efficient, EPEAT-registered systems and extending asset life through modular upgrades. One of the most overlooked sustainability levers is repurposing aging GPUs for lighter inference tasks rather than replacing them prematurely. Another lever is repurposing existing hardware such as servers and external storage arrays. This approach aligns with circular economy principles, reducing e-waste and conserving resources while preserving financial investments.
A sustainable way to run AI is through a unified, hybrid platform that treats energy as a finite resource. By moving from a fragmented "AI-first" mindset to a disciplined, AI-smart architecture, organizations can work toward reducing energy demand, maximize utilization, and scale responsibly.
Unlike model training, which is a localized, high-intensity batch process, inference is a continuous, real-time workload. It requires infrastructure that prioritizes resiliency and power-per-watt to avoid unmanageable energy costs during 24/7 production.
Nutanix provides a unified platform that can reduce the physical footprint in a data center, which in turn can lead to savings in power and cooling.
Yes, depending on the use case. For many Small Language Models (SLMs), modern CPUs with built-in accelerators can offer high energy efficiency and lower TCO compared to dedicated GPUs. Rule of thumb: Use CPUs when appropriate and GPUs when you must.
Traditional networking often relies on a sprawl of dedicated hardware appliances that stay powered on 24/7. By virtualizing these functions—using solutions like Nutanix Flow Virtual Networking—there is opportunity to eliminate redundant hardware switches and routers. This could lower data center energy consumption by reducing the "vampire power" draw from idle, standalone gear.