How to Build a Sustainable, Energy Efficient IT Infrastructure for AI: 6 Core Principles 

By Andrea Osika, Sr. PMM, Sustainability, Nutanix

Modernizing for the Inference Era: Balancing Performance and Energy Constraints

The AI wave is maturing as organizations transition from experimental pilots to business-critical production. This shift has moved the focus from resource-heavy model training to the high-volume demands of AI inference, which is projected to account for roughly two-thirds of all enterprise AI compute in 2026.  Inference is persistent, distributed, and resides closer to the user, so the sustainability challenge has fundamentally changed. The bar for success has moved from one-time instances or rounds of efficiently training a model to deployment of continuous, system-level efficiency across the entire hybrid multicloud stack.

In response, the industry is moving toward a more intentional, "AI-Smart" approach to infrastructure. This means treating AI not as an isolated side project, but as a core capability requiring the same resiliency, Day-2 operational discipline, and integrated security as any other mission-critical system.  This transition is critical since siloed, legacy infrastructure was not engineered for the high power density or the persistent uptime requirements of modern, distributed AI logic.  In fact, these constraints are pushing hyperscalers to redesign their sites.  

As energy-intensive AI services scale, infrastructure efficiency is shifting from a secondary utility concern to a primary constraint. A rigid IT infrastructure design risks creating operational bottlenecks and financial unpredictability (TCO) as environments struggle to adapt to rising energy costs or power limitations. Efficiency should be factored into IT modernization alongside performance, sovereignty, and security. Sustainable IT infrastructure centers on building on a flexible, unified platform that can remain adaptable for the next generation of AI execution.

The Six Core Principles of Sustainable, Energy Efficient AI Infrastructure

A sustainable IT architecture is the result of intentional design that reduces waste, improves utilization, and adapts as workloads evolve. 

1. Consolidate via HCI to Minimize “stranded watts”

Poor utilization is the biggest driver of energy waste. Efficiently scaling AI won’t work if the legacy tech debt underneath it isn’t addressed. Sustainable AI starts with consolidation. By using Hyperconverged Infrastructure (HCI), enterprises can keep infrastructure “tightly packed" and scale resources only when demand rises. This can provide resiliency needed to help eliminate “stranded watts” that can be found in legacy infrastructure, i.e., power allocated to underutilized or “zombie” servers, and absorb constant hardware churn (GPUs/CPUs) without overprovisioning. 

Evidence-Based Impact: On average, customers that shared their experiences using the Nutanix Cloud Infrastructure (NCI) solution in the Nutanix Cloud Platform (NCP) reported a 50% reduction in energy consumption, and approximately a 66% decrease in physical footprint after replacing legacy systems.* 

* These space or energy savings claims are average results based on case studies of representative Nutanix customers that are publicly available on the Nutanix website as of December 10, 2025, and were initially published between Jan 1, 2023, and Dec 10, 2025. Because potential customer outcomes depend on a variety of factors including their use case, individual requirements, and operating environments, these accounts should not be construed to be a promise or obligation to deliver specific outcomes. We invite you to contact Nutanix here to discuss how we may be able to provide an optimal solution for your specific circumstances.

2. Optimize Day-Two Operations with Kubernetes

Running AI reliably on "Day 2" is where sustainability is won or lost. Using cloud-native practices like virtualization and containers to achieve enterprise efficiency allows for automated scaling designed to better align power usage with demand. This approach maximizes utilization by packing workloads onto fewer systems and autoscaling only when needed. The objective is less idle capacity, lower power and cooling demand, and faster scaling for AI workloads moving into production.  This targets complexity and "AI Sprawl", where forgotten models continue to unnecessarily consume energy and create security gaps. For many workloads, running containers inside a platform can be an efficient course for the business because:

3. Operationalize Integrated Security and Energy Observability

You cannot manage what you cannot see. Integrated observability provides real-time visibility into the relationship between performance and power-per-watt. This enables IT teams to measure and monitor energy demand dynamically, aligning with the World Economic Forum’s call for "energy-aware" compute.  Because AI introduces new, distributed data surfaces, security and power monitoring should be built into the foundation of the platform. By maintaining real-time visibility into the entire stack, IT can intelligently allocate energy resources to mission-critical models, helping ensure that power is used where it has the highest business impact and the lowest risk.

 A holistic workspace monitor solution can help organizations holistically understand their infrastructure environment – including storage, compute and networking resources – with useful metrics like CPU utilization, memory usage and storage IOPS as well as power metrics.

A View of the Nutanix Prism Dashboard with Power Usage Highlighted.

The Nutanix Prism dashboard allows organizations to monitor the power usage alongside key system metrics such as CPU usage, memory usage, disk I/O and more. 

Additionally, features like Token-Based Guardrails which implements rate limiting and granular cost controls can help to prevent "bill shock" and resource exhaustion. The implementation of Max-Step Execution Caps at the infrastructure or gateway layer can also provide a mechanism to help manage the operational overhead of autonomous agents. By enabling the termination of workflows after a predefined number of steps, this feature can help mitigate the risk of recursive 'hallucination cycles' and potentially reduce the energy waste often associated with unconstrained, non-terminal reasoning.  

4. Right-Size Your Silicon: Use CPUs for Inference

One of the most impactful ways to reduce operational energy consumption is to pair the right accelerator with the right task. Enterprise IT infrastructure can be optimized by matching the right tool for the job.

  • CPUs/NPUs: Modern CPUs can offer a practical balance of energy efficiency and potential for a lower TCO for a significant majority of enterprise inference tasks, like RAG for embedding and reranking of data.
  • GPUs/TPUs: Reserve high-wattage accelerators specifically for large-scale tuning and heavy-duty LLM inference, like reasoning.

The Strategy: Use CPUs when you can, and GPUs only when you must, to maximize performance-per-watt. Leverage a software-defined infrastructure that uses automation to right-size and place workloads efficiently, which can help reduce waste.

FeatureTraditional HardwareSoftware-Defined Infrastructure
GPU Usage1 GPU = 1 ModelFractional GPU Sharing (Higher utilization)
ComputeBuy new GPUs for all AIUses existing CPUs
WasteProvision for peak capacityRight-Sizing (Match supply to demand)
NetworkingCPU handles all trafficDPU Offloading (Free CPU for core tasks)

5. Deploy Carbon-Aware Workload Placement

Recent research suggests that running AI in a low-carbon location can, in some scenarios, significantly reduce emissions, often a more impactful move than a hardware upgrade alone. A hybrid multicloud architecture can provide the necessary mobility to shift heavy compute to where the grid is cleanest.

  • Cloud for Intensity: Hyperscalers are increasingly powered by renewable PPAs and high Carbon-Free Energy (CFE) percentages. Their optimized Power Usage Effectiveness (PUE) makes them the ideal default for energy-intensive training and high-volume LLM workloads.
  • Edge for Efficiency: When data must stay local for sovereignty or latency, adopt the "efficiency playbook" at the source. Running inference on the edge using NPUs1, ARM architectures2, or AI PCs reduces backhaul and can cut the total energy required per query3.

By combining renewable-rich cloud regions with localized edge inference, enterprises can deliver AI execution that is both carbon-aware and high-performance.

Nexus Image View of Nutanix Carbon and Power Estimator

1 https://www.mdpi.com/2079-8954/13/9/797
2 https://www.arm.com/markets/cloud-ai
3 https://www.rapidus.inc/en/tech/te0017/

6. Embrace Circularity and Hardware Repurposing

Sustainability includes the embodied carbon of the hardware itself. This highlights the importance of choosing energy-efficient, EPEAT-registered systems and extending asset life through modular upgrades. One of the most overlooked sustainability levers is repurposing aging GPUs for lighter inference tasks rather than replacing them prematurely. Another lever is repurposing existing hardware such as servers and external storage arrays. This approach aligns with circular economy principles, reducing e-waste and conserving resources while preserving financial investments.

A Sustainable Way to Run AI

A sustainable way to run AI is through a unified, hybrid platform that treats energy as a finite resource. By moving from a fragmented "AI-first" mindset to a disciplined, AI-smart architecture, organizations can work toward reducing energy demand, maximize utilization, and scale responsibly.

Sustainable AI FAQ:

Unlike model training, which is a localized, high-intensity batch process, inference is a continuous, real-time workload. It requires infrastructure that prioritizes resiliency and power-per-watt to avoid unmanageable energy costs during 24/7 production.

Nutanix provides a unified platform that can reduce the physical footprint in a data center, which in turn can lead to savings in power and cooling.

Yes, depending on the use case. For many Small Language Models (SLMs), modern CPUs with built-in accelerators can offer high energy efficiency and lower TCO compared to dedicated GPUs. Rule of thumb: Use CPUs when appropriate and GPUs when you must.

Traditional networking often relies on a sprawl of dedicated hardware appliances that stay powered on 24/7. By virtualizing these functions—using solutions like Nutanix Flow Virtual Networking—there is opportunity to eliminate redundant hardware switches and routers. This could lower data center energy consumption by reducing the "vampire power" draw from idle, standalone gear.

©2026 Nutanix, Inc. All rights reserved. Nutanix, the Nutanix logo and all Nutanix product and service names mentioned are registered trademarks or trademarks of Nutanix, Inc. in the United States and other countries. Kubernetes is a registered trademark of The Linux Foundation in the United States and other countries. All other brand names mentioned are for identification purposes only and may be the trademarks of their respective holder(s). Certain information contained in this content may link or refer to, or be based on, studies, publications, surveys, and other data obtained from third-party sources. While we believe these third-party studies, publications, surveys, and other third-party data are reliable as of the date of publication, they have not independently verified unless specifically stated, and we make no representation as to the adequacy, fairness, accuracy, or completeness of any information obtained from a third-party. Our decision to publish, link to or reference third-party data should not be considered an endorsement of any such content.