Nutanix Enterprise AI 2.6 is here - Orchestrating the Hybrid AI Frontier

By Ashwini Vasanth, Principal Product Manager, Nutanix

The rapid adoption of Artificial Intelligence has left many organizations at a crossroads. While the initial journey often begins in the public cloud to leverage inference as a service credits, moving these proof-of-concepts to production often requires the governance, control, and data sovereignty found in private cloud or on-premises environments. 

Today’s enterprises find themselves managing a hybrid AI approach, balancing a mix of hosted provider models (like OpenAI and AWS Bedrock), open-source self-hosted models, and organization-specific fine-tuned models. When an organization scales AI, they don't just use one model; they use a hybrid ecosystem. Data Scientists are fine-tuning specialized models for accuracy. Application Developers are hitting hosted model providers for rapid prototyping. IT Operations is self-hosting models for data sovereignty. This fragmentation creates significant hurdles: inconsistent security, observability nightmares, and the challenge of managing costs across different clusters and providers.

Nutanix Enterprise AI (NAI) introduces AI Gateway [Tech Preview]

Nutanix Enterprise AI (NAI) 2.6 addresses these challenges head-on by providing a centralized layer of management, control, and observability with the AI Gateway. It  provides a unified, secure endpoint that lets enterprises use cloud hosted models (and token credits) alongside private LLMs with consistent authentication and observability.

Nutanix Enterprise AI (NAI) 2.6 dashboard

Unified API: Organizations can now access external provider models and self-hosted models through a single API. This allows developers to use one API to access multiple providers as well as self-hosted models.

Granular Token-Based Rate Limiting: The AI Gateway provides granular cost and rate governance with token-based rate limiting per endpoint as well as per user to help prevent the "bill shock". You gain visibility into who is consuming what, allowing for efficient cost allocation and resource optimization across the entire organization.

Fallback & High Availability: Reliability is critical for production AI. The AI Gateway via Unified Endpoints supports endpoint fallback, automatically routing traffic to a healthy backup model endpoint if the primary model endpoint fails. 

Load Balancing across local and remote endpoints: Load Balancing support in the unified endpoint allows for balancing the load across model endpoints, even when they’re located on multiple clusters or hosted providers.

Empowering Agents with Remote MCP Server Access (Tech Preview)

With support for the Access Controls for Model Context Protocol (MCP) Servers, your agents can securely connect to enterprise tools and private data sources, turning static models into active participants in your business workflows within governed boundaries. The AI Gateway applies unified Role-Based Access Control (RBAC) and audit trails for all tool calls connected to the MCP Server. 

Optimizing Performance for Efficiency

GPUs are expensive, and redundant processing in multi-turn conversations or large-context tasks can drive up costs significantly. NAI 2.6 introduces two major inference optimizations:

kvCache Aware Routing (Tech Preview): This strategy directs requests to specific GPU workers, avoiding wasteful recomputation, reducing "Time to First Token" (TTFT) and increasing system throughput.

vLLM Speculative Decoding: The inference engine can predict and suggest future tokens based on past patterns. If the LLM accepts these suggestions, it saves multiple iterations per query, leading to faster responses for repetitive tasks like code generation or summarization.

Customization and New Capabilities

Beyond infrastructure and management, NAI 2.6 adds features that allow data scientists and AI developers to push the boundaries of their AI applications:

LoRA-Based Supervised Fine-Tuning (Tech Preview): Data scientists and AI developers can now fine-tune open-source models using organization-specific datasets directly within NAI to evolve the models from generalists to specialists. The use of LoRA (Low-Rank Adaptation) is designed to help make this compute-efficient.

Advanced vLLM Inference Sandbox: This sandbox allows customers to test and experiment with the latest community versions of vLLM and custom configuration parameters in a controlled environment.

Speech-to-Text Support: NAI now supports the NVIDIA NIM for the OpenAI Whisper Large v3 model, enabling organizations to integrate speech-to-text capabilities into their workflows.

With Nutanix Enterprise AI 2.6, organizations don't have to deal with a fragmented experience across different users and model providers within their organizations. By unifying these worlds through a single, secure gateway, Nutanix is making the vision of true hybrid AI a reality.

*Tech Preview indicates these features should not be used in production environments.

©2026 Nutanix, Inc. All rights reserved. Nutanix, the Nutanix logo and all Nutanix product and service names mentioned are registered trademarks or trademarks of Nutanix, Inc. in the United States and other countries. All other brand names mentioned are for identification purposes only and may be the trademarks of their respective holder(s). This content may contain express and implied forward-looking statements, including related to Tech Preview releases and planned future general availability releases of NaAI, which are not historical facts and are instead based on our current expectations, estimates, and beliefs. The accuracy of such statements involves risks and uncertainties and depends upon future events, including those that may be beyond our control, and actual results may differ materially and adversely from those anticipated or implied by such statements. Any forward-looking statements included speak only as of the date hereof and, except as required by law, we assume no obligation to update or otherwise revise any such forward-looking statements to reflect subsequent events or circumstances.