Introduction
At Nutanix .NEXT 2025, our annual user conference, we announced how the latest version of Nutanix Enterprise AI enables agentic support for NVIDIA agentic workflows. In this blog we’ll dive into some of the other exciting new features in this release.
What is Nutanix Enterprise AI?
The Nutanix Enterprise AI solution is an enterprise-grade platform that promotes the management of large language models (LLMs) and inference endpoints. It enables IT admins to create the resources needed to deliver inference-as-a-service to a developer or data scientist for use in their AI applications, and provides a platform for developers to experiment with LLMs, with endpoint specific performance metrics.
New Features
New Models
Nutanix Enterprise AI makes it easy for users to import models from both the NVIDIA NGC Catalog for NVIDIA NIMs, as well as Hugging Face.
The previous version of Nutanix Enterprise AI provided easy access to NVIDIA NIMs of different types, including: Text Generation, Embedding, Reranker, and Safety models. However, only Text Generation based models were available for direct import from Hugging Face. With Nutanix Enterprise AI 2.3, you can now import multi-modal, vision, safety, and embedding models to support your varying AI use cases, such as RAG and/or AI agentic workflows. This includes Llama-4-Scout-17B-16E-Instruct, which is a recently released multi-modal model from Meta. For more information about running this on Nutanix Enterprise AI, check out this great post by our AI Engineering team: https://www.nutanix.dev/2025/05/09/llama-4-on-nutanix-enterprise-ai/
Five new validated NVIDIA NIMs have been added, including Llama-3.3-Nemotron-Super-49b-v1, which is a reasoning model. Reasoning models provide more accurate answers by breaking down problems into logical steps and typically sharing the “thinking” process transparently with the user. They are crucial in agentic workflows because they enable agents to perform complex, intelligent actions by simulating human-like thinking and decision-making.
For the full list of models in this release, check out the Pre-validated Models section in the Nutanix Enterprise AI documentation.
Broader Scope for Model Testing
In Nutanix Enterprise AI 2.3, in addition to testing text generation models, you can now test out other types of models directly from the UI, including vision and safety models.
For more information on testing model endpoints, please refer to the Testing an Endpoint section in the documentation.
Importing Hugging Face Models via URL
In order to import a Hugging Face model that is not in the current list of validated models, you are able to download the model yourself, and import using the manual import method.
For internet-connected sites, you can now save the extra step of having to download and re-upload, by providing the Hugging Face URL.
For more details on this feature, please refer to the documentation.
Hibernating and Resuming an Endpoint
This release enables the hibernation and resuming of endpoints, which makes it simple to free up GPU resources for spinning up additional endpoints and models (e.g. for testing purposes), without having to permanently delete existing endpoints.
Hibernating an endpoint will quiesce the endpoint activity, release the compute and GPU resources, but retain the configuration of the endpoint, so that it can easily be resumed once resources are available.
Please refer to the Endpoint Hibernation section in the documentation for more details.
Advanced Configuration for Endpoints
This release also enables you to define the maximum number of tokens per endpoint, giving you the choice to pick the right context length per use case and optimizing the available GPU memory.
For pre-validated Hugging Face models, a default token count is provided. The maximum token count is determined per model with a limit of 1048576 tokens when GPU is enabled, and 4096 when GPU is not enabled.
Please refer to the Creating an Endpoint section in the documentation for more details.
Other Features
- Exporting Metrics with OpenTelemetry - configure an OpenTelemetry collector to receive metrics directly from Nutanix Enterprise AI, enabling standardized telemetry collection, aggregation, and forwarding using the open OTel protocol.
- Intel AMX Support - on Intel AMX systems, CPU-enabled endpoints will leverage Intel AMX acceleration, assisting performance without GPU.
- Japanese language support - making Nutanix Enterprise AI even more accessible to global teams.
Customers can check out the Release Notes for a full list of what’s new.
Nutanix Enterprise AI makes it easy to manage your LLMs and API endpoints, and these new features further unlock the experience and provide simplicity.
Be sure to take a Test Drive of Nutanix Enterprise AI and check out our AI on Nutanix YouTube Playlist for more information!
©2025 Nutanix, Inc. All rights reserved. Nutanix, the Nutanix logo and all Nutanix product and service names mentioned herein are registered trademarks or trademarks of Nutanix, Inc. in the United States and other countries. All other brand names mentioned herein are for identification purposes only and may be the trademarks of their respective holder(s).
Our decision to link to or reference an external site should not be considered an endorsement of any content on such a site. Certain information contained in this post may relate to, or be based on, studies, publications, surveys and other data obtained from third-party sources and our own internal estimates and research. While we believe these third-party studies, publications, surveys and other data are reliable as of the date of this paper, they have not independently verified unless specifically stated, and we make no representation as to the adequacy, fairness, accuracy, or completeness of any information obtained from a third-party.
All code samples are unofficial, are unsupported and will require extensive modification before use in a production environment. This content may reflect an experiment in a test environment. Results, benefits, savings, or other outcomes described depend on a variety of factors including use case, individual requirements, and operating environments, and this publication should not be construed as a promise or obligation to deliver specific outcomes.
This content may reflect an experiment in a test environment. Results, benefits, savings, or other outcomes described depend on a variety of factors including use case, individual requirements, and operating environments, and this publication should not be construed as a promise or obligation to deliver specific outcomes.