NCI 7.5 Performance: AHV for G2K Scale

Introduction

The Nutanix Cloud Infrastructure (NCI) 7.5 release, and its accompanying Nutanix AHV hypervisor 11.0, introduce significant performance improvements across the platform, building on our commitment to continuously enhancing the hypervisor for enterprise scale, particularly for Global 2000 (G2K) environments.

This blog post explores key areas of performance enhancement in NCI 7.5, including substantial improvements to upgrade/migration operations and core API performance for VM operations.

Upgrade and Migration Performance

The NCI 7.5 release delivers notable improvements in the speed and efficiency of cluster operations, particularly impacting the time required for upgrades and live migrations.

For upgrades, AHV has introduced optimizations and new techniques to massively reduce the time for the most intensive upgrade type, re-image upgrades. This specific type of upgrade uses a special reimager environment to reinstall all of AHV and then re-apply the configuration after the reimage. This style of upgrade is crucial for the reliability and stability of upgrades which impact a very large number of components on the system, but has historically been slower than in-place upgrades.

In NCI 7.5 and AHV 11, major changes have been made to substantially reduce these reimage upgrade times. In an in-house test cluster, the result was a halving of the total time needed to upgrade on representative hardware and making them comparable with in-place upgrades. The actual time needed to upgrade will vary depending on the size of VMs that need to be migrated and the hardware capabilities.

Relative Speed Upgrade

There were a number of key enhancements that made this improvement in upgrade speed possible, including unification of the AHV and reimager environments, optimizations around management of logical volumes and notably the ability to entirely avoid the two host reboots (one into the reimage environment and one out of the environment) which by itself could save over 10 minutes on hardware which takes a substantial period to perform a power cycle.

Continuing the trend from recent releases, there has also been more improvements to reduce the time taken to migrate VMs, this time with the introduction of Zero-copy transfers. This optimization allows for more efficient management of VM memory during transfer, eliminating the need to duplicate the VM's memory content when interacting with the host's networking stack. The benefits of Zero-copy include:

Faster migration: Reducing the overall time required for a live migration.
Lower CPU consumption: Decreasing the host's CPU overhead during the migration process.

Beyond these improvements, we’re also exploring other ways that we can reduce the time needed to upgrade a cluster, so watch this space!

API Performance Improvements

The NCI 7.5 development cycle also featured some internal introspection of the core API interactions, which govern the performance of Virtual Machine operations. This introspection identified some optimizations which improve the majority of VM operations, and can be seen when both managing large operations (such as API calls to power-on many VMs) and in the general responsiveness of the APIs to reduce overall wait times.

The performance uplift stems from several improvements:

Faster Interactions with the Task Management service, including consolidation of calls to update these tasks.
Parallelising operations made on the host, increasing the number of operations on the host which can be run in parallel.
Reducing database interactions by restructuring the relatively higher cost database interactions, including removing some entirely.
Scheduler Changes to further increase parallelism in VM operations by reducing lock contentions and batching some updates together.

Deep Dive: VM Power On Optimizations

Our initial internal analysis of the bulk VM power-on operation showed that on a 16-node cluster, 3,000 VMs took around 30 minutes for smaller VMs and longer for bulk power-on of VMs with extra devices.

Bulk VM power-on was chosen as a target to optimize as a repeatable and isolatable example and also representative of business-critical operations, such as ensuring the best responsiveness during VDI bootstorms or minimizing the recovery time for VMs during HA or DR events.

The main areas of performance improvement identified were around the control plane's efficiency. Putting the bulk power-on case aside, individual VM power-on showed the control plane consuming roughly 25% of the total time. The NCI 7.5 development cycle targeted a reduction in the number of calls made between services for a single power-on and reduced this number of calls by 20%, by implementing the improvements outlined above. In the specific case of power-on, this meant:

Consolidating calls to update tasks.
Optimizing lookups to the internal database for VM info.
Reducing the number of sequential calls made to the host (e.g., for port creation, bridge information, and ifconfig) by running tasks in parallel and consolidating network information gathering.
Removing redundant lookups which had been obsoleted and were no longer required.

With these optimizations, in our internal tests, the time to bulk power-on VMs has been reduced by between 25% and 50% depending on the VM configurations. These results, obviously, depend on the use case and results in customer environments may be different.

However, it’s worth noting that the design changes that we’ve implemented to drive such performance improvements aren’t restricted to VM Power-On. The holistic approach taken to optimization has resulted in faster VM operations across all API calls. For example, in our internal tests, VM power off, which was also used by the DR use cases, also benefits from an improvement between 25% and 33%.

DR Integration Improvements

In addition to the general API performance improvements, Disaster Recovery (DR) operations also received targeted performance improvements.

When Nutanix introduced the new v4 API family, it brought with it more opportunities for performance optimization. One example is in the VM restore operation. Similar to the VM power on example above, in our internal testing, we’ve been able to substantially improve the performance of the VM restore operation when using the v4 APIs, and in our internal tests it was over 4 times faster than using the v3 restore operation when restoring a single VM.

Even when restoring multiple VMs at the same time (which was already parallelized, hence the potential for improvement was lower than the single VM restore case), the new v4 integration is 30% faster for median/90th percentile cases.

Moving to the v4 APIs doesn’t just bring performance improvements, it also simplifies the DR integration. For example, the v4 APIs introduce a method to obtain a VMs configuration in an opaque way, and apply that to another VM. This simplified approach means consumers of this API can take steps to preserve the new features in the new configuration and also benefit more easily from further performance improvements in the APIs.

Summary

The NCI 7.5 release represents another significant step in optimizing Nutanix AHV for G2K scale. By focusing on fundamental improvements to upgrade, migration, and core VM API performance, Nutanix continues to ensure that AHV is not only robust and highly available but also exceptionally fast and efficient for the most demanding enterprise workloads.

©2026 Nutanix, Inc. All rights reserved. Nutanix, the Nutanix logo and all Nutanix product and service names mentioned herein are registered trademarks or trademarks of Nutanix, Inc. in the United States and other countries. All other brand names mentioned herein are for identification purposes only and may be the trademarks of their respective holder(s).This content reflects an experiment in a test environment. Results, benefits, savings, or other outcomes described depend on a variety of factors including use case, individual requirements, and operating environments, and this publication should not be construed as a promise or obligation to deliver specific outcomes.

NCI 7.5 Performance: AHV for G2K Scale