Blog

Radically Simple Hypervisor Upgrades

By Steve Kaplan

Overview

Many of the key product and feature decisions at Nutanix are driven by our mission to deliver uncompromising simplicity to the enterprise datacenter. NOS 4.0 release announced in May, Nutanix brought one-click automation to the non-disruptive upgrade process of the Nutanix Operating System. Automated non-disruptive NOS upgrades make the lives of IT admins easier by simplifying IT operations and enabling always-on operation. Our customers loved it and told the world about it.

In the next release of the operating system, NOS 4.1, we are going a step further and tackling another significant customer pain point — the hypervisor upgrade. Hypervisor upgrade challenges account for a sizable number of customer cases that the Nutanix support team has had to deal with. The new hypervisor upgrade feature brings the same degree of simplicity and functionality to the upgrade of the hypervisor as with the Nutanix Operating System upgrade. Moreover, this feature makes the process of upgrading hypervisors uniform across vendors.

At a high level, the hypervisor upgrade feature runs pre-upgrade checks that are specific to Nutanix to ensure compatibility and state readiness. The system will alert the admin appropriately in the case of a pre-check failure. Once the pre-upgrade checks complete successfully, NOS puts nodes in maintenance mode, restarts individual nodes sequentially, and applies post-upgrade configurations. Through this entire process, the cluster is up and and running.

This release will support upgrade of VMware ESXi and Microsoft Hyper-V, with KVM support coming soon. Hypervisor upgrade will be available in all editions of our software. Since the feature is implemented entirely in software, customers with Nutanix clusters can get this functionality simply by upgrading to NOS 4.1 (using one-click NOS upgrade!)

How It Works

Let’s start with an overview of what’s changed with the way hypervisor upgrades are handled on Nutanix clusters before and after NOS 4.1.

Steps

Before 4.1

With Nutanix Hypervisor upgrade

Upload Hypervisor bits

Manual – one node at a time

Upload to all nodes via UI in the background

Prechecks

Manual, error prone and time consuming (20 minutes per node)

Automatic

Upgrade process

Manual, susceptible to VUM errors

Automatically upgrades one node at a time

Post upgrade configurations

Manual/repetitive and possibility of missing steps

automated

Time spent per 4 node block watching the screen/uploading and running checks

3 -4 hours

30-45 minutes

 

Pre-Upgrade Checks

During the pre-upgrade check phase, Nutanix runs a set of tests to verify the state of the cluster. Here’s a partial list of verifications:

  1. Nutanix cluster services are up on all nodes

  2. Hypervisor bits are not corrupt

  3. Hypervisor from and to version is compatible with Nutanix OS

  4. The cluster is in optimal state (e.g., replication/zookeeper status)

  5. ESXi is part of VMware HA/DRS cluster by using Vcenter credentials

If admins want to stage hypervisor upgrades, they have the option of running the pre-upgrade checks without going through with the upgrade process. This helps to prepare in advance and address any failures.

Upgrade Process

During the upgrade phase, Nutanix puts nodes in maintenance mode one at a time. Guest VMs are migrated using native HA/DRS provided by the hypervisor. After the upgrade of the hypervisor, nodes will be removed from maintenance mode and hypervisor configurations will be optimized for Nutanix and the “token” to upgrade will be passed to the next node.

Post-Upgrade Checks

During the post-upgrade phase, Nutanix runs a set of post-checks and configurations. Here’s a partial list of steps:

  1. Update the Nutanix Plugins (e.g., VAAI plugin)

  2. Configure compute/network and storage parameters

  3. Update drivers if needed. (e.g., Intel network drivers)

At this point, all the nodes in the cluster are updated and on the hypervisor version.

What’s Next

We will continue to drive simplicity and ease of use in the upgrade process. Here’s the short list of feature improvements that we’re planning for hypervisor and NOS upgrade:

  1. Provide easier updates of minor patches

  2. Schedule upgrades to happen at a later time

  3. Provide support for mixed hypervisor clusters

If you have ideas for how to improve the capability further, we’d love to hear from you!