Nutanix Clusters on AWS - New Improvements to Hibernate & Resume Feature
An enhanced architecture now makes it faster and more cost efficient to hibernate and resume your Nutanix cluster on public clouds like AWS
Nutanix delivers a Hybrid Multicloud Platform that helps you run your applications either on-premises or on public clouds, all managed together on the Nutanix® Cloud Platform. Customers can deploy Nutanix Clusters on AWS and seamlessly integrate those with their Nutanix on-prem clusters for a true hybrid, multicloud experience. Simply deploy the Nutanix software on AWS® EC2 bare-metal instances through Nutanix Clusters deployment portal and use the Prism® management plane to manage your applications anywhere. Keep reading and we’ll show you how.
This blog assumes an intermediate level of familiarity with how Nutanix Clusters are deployed on EC2 bare-metal nodes. If you may not be familiar with our hybrid multicloud platform, we encourage you to read this previous blog, otherwise let’s dive right in.
Figure 1: Nutanix delivers a Hybrid Multicloud solution
A key feature of the Nutanix Clusters™ on AWS (NCA) solution is its unique ability to preserve your data while the EC2 bare-metal nodes are shut down. You can achieve this using a feature we call “Hibernate and Resume”. Hibernation allows customers to save on their EC2 bare-metal spend when the Nutanix cluster is not in use—for example, in test or development environments that aren’t used on the weekends or for disaster recovery when recovery point objectives (RPOs) are fairly long, on the order of a day or more. With just one-click, you can send your cluster into hibernation mode, store your data in EC2 S3 buckets, and leave the EC2 nodes shut down when you don’t need them. Ready to start using the EC2 bare-metal nodes again? We make it as easy as one-click to resume the nodes, populate your data from the S3 bucket and get you on your way. Here is a short, two minute demo video showing the Hibernate feature in action.
The latest version of the Hibernate feature moves a single copy of data over instead of the previous architecture that moved both copies when using a replication factor of 2 (RF2). In this blog we are going to discuss the architectural details of this extremely useful feature that helps you better implement your hybrid cloud architecture.
As a long time Nutant, my nerd bliss is overflowing with the fact a long time hidden feature is being used for hibernate. The AOS software has long had a tiering option that hasn't been used for external storage. If you ran a NCLI command from one of the CVMs today, you will see a Cloud tier:
The Hibernate feature uses the Cloud tier when invoked and while the Nutanix AWS cluster is still up and going through the hibernate process, you can see a new storage tier listed below. This new storage tier is nothing but the AWS S3 object storage. A cloud disk is added for each CVM in the cluster so that it scales as the cluster grows. I think you can start to imagine a world where 3rd party on-prem products could get added here, but I digress.
Hibernate your Nutanix Cluster on AWS
Figure 2: Architecture diagram showing the Hibernation process for a Nutanix Cluster on AWS
The following steps give an overview of the hibernation process:
Important: Users should shut down all the VMs before hibernating their cluster.
NCA runs prechecks against the cluster to verify that no VMs, upgrades, or other workflows (such as cluster expansion) are running.
NCA creates and adds at least one S3 disk per node or CVM in the cluster. The S3 disks form a cloud storage tier used by the same internal process that tiers data between SSD and HDD for on-prem clusters.
NCA puts all hosts into maintenance mode, which prevents UVM starts and stops all I/O operations. The Curator and Stargate services migrate extent store data to the cloud, maintaining replication factor 1 (a single copy), then migrate Cassandra metadata.
NCA protects the cluster configuration and state information maintained on CVM boot disks during hibernation by snapshotting the respective EBS volumes.
NCA releases the EC2 bare-metal nodes back to the AWS pool.
If your cluster is hibernating, you can bring the clusters back online with the resume operation. Resume deploys the nodes you need into the original AWS region and migrates your data back to them.
Resume your Nutanix Cluster on AWS
Figure 3: Architecture diagram showing the Resume process for a Nutanix Cluster on AWS
The following steps give an overview of the resume process:
- When you start the resume process, NCA deploys new EC2 bare-metal instances and attaches cloud disks to them to form the Nutanix AWS cluster.
- NCA restores the cluster boot disk’s Zeus configuration from the EBS volume snapshot.
- NCA restores cluster configuration from the EBS volume snapshot.
- Genesis starts the necessary cluster services, and the Cassandra dynamic ring changer restores metadata. You can read more about Zeus and Genesis in the Nutanix Bible.
- NCA creates and adds at least one S3 disk per node or CVM in the cluster. The S3 disks form a cloud storage tier.
- Curator restarts and registers the kSelectiveClusterHibernate scan for the restore operation. Curator schedules the selective hibernate scan to migrate extent store data from S3 to NVMe.
- Genesis transitions to kNormal mode once all the data is restored. NCA marks the hibernated disks as “to-remove” and removes them from the cluster configuration.
Moving data between the Nutanix AWS cluster and S3 occurs in four main stages:
- metadata migration
- data migration
- AOS processing
- Nutanix Clusters portal processing.
We tested throughput on a three-node EC2 i3.metal cluster with VMs consuming 18 virtual disks with a total usable capacity of 9.67 TB and 4.74 GB of metadata. The following table shows the timing results for all phases.
Table Hibernation Throughput
Three-node cluster i3.metal
Metadata migration phase
Data migration phase
Total AOS processing time
1 hours 38 minutes
Total Clusters portal processing time
1 hour 50 minutes
Compared with the previous version of this feature, the total processing time to hibernate a cluster is cut down by nearly 50% - going from 3hrs 41mins in the old architecture to only 1hr 50mins with the new architecture. This is a huge improvement and allows much more flexibility to our customers to send their nodes into hibernation when they don’t need the nodes. For example, say you only need the EC2 bare-metal nodes running for 6hrs per day to update your backup on S3. With just one-click you can hibernate your data or resume when needed. For the other 18hrs you can leave the node hibernated. After accounting for the time to hibernate and resume, you will still be able to leave your nodes in hibernation for about 14hrs per day, a cost savings of nearly 60% compared with running your nodes 24*7.
This delivers huge cost savings and allows them to more easily adopt a hybrid cloud architecture. You get to save all your data in S3 storage buckets and not have to pay for EC2 bare-metal instances running all the time. You can also read about how one of our customers, Penn National Insurance, has been using the hibernation feature.
As the Nutanix Clusters product continues to evolve we see the numbers improve more as well. Hibernate/Resume feature is currently in tech-preview and we encourage you to check it out with our current free trial available on the Nutanix Clusters website or by logging in to your My.Nutanix account and selecting “Nutanix Clusters'' from your dashboard and start running Nutanix on your AWS bare-metal nodes.
Not ready to start using Nutanix Clusters on AWS? You can take a test drive instead to learn more about the solution with a self-guided tour of a pre-canned environment.
The future looks bright in the Cloud!
© 2021 Nutanix, Inc. All rights reserved. Nutanix, the Nutanix logo and all Nutanix product, feature and service names mentioned herein are registered trademarks or trademarks of Nutanix, Inc. in the United States and other countries. Other brand names mentioned herein are for identification purposes only and may be the trademarks of their respective holder(s). This post may contain links to external websites that are not part of Nutanix.com. Nutanix does not control these sites and disclaims all responsibility for the content or accuracy of any external site. Our decision to link to an external site should not be considered an endorsement of any content on such a site. Certain information contained in this post may relate to or be based on studies, publications, surveys and other data obtained from third-party sources and our own internal estimates and research. While we believe these third-party studies, publications, surveys and other data are reliable as of the date of this post, they have not independently verified, and we make no representation as to the adequacy, fairness, accuracy, or completeness of any information obtained from third-party sources.
This post may contain express and implied forward-looking statements, which are not historical facts and are instead based on our current expectations, estimates and beliefs. The accuracy of such statements involves risks and uncertainties and depends upon future events, including those that may be beyond our control, and actual results may differ materially and adversely from those anticipated or implied by such statements. Any forward-looking statements included herein speak only as of the date hereof and, except as required by law, we assume no obligation to update or otherwise revise any of such forward-looking statements to reflect subsequent events or circumstances.
Product features or enhancements described in this post as early access may be subject to change at any time, without notice, and we provide no assurances, and assume no responsibility, that early access product features or enhancements will be introduced in the timeframe or manner presented or at all. No purchasing decisions should be made based upon reliance of early access features. We reserve the right at any time not to release a generally available version of the early access features or, if released, to alter prices, features, licensing terms, or other characteristics of the generally available release.