True Hybrid Cloud Offering: Nutanix Clusters on AWS

| min

About five hundred million years ago, creatures of the ocean started venturing onto land. Evolution, through many iterations, enabled some select creatures with the unique ability to live both on land and in the water. This ability to feel native in both places was a radical breakthrough and was the key to the success of these hybrid animals. Most fish, however, still remain confined to the oceans today. No one would take a fish in an aquarium and place the aquarium in the middle of the marshes and call it adaptation or migration. It’s an uncomfortable lift-and-shift! Life (and more importantly death) has taught nature to shed fragile contraptions and favor designs that have native resiliency and efficiency built into them. It was through a series of explorations and mutations that only about 200 million years ago, the true crocodiles arrived. They have since ruled the planet and proved to be the hardiest of the large animals.

"The eye of a caiman" by Another Seb is licensed under CC BY-NC-ND 2.0

In the tech industry, evolution happens, but at a much more rapid clip. About a decade ago, enterprise IT infrastructure was presented a challenge to move to the cloud, driven by the desire to reduce CapEx, increase agility and expand businesses rapidly at a global scale. A serendipitous mutation called Hyper Converged Infrastructure (HCI) happened around the same time, which brought in cloud-like features to enterprise infrastructure. The three-tier architecture, just like the fish, has not been able to survive in the cloud. Early attempts by search companies to use three-tier architectures with SAN and NAS appliances did not succeed. HCI ushered in a new cloud-like architecture in the enterprise and now, the on-premises data centers are rapidly deploying HCI. While this first transition is maturing, some of the HCI architectures are “coming out of the ocean” and forming Hybrid Clouds — some more boldly than others.

Just like in nature, for any engineering architecture to succeed, it needs to be simple and efficient.

AWS is a leader in cloud computing and makes it simple to bring Nutanix HCI software stack to the cloud by providing API-driven access to its bare metal servers. While there are many different ways of bringing an HCI stack on top of AWS bare metal, at Nutanix, we have chosen the most native approach to the problem with the conviction that such AWS hybrid cloud architecture is the best solutions for our joint customers.

Nutanix brings a first-of-a-kind hybrid cloud solutions that delivers true hybridity and true elasticity. Let us take a deeper look:

True Hybridity

  1. Many customers have existing AWS accounts. True hybridity calls for using the existing AWS accounts, VPCs, VPNs, Direct Connects while bringing the private and public clouds together. With Nutanix Clusters on AWS (NCA), current customers of AWS can leverage their existing environments and launch Nutanix Enterprise Cloud OS within their current environments, without the need to create a new AWS account, VPCs or WAN networking.
  2. True hybridity also allows for bringing together the cloud native services to the classic apps and containers running on Nutanix Enterprise Cloud OS without the need to go via inefficient network gateways or VPC peering. With NCA, not only can the classic apps be on the same subnets as the cloud-native services and apps, but they can also get native network performance with minimal overheads. This also simplifies migrating apps from the NCA to AWS EC2 native and vice versa without the need of IP address changes or any network reconfiguration.
  3. One of the key aspects of hybrid is to be able to manage both the private and public sides of infrastructure through the same console, without adding management overheads on the public cloud side. NCA simply brings up AOS nodes in AWS bare metal while managing them from existing Prism Central and imposing no networking management VMs or networking gateway VMs.

True Elasticity

  1. Cloud infrastructure should allow quick burstability. AWS provides an elastic bare metal service in EC2. NCA allows customers to spin up clusters on demand and in minutes. The cloud infrastructure is available from AWS at an hourly granularity. As the capacity requirement of a cluster increases or decreases nodes can be added or removed on demand.
  2. Cloud infrastructure should allow for the sporadic nature of business without the need to recreate or migrate the assets each time. NCA allow for hibernating a running cluster along with its VMs for any period of time into AWS S3 - another first in the industry feature. During hibernation, no compute costs are incurred. Whenever the workloads are required to run again, the NCA can be resumed and all the workloads are brought to life. This allows for an elastic infrastructure for seasonal but stateful workloads.

NCA Design Choices

Account Management

We had a choice between creating a new AWS account for the customer or using an existing one to manage NCA. Using a new AWS account would have given a clean working space and made it somewhat easier to build the product. However, from a customer’s perspective that proved to be less optimal because the customers cannot use their existing accounts and credits with AWS. Hence, we decided to not create the burden of new AWS accounts for the customer. Their current accounts can be used. The customer will be directly billed by AWS for the infrastructure spend and only pay Nutanix for the software cost of using Nutanix for the duration the NCA are used.

Networking Design

We had a choice between deploying the Nutanix VMs on an overlay network (using VXLAN) on top of the AWS subnets, or deploying the Nutanix VMs directly on the AWS subnets. Deploying an overlay network provides for easier integration with the underlying cloud networking because nothing needs to change in the way the hypervisor does IP address management.

However, choosing an overlay presents many challenges:

  1. Running an overlay requires management VMs (at least a controller and couple of Network Edge gateways). That overhead presents a challenge to our simple and efficient mantra.
  2. Encapsulating traffic does present CPU overhead that is non-trivial and achieving bandwidths higher than 10GBits/sec becomes hard.
  3. When IP addresses on the overlay talk to IP addresses in native AWS EC2, they go through the Network Edge gateways. That creates a performance bottleneck and if not scaled out (causing additional overhead), may lead to a downtime during the upgrades.

Hence, we decided to explore a more native integration with AWS EC2 networking. This new native networking model has the following features:

  1. There is no overlay needed, hence no VMs that act as network controller or network edge gateways. There are 0 management VMs needed saving expensive resources in the cloud and also reducing complexity of management.
  2. The VMs running on Nutanix AHV are assigned IP addresses that are provided by native AWS networking and recognized by AWS switching fabric.
  3. When VMs talk to each other within the NCA or to native EC2 VMs, they do not have to go through any gateways but rather are directly switched by AWS. This allows user VMs to talk natively to cloud services without going through any translation of packets from overlay to underlay. This results in high performance and low latency networking.
  4. To achieve the above, AHV has been modified to add deep integration for AWS networking.

NCA Architecture

NCA are designed to look virtually the same as on-premises Nutanix clusters. These clusters run the complete Nutanix AOS and AHV stack with no change in CLI, UI or APIs. This allows existing IT processes or 3rd party integrations that work on-premises to continue to work with NCA.

With NCA, the complete Nutanix HCI stack runs directly on the AWS EC2 bare metal instances. The bare metal runs the AHV hypervisor and just like any on-premises deployment, runs a Controller Virtual Machine (CVM) with direct access to NVMe instance storage hardware. The Nutanix AOS software provides high-performance, low latency and highly available storage using these local NVMe disks. NCA can be managed by an existing on-premises Prism Central or from a Prism Central deployed on NCA.

Hybrid Cloud Deployment

Customers can deploy NCA from their account. They can perform day-to-day cluster management via Prism Central and use account for creation, hibernation, deletion, and billing of their clusters in AWS.

Hybrid Cloud Storage Architecture

NCA look similar to Nutanix clusters on-premises. A cluster can have 3 or more EC2 i3.metal bare metal instances. AHV runs directly on the bare metal and exposes the local NVMe storage to CVMs. The CVMs on each instance cluster together and provide single storage fabric across all nodes with all the enterprise storage capabilities that enterprise apps need. The storage fabric in NCA can be connected to the on-premises using Nutanix AOS DR, backup, replication capabilities, allowing seamless mobility of stateful applications from on-premises to AWS and back.

Not only can NCA natively interact with the on-premises clusters at the storage layer, they can also extend Nutanix hybrid cloud storage capabilities like Volumes, Files, and Buckets to native workloads running on AWS EC2. This allows for workloads that require DR or backup to on-premises but need to leverage compute from AWS EC2 for bursting.

Networking Architecture

AHV runs an efficient embedded distributed network controller that integrates user VM networking with AWS networking. The network controller does not create an overlay network. The way it works is that all user VM IPs are assigned to the bare metal host where the VMs happen to run. The AHV embedded network controller simply forwards the packets from the host to the right VM on the host or wherever it might have migrated to. IP address management is integrated with AWS VPC, hence all user VM IPs are allocated by AWS from the AWS subnets in the existing VPCs.

There is no additional configuration required for AHV user VMs to access AWS services and other EC2 instances; and also for EC2 instances to connect directly to services on AHV user VMs, using their assigned IP addresses. The network supports near native networking performance between AHV user VMs and EC2 instances.

As a result of the above cloud architecture, there is no need for network controller VMs, network edge gateways or any other management VMs to be run on the NCA. Microsegmentation is implemented via the embedded networking controller in AHV and the policies are managed from Prism Central which could be running on-premises or in the cloud on the NCA.

What remains the same compared to on-premises?

Almost all features, APIs and user experience remains the same. The clusters in AWS act just like a true extension of the on-prem DC. There is seamless mobility between on-premises and cloud and vice-versa without any changes to the applications.

What is different between Nutanix On-premises and NCA?


Six Use Cases of NCA


Successful enterprises routinely are looking to quickly expand the presence of their business apps into regions where they do not yet have a physical datacenter footprint. This needs to be done without any change to the apps. With NCA, enterprises can choose from the numerous AWS Global regions to rapidly expand their presence world-wide.


Many enterprises have to deal with fragmentation of data centers assets and would like to consolidate or shut down some underperforming datacenter assets. Having the option to consolidate into a nearby AWS availability zone with an elastic capacity while not having to worry about the compatibility of apps with the new cloud environment is ideal for such transitions. NCA offers the ability to run enterprise apps as-is on the AWS infrastructure while managing the applications from a single Prism console which can be either on-prem or in the cloud.


Retail companies see steep seasonal peaks in infrastructure demand. Since adding new nodes on-premises is a long process,  many enterprises are forced to have an infrastructure that is mostly underutilized during most of the year. Using NCA, now enterprises can keep the utilization high during non-peak seasons by reducing the overall infrastructure footprint and bursting into the cloud in the peak seasons. Since the NCA and on premises clusters are on the same network and storage fabric migration of applications back and forth is convenient.


Some classic apps require access to shared disks with SCSI-PR support. NCA enables these apps to now run inside AWS. This flexibility allows for new class of applications to migrate to the public cloud.


High end databases, big data apps or other I/O hungry applications may not be able to get enough storage IOPs in the cloud native environment without changing their architecture or spending a lot of provisioned IOPs. NCA brings the high storage IOPs capability of AOS on top of AWS bare metal servers, providing for a new class of I/O hungry applications to move to the cloud.


When migrating applications to the cloud, the common challenge faced by customers is that not all components of the app can be made cloud-ready at the same time. So, the common challenge is the long distance splitting of the on-prem app components and in-cloud app components leading to latency issues. This frustrates many cloud migration efforts. With NCA now, app components can be moved as-is and without any changes to the cloud to be in close proximity to their cloud-native components. The unique integration NCA provides with native AWS networking enables high bandwidth, low latency streaming between the app components on both the Nutanix Infrastructure on AWS and on native AWS VMs.


NCA is expected to be available in Technical Preview this summer.  If you are interested in participating in the Technical Preview of NCA, please visit here.

Forward Looking Disclaimers

This blog post includes forward-looking statements concerning our plans and expectations relating to new product features and technology that are under development, including NCA, the capabilities of such product features and technology and our plans to release product features and technology in future releases. These forward-looking statements are not historical facts, and instead are based on our current expectations, estimates, opinions and beliefs. The accuracy of such forward-looking statements depends upon future events, and involves risks, uncertainties and other factors beyond our control that may cause these statements to be inaccurate and cause our actual results, performance or achievements to differ materially and adversely from those anticipated or implied by such statements, including, among others: the introduction, or acceleration of adoption of, competing solutions, including public cloud infrastructure; a shift in industry or competitive dynamics or customer demand; and other risks detailed in our quarterly report on Form 10-Q for the fiscal quarter ended January 31, 2019, filed with the Securities and Exchange Commission. These forward-looking statements speak only as of the date of this press release and, except as required by law, we assume no obligation to update forward-looking statements to reflect actual results or subsequent events or circumstances.

© 2019 Nutanix, Inc.  All rights reserved. Nutanix, the Nutanix logo and the other Nutanix products and features mentioned herein are registered trademarks or trademarks of Nutanix, Inc. in the United States and other countries. All other brand names mentioned herein are for identification purposes only and may be the trademarks of their respective holder(s), and Nutanix may not be associated with, or sponsored or endorsed by such holder(s).  This document is provided for informational purposes only and is presented ‘as is’ with no warranties of any kind, whether implied, statutory or otherwise