nutanix

Disaster Recovery – Failover and Failback with Nutanix

Nutanix has built-in VM-Centric, multi-site, bi-directional, multi-topology disaster recovery and replication engine that also supports VMware Site Recovery Manager and Run Book Automation tools. The replication utilizes incremental and fine-grained byte-level data transfers with intelligent data compression, eliminating network and storage resource throttle.
Administrators may completely failover entire deployments to a secondary datacenter and later on failback with all data created in the secondary datacenter already replicated back to the primary datacenter. As of today the replication is asynchronous for up to an hour RPO/RTO as the most granular recovery point, but this is likely to change in the near future in favor of a more granular approach.

Protection Domain – A protection domain is a group of VMs to be backed up locally on a cluster or replicated on the same schedule to one or more local or remote clusters. A protection domain on a cluster is in one of two modes:
Active: Manages live VMs; makes, replicates, and expires snapshots.
Inactive: Receives snapshots from a remote cluster.
Consistency Group – A consistency group is a subset of VMs in a protection domain. All VMs within a consistency group for that protection domain will be snapshotted in a crash-consistent manner. For all VMs in a consistency group, a snapshot creates one snapshot for all VMs in the group.

I recorded the video below in my lab that demonstrates the failover and failback of a Citrix XenDesktop machine with changes being applied to the VM while the VM is in the secondary site. It might just be me, but I have never experienced or seen a multi-site DR solution as simple as what Nutanix has built into their product. Moving forward I will also produce more technical articles that depicts how Nutanix disaster recovery works under the covers.

In my article Nutanix Automation, Policies and the SDDC I discussed how Nutanix exposes all features and functions via REST API, enabling programmatic access to datacenter services within mainstream enterprises, and in my last article I demonstrated How to Establish PowerShell Connection to Nutanix and Execute your 1st Query.

The script below demonstrate how to trigger disaster recovery for a protection domain using the REST API via PowerShell. The source protection domain is in the URI and the remote site (destination) is in the Body of the REST request.

$Uri = “https://10.20.18.10:9440/PrismGateway/services/rest/v1/protection_domains/bizdev01/migrate”

$username = “username”

$password = “password”

$Header = @{“Authorization” = “Basic “+[System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes($username+”:”+$password))}

$Body = “POC07”

Invoke-RestMethod -Method Post -Uri $Uri -Headers $Header -Body $Body

Note: Stretched VLAN is required for seamless DR and fallback capabilities. Optionally, VMware Site Recovery Manager (SRM) or other Run Book Automation tool can be used to manage IP addresses and start sequence in each DR datacenter.

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net.