Single File Restore – Fairy Tale Ending Going Down History Lane


By Dwayne Lessner
| min

If I go back to my earliest sysadmin days where I had to restore a file from a network share, I was happy just to get the file back. Where I worked we only had tape and it was crapshoot at the best of times. Luckily, 2007 brought me a SAN to play with.

The SAN made it easier for sure to go back into time and find that file and pull it back from the clutches of death by using hardware based snapshots. It was no big deal to mount the snapshot to the guest but fighting with the MS iSCSI initiator got pretty painful, partly because I had a complex password for the CHAP authentication, and partly because clean-up and logging out of the iSCSI was problematic. I always had ton of errors, both in the windows guest and in the SAN console which caused more grief than good it seemed.

Shortly after the SAN showed up, VMware entered my world. It was great that I didn’t have to mess with MS iSCSI initiators any more but it really just moved my problem to the ESXi host. Now that VMware had the LUN with all my VMs, I had to worry about resignatureing the LUN so it wouldn’t have conflicts with the rest of production VMs. This whole process was short lived because we couldn’t afford all the space the snapshots were taking up. Since we had to use LUNS we had to take snapshots of all the VMs even though there were a handful that really need the extra protection. Before virtualization we were already reserving over 50% of the total LUN space because snapshots were backed by large block sizes and ate through space. Due to the fact that we had to snapshot all of the VMs on the LUN we had to change the snap reserve to 100%. We quickly ran out of space and turned off snapshots for our virtual environment.

When a snapshot is taken on Nutanix, we don’t copy data, nor do we copy the meta-data. The meta-data and data diverge on a need basis; as new writes happen against the active parent snapshot we just track the changes. Changes operate at the byte level which is a far cry from the 16 MB I had to live with in the past.

Due to the above-mentioned life lessons in LUN-based snapshots, I am very happy to show Nutanix customers the benefits of per-VM snapshots and how easy it to restore a file.

To restore a file from a VM living on Nutanix you just need to make sure you have a protection domain set up with a proper RPO schedule. For this example, I created a Protection Domain called RPO-High. This is great as you could have 2,000 VMs all on one volume with Nutanix. You just slide over what VMs you want to protect; in this example, I am protecting my FileServer. Note you can have more than one protection domain if you want to assign different RPO to different VMs. Create a new protection domain and add 1 VM or more based on the application grouping.

Once the protection domain has been set up and the schedule selected your VMs will have a quick and easy way to roll back or grab deleted files. Nutanix performs a background copy block map operation after a snapshot is created, which effectively nullifies the effect of having a snapshot chain. This make the inherited metadata available in the child’s metadata space itself, instead of having to traverse the snapshot chain upward to find it. So there’s no overhead of doing IO on a child snapshot, whether it is at level 5 or level 10,000. Copy block map is done in the next maintenance scan, and its rate is controlled by curator/chronos – so it happens gradually.

So if I want to recover a deleted file, you can restore a copy of the VM to vCenter and then pull the missing file from the vmdk. Below is the process to use the REST-API explorer from the Management UI.

1) Get a listing of all the snapshots for the protection domain RPO-High. Once executed you can scroll to the bottom as the snapshots are listed in chronological order. Copy the snapshotID (129181) and you’ll be able to restore to that point in time.

2) To create a copy of FileServer in vCenter we use the rollback_vms REST-API. If you click on Model Schema it will load the code into the body field. Type in the name of the protection domain, add a prefix to put the VM file in a different location than the original, list the name of the VM, and paste in the snapshotID from step 1 and execute the command.

3) Add the recovered vmdk to the original VM (FileServer). You can see the effect of adding a pathPrefix during the restore process.

Add the vmdk from the restored vm to the original vm and then you can locally copy the file across.

4) Once the vmdk has been added to the original, bring the restored drive online and copy the file to the original location.

5) Once the file has been restored you can remove the added drive from the original VM and delete the restored virtual machine.

Easy to restore, space efficient, and easy to architect when you operate the VM level.