The Nutanix Distributed Filesystem (NDFS) is at the core of the Nutanix Platform. It manages all metadata and data, and enables core features. NDFS is the underpinning architectural element that connects the storage, compute resources, controller VM, and the hypervisor. It also provides full Information Lifecycle Management (ILM), including localizing data to the optimal node.
Metadata is distributed among all nodes in the cluster in order to eliminate any single point of failure and to allow scalability that increases linearly with cluster growth. The metadata is partitioned using a consistent hashing scheme to minimize the redistribution of keys during cluster-sizing modifications.
The system enforces strong consistency through a distributed consensus algorithm. Quorum-based leadership election eliminates potential “split brain” scenarios, which ensures strict consistency of configuration data.
NDFS was designed from the ground up to be extremely fault-resilient. It ensures data availability in the event of a node, controller, or disk failure. NDFS uses a replication factor (RF) that keeps redundant copies of the data. Writes to the platform are logged in the PCIe SSD tier, which can be configured to replicate to another controller before the write is committed. If a failure occurs, NDFS automatically rebuilds data copies to maintain the highest level of availability.
The platform is self-healing. Leveraging distributed MapReduce jobs, it proactively scrubs data to resolve disk or data errors. If a controller VM fails, all I/O requests are automatically forwarded to another controller VM until the local controller becomes available again. This Nutanix auto-pathing technology is completely transparent to the hypervisor, and guest VMs continue to run normally. In the case of a node failure, an HA event is automatically triggered and VMs fail over to other hosts within the cluster. Nutanix ILM localizes I/O operations by migrating data to the virtual machine’s local controller VM. Simultaneously, data is re-replicated to maintain RF, and overall availability.
NDFS provides built-in converged backup and disaster recovery (DR). The converged-backup capabilities leverage array-side snapshots and clones, which are performed using sub block-level change-tracking at the VM and file level. The snapshots and clones are instantaneous, and thin provisioning maintains very low overhead. These capabilities also support hypervisor array offload capabilities, such as VMware API for Array Integration (VAAI).
Snapshots can be configured on a standard schedule to align with RPO and RTOs, and can be replicated to remote sites using array-side replication. This replication is configurable at the VM level, and only the sub-block-level changes are shipped to the remote replication site.
A core design principle of the Nutanix platform is data localization. It keeps data proximate to the VM and allows write I/O operations to be localized on that same node. If a VM migrates to another host in an event such as DRS or vMotion, the data automatically follows the VM so it maintains the highest performance. After a certain number of read requests made by a VM to a controller that resides on another node, Nutanix ILM transparently moves the remote data to the local controller. The read I/O is served locally, instead of traversing the network.
Nutanix incorporates heat-optimized tiering (HOT), which leverages multiple tiers of storage and optimally places data on the tier that provides the best performance. The architecture was built to support local disks attached to the controller VM (PCIe SSD, SSD, HDD) as well as remote (NAS) and cloud-based source targets. The tiering logic is fully extensible, allowing new tiers to be dynamically added and extended. The Nutanix system continuously monitors data-access patterns to determine whether access is random, sequential, or a mixed workload. Random I/O workloads are maintained in an SSD tier to minimize seek times. Sequential workloads are automatically placed into HDD to improve endurance.
The most frequently accessed data (hot data) resides on the highest performance tier (PCIe SSD). That tier is not just a cache – it is a truly persistent tier for both read and write operations. The next hottest data is placed on the SSD tier, which serves as spillover for the highest-performance tier (PCIe SSD), as well as QoS-controlled data. Cold data sits on hard disk drives, the highest-capacity, most economical tier.
NDFS array-side compression capabilities work in combination with Nutanix ILM. For sequential workloads, data is compressed during the write operation using in-line compression. For batch workloads, post-process compression adds significant value as data is compressed once it becomes idle and ILM has moved it down to the highest capacity tier (HDD). All compression configurations are carried out at a container level, and operate at a granular VM and file level. Decompression is done at the sub-block level to ensure precise granularity. The operations are monitored by the ILM process, which proactively moves frequently accessed, decompressed data up to a higher performance data tier.