Keeping data close to the compute has been one of the salient features of the Nutanix architecture and the Distributed Storage Fabric, or DSF. It provides several benefits including inherent protection against workload interference, consistent performance, and efficient utilization of high speed storage media such as SSD. As Nutanix has turned its focus towards the hypervisor, we have found opportunities to take advantage of DSF’s ability to quickly determine where any block of data resides at any given point in time. We chose to build a scheduler for AHV that has access to host telemetry and can communicate with the DSF to make placement decisions. The end result is that VMs and workloads running on AHV do not need to rely on their own resource management tools and can focus on running their workloads.
Today at the .NEXT On-Tour event in Sydney, Australia, we are happy to announce plans for an improved scheduler, the Acropolis Dynamic Scheduler. Once released, the Acropolis Dynamic Scheduler’s design will extend the traditional means of scheduling that relies upon compute utilization (CPU/MEM) to make placement decisions. The Acropolis Dynamic Scheduler leverages real-time compute and storage statistics to drive VM and volume (ABS) placement decisions.
There are two types of placement decisions:
- Initial placement – VM or workload’s initial location on a host at power-on
- Runtime Optimization – Movement of workloads based upon runtime metrics
Current versions of the Acropolis Scheduler make initial VM and workload placement decisions. The Acropolis Dynamic Scheduler, planned to debut in the Asterix release, will provide runtime resource optimization.
The figure below depicts a high-level view of the scheduler architecture:
A Closer Look at Placement Decisions
Several key factors can be used to non-disruptively migrate a VM from one host to another. The following are some of the factors ADS may consider:
- Compute (CPU/MEM)
- CPU utilization
- Memory utilization
- Resource contention
- Thresholds and/or watermarks for compute metrics
- Storage performance
- Stargate process utilization
- vDisk ownership
- Location of ABS volumes
- [Anti-]Affinity rules
- User defined policies for VM location
- Grouping VMs together
- Grouping VMs separately
There are many factors to consider when moving a workload and we are only scratching the surface of what’s possible. It is exciting work for our Engineering team because it is a step in the journey towards true commoditization of data center infrastructure. The goal is for IT administrators to treat all nodes equally and they should expect nodes to behave the same way during upgrades, failures, and normal operations. Unbalanced workloads mean more work for the admin and more work for the application teams who need to design ways to compensate for unbalanced infrastructure resources. With the Nutanix Enterprise Cloud powering your data center, you will have a system that works for you and your workloads. Please join us on the Next Community site and tell us what you think.
Special thanks to Steven Poitras, Nutanix Principal Architect, who created the diagram above and provided the technical details for our resource placement designs. If you are interested in taking a deep dive into the world of Nutanix Acropolis and Prism, please check out Steven’s Nutanix Bible.