In our Anaheim .Next Conference keynote a few weeks ago, Dheeraj highlighted the challenge of managing upgrades with the growth of our portfolio yet maintaining our hard-earned reputation of keeping operations simple.
"We were doing upgrades on two things - AOS and Prism - five or six years ago. This is the paradox of growth -- that growth creates complexity. Complexity kills growth."
"Look at upgrades in general, the growth of upgrades has gone from like being half-an-hour operations to at some of our customers' in a large cluster, four or five hours.
But this is the burden of responsibility. It's really taking care of all the servers, all the firmware, all the hypervisors - all doing it without downtime. "
"This conference is about really getting all that feedback from all of you to say, "What does it mean to be awesome on the most complex of things?". Because our canvas is no more the simple canvas that we had five years ago.
Our canvas is much more complicated and the bar for Nutanix has to be about bringing the same kind of delight from our existing customers and our new customers, who actually don't see this complexity in front of them, but to go and make this 10x faster -- like the way it was five years ago -- is what this conference should be about. "
Back to the drawing board for 1-click
In early 2016, it was clear that we needed a new way to keep our promise of simple 1-click upgrades while our portfolio was expanding. It was becoming apparent that the trusted old design - one where the upgrade code and logic were inbuilt into AOS - was not going to cut it. We could not sit by and say we have ‘1 click’ when each and every product or component is a separate or isolated ‘1 click’ because we could only execute on one component at a time. Couple that with the fact that as the portfolio expands, so does the dependency matrix between various components.
We needed velocity, we needed agility; we needed to bundle upgrades, things that are difficult to provide if upgrade logic is tied into a particular version of AOS, or even the hypervisor kernel. Some customers do not upgrade their data or management plane often, and yet urgent fixes for software and firmware (think Spectre/Meltdown) and/or security patches were coming thick and fast. It is a different world security-wise these days, being slow is no longer suitable.
There had to be a better way.
We took a leaf from the consumer design handbook, started to treat upgrades like an application, just like smartphones do. Imagine a world where you had to upgrade Apple iOS every time a developer released a new version of Google Chrome or Microsoft Outlook - it would not be tolerated. Yet, this is what enterprise IT had become. You want the latest spectre fixes ? … upgrade your hypervisor kernel or data plane so we can add the capability to apply those fixes. No thank you. That is not our way at Nutanix.
LCM - the ‘new’ 1-click for Nutanix
Nutanix Life Cycle Manager (LCM) was created to be that ‘app’. Initially, we decided to focus on LCM supporting firmware upgrades - a gap in our offering at the time - on the NX platform in early 2017. The OEM offerings from Dell XC and Lenovo HX have followed since. The HPE DX series will be added when that platform goes GA in calendar Q3 2019. At least four different hardware vendors’ offerings, running Nutanix software, will be able to ‘1-click’ firmware upgrades in 2019.
Looking back, we underestimated the difficulty of providing a true ‘1-click firmware’ button for multiple manufacturers - a “10x” problem in itself - a problem which no other company has tackled on behalf of a “customer choice” ideology. The ‘1-click’ experience should be the same as much as possible regardless of hardware make.
We could have taken the easy road by just making LCM do our own software upgrades first - something we already knew - but the lessons learned in the fires of firmware upgrades will make the delivery of our software upgrades so much better than we could have imagined.
Two years after the first version of LCM was released we’ve covered the majority of our customers’ deployed hardware platforms for firmware upgrades, and development is continuing to cover all OEM partnerships.
The LCM journey has had bumps in the road; such as hardware manufacturer firmware quality for boot devices and annoying multiple reboots for BIOS with Spectre/Meltdown fixes causing the reputation of LCM to come into question. Some lament the time taken to apply such upgrades to their large clusters. Naturally, we took on the burden of responsibility for delivering the mechanism by which we applied these manufacturer’s firmware, and took the blame regardless of the cause. When your iPhone hardware breaks, you take it back to Apple, regardless of which manufacturer Apple uses for the battery or the memory. Fair enough too!
Without these bumps, we would not have made the better LCM Framework design changes to help overcome such external factors outside of our control, which has resulted in an improved design for the next stage of the LCM journey...software upgrades.
LCM is now ready for Software Upgrades
In 2019 LCM’s focus has shifted to the software upgrades that need to be transitioned from the old one-at-a-time 1-click which was built into AOS/Prism; specifically to tackle the complexity that Dheeraj mentions on our own product line. We want to mirror the cloud experience, where keeping clusters up to date is invisible - no reading release notes; no downtime; no problems. The only way for that to be delivered is if the upgrade logic is decoupled from the entities that need the upgrade, decoupled from the data and management plane and decoupled from the hypervisor kernel.
Being an ‘app’, LCM can upgrade itself whenever necessary, regardless of the data plane (AOS), management plane (Prism), or Hypervisor version running on the customer cluster. This decoupling of the upgrade logic from those entities brings a lot of flexibility.
While we already support software such as Calm and Karbon via LCM in Prism Central, we will soon add Buckets as well. In addition, on the Prism Element side we will start the transition of NCC and AHV to LCM for upgrades in the coming weeks.
Introducing LCM 2.2
The latest version of LCM is the first to have its UI completely decoupled from Prism, which means we can update the UI, including messages and UX layout at any time and refresh it every time we upgrade the LCM ‘app’ without disruption. You still access LCM via Prism, but LCM is now in control of its own appearance and function regardless of Prism/AOS version on the cluster.
As such, in LCM 2.2 you will see a completely redesigned UI offering different views, an export function, better messaging on tasks, upgrade status and work plans.
One of the core tenets of LCM is to remove the need for an IT admin to worry about reading release notes, so dependency handling and how we display this is improving as well.
Automatic Dependency Handling
The magic of LCM is in how we handle any component upgrade that has a requirement on another entity. For example, perhaps a BIOS upgrade from a particular hardware manufacturer also requires the associated BMC controller to be upgraded. LCM will not only highlight this, but also decide the correct order in which to apply the upgrades.
A demonstration of this capability is in the following YouTube LCM 2.2 demo
In all cases, the creator of the component upgrade ‘module’ in LCM defines what requirements are needed for their upgrade process, and LCM orchestrates it on behalf of the admin. This includes ‘bundling’ of upgrades together or combining multiple reboots into one if the module creator allows it.
OEM manufacturers can define their own ‘recipes’ for LCM and decide the order of upgrades, whether to allow individual upgrades or not and other compliance rules.
At the end of the day, the aim is to provide consistency and reliability of upgrades in a true 1-click fashion.
Still more to do
Until we complete the entire portfolio’s transition to LCM, the job is not done. Even then we need LCM to grow to cover such items as multicluster upgrades, node movement across clusters, maintenance mode and other utility operations, cutting down the time to conduct the upgrades, better dark site options, rebootless upgrades, extending the auto-upgrade feature available now for the LCM Framework to cover other components and so on… the 10x challenge continues! Perhaps one day we will open up LCM module development to third parties to expand 1-click beyond Nutanix infrastructure.
In order to achieve such improvements at scale, any distributed system must take steps to make actions within it smaller. Smaller footprint of features and payloads for example deployed via containers - which is why Nutanix has started on the MSP/Kubernetes journey and we will continue to explore ways to use such technologies to deliver upgrades with more velocity. AOS itself must get smaller; the intelligent storage services we provide via the data plane should itself be just a ‘service’ on top of the CVM - remaining completely outside the hypervisor kernel - cutting upgrades and new storage feature deployments to seconds instead of minutes.
The days of 3-tier infrastructure upgrades taking months and costing tens of thousands of dollars must end. The days of bloated hypervisors and associated products where admins are afraid to conduct upgrades because of the dozens of dependency steps must end. Those things are not cloud-like. Nutanix and LCM is striving to end all this pain.
As always we value the feedback and suggestions, please keep them coming. Ultimately we want to make operations of the Nutanix suite, on-premises or off-premises, truly ‘1 click’. Please continue to keep us honest and help us get there...and thank you to our customers who’ve built LCM along with us.
Forward Looking Disclaimer
This blog post includes forward-looking statements concerning our plans and expectations relating to new product features and technology that are under development, the capabilities of such product features and technology and our plans to release product features and technology in future releases. These forward-looking statements are not historical facts, and instead are based on our current expectations, estimates, opinions and beliefs. The accuracy of such forward-looking statements depends upon future events, and involves risks, uncertainties and other factors beyond our control that may cause these statements to be inaccurate and cause our actual results, performance or achievements to differ materially and adversely from those anticipated or implied by such statements, including, among others: the introduction, or acceleration of adoption of, competing solutions, including public cloud infrastructure; a shift in industry or competitive dynamics or customer demand; and other risks detailed in our quarterly report on Form 10-Q filed with the Securities and Exchange Commission. These forward-looking statements speak only as of the date of this press release and, except as required by law, we assume no obligation to update forward-looking statements to reflect actual results or subsequent events or circumstances.
© 2019 Nutanix, Inc. All rights reserved. Nutanix, the Nutanix logo and the other Nutanix products and features mentioned herein are registered trademarks or trademarks of Nutanix, Inc. in the United States and other countries. All other brand names mentioned herein are for identification purposes only and may be the trademarks of their respective holder(s).