UI Alerts and Statistics

One of the common questions people ask us about the management UI for a massively scalable, converged cluster is how we manage a very large and complex product that could potentially have hundreds or thousands of entities? The Nutanix UI team has taken on this challenge with an easy to use and well-designed UI web console that lets users manage entities and view information about those entities in a very organized manner.

We’ll be discussing two UI elements in this blog: Alerts and Statistics.

 

Alerts

Our design philosophy is UI alert by exception. We don’t display a “normal” status label because in reality, we don’t expect the admin to look at the UI and check if every entity is healthy. We only show an alert if needed. The alert system is easy to see and catches your attention right away. We make sure that alerts are not too plentiful, cluttering the screen. We try to keep them organized. Here’s a glimpse of our alert system:

Alerts appear on top. When users click the alert icon, a drop down appears. We included action buttons for users so they can click to indicate that a notification has been addressed.

Our object-oriented UI also shows the alerts on the icon and in the context tray menu. The example above shows that the disk is bad.

We also show alerts on the datagrid right away with specific details.

 

Statistics

Statistical graphs are an important part of a system for troubleshooting and diagnostics problem solving. We made different charts for various metrics we thought would be important for troubleshooting:

 

1. Usage Chart

We’ve made a Google Finance-like charting system that allows you to slide through the history of the storage usage of hosts, disks, containers and storage pools. Users can select different time frames to look, zooming in and out of the graph for the level of granularity that is needed for either diagnosing the problem or just tracking the storage usage history.

2. ILM Chart (Information Lifecycle Management)

Our ILM charts split into IOPS and the different drives (SSD and HDD) in the system.

3. Provisioned Capacity and Usage Beaker Chart

This one shows the provisioned capacity of the system as well as actual usage.

Hope you enjoyed this sneak peek into the new alerts and stats functionality coming in our next UI release, and we look forward to sharing more ways we’re trying to empower the virtualization admin!

Posted in UI Design | Leave a comment

In his own words…customer Josh O’Brien shares his experience

Josh O’Brien on Nutanix: “So far I love the platform.  It gave me what I needed in the price point I needed and offers huge scale out options considering it is based of the GFS files system that Google uses across their DC’s… Spinning up my first production ESXi Cluster of (4) Servers, Sphere and provisioning the storage was a bit nerve-racking.  Its a HUGE statement for all the vendors that it was actually pretty painless.  Maybe even a bit to much so.”

Josh O’Brien from Language Access Network wrote about installing Nutanix last week. Read about it here: http://www.staticnat.com/WP/2012/02/07/fake-it-till-ya-make-it/#more-516

What we, at Nutanix, would like to point out from this post is that Josh referred to us as partners. Building a long-term relationship with our customers and offering world-class customer support remains one of our biggest goals.

Look forward to a case study in the coming weeks on LAN and using Nutanix for building a private cloud!

Posted in Uncategorized | Tagged | Leave a comment
Partha Ramachandran

VDI Series: Part 4 – Manageability

In my previous blog post, I talked about how the how the Nutanix architecture is designed to rapidly scale while maintaining high performance, enabling organizations to grow their VDI deployments. In this post, I will discuss manageability, an often under-represented facet that is key in making a VDI deployment successful. With large VDI deployments, it is crucial to enable organizations to focus on managing virtual desktops, rather than having to worry about allocation of compute and storage infrastructure resources for these virtual desktops.

Steve Jobs has set a very high bar for manageability. I am a die-hard Apple fan, and firmly believe that the device (or appliance in this case) should do whatever the user wants easily and without fuss. It should be visible and available when needed, and simply get out the way when not.

At Nutanix, we aspire to these same manageability goals. From the get-go, the Nutanix architecture has a distinct advantage in that our platform converges compute and storage. This means that the administrator doesn’t have to worry about multiple instances of storage arrays, and the monolothic pieces of management software that have to be installed on a single management station / desktop. Nutanix gives the admin a single pane of glass that provides visibility to one single system that grows over time to fit the needs of the organization.

At the same time, since the Nutanix architecture greatly simplifies the infrastructure for virtualizing by converging compute and storage together into a single tier, we’re able to streamline existing management processes.

Knowing our end user is a VMware admin these days, we’ve taken an approach that streamlines and simplifies their workflows.  For example, VMware Tools is what VMware admins are spending their time using, so seeking not to disrupt this workflow, we simply stay out of the way.  Rather than reinventing the wheel around these VM management workflows, we defer to VMware on the front-end and instead integrate into the VMware stack on the backend.

Examples include:

- VMFS Support: VMFS datastores are created by default on the Nutanix Complete Cluster. Admins can thus create and manage their virtual machines using standard VMware tools.

- NFS Support: We are about to release support for NFS datastores on Nutanix. With this, VMware will see the Nutanix backend via a standard NFS datastore. See screenshots for how this integrates into the VMware client

- VAAI : With NFS support, we will also release support for VAAI. This allows VMware admins to seamlessly leverage Nutanix snapshots from the VMware console.

- SRM Integration: We are working on Backup and Disaster Recovery integration into VMware SRM (Data Protection is a whole different topic which I will cover in a subsequent blog post)

So how does this make life easier for VDI admins?  VDI admins are accustomed to managing their deployments using a VDI management tool – the two market leaders being VMware View and Citrix XenDesktop.  Nutanix  integration into the VMware stack enables VDI admins to focus on virtual desktops versus worry about the underlying storage management as they would in traditional server and SAN infrastructure solutions.

Having said all of the above about being invisible when we have to be, the Nutanix Console is available when an admin needs it. The Console offers a rich set of functionality that the admin can access when needed:

- Analyzing System Bottlenecks
- Storage usage and forecasting
- Measuring the compute and storage footprint of various applications
- Alerts on various system events
- Managing failed drives
- System expansion and scale-out
- Call Home and Remote Support (More on this in a subsequent blog)

In summary, the key manageability benefits from the Nutanix Complete Cluster are that a) Seamless integration with the VMware front-end workflow and b) Simplification of the configuration, management and troubleshooting for the backend. The Nutanix Complete Cluster removes the complexity in managing a virtualized datacenter, and allows an organization to focus on enabling end users with what they need to be productive, whether the application is delivered via virtual desktop or streamed.

Posted in Uncategorized | Leave a comment

Nutanix Sales Kick Off and Mad Men Spoof

We had a great Sales Kickoff last week. In preparation for it, Marketing created a quick video for our sales team as a fun way to start off the training. Using Mad Men as our inspiration, we introduced Nutanix as the quick and easy way to set up your own private cloud. Check out our video and let us know what you think!

http://www.youtube.com/watch?v=p3Qq6QU1WOk

Posted in Private Cloud, Sales Kickoff | Leave a comment

Convergence with Scott Adams

Converged architectures coming out of force-fit consortiums are just that — expensive toxic blobs. You pay through your nose, and what you get is glorified professional services and an overstretched one-throat-to-choke argument for support. The mid-market has rightfully rejected the gimmicks. Scott Adams, the God-of-all-McKinsey-strategists, saw this ahead of us when some Virtualization-Challenged-Eclecticists came to sell $1M equipment to Asok!

Today, everyone and their mother want to call their products “converged.” Moving beyond the hype will require some bolder measures: collapsing complex networks, consolidating application and storage servers, simplifying datacenters, and converging skill sets. When such convergence is prevalent, Scott will be bored enough to write about other toxic blobs such as all-flash-arrays.

Posted in Technology Trends | Leave a comment
Partha Ramachandran

VDI Series: Part 3 – Incremental Scalability

In my previous blog post, I talked about how the Nutanix architecture, built ground-up for virtualization, is a perfect fit for the unique performance needs of VDI. In this post, I would like to cover another pain point of VDI – incremental scalability.  It’s commonly known that many VDI deployments fail when trying to scale beyond 200 desktops. In this post, I’d like to cover some of the reasons for this failure, and then go into how the Nutanix architecture is designed to rapidly scale while maintaining high performance, enabling organizations to grow their VDI deployments, as needed, one node at a time.

Let’s look at some of the reasons why VDI deployments fail to scale when using the standard server + SAN approach:

Many VDI deployments do not have a dedicated storage array. Typically, the SAN is a common resource to all virtualization initiatives in an organization. From the get-go, this approach can be a recipe for disaster. SAN resources are shared by VDI workloads as well as server workloads, which have different performance profile requirements. There is no effective way to guarantee any kind of QoS for the initial VDI deployment, let alone scale the VDI deployment effectively.

If a VDI deployment does indeed get a budget to have its own SAN, the SAN is sometimes under-provisioned because it’s based on the initial # of planned desktops. When the VDI deployment needs to scale, the current SAN performance & capacity is no longer sufficient, and another storage array needs to be wheeled in. Adding a storage array not only makes the CAPEX go through the roof, but also increases the management complexity of the VDI deployment.

In the most common case, SAN resources are over-provisioned. The idea here is that the SAN is a one-time purchase, and the organization can add servers to the fabric as the deployment scales. There are two common resulting problems with this approach:

- The organization is forced to take a huge upfront CAPEX hit, which then drives up the overall cost of the project and the time it takes to realize the ROI
- The interconnect becomes the bottleneck. Typically the array is on the other side of the core switch. The core switch is heavily oversubscribed, and therefore the interconnect between the servers and the storage becomes a huge bottleneck.

The real solution to delivering incremental scalability is to keep the compute right next to the storage, which is key to the Nutanix approach for achieving higher performance with greater simplicity. A single Nutanix block houses four nodes, and organizations can add single nodes to their Nutanix cluster to continue to grow their VDI by 50-100 desktops depending on the user profile (task/knowledge worker/power user).
Some of the key attributes of the Nutanix architecture that deliver this incremental scalability are described below:

Shared Nothing Distributed Architecture: The Nutanix Cluster is a pure distributed system. This means that the compute gets its storage locally, and does not need to traverse the network. All the problems around network, interconnect and core switch bottlenecks are avoided completely. IT organizations can add single Nutanix nodes or blocks of four nodes, as needed to achieve linear scaling in performance and capacity.

Distributed Metadata : A big problem affecting scalability in a traditional SAN is metadata access. Most storage arrays have 1-4 controller heads and all metadata access needs to go through these heads. This causes contention and performance drops as more servers try to access the same storage array.

In the Nutanix Cluster, metadata is maintained in a truly distributed and decentralized fashion. Each of the nodes in the Nutanix Cluster maintains a part of the global metadata, which means that there is no single bottleneck for metadata access.

Distributed Metadata Cache : Most storage arrays do a lousy job of maintaining metadata caches. In storage arrays that do have a metadata cache, the caches live on the limited number of controller heads. Access to the cache is limited by the bottlenecks around the network, interconnect and switch as discussed above.

In the Nutanix Cluster, metadata is cached on each of the controller VMs. Most metadata access is served up by cache lookups. Each controller VM maintains its own cache.  This means that however large the Nutanix Cluster grows as you continue to add nodes, the cost of metadata access stays the same.

Lock-free Concurrency Model :  The standard approach to ensuring correctness for metadata access is to use locking. Unfortunately, in a distributed system, locking can become tricky. Excessive locking ensures correctness, but causes performance to drop like a rock.

The Nutanix Cluster implements an innovative lock free concurrency model for metadata access. This model ensures the correctness of metadata but at the same time ensures high performance.

Distributed MapReduce for data/metadata consistency : For a large scale deployment, consistency checking for data and metadata becomes a challenge. The Nutanix Complete Cluster implements a fully distributed map-reduce algorithm to ensure data and metadata consistency. The distributed nature of the map-reduce ensures that there is no single bottleneck in the system. MapReduce has been shown by Google to scale to 1000s of nodes, and is a key ingredient in the incremental scalability of the Nutanix Cluster

Distributed Extent Cache : Caching is not a new concept to storage arrays. The challenge, however is that caches are located on the limited number of storage controllers in a storage array. Not only does the limited of storage controllers cause contention, but the fact that these caches live in the storage array means that access to cached data needs to traverse the core switch. This brings with it latencies around the network, interconnect and switch that was discussed above.

In my previous blog post, I discussed the extent cache, which caches the data served up by the controller VM. They key aspect of this cache is it lives on the controller VM. This means that the compute tier can access the cache locally, without having to hop across the network and the core switch. This approach allows the Nutanix Cluster to incrementally scale with ease, while maintaining high performance.

My next blog post will focus on manageability. With large VDI deployments, it is crucial to let organizations focus on managing virtual desktops, rather than have to worry about compute and storage. In a traditional server + SAN model, organizations need to have trained personnel to manage servers, storage arrays, switches, zones etc. The Nutanix Cluster removes the complexity in managing a virtualized datacenter, and allows an organization to focus on using VDI towards their core business.

Posted in Uncategorized, VDI | 3 Comments
Shirish Sathaye

A New Building Block

As part of Khosla Ventures, I’m privileged to be involved with disruptive technologies very early on, sometimes at a stage when it may be hard to fully see the impact that a technology may have on the market. When I met with the founders of Nutanix, I saw very quickly that this was not one of those situations.

The penetration of virtualization has driven server consolidation and enabled enterprises to run diverse applications on the same physical server, and the resulting increased utilization of servers has put new, immense pressure on storage performance. Many startups are tackling this space, but it’s clear that incremental improvements to today’s storage technologies are inadequate in enabling enterprises to realize the full potential of virtualization.

A complete rethinking of the storage layer is needed.

Nutanix is the only company that has cracked the code on this problem while the incumbents are asleep at the wheel. In the late 1990s, I left FORE Systems, which built ATM switches, for Alteon Websystems, as Ethernet switches and Server-Load-Balancers became fundamental for the construction of the Internet, with TCP/IP eclipsing ATM. Today, Nutanix’s approach to converged compute and storage delivers a fundamentally new building block for datacenters after decades of bigger and bigger SANs in the enterprise. The impact is radical and IT professionals will wonder in ten years how they were ever be able to build a virtual computing environment with SANs.

As the newest member of the Nutanix board, I am very excited to be part of the revolution that Nutanix is bringing to the virtualized datacenter.

Posted in Technology Trends, Uncategorized | Leave a comment
Partha Ramachandran

VDI Series: Part 2 – Addressing Performance

At VMworld back in August, Nutanix was honored with TechTarget’s Best of VMworld – Desktop Virtualization award.

With the show being our first public appearance after launching just two weeks prior, this honor surprised many people in the desktop virtualization and general virtualization community.   In this post (continuing our VDI series), I would like to go into some details on why we believe the Nutanix Complete Cluster is a great fit for desktop virtualization use cases.  If you’re in a hurry and want the Cliff Notes version, you can take a detour to the vmworld video interview @ BrianMadden.com. For those with a longer attention span, let’s get into the nitty,  gritty.

Three key areas that can make or break a successful VDI deployment are: Performance, Scale-out and Manageability. This post focuses on performance as it relates to VDI.  I will cover Scale-out and Manageability in subsequent posts.

Let’s start with a baseline.  This article from Citrix does a good job of describing the IO profile for virtual desktops. Even though the article is specific to Citrix XenDesktop, the analysis can also equally apply to VMware View or any other VDI management solution.  Here’s a summary of the key points from this article for our baseline:

Boot Storms - 300 IOPS, 90/10 R/W ratio
Steady State -10/90 R/W ratio. IOPS numbers vary as follows:

Light: Single application, no web browsing – 6 IOPS

Normal: Few applications, minimal web browsing – 10 IOPS

Power: multiple applications concurrently, and heavy browsing – 25 IOPS

Heavy: Compiling code, Videos – 50 IOPS

Now that we know what it takes for a VDI performance-wise, let’s examine how a typical SAN holds up under these circumstances. Let’s take a 1,000 desktop deployment as an example.

Random Steady State IO:  In steady state, desktop users open applications, send email, browse the web etc. Each of these translates to a small size IO request to the storage layer. Now, each desktop is independent of the other desktops. A SAN can’t tell one desktop from the other. Therefore all IO coming into a SAN at steady state is completely random.

What kills a SAN in steady state is that each random IO is a spindle head movement. With an average of 20 IOPS per desktop, the total random IO required of a SAN is 20,000 IOPS. This translates to 300 spindle disks without accounting for RAID. With RAID 5 or 6, the number of disks required is 600-800 just to support steady state random IO coming from these 1,000 virtual desktops.

Boot Storms: Booting a virtual desktop requires that the key OS bits be loaded from the SAN, separately for each desktop. There’s no simple way for a SAN to load the data more intelligently, since this intelligence has to be at an upper layer. Booting 1,000 users translates to 300,000 IOPS, which means for many deployments, overprovisioning storage is necessary to meet the performance requirements, driving the TCO way up.

Now let’s get back to why we believe the Nutanix architecture provides a great alternative to traditional server +SAN approaches for VDI. How does the Nutanix Complete Cluster, with its converged compute + storage architecture stack up in the face of these performance challenges?

Fast Random IO: All write IO in the Nutanix Complete Cluster goes to the HOT (Heat Optimized Tiering) Cache first. This cache data is written to Fusion-io ioDrive that is on each node in the Cluster, and immediately returns to the OS. The Nutanix Scale-Out Converged Storage layer that stitches the local storage from each node in the cluster into one global fabric then provides persistent storage in the form of either the DiskStore (direct-attached SATA HDDs) or the FlashStore (PCI-e attached Fusion-io ioMemory) depending on the tiering policies in place. The fact that the IO is written to Fusion-io ioDrive first means that there is no spindle disk in the picture here. All Fusion-io flash. This means microsecond latencies and high throughput.

No more Boot Storms: Each controller VM in the Nutanix architecture functions similarly to a traditional storage controller, except that there is one for every node instead of a limited number shared by a large SAN that may result in bottlenecks.  The controller VMs each have a cache called an “Extent Cache” that caches the data that has been served up by the controller. Frequently accessed data continues to live in the cache. This means that once cached, the OS bits are served from the cache, eliminating the need for disk/flash IO.

By rethinking how infrastructure should be built for virtualization, Nutanix’s approach inherently solves the 2 biggest performance pain points for VDI. Nutanix delivers fast random IO as well as high sequential bandwidth providing desktop users with a great user experience in steady state and in the face of boot storms.

In the next post, I’ll go into another key pain point for VDI: incremental scalability.  It’s commonly known that many VDI deployments fail when they try to scale beyond 200 desktops. We’ll look at how Nutanix is built from the ground up to rapidly scale while maintaining high performance, enabling IT organizations to grow their VDI deployments, as needed, one node at a time.

One last note to keep in mind that though the Nutanix solution is great for VDI, it was designed for virtualization use cases as a whole, from general server virtualization in the core datacenter to DR sites, to test and dev or branch offices.  We’ll elaborate on more of these use case in future blog posts, so stay tuned.

Posted in Technology Trends, Uncategorized, VDI | 2 Comments
Tiffany To

Lessons Learned from Our First vmworld

Back from our first vmworld on the heels of our launch on 8/16, here are some lessons we learned and a highlights video of our experience. 

5.  Lock down your laptops or they will get stolen (We lost 3.)
4.  The NO SAN logo on our booth attracts SAN vendors.

Posted in Uncategorized | Leave a comment
Partha Ramachandran

VDI Series Part 1: Moving Beyond the POC

Gartner released a report in mid 2010 that they expected 50 million VDI desktops by 2013. Then there are the recent newsflashes surrounding VDI : Citrix buys Kaviza and RingCube. AppSense gets a 70 million round of funding from Goldman Sachs.  The VMware View vs Citrix XenDeskop war rages on with new product releases this month.  These outward signs point to a huge market, yet there is also a somewhat hidden undercurrent of chatter about how real VDI is, that is, how many customers are truly deploying VDI beyond a 50 seat POC.

Here at Nutanix, ironically, the director of product management comes from Citrix and the director of product marketing comes from VMware.  We (Partha Ramachandran from Citrix and Tiffany To from VMware) have a fun time making jabs at each other about Citrix XenDesktop and VMware View, but at the end of the day, we realize the real VDI roadblock is not about which user experience protocol is better, but rather, how can VDI be made radically simpler and cost-effective so it can move beyond the POC chasm of purchased but undeployed licenses.

At Nutanix, we did not aim to build a VDI product, but focused on the broader use case of enterprise virtualization, but it’s clear that as early customers such as global law firms have been looking at our solution, there is a sweet spot with VDI.  All of us who pay attention to VDI know what a tough critic Brian Madden can be, so we invited him to our offices a few weeks ago to take a look and drill into the architecture.   After that visit, he remarked that we might have “just created the ultimate server/storage big data combo hardware for VDI.”  See his full take  @ http://tinyurl.com/44j8hy8.

Let’s look at the first part of the roadblock – cost.  Why is VDI so expensive in the first place? Storage accounts for at least 60% of CAPEX for a VDI deployment. The upfront, as well as ongoing, storage costs of growing data and complex network storage management have been the key reasons why many mid to large sized VDI deployments have not moved beyond the POC stage.    Neither Citrix nor VMware are storage companies, and their expertise is in the “upper half” of VDI – desktop brokering, user profiles, image management, etc.  Both companies have tried their best to alleviate the storage pain for the SMB market. Citrix is working on integrating their Kaviza purchase into a simple VDI offering for SMBs. VMware will do something similar with their recent announcements around VSA and vSphere 5. 

But these SMB solutions will only go so far.  Technologies like VMware View Composer seek to provide storage relief for larger deployments, but the caveat is that the complexity level goes up in configuring and managing a gold image and personas appropriately.

The reality is that when customers hit a pivot point of 250-300 desktops, the only way to make a VDI deployment work is to buy an expensive SAN. When that happens, the resulting TCO on a virtual desktop may end up looking greater than the cost of a good laptop, leaving CIOs scratching their heads on whether the benefits of VDI outweigh the costs.  Going from cheap desktop storage to SAN storage is a painful realization that requires significant business case justification. TCO presentations about VDI often focus on the long term OPEX benefits of simpler desktop management (Tiffany confesses to building many of these presos), but those benefits can never be fully realized if the VDI infrastructure itself can’t be managed at scale.

Well, Nutanix believes it’s time to get back to the basics and rethink what virtualized infrastructure should look like in the first place.

Nutanix Complete Cluster is built from the ground up to be a truly converged, scale out solution for enterprise virtualization. Each Nutanix 2U building block can support 200 virtual desktops (this number is being pushed as we speak – come check out our booth #212 at vmworld to see a loginvsi demo). Our scale-out architecture means that you can add blocks as you go, unlike with vBlocks or FlexPods, and still manage a single system without having to reconfigure or tune it.  That protects the customer’s investment and allows them to buy what they need when they need it.  Oh yeah, and there’s no SAN. The Nutanix Complete Cluster delivers a self-contained compute, storage, and networking infrastructure solution to run your datacenter the way Google, Facebook and other cloud-generation companies do. All this with the high-end data management features you’ve come to expect from a million dollar SAN.

Great! So, Nutanix is going to save customers a lot of money on VDI, but what about the performance?  What about boot storms?  How does Nutanix leverage those Fusion-io ioMemory cards for VDI? Stay tuned for part 2 in our VDI series.

Partha & Tiffany

Posted in Uncategorized | Leave a comment