Convergence with Scott Adams

Converged architectures coming out of force-fit consortiums are just that — expensive toxic blobs. You pay through your nose, and what you get is glorified professional services and an overstretched one-throat-to-choke argument for support. The mid-market has rightfully rejected the gimmicks. Scott Adams, the God-of-all-McKinsey-strategists, saw this ahead of us when some Virtualization-Challenged-Eclecticists came to sell $1M equipment to Asok!

Today, everyone and their mother want to call their products “converged.” Moving beyond the hype will require some bolder measures: collapsing complex networks, consolidating application and storage servers, simplifying datacenters, and converging skill sets. When such convergence is prevalent, Scott will be bored enough to write about other toxic blobs such as all-flash-arrays.

Posted in Technology Trends | Leave a comment
Partha Ramachandran

VDI Series: Part 3 – Incremental Scalability

In my previous blog post, I talked about how the Nutanix architecture, built ground-up for virtualization, is a perfect fit for the unique performance needs of VDI. In this post, I would like to cover another pain point of VDI – incremental scalability.  It’s commonly known that many VDI deployments fail when trying to scale beyond 200 desktops. In this post, I’d like to cover some of the reasons for this failure, and then go into how the Nutanix architecture is designed to rapidly scale while maintaining high performance, enabling organizations to grow their VDI deployments, as needed, one node at a time.

Let’s look at some of the reasons why VDI deployments fail to scale when using the standard server + SAN approach:

Many VDI deployments do not have a dedicated storage array. Typically, the SAN is a common resource to all virtualization initiatives in an organization. From the get-go, this approach can be a recipe for disaster. SAN resources are shared by VDI workloads as well as server workloads, which have different performance profile requirements. There is no effective way to guarantee any kind of QoS for the initial VDI deployment, let alone scale the VDI deployment effectively.

If a VDI deployment does indeed get a budget to have its own SAN, the SAN is sometimes under-provisioned because it’s based on the initial # of planned desktops. When the VDI deployment needs to scale, the current SAN performance & capacity is no longer sufficient, and another storage array needs to be wheeled in. Adding a storage array not only makes the CAPEX go through the roof, but also increases the management complexity of the VDI deployment.

In the most common case, SAN resources are over-provisioned. The idea here is that the SAN is a one-time purchase, and the organization can add servers to the fabric as the deployment scales. There are two common resulting problems with this approach:

- The organization is forced to take a huge upfront CAPEX hit, which then drives up the overall cost of the project and the time it takes to realize the ROI
- The interconnect becomes the bottleneck. Typically the array is on the other side of the core switch. The core switch is heavily oversubscribed, and therefore the interconnect between the servers and the storage becomes a huge bottleneck.

The real solution to delivering incremental scalability is to keep the compute right next to the storage, which is key to the Nutanix approach for achieving higher performance with greater simplicity. A single Nutanix block houses four nodes, and organizations can add single nodes to their Nutanix cluster to continue to grow their VDI by 50-100 desktops depending on the user profile (task/knowledge worker/power user).
Some of the key attributes of the Nutanix architecture that deliver this incremental scalability are described below:

Shared Nothing Distributed Architecture: The Nutanix Cluster is a pure distributed system. This means that the compute gets its storage locally, and does not need to traverse the network. All the problems around network, interconnect and core switch bottlenecks are avoided completely. IT organizations can add single Nutanix nodes or blocks of four nodes, as needed to achieve linear scaling in performance and capacity.

Distributed Metadata : A big problem affecting scalability in a traditional SAN is metadata access. Most storage arrays have 1-4 controller heads and all metadata access needs to go through these heads. This causes contention and performance drops as more servers try to access the same storage array.

In the Nutanix Cluster, metadata is maintained in a truly distributed and decentralized fashion. Each of the nodes in the Nutanix Cluster maintains a part of the global metadata, which means that there is no single bottleneck for metadata access.

Distributed Metadata Cache : Most storage arrays do a lousy job of maintaining metadata caches. In storage arrays that do have a metadata cache, the caches live on the limited number of controller heads. Access to the cache is limited by the bottlenecks around the network, interconnect and switch as discussed above.

In the Nutanix Cluster, metadata is cached on each of the controller VMs. Most metadata access is served up by cache lookups. Each controller VM maintains its own cache.  This means that however large the Nutanix Cluster grows as you continue to add nodes, the cost of metadata access stays the same.

Lock-free Concurrency Model :  The standard approach to ensuring correctness for metadata access is to use locking. Unfortunately, in a distributed system, locking can become tricky. Excessive locking ensures correctness, but causes performance to drop like a rock.

The Nutanix Cluster implements an innovative lock free concurrency model for metadata access. This model ensures the correctness of metadata but at the same time ensures high performance.

Distributed MapReduce for data/metadata consistency : For a large scale deployment, consistency checking for data and metadata becomes a challenge. The Nutanix Complete Cluster implements a fully distributed map-reduce algorithm to ensure data and metadata consistency. The distributed nature of the map-reduce ensures that there is no single bottleneck in the system. MapReduce has been shown by Google to scale to 1000s of nodes, and is a key ingredient in the incremental scalability of the Nutanix Cluster

Distributed Extent Cache : Caching is not a new concept to storage arrays. The challenge, however is that caches are located on the limited number of storage controllers in a storage array. Not only does the limited of storage controllers cause contention, but the fact that these caches live in the storage array means that access to cached data needs to traverse the core switch. This brings with it latencies around the network, interconnect and switch that was discussed above.

In my previous blog post, I discussed the extent cache, which caches the data served up by the controller VM. They key aspect of this cache is it lives on the controller VM. This means that the compute tier can access the cache locally, without having to hop across the network and the core switch. This approach allows the Nutanix Cluster to incrementally scale with ease, while maintaining high performance.

My next blog post will focus on manageability. With large VDI deployments, it is crucial to let organizations focus on managing virtual desktops, rather than have to worry about compute and storage. In a traditional server + SAN model, organizations need to have trained personnel to manage servers, storage arrays, switches, zones etc. The Nutanix Cluster removes the complexity in managing a virtualized datacenter, and allows an organization to focus on using VDI towards their core business.

Posted in Uncategorized, VDI | 3 Comments
Shirish Sathaye

A New Building Block

As part of Khosla Ventures, I’m privileged to be involved with disruptive technologies very early on, sometimes at a stage when it may be hard to fully see the impact that a technology may have on the market. When I met with the founders of Nutanix, I saw very quickly that this was not one of those situations.

The penetration of virtualization has driven server consolidation and enabled enterprises to run diverse applications on the same physical server, and the resulting increased utilization of servers has put new, immense pressure on storage performance. Many startups are tackling this space, but it’s clear that incremental improvements to today’s storage technologies are inadequate in enabling enterprises to realize the full potential of virtualization.

A complete rethinking of the storage layer is needed.

Nutanix is the only company that has cracked the code on this problem while the incumbents are asleep at the wheel. In the late 1990s, I left FORE Systems, which built ATM switches, for Alteon Websystems, as Ethernet switches and Server-Load-Balancers became fundamental for the construction of the Internet, with TCP/IP eclipsing ATM. Today, Nutanix’s approach to converged compute and storage delivers a fundamentally new building block for datacenters after decades of bigger and bigger SANs in the enterprise. The impact is radical and IT professionals will wonder in ten years how they were ever be able to build a virtual computing environment with SANs.

As the newest member of the Nutanix board, I am very excited to be part of the revolution that Nutanix is bringing to the virtualized datacenter.

Posted in Technology Trends, Uncategorized | Leave a comment
Partha Ramachandran

VDI Series: Part 2 – Addressing Performance

At VMworld back in August, Nutanix was honored with TechTarget’s Best of VMworld – Desktop Virtualization award.

With the show being our first public appearance after launching just two weeks prior, this honor surprised many people in the desktop virtualization and general virtualization community.   In this post (continuing our VDI series), I would like to go into some details on why we believe the Nutanix Complete Cluster is a great fit for desktop virtualization use cases.  If you’re in a hurry and want the Cliff Notes version, you can take a detour to the vmworld video interview @ BrianMadden.com. For those with a longer attention span, let’s get into the nitty,  gritty.

Three key areas that can make or break a successful VDI deployment are: Performance, Scale-out and Manageability. This post focuses on performance as it relates to VDI.  I will cover Scale-out and Manageability in subsequent posts.

Let’s start with a baseline.  This article from Citrix does a good job of describing the IO profile for virtual desktops. Even though the article is specific to Citrix XenDesktop, the analysis can also equally apply to VMware View or any other VDI management solution.  Here’s a summary of the key points from this article for our baseline:

Boot Storms - 300 IOPS, 90/10 R/W ratio
Steady State -10/90 R/W ratio. IOPS numbers vary as follows:

Light: Single application, no web browsing – 6 IOPS

Normal: Few applications, minimal web browsing – 10 IOPS

Power: multiple applications concurrently, and heavy browsing – 25 IOPS

Heavy: Compiling code, Videos – 50 IOPS

Now that we know what it takes for a VDI performance-wise, let’s examine how a typical SAN holds up under these circumstances. Let’s take a 1,000 desktop deployment as an example.

Random Steady State IO:  In steady state, desktop users open applications, send email, browse the web etc. Each of these translates to a small size IO request to the storage layer. Now, each desktop is independent of the other desktops. A SAN can’t tell one desktop from the other. Therefore all IO coming into a SAN at steady state is completely random.

What kills a SAN in steady state is that each random IO is a spindle head movement. With an average of 20 IOPS per desktop, the total random IO required of a SAN is 20,000 IOPS. This translates to 300 spindle disks without accounting for RAID. With RAID 5 or 6, the number of disks required is 600-800 just to support steady state random IO coming from these 1,000 virtual desktops.

Boot Storms: Booting a virtual desktop requires that the key OS bits be loaded from the SAN, separately for each desktop. There’s no simple way for a SAN to load the data more intelligently, since this intelligence has to be at an upper layer. Booting 1,000 users translates to 300,000 IOPS, which means for many deployments, overprovisioning storage is necessary to meet the performance requirements, driving the TCO way up.

Now let’s get back to why we believe the Nutanix architecture provides a great alternative to traditional server +SAN approaches for VDI. How does the Nutanix Complete Cluster, with its converged compute + storage architecture stack up in the face of these performance challenges?

Fast Random IO: All write IO in the Nutanix Complete Cluster goes to the HOT (Heat Optimized Tiering) Cache first. This cache data is written to Fusion-io ioDrive that is on each node in the Cluster, and immediately returns to the OS. The Nutanix Scale-Out Converged Storage layer that stitches the local storage from each node in the cluster into one global fabric then provides persistent storage in the form of either the DiskStore (direct-attached SATA HDDs) or the FlashStore (PCI-e attached Fusion-io ioMemory) depending on the tiering policies in place. The fact that the IO is written to Fusion-io ioDrive first means that there is no spindle disk in the picture here. All Fusion-io flash. This means microsecond latencies and high throughput.

No more Boot Storms: Each controller VM in the Nutanix architecture functions similarly to a traditional storage controller, except that there is one for every node instead of a limited number shared by a large SAN that may result in bottlenecks.  The controller VMs each have a cache called an “Extent Cache” that caches the data that has been served up by the controller. Frequently accessed data continues to live in the cache. This means that once cached, the OS bits are served from the cache, eliminating the need for disk/flash IO.

By rethinking how infrastructure should be built for virtualization, Nutanix’s approach inherently solves the 2 biggest performance pain points for VDI. Nutanix delivers fast random IO as well as high sequential bandwidth providing desktop users with a great user experience in steady state and in the face of boot storms.

In the next post, I’ll go into another key pain point for VDI: incremental scalability.  It’s commonly known that many VDI deployments fail when they try to scale beyond 200 desktops. We’ll look at how Nutanix is built from the ground up to rapidly scale while maintaining high performance, enabling IT organizations to grow their VDI deployments, as needed, one node at a time.

One last note to keep in mind that though the Nutanix solution is great for VDI, it was designed for virtualization use cases as a whole, from general server virtualization in the core datacenter to DR sites, to test and dev or branch offices.  We’ll elaborate on more of these use case in future blog posts, so stay tuned.

Posted in Technology Trends, Uncategorized, VDI | 2 Comments
Tiffany To

Lessons Learned from Our First vmworld

Back from our first vmworld on the heels of our launch on 8/16, here are some lessons we learned and a highlights video of our experience. 

5.  Lock down your laptops or they will get stolen (We lost 3.)
4.  The NO SAN logo on our booth attracts SAN vendors.

Posted in Uncategorized | Leave a comment
Partha Ramachandran

VDI Series Part 1: Moving Beyond the POC

Gartner released a report in mid 2010 that they expected 50 million VDI desktops by 2013. Then there are the recent newsflashes surrounding VDI : Citrix buys Kaviza and RingCube. AppSense gets a 70 million round of funding from Goldman Sachs.  The VMware View vs Citrix XenDeskop war rages on with new product releases this month.  These outward signs point to a huge market, yet there is also a somewhat hidden undercurrent of chatter about how real VDI is, that is, how many customers are truly deploying VDI beyond a 50 seat POC.

Here at Nutanix, ironically, the director of product management comes from Citrix and the director of product marketing comes from VMware.  We (Partha Ramachandran from Citrix and Tiffany To from VMware) have a fun time making jabs at each other about Citrix XenDesktop and VMware View, but at the end of the day, we realize the real VDI roadblock is not about which user experience protocol is better, but rather, how can VDI be made radically simpler and cost-effective so it can move beyond the POC chasm of purchased but undeployed licenses.

At Nutanix, we did not aim to build a VDI product, but focused on the broader use case of enterprise virtualization, but it’s clear that as early customers such as global law firms have been looking at our solution, there is a sweet spot with VDI.  All of us who pay attention to VDI know what a tough critic Brian Madden can be, so we invited him to our offices a few weeks ago to take a look and drill into the architecture.   After that visit, he remarked that we might have “just created the ultimate server/storage big data combo hardware for VDI.”  See his full take  @ http://tinyurl.com/44j8hy8.

Let’s look at the first part of the roadblock – cost.  Why is VDI so expensive in the first place? Storage accounts for at least 60% of CAPEX for a VDI deployment. The upfront, as well as ongoing, storage costs of growing data and complex network storage management have been the key reasons why many mid to large sized VDI deployments have not moved beyond the POC stage.    Neither Citrix nor VMware are storage companies, and their expertise is in the “upper half” of VDI – desktop brokering, user profiles, image management, etc.  Both companies have tried their best to alleviate the storage pain for the SMB market. Citrix is working on integrating their Kaviza purchase into a simple VDI offering for SMBs. VMware will do something similar with their recent announcements around VSA and vSphere 5. 

But these SMB solutions will only go so far.  Technologies like VMware View Composer seek to provide storage relief for larger deployments, but the caveat is that the complexity level goes up in configuring and managing a gold image and personas appropriately.

The reality is that when customers hit a pivot point of 250-300 desktops, the only way to make a VDI deployment work is to buy an expensive SAN. When that happens, the resulting TCO on a virtual desktop may end up looking greater than the cost of a good laptop, leaving CIOs scratching their heads on whether the benefits of VDI outweigh the costs.  Going from cheap desktop storage to SAN storage is a painful realization that requires significant business case justification. TCO presentations about VDI often focus on the long term OPEX benefits of simpler desktop management (Tiffany confesses to building many of these presos), but those benefits can never be fully realized if the VDI infrastructure itself can’t be managed at scale.

Well, Nutanix believes it’s time to get back to the basics and rethink what virtualized infrastructure should look like in the first place.

Nutanix Complete Cluster is built from the ground up to be a truly converged, scale out solution for enterprise virtualization. Each Nutanix 2U building block can support 200 virtual desktops (this number is being pushed as we speak – come check out our booth #212 at vmworld to see a loginvsi demo). Our scale-out architecture means that you can add blocks as you go, unlike with vBlocks or FlexPods, and still manage a single system without having to reconfigure or tune it.  That protects the customer’s investment and allows them to buy what they need when they need it.  Oh yeah, and there’s no SAN. The Nutanix Complete Cluster delivers a self-contained compute, storage, and networking infrastructure solution to run your datacenter the way Google, Facebook and other cloud-generation companies do. All this with the high-end data management features you’ve come to expect from a million dollar SAN.

Great! So, Nutanix is going to save customers a lot of money on VDI, but what about the performance?  What about boot storms?  How does Nutanix leverage those Fusion-io ioMemory cards for VDI? Stay tuned for part 2 in our VDI series.

Partha & Tiffany

Posted in Uncategorized | Leave a comment

Unleashing Fusion-io in the Nutanix Architecture

Recently did a guest blog post on Fusion-io about how we make use of their ioDrives in the Nutanix architecture.  Take a look @ http://www.fusionio.com/blog/unleashing-fusion-io-from-the-san-traffic-jam-nutanix-complete-cluster-for-enterprise—class-virtualization/

Posted in Uncategorized | Leave a comment

We are officially open for business

It’s official! We are open for business, after spending many tireless moments building this product and bringing it to market. We can’t help but gush at how beautiful the end product looks and feels and sounds! Best part — it’s low maintenance. How rarely have we seen something that is pretty and yet low maintenance?! It’s one of those.

Game-changers Stir Debates… and Markets

When storage upstarts brought iSCSI to the market, the fence-sitters — who had seen storage from the lens of the high-end market — were the biggest skeptics. Fiber Channel was king, and seeing SCSI traffic go over a commodity TCP/IP/Ethernet stack was unimaginable. To the purists, it was even heretic to think of another way of doing SANs. There was a massive debate, but the mid-market spoke, and spoke resoundingly.  iSCSI is now king, and Fiber Channel is on decay.

When DataDomain brought a disk-based backup appliance to the market, the fence-sitters — who had seen backup again from the lens of the high-end market — were the biggest skeptics. Tape was king, and doing backup using disks was unimaginable. To the purists, it was even heretic to morph the backup data by applying deduplication. There was a massive debate on whether such morphed data will withstand the audit scrutiny of a court of law. The mid-market spoke, and resoundingly so. Disk-based backups rule now, and tape is on decay.

And now that Nutanix has collapsed compute and storage into one tier, we are stirring a massive debate on “who owns storage”. But the mid-market will speak, and resoundingly so. We tend to underestimate the power of the mid-market in creating paradigm shifts within IT. These early adopters are some of the first deployments of Nutanix. They and the channel partners serving them have spoken loud and clear about the need for a solution that disrupts the network storage market.

We have built a game-changer that dramatically simplifies and shrinks the datacenter. It will stir the datacenter market in this decade.

Reflecting on the foundation

Building the product was immensely hard, especially because we set such a high bar for ourselves in terms of enterprise- and market-readiness. The team has managed to pull off a nearly flawless execution in record time. The goals and deadlines set seemed laughably-impossible, and yet the team was up for each and every one of them. The sleepless nights, the pizza dinners, the intractable bugs, the insatiable nit-picking of the (UI and website) designs, the hiring committee meetings, the countdown calendar, the dreaded DogFood* dashboard, the scotch-pouring during company meetings… each of these have helped forge the foundation of this company. As we scale this business, this camaraderie will be the single-most important virtue to take us through the highs and lows of the future.

Quiet Confidence

Over the last several quarters, we’ve been heads-down developing the product, learning from the pilots, and building the engineering team. Now that the product is generally available, we will be firing on all cylinders, marketing and sales included. These two functions have their hands full selling this product. There is a quiet confidence within us stemming from the knowledge that we’ve built something that is of lasting value to end customers and channel partners. Our goal is for the channel to make more margins on a product that sells for less — this seems like magic, but convergence has been known to pull off even bigger surprises! The synergy of compute and storage never ceases to amaze us.

To our future channel partners, customers, and employees: welcome aboard!

*P.S.: DogFood and PuppyFood are our internal datacenters that run Nutanix-on-Nutanix.

Posted in Uncategorized | 2 Comments

The Dinosaurs are Dying

Having spent almost a decade in IBM research inventing bleeding-edge technologies for storage systems, and proudly making them the biggest and baddest creatures in the storage-land, I now feel that the beginning of the end for SAN-like storage systems is hurtling towards us. The last few of the T. rex might be the most vicious but their roar will soon be forgotten as the Jurassic age of computing comes to an end.

Virtualization has ushered in a slew of fundamental changes into the computing landscape. With an ever increasing demand for IT processing, the datacenters have been witnessing unprecedented sprawl. Virtual machines have allowed the consolidation of 100s of traditional servers on a few powerful physical servers. From virtual desktops to software-as-a-service, to anything cloud — virtual machines have become a fundamental tenet in the data center. This uber concentration of computing logic onto a handful of data center servers gives rise an unprecedented demand for concentrated storage performance. The SAN vendors coined terms like ‘boot storms’ in a subliminal attempt to shift the blame back to the consolidated servers, while actually exposing the reality that SANs were never designed for virtualized servers. Most storage vendors, today, provide an ensemble of yester-years technologies glued together with a serious amount of IT dollars. At IBM storage research, I realized that it was time, once again, to think outside the storage box.

Applications that deal with a large amount of data, like Google’s search, Facebook’s inbox, and Amazon’s services, have all developed their own storage architectures as even the scale-out SAN architectures were seriously inadequate. The fundamental problem is the bottleneck in the networking fabric that SAN (or NAS) architectures create due to the separation of the consolidated servers from the consolidated storage.  The sequential bandwidth of hard disks, and the spectacular performance of recent SSDs, have encouraged the creation of data-hungry applications, and made the problem even more acute. To work around the networking meltdown, the googles and facebooks of the world hired a bunch of PhDs to rewrite the entire application layer such that logic is shipped to run on the hardware where the data resides (a.k.a Map-reduce). This is diametrically opposite to the assumptions of traditional SAN architectures. At IBM, I saw such non-SAN storage as complementary and non-threatening to the traditional SAN systems where my research was focused. But, something else was brewing on the horizon.

The explosion of virtual machines and server consolidation in the data center has completely shaken the SAN world.  Most traditional applications (which constitute the majority of the software wealth of mankind) have been designed with the assumption that data is shipped to the application. From this contradiction arises the third option where logic and data are in fact co-located. Even in the human brain, which arguably is the most evolved form of computing, logic and data inextricably and sometimes indistinguishably co-reside. This emergence of what might be called NoSAN, seems both logical and inevitable. I believe this ‘third option’ is the species that will run rampant not only in the green fields of Big Data but will push the SAN dinosaurs to a corner in their home turf as well. Data will need to be primarily stored on the same hardware or close to where the applications reside for the best scale out performance, while being backed up elsewhere for reliability.

A couple of years ago, the founders of Nutanix had the vision and fortitude to embark on a mission to realize this third option that would deliver seamless mobility, scalability, performance and reliability in a harmonious marriage of storage and server technologies. The dawn of a new era in computing is now upon us as Nutanix unveils the data-center of the future. This is not to say that SANs will cease to exist, but just like RDBMS, will play a supportive role in the path going forward. It is time to embrace Not-only-SAN, Not-only-SQL, and Not-only-what-used-to-be-cool.

Posted in Technology Trends, Uncategorized | 5 Comments

Finding a Needle in the Infrastructure Haystack

A few weeks ago, we shared our thoughts on why and how we wanted to revolutionize the enterprise management of virtualized datacenters by creating new intuitive workflows for managing both compute and storage resources.  With this goal in mind: “Make virtualization simple”, we’ve designed our UI to be intuitive, functionally advanced and visually appealing from the ground up.  Speaking of “ground-up,” let us share more about one of the fundamental pieces of our UI design: search.

Spotlight

 

There’s no doubt that search has transformed how we use software today. Through the years, search has also become intelligent and instant. It anticipates what the user wants to do or search for. Just like Google Chrome’s tag line: “It’s faster than the speed of thought,” we’ve combined these awesome features in our search engine, borrowing from the philosophy behind Apple’s Spotlight function. As your virtualized datacenter scales out, with the potential of hundreds or thousands of compute and storage objects accumulating, we know it will be critical to quickly and easily find a specific object of interest.

Another cool feature about our search bar is that it can also aid in admin actions like creating, editing and listing resources across our product. It lessens the number of mouse clicks a user has to make in order to accomplish a single admin action. We’ve also integrated synonym-related actions so users are not only stuck with one action word. Let’s say a user wants to create a new entity, he/she is not only limited go to the entity’s page and click the create button. By just typing “create” or “add”, our search bar anticipates what the user wants to do next.

Another admin action example is editing a specific resource directly in the search window. The user simply types “edit” or related words like “update” or “modify” and let auto-fill do the rest.

A picture paints a thousand words, doesn’t it? These are just few small examples of how we hope to transform the virtualized infrastructure management user experience. We’re getting great feedback from our early access customers and can’t wait to unveil our UI when we launch in the very near future.

 

Posted in UI Design, Uncategorized | Leave a comment