As you may have read in my article New Adventure to Redefine Radically Simple Architecture for Business Critical Apps with Nutanix I’ve started working at Nutanix. My primary role as part of the Solutions and Performance Engineering team is to help develop the platform, solutions reference architectures, and best practices for Business Critical Applications. One of the areas I’m working on right now is Oracle on Nutanix.
Nutanix really is the big red easy button for infrastructure. The Nutanix Virtual Computing Platform provides a hyperconverged Google like web-scale scale-out infrastructure for the masses. Even though it’s a scale-out architecture the unit of scale is large enough for monster VM’s including Monster Oracle Databases. Nutanix is a Gold level Oracle technology partner and considers Oracle a very important use case for the Nutanix Virtual Computing Platform. I’ve been working with a number of customers that are already running Oracle on Nutanix successfully and as part of this work we’ll be documenting some case studies as well as solutions architectures that you can all benefit from. But right now I wanted to start a series of blogs to document my thoughts, some of the considerations, and benefits of running Oracle on the Nutanix Virtual Computing Platform.
If you want to know why you should be virtualizing your Oracle databases and migrating from Unix to a virtualization commodity x86 platform then I recommend you take a look at my Oracle Page. I have a number of articles regarding the benefits, design considerations, support, licensing and other aspects all listed. This article is specifically about Oracle on Nutanix. Also I won’t be covering all the aspects of the Nutanix Virtual Computing Platform in detail as Steven Poitras has done a superb job of documenting it in The Nutanix Bible. In this article I’ll be including high level benefits and overview of Oracle on Nutanix.
Oracle on a Scale-Out Hyperconverged Platform
The Nutanix Virtual Computing Platform is a scale-out hyperconverged platform. What this means in reality is that when you add a node to a Nutanix cluster you’re not only getting compute resources (CPU and RAM), but you’re also getting storage resources. Not just stroage capacity, but also storage performance. This is similar to the way web-scale companies such as Google, Facebook and Linked-in build their infrastructures.
Nutanix is a No SAN solution. By that I mean you don’t need a SAN, no monolithic storage array, and you don’t need a fibre channel network in order to use it. It’s all connected using standard Ethernet networks, either 1G or 10G (or both). The compute hardware is abstracted and pooled using an enterprise class hypervisor, such as VMware vSphere, so you still get the same high availability, distributed load balancing and scheduling and non-disruptive maintenance you’re used to. When you receive your Nutanix equipment and unbox it you can be up and running in under an hour. It is radically simple yet very powerful.
The way the platform works, the more nodes you have the more aggregate performance you get, both storage and compute. However a single VM can only benefit from the compute resources of a single host, unless you application has some built in mechanism to combine the resources of multiple VM’s (such as Oracle RAC) you can’t have a single VM configured for the capacity of two or more hosts (I’m not talking about an Fault Tolerant VM configuration here). However things work a little bit differently for storage than compute.
Each node in a Nutanix cluster has local SSD and HDD. The most frequently accessed data is stored on the SSD and is kept local to the VM, reducing latency and improving your Oracle database performance. In addition to that the SSD performance tier is also distributed across nodes in the Nutanix cluster. This means your database gets to benefit from the available SSD across the other nodes in the Nutanix cluster. Less frequently accessed data is migrated to the HDD tier, which is also distributed across the cluster. This gives your VM’s access to the aggregate capacity of the entire Nutanix Cluster and also a lower cost of ownership. So the storage capacity and performance for your databases will scale out as the number of nodes scales out also, you’re not limited to the performance or capacity of a single node from a storage perspective. All this magic happens thanks to the Nutanix software and the Nutanix Distributed File System (NDFS) (See The Nutanix Bible and How Nutanix Works).
With NDFS using Replication Factor 2 (RF2) by default all data blocks are written to two nodes at the same time before being acknowledge to the application to ensure data protection. If a drive fails the data will be re-protected immediately. Because this doesn’t use RAID you don’t have the same write penalties, and you don’t have the same rebuild delays. The more nodes in the cluster the faster the process occurs. In addition to this data protection Nutanix has built in snapshot, remote site async replication, data de-duplication and compression. You can still make use of Oracle RAC and Oracle DataGuard on top of the Nutanix foundation to build an even more rock solid environment for your Oracle Databases. You could just as easily rely on VMware HA and DRS for those databases that only need 99.9% availability.
So how big is a node? Good question. You can take a look at the Nutanix Tech Specs. In terms of compute capability a node can be as large as 20 Ivy-Bridge cores @ 3GHz, 512GB RAM and 16TB or more of disk, or as small as 12 cores, 64GB RAM and 4TB disk. You don’t have to have all the same nodes in a cluster, you can mix and match to a certain degree based on your requirements. You can start with 3 nodes and grow one node at a time from there. You get up to two or four independent Nutanix nodes in a 2U chassis known as a block, as in building block. This is a pay as you grow model, keeping the up front acquisition cost down compared to other converged architectures and allowing economical units of scalability. With a single node having 20 x 3GHz IVB Cores and 512GB RAM you can realistically virtualize very large Oracle databases, or a mixture of different sized databases.
So how big can this get? Well lets take a Nutanix cluster of 32 nodes, this is not the biggest single Nutanix cluster currently in use (look at this 50 node cluster), but a good example as it’s the current limit for a single VMware vSphere Cluster. With one of the Nutanix node options this 32 node cluster could provide as much as 640 x 3.0GHz IVB Cores, 16TB RAM, 25TB distributed SSD storage 256TB usable distributed HDD storage, all in just 32U. In some cases a solution such as this would be less cost than just the storage for a traditional storage array. The difference being you can run all of your Oracle Database VM’s on Nutanix. How many VM’s can you run directly on other storage arrays?
Oracle on Nutanix as part of Your Software Defined Datacenter
With a software defined datacenter and Nutanix Virtual Computing Platform hyperconverged architecture you can provide different SLA’s required by your different applications on the same simple underlying infrastructure. You can deliver 99.999% or higher in one part of your environment, while delivering 99.9% in another part. All with the same foundation, same tools, comprehensive yet simple management capability, all without the lock in, and all with a better ROI and TCO compared to other options. You can meet the multiple business requirements of isolation, SLA’s etc, that each of your application teams require, without complicating change management or over complicating the infrastructure.
There is no need to compromise when you virtualise. This is why my customers have achieved such success. I’ve helped them to virtualise without compromise. But you need to build a solid design on a solid foundation that gives you those options and is flexible enough to change when your business changes. With Nutanix Virtual Computing platform as the foundation we can provide an abstracted, pooled and automated environment that self tunes to the needs of the applications and the teams running them, self heals when it breaks, based on pre-defined policies, with far greater levels of efficiency.
A software defined datacenter isn’t just about being policy defined or a declarative architecture. It’s also about being able to grow and shrink as you need, and being able to upgrade non-disruptively. Nutanix allows you the ability to add or remove nodes online, so that you can grow and shrink your environment as your needs dictate. It also provides the ability to do completely non-disruptive upgrades of your server firmware, your hypervisor, but also your storage. You can upgrade your Nutanix OS version while all of your VM’s are running and they won’t even know its happened. I’ve experienced this for myself a few times already.
By building a software defined datacenter you are avoiding single vendor lock in while maintaining flexibility, standardization and good economics. By building your software defined datacenter with Nutanix you get a simple package of server, compute and storage that is scaleable, flexible and simple to maintain. Although this is all from one vendor and completely supported by Nutanix, you are also able to easily migrate from it at any time if you’re not happy with your solution. We know this, and we know how important it is that you have a good experience, so we will always do our best to make sure the solution meets your requirements. The power really now is in your hands as the customer.
Service and Support
Like any good enterprise vendor Nutanix offers a range of service and support options, including 24/7/365 support and various response times. The important point is that Nutanix supports the entire solution. Hardware (both compute and storage), Hypervisor, and with VMware support even the database. You can read my article Fight the FUD – Oracle Licensing and Support on VMware vSphere for more information regarding VMware’s Oracle expanded support options. Nutanix is also a member of TSANet, which is important when working across vendors support organisations. Rest assured if you choose to run your databases on Nutanix, like other customers are doing, you will be fully supported.
Foundation for Database as a Service
The Nutanix Virtual Computing platform provides a set of standardized and simple node options that come together to form a cluster. You can start with 3 nodes of a particular type, combine different nodes to balance the requirements of compute or storage capacity and performance to meet the requirements of your different workloads, and grow one node at a time. You’ll need sufficient nodes to support your workload requirements and to allow for failures (N+1 minimum). The NDFS gives you scalability and availability of shared storage and then VMware vSphere gives you everything else you need to run enterprise class Oracle Databases. When you add to this your cloud management platform, and a manageable number of pre-defined Oracle database templates, and perhaps VMware Data Director, you can provide a simple, automated and highly scalable policy driven Database as a Service environment to your DBA’s and application users.
My Experience with the Nutanix Platform so far and Oracle on Nutanix
I’ve done the training, I’ve got some hands on, and I passed the Nutanix Platform Profession Certification. NPP is a Cert I recommend as it’s not just about virtualization but a complete converged platform, just like VMware VCP isn’t just about the hypervisor and compute but all resources. These are some of the foundations of a Software Defined Datacenter education.
I’ve been setting up Nutanix NX-3450 block in my lab environment, as the foundation of my testing. Preparing the VMware vSphere 5.5 environment and configuring it the way I want it to best demonstrate all the capabilities of the combined solutions. I have Multi-NIC vMotion working and during vMotion performance tests I’m getting over 2100MB/s between the nodes (2 x 10G links). I’m using a vSphere Distributed Switch, and Network IO Control is enabled. This proves my network infrastructure is fine (2 x Dell PowerConnect 24 port 10G switches) and that my hypervisor can support the live migration of very demanding workloads. My vMotion test VM had 192GB RAM and 8 vCPU’s and was running at 100% utilization using Prime 95 64bit Custom Torture Test at the time it was migrated.
I have also been conducting IO performance tests with a utility built into the Nutanix Virtual Computing Platform called diagnostics.py. If you have a new Nutanix cluster this is a great way to ensure the components are working, including the physical infrastructure, and to see what your storage performance will likely be. Diagnostics.py creates one VM on each Nutanix node and runs a series of IO performance benchmarks, which can also include iPerf for network, but the basic tests use FIO to drive storage IO. The results are then aggregated so you get the total performance of the entire cluster. The storage IO patterns use 1MB IO size for sequential read and write and 4KB IO size for random read and write (Note: Oracle DB’s use 8KB IO size predominantly so half the IOPS at 4KB IO size for as an estimate of DB IOPS performance). If you have a Nutanix Cluster you want to test you can simply log into one of the CVM’s and run the command “diagnostics/diagnostics.py –display_latency_stats –run_iperf run” from the Nutanix home folder. This will give you the latency and iPerf stats in addition to just the storage IO numbers. After the test is complete you run the “diagnostics/diagnostics.py cleanup” command, to clean up the test and shut down the test VM’s. I wouldn’t recommend running this against an active production environment as it will consume all available storage performance. Nutanix has successfully demonstrated up to 1million IOPS using 40 Nutanix Nodes (See Josh Odgers article – Scaling to 1 Million IOPS and beyond linearly!).
For my NX-3450, which is a 4 node block (Total of 64 IVB Cores at 2.6GHz, 1TB RAM, 3200GB SSD, 16TB HDD) I’ve been able to generate aggregate results of 1.5GB/s sequential write, 3.3GB/s sequential read, 110K IOPS random read and 52K IOPS random write. In my environment the Nutanix Storage Controller VM is configured to use a single NIC and leverage VMware’s Load Based Teaming (other NIC’s are available for availability). The throughput between Controller VM’s is over 1.1GB/s. I think this is a fairly good starting point for Oracle don’t you?
My next steps are to get Oracle RAC configured and I’ll be testing a 3 Node Oracle RAC configuration with an OLTP Database of 300GB – 500GB in size initially. I’ll be using a TPC-C like benchmark to drive load against the Oracle RAC environment and testing the capability of the different components of the architecture. I’m expecting this’ll allow me to demonstrate very clearly the power and the simplicity of the Nutanix Virtual Computing Platform for Oracle Databases.
I’ve only been at Nutanix for a short time but I’ve already met a couple of customers running many Oracle workloads on Nutanix in production. From a customer that uses Oracle on Nutanix as part of a system to manage 600,000 exams and the marking of those exams (whole solution all in just 5RU), to financial institutions running many Oracle DB’s (16 Nodes, 320 x E5-2680v2 Cores, 8TB RAM, 128TB usable storage). We also have a very good pipeline of customers that have purchased Nutanix Virtual Computing Platforms for the purpose of running large scale Oracle workloads . We’ll be making these and other examples into case studies and I’ll be writing best practice white papers and reference architectures for you all to benefit from. Look out for these on the Nutanix web site, and here as I’ll let you know when they are published.
I know I’ve only just scratched the surface of Oracle on Nutanix and this has been more a conceptual high level article than detailed. I will cover the details more in future articles as part of this series as I work through the work for the Oracle on Nutanix Best Practice and Reference Architecture papers. I hope this gives you a flavour for what are vast opportunities to run your Oracle environments to deliver the required SLA’s to your DBA’s while converging and greatly simplifying the architecture and improving ROI and TCO. As always I appreciate your comments and feedback.
This article is also published on my personal blog at longwhiteclouds.com.