Dog Days of Disaster Recovery are Done

Why technologies like hyperconverged infrastructure make it much easier to control downtime and recover from IT disasters.

By Tom Mangan May 13, 2021

Disaster recovery used to be as sluggish and excruciating as the Dog Days of summer.

But those days are fading, according to two experts on business continuity and disaster recovery (BCDR) who spoke with The Forecast by Nutanix. Ed Collins and Dan Angst are two of the leading BCDR experts at Nutanix, a pioneer in hyperconverged infrastructure (HCI), which virtualizes compute, storage and networking in a single software package.

In their talk with The Forecast, Collins and Angst noted that downtime is increasingly unacceptable because digital technologies are becoming so deeply integrated into businesses and organizations. At the same time, automation and cloud technologies are making it much simpler to create business continuity plans that help organizations fend off threats like ransomware and natural disasters.

Desire to Simplify Disaster Recovery for VDI Workloads Led Penn National Insurance to Hybrid Cloud

In ancient times, stargazers noted how the sun aligned with Sirius, the brightest star in the constellation of Canis Major (or Greater Dog), every year in midsummer. Ancient observers thought the long hot weeks of July and August, which came to be called the Dog Days, brought disaster as well.

Though we live in far less superstitious times today, modern-day IT leaders know too well that the Dog Days of Disaster Recovery can last all year — especially in an increasingly complex BCDR landscape.

Angst and Collins bring deep experience to the challenge of business continuity and disaster recovery. In their conversation with The Forecast, they described four trends reshaping BCDR.

Downtime is getting expensive

Everything in BCDR comes down to dealing with downtime. The ability to limit is a measure of the health of an organization, said Collins, marketing lead for business continuity and disaster recovery solutions at Nutanix.

“Protecting your business begins with protecting your applications and data,” Collins said. “Our customers are becoming more and more reliant on IT in such a way that IT is really the lifeblood of most companies. The ability to recover after a disaster is almost a form of a health check.”

Collins and Angst, director of data center sales at Nutanix, reeled off a list of costly examples of IT system downtime:

2015: A 12-hour Apple Store outage cost the company $25 million.
2016: A five-hour outage cost Delta Air Lines $150 million and 2,000 canceled flights.
2019: A 14-hour power outage cost Facebook $90 million.
2021: Ransomware attacks alone will cost an estimated $20 billion worldwide.

Angst noted a widely quoted estimate from Gartner Inc. that IT downtime costs businesses an average of $300,000 per hour. “That's mind-boggling,” he said.

2. Recovery objectives are getting tighter

The increasing reliance on IT systems gives companies an incentive to reduce or even eliminate downtime. To do that, they must estimate how much downtime they can afford without hurting their business or damaging brand trust.

Collins explained that two critical downtime benchmarks — recovery point objectives (RPOs) and recovery time objectives (RTOs) — are shrinking every day.

“You have to start thinking about ‘how much data can I lose?’ That's RPO, or recovery point objective,” Collins said. “What was tolerable five years ago isn't tolerable anymore.”

RPOs look backward, estimating how long companies can safely go between backups. Then there’s the other half of the downtime equation:

“How much time can I consume before I bring my systems back online so that I do recover in a way that's tolerable? That metric is typically expressed as RTO, or recovery time objective,” Collins added.

RTOs look forward because they estimate how long downtime can last before creating serious pain for businesses and customers. Again, RTOs that were acceptable five years ago are out of the question today.

The latest generations of recovery tools are helping companies shrink their downtime windows. A Nutanix customer, for instance, realized a 24-to-1 improvement in recovery point objectives and a 2-to-1 boost in recovery time objectives.

“That's pretty amazing,” Angst concluded.

How is this possible? A lot of it has to do with building BCDR into the architectures of virtualized environments.

DR Done Right: Modernizing Enterprise Disaster Recovery with the Public Cloud

Recovery tools are getting smarter

With hyperconverged infrastructure, software virtualizes the operations of computers, storage arrays and network switches. This makes it incredibly easy to spin up environments that duplicate existing IT operations and enable rapid disaster recovery.

Nutanix’s BCDR tools include Xi Leap for disaster recovery and Mine for data protection. These tools were built from the ground up to work seamlessly with a variety of hypervisors, including AHV and the company’s full suite of enterprise cloud management services and software.

“It's all native to the platform,” Angst explained. “Disaster recovery and business continuity are all in one and are a single click away.” Integrating everything into the platform means clients can back data up to anyplace they want, from internal data centers to hyperscale public cloud services like AWS and Azure.

Collins added: “A lot of the complexity that the Dog Days of DR imposed on our customers are now gone because you can manage the whole thing through a single interface.”

Database Management Automation Brings Huge ROI

Expanded use of automation makes it easy to create backup and recovery tiers customized to the needs of each business. The most critical apps can run with near-real-time failover, which is expensive but worth it. Less-critical apps and their data can have more generous, and less costly, recovery objectives.

This helps companies optimize their total cost of ownership while creating sophisticated data protection and disaster recovery environments and reducing complexity. “Because it's all managed, configured and failed over from a single pane of glass, you could do it on your iPhone,” Collins said. “We've done all the heavy lifting for our customers.”

Testing is getting more practical

Historically, testing has been one of the biggest headaches of disaster recovery. Thorough, accurate testing is essential because companies need evidence that their backups will work in a disaster. Companies often brought their IT teams to work on the weekends to take their IT infrastructure down and then bring it back up without disrupting business operations.

Testing was so convoluted that companies might run tests once or twice a year — hardly the most secure stance when new cyber villains seem to appear every week. Nutanix and its competitors in the disaster-recovery-as-a-service (DRaaS) space let companies run sophisticated tests minus conventional complexities.

“You don't have to have those multiple silos of applications, data, and tools that make the testing intrinsically difficult,” Angst said. “I can test daily if I want to.”

Accurate testing requires a digital sandbox hosting the most current virtual machine snapshots of the production environment. Entire clusters or just specific virtual machines all can be easily replicated for testing. When the testing is over, the sandbox is just as easy to destroy with one click.

“There have been great strides that mitigate so many of the pains that customers have dealt with in the past,” Collins concluded. “We have to jettison this legacy idea that DR and even testing DR has to be difficult.”

In other words, leave the Dog Days where they belong: in the imaginations of ancient astrologers.

Tom Mangan is a contributing writer. He is a veteran B2B technology writer and editor, specializing in cloud computing and digital transformation. Contact him on his website or LinkedIn.

Subscribe