Advancing Genomics Medicine with Cloud Infrastructure

By Tapan Mehta

The combined power of computing technology, data storage, and gene sequencing will shape the future of personalized medicine.

The ability to rapidly sequence and interpret genomic information has quickly emerged as a fundamental driver of improved human health. To complete the first human genome sequence in 2003, scientists consumed 13 years, an international effort, and billions of dollars in investment. Currently, sequencing labs can churn out hundreds of completely sequenced individual human genomes in days, with costs as low as $299. Huge, ambitious endeavors such as the 100,000 genomes project, the All of Us NIH research project (1 million genomes), and other national and international efforts are the order of the day. These studies produce invaluable troves of information for scientists to explore and enable ever deeper levels of query and understanding into the fundamental bases of development, disease, and various other traits. This rapid on-going innovation in the genomics field is driving a growing number of pharma and academic centers to embrace cloud infrastructure for its elastic and cost-efficient resources. 

Producing the Sequence Is Just the Start

Much of the acceleration in this field has involved new sequencing technologies, such as those developed by the Illumina, Oxford Nanopore Technologies, and PacBio companies. These advances have improved base call accuracy, enabled longer read lengths, and increased speed while decreasing costs.

On the output side of things, deciphered genomic sequences can occupy warehouses of data at ~150 GB per genome. But really, that’s just the beginning. Mapping reads and making higher-level inferences based on collections of sequences require robust and rapidly scalable computational power, data storage, transfer, retrieval, and analytical resources, and the numbers rapidly multiply as the million-genome-scale projects enter the fray. It is this scale, though, that holds the greatest promise for revealing subtle relationships among genes and phenotypes. Instead of simple associations between gene X and disease Y, multi-gene classifications can be stratified into specific population groups, providing the most precise correlations and predictions. The human genome sequences alone have been estimated to occupy up to 40 exabytes of data by 2025, and this is at a 1:1 human:sequence ratio. It's easy to see how this number will multiply drastically as transcriptional, epigenetic, single-cell, temporal, and other regulatory levels under the genomics umbrella are added. 

The value of these tools is evident from the recent worldwide scientific community effort to understand and conquer COVID-19. The virus sequence was released less than a month after the disease gained international recognition, and this led to a record-setting timeline for vaccine development and approval, something that could be supported by hyperconverged infrastructure (HCI) and cloud. However, much remains to be understood, such as why certain individuals are susceptible while others go unscathed, and sequencing efforts hold great promise in answering such questions. Continued sequencing is also rapidly identifying new variants that may affect virulence and disease severity, informing better public health decisions and helping vaccine developers stay ahead of changes.


Image Source: NHS

You Need the Right Data Infrastructure

To sustain the pace of progress in this field, data handling must have a VIP seat, and Nutanix® cloud software and HCI solutions are a perfect fit for handling the changing needs in this data-rich environment. Nutanix software enables simplified access and control of hardware that stores and processes this data through a single interface, whether the data is on-prem, in a public cloud or a private cloud, or, as seen more commonly, in a hybrid cloud environment. Nutanix simplifies the technology so you can focus on your data, and we accomplish this simplification using software-defined HCI. This allows the same tools to function on-prem or in hybrid cloud environments, and it’s built using web-scale architecture—made for the scale at which Google, Amazon, and Facebook technologies work. It’s easy to see that this is the framework needed to support massive sequencing projects. With Nutanix solutions, storage and management are fully distributed across various nodes to prevent a single point of failure, creating redundancy to help guarantee data integrity, pave the way for the federated genomics analysis espoused by GA4GH, and allow data to remain in local storage and comply with jurisdictional constraints related to data sharing.

At the scale of many of these projects, minor efficiency improvements can sum to major savings. Every element of data transfer and processing involves latencies that can be addressed with optimized data handling. Stacey Gabriel, Senior Director of the Genomics Platform at the Broad Institute, recently stated that “processing using optimized computing power lowered the time and cost of creating a sequence file by ~50%.” Nutanix software helps to automatically identify these bottlenecks in processing and transfer of data to optimize workflows. Furthermore, by incorporating cloud solutions, compute and storage needs can be scaled up and down quickly to handle the specific needs of a sequencing project, maximizing the efficiency of expenditure. This optimized access to essentially infinite resources is ideal for harnessing the power of advanced analytics, machine learning, and artificial intelligence, necessary elements of population-scale studies.

For the data-intensive nature of genome sequencing and its expansive view into the future, having the right infrastructure to handle these data needs is crucial. The data requirements appear poised to only grow in the near term, and these technologies will certainly be involved in designing the pharmaceutical interventions of the future with additional unforeseeable benefits. To learn more about how Nutanix can help your organization, take a complimentary test drive at Test Drive lets you experience the simplicity and agility of public cloud, combined with the performance, security, and control of private cloud via an easy-to-follow guided tour. 

© 2021 Nutanix, Inc.  All rights reserved. Nutanix, the Nutanix logo and all Nutanix product, feature and service names mentioned herein are registered trademarks or trademarks of Nutanix, Inc. in the United States and other countries. Other brand names mentioned herein are for identification purposes only and may be the trademarks of their respective holder(s). This post may contain links to external websites that are not part of Nutanix does not control these sites and disclaims all responsibility for the content or accuracy of any external site. Our decision to link to an external site should not be considered an endorsement of any content on such a site. Certain information contained in this post may relate to or be based on studies, publications, surveys and other data obtained from third-party sources and our own internal estimates and research. While we believe these third-party studies, publications, surveys and other data are reliable as of the date of this post, they have not independently verified, and we make no representation as to the adequacy, fairness, accuracy, or completeness of any information obtained from third-party sources.

This post may contain express and implied forward-looking statements, which are not historical facts and are instead based on our current expectations, estimates and beliefs. The accuracy of such statements involves risks and uncertainties and depends upon future events, including those that may be beyond our control, and actual results may differ materially and adversely from those anticipated or implied by such statements. Any forward-looking statements included herein speak only as of the date hereof and, except as required by law, we assume no obligation to update or otherwise revise any of such forward-looking statements to reflect subsequent events or circumstances.