DataDirect Networks > Case Studies > ACCELERATE: ACADEMIC RESEARCH - Researching the Genetic Basis of Behavior, Cognition and Aff ect, USC Needed a High Performance, Scalable Infrastructure to Support Next-Gen Genomics Sequencing

ACCELERATE: ACADEMIC RESEARCH - Researching the Genetic Basis of Behavior, Cognition and Aff ect, USC Needed a High Performance, Scalable Infrastructure to Support Next-Gen Genomics Sequencing

DataDirect Networks Logo
Company Size
1,000+
Region
  • America
Country
  • United States
Product
  • GRIDScaler® File Storage Appliance
  • Illumina® HiSeq2000 instruments
Tech Stack
  • GRIDScaler parallel file system
  • SFA10K-E
  • Burrows-Wheeler Aligner (BWA)
Implementation Scale
  • Enterprise-wide Deployment
Impact Metrics
  • Innovation Output
  • Productivity Improvements
Technology Category
  • Analytics & Modeling - Big Data Analytics
  • Infrastructure as a Service (IaaS) - Cloud Storage Services
Applicable Industries
  • Healthcare & Hospitals
  • Life Sciences
Applicable Functions
  • Product Research & Development
Use Cases
  • Predictive Maintenance
  • Process Control & Optimization
Services
  • Cloud Planning, Design & Implementation Services
  • System Integration
About The Customer
The Zilkha Neurogenetic Institute is an integral part of a broader USC neuroscience initiative, promoting collaboration between researchers from diverse disciplines. It was designed to foster interaction among the best and brightest. Scientists at the Institute reach across boundaries to embrace methods and techniques from other fields of study, identifying new approaches to examine nervous system function, so we may all better understand the underlying causes of neurological and psychiatric disorders. The Laboratory of Dr. James Knowles is interested in understanding the genetic basis of behavior, cognition and affect. At present, most of the lab's efforts are directed to understanding the transcriptional program of brain development and the genetics of schizophrenia, bipolar disorder and obsessive-compulsive disorder. The lab is leveraging high throughput sequencing technology to look for genetic factors that have an etiological role in psychiatric illness. With this knowledge, they aim to improve diagnostic methods and possibly develop therapies to improve the quality of life for that population.
The Challenge
The Laboratory of Dr. James Knowles at the Zilkha Neurogenetic Institute, Keck School of Medicine at the University of Southern California (USC) was facing a significant challenge. The lab, which is focused on understanding the genetic basis of behavior, cognition, and affect, was struggling with a legacy SAN storage server that was nearing capacity and could not keep up with data access requirements. The storage throughput was hobbled by the network and by the performance limitations of NFS. The storage bottleneck caused by slow uploads was delaying time to discovery. The lab needed a new storage solution that could serve in excess of Gigabyte per second throughput and scale to petabytes in a single name space. The Knowles Lab had a data storage performance problem. They needed to sequence 1,400 full human genomes to support their ongoing studies. This work would generate several terabytes of raw data per day that needed to be transferred, inspected, and aligned to the human genome. Their legacy storage system could only output enough data to the CPU cluster to run a single instance of their Burrows-Wheeler Aligner (BWA) under the Pegasus MPI workflow. Furthermore, they could only upload data to that system at 30-50 MB/second, nowhere near the 100MB/second peak theoretical capacity of the GbE network. This bottleneck was not only an inconvenience, but it was slowing their time to discovery.
The Solution
The Knowles lab and USC’s HPCC team worked with DDN to identify a high-performance, scalable, cost-effective solution. USC selected a solution based on DDN’s Storage Fusion Architecture®, running an embedded image of DDN GRIDScaler parallel file system on the SFA10K-E. This solution appeared to meet both parties’ needs, supporting a high performance parallel file system, and NFS, simultaneously. Additional research groups with large amounts of data at the Keck School of Medicine learned about the impending storage deployment. They added their resources to the purchase of GRIDScaler, which then doubled the raw system capacity to over 1PB. GRIDScaler is a large, high-performance storage appliance with a shared architecture - capable of delivering in excess of 800 million IOPS continuously to the USC HPCC cluster. This provided a clear advantage over the use of smaller, lower performance units, which would also entail more HPCC management. The storage array is connected to the computer resources via multiple 10Gb-E. There’s a separate 10Gb-E connection to the head node that runs an image of the GRIDScaler client software and acts as an NFS server. The caching server in the Knowles Lab provides the long-haul connection to the head node in the HPCC and acts as a data transfer gateway for the Windows/Linux terminals and instruments located in the Knowles Lab.
Operational Impact
  • The solution simultaneously supports a high performance parallel file system and NFS.
  • GRIDScaler is a large, high-performance storage appliance with a shared architecture- capable of delivering in excess of 800 million IOPS continuously to the USC HPCC cluster.
  • GRIDScaler provides performance, economy and scale.
  • Every member of this group of diverse medical researchers has the use of high-performance storage, central management and a clear path to increase their data capacity.
Quantitative Benefit
  • The new schedule will generate several terabytes of raw data per day that needs to be transferred, inspected and aligned to the human genome.
  • The storage array is connected to the computer resources via multiple 10Gb-E.
  • GRIDScaler is a large, high-performance storage appliance with a shared architecture - capable of delivering in excess of 800 million IOPS continuously to the USC HPCC cluster.

Case Study missing?

Start adding your own!

Register with your work email and create a new case study profile for your business.

Add New Record

Related Case Studies.

Contact us

Let's talk!
* Required
* Required
* Required
* Invalid email address
By submitting this form, you agree that IoT ONE may contact you with insights and marketing messaging.
No thanks, I don't want to receive any marketing emails from IoT ONE.
Submit

Thank you for your message!
We will contact you soon.