Google Cloud Platform > Case Studies > ETH Zurich: Deciphering life with the largest-ever DNA search engine

ETH Zurich: Deciphering life with the largest-ever DNA search engine

Google Cloud Platform Logo
Technology Category
  • Analytics & Modeling - Machine Learning
  • Infrastructure as a Service (IaaS) - Cloud Computing
Applicable Industries
  • Education
  • Life Sciences
Applicable Functions
  • Procurement
  • Product Research & Development
Use Cases
  • Construction Management
  • Infrastructure Inspection
Services
  • Cloud Planning, Design & Implementation Services
  • Data Science Services
About The Customer

ETH Zurich is a leading research institution that aims to find solutions for the defining challenges of our time, while cultivating a team of innovative and critical researchers. Its Biomedical Informatics (BMI) Group combines medicine and biology with computer science to model and make sense of molecular processes and diseases and contribute to improving treatment options together with medical collaborators. The BMI Group is working on creating the world's largest-ever DNA search index by processing 4 petabytes of sequencing data. The goal is to make the world's genetic code more accessible for medical and scientific research. The team is combining machine learning, health informatics, and bioinformatics with clinical data science, bridging medicine and biology with computer science to streamline the analysis of large genomic and medical datasets.

The Challenge

ETH Zurich's Biomedical Informatics (BMI) Group is working on creating the world's largest-ever DNA search index by processing 4 petabytes of sequencing data. The goal is to make the world's genetic code more accessible for medical and scientific research. However, the team faced significant challenges in terms of data accessibility and processing. Despite having access to a vast amount of information in the National Center for Biotechnology Information (NCBI) repository, existing methods did not allow for the most effective use of these datasets. The team's ambitions were curtailed by their other major obstacle: efficient accessibility. Before the switch to Google Cloud, the BMI Group had to limit its operations to smaller sequencing datasets of several terabytes in size, just to keep download and processing times manageable.

The Solution

The solution came in the form of Google Cloud, which allowed the researchers to bring the algorithms to the data, instead of the other way around. The BMI Group uses Cloud Storage to store sequencing information and Compute Engine VM instances to process the data. The availability of this data in Google Cloud was a game changer, removing bottlenecks while fast-tracking data processing. The elasticity of cloud computing allowed for optimal parallelization of compute power, increasing the throughput. The team also built a custom server infrastructure, with one central server node distributing worker jobs across the available instances. This checkpointing feature adds resilience to the group’s operations, minimizing the risk of losing progress due to technical failures or errors. To lower the overall compute cost, the ETH team used Compute Engine Preemptible VMs, which allow any compute node to be reclaimed by the provider for other duties at any time.

Operational Impact
  • The switch to Google Cloud has significantly increased the efficiency of the BMI Group's operations. The team no longer has to limit its operations to smaller sequencing datasets and can now process petabytes of data in a feasible time frame. The elasticity of cloud computing has allowed for optimal parallelization of compute power, increasing the throughput. The team has also built a custom server infrastructure, adding resilience to their operations and minimizing the risk of losing progress due to technical failures or errors. The cost-effective dynamism of Google Cloud has expanded the scope of future projects for the BMI Group, allowing them to readjust the setup to their needs and creating new opportunities. The success of this project could transform the field of bioinformatics, changing the way we engage with DNA.

Quantitative Benefit
  • The team is now more than 10 times faster in processing data.

  • At peak times, the team is using 4,000 CPUs and 15 terabytes of RAM to process all this data.

  • The ETH team cut the overall compute cost by 75%.

Case Study missing?

Start adding your own!

Register with your work email and create a new case study profile for your business.

Add New Record

Related Case Studies.

Contact us

Let's talk!
* Required
* Required
* Required
* Invalid email address
By submitting this form, you agree that IoT ONE may contact you with insights and marketing messaging.
No thanks, I don't want to receive any marketing emails from IoT ONE.
Submit

Thank you for your message!
We will contact you soon.