Denodo Technologies > 实例探究 > Curing Advanced Data Ailments Using Data Virtualization to Aid Worldwide War on Cancer

Curing Advanced Data Ailments Using Data Virtualization to Aid Worldwide War on Cancer

Denodo Technologies Logo
公司规模
1,000+
地区
  • America
国家
  • United States
产品
  • Data Virtualization Platform
技术栈
  • XML
  • Oracle
  • MySQL DB
  • FTP
  • CSV
实施规模
  • Enterprise-wide Deployment
影响指标
  • Cost Savings
  • Productivity Improvements
技术
  • 应用基础设施与中间件 - 数据交换与集成
  • 平台即服务 (PaaS) - 数据管理平台
适用行业
  • 医疗保健和医院
  • 生命科学
适用功能
  • 产品研发
  • 质量保证
用例
  • 质量预测分析
服务
  • 数据科学服务
  • 系统集成
关于客户
The National Institutes of Health (NIH) is the nation’s medical research agency and a component of the U.S. Department of Health and Human Services. It includes 27 Institutes and Centers and is the primary federal agency conducting and supporting basic, clinical, and translational medical research. NIH investigates the causes, treatments, and cures for both common and rare diseases. Two of the 27 institutes that make up NIH are The National Cancer Institute (NCI) and National Human Genome Research Institute (NHGRI), which recently joined forces to execute on a project known as The Cancer Genome Atlas (TCGA). The TCGA mission is to catalog the genetic mutations responsible for cancer using genome sequencing and bioinformatics.
挑战
The National Institutes of Health (NIH) faced significant obstacles in reliably and efficiently moving large volumes of cancer genome data from The Cancer Genome Atlas (TCGA) to the International Cancer Genome Consortium (ICGC). This process involved transforming the TCGA data to meet ICGC format requirements and then periodically uploading the data into ICGC servers. The transformation was initially accomplished using PERL scripts, but NIH faced challenges with this process. It was not scalable, had high costs, and was inaccurate due to limited connectivity to data sources leading to redundant copies of data, slower processes and greater chance of errors.
解决方案
The NIH used data virtualization to connect to the different sources of the genome data, apply transformations, produce the final data sets and periodically upload these data sets into the ICGC servers. The connectors within the data virtualization platform provided a normalized view of the patient and donor data stored in XML files, sample test results in Oracle and TCGA-ICGC mapping data in MySQL DB. The transformation process included three important steps: aggregating the patient and test data, converting this data into the ICGC format using the mapping information, and then creating the final output files in CSV format. Lastly, the scheduler within the data virtualization platform executed an FTP process once every quarter and then uploaded the files into the ICGC servers.
运营影响
  • Increased scalability: Include larger genome data sets due to the creation of replicable generic workflows and the platform's advanced performance capabilities.
  • Increased efficiency: Faster development and modification of TCGA - ICGC transformation processes because of the platform's diverse connectivity and publishing capabilities.
  • Increased accuracy: Minimized replication and manual intervention led to the most current versions of data and processes being used to create the output files, leading to greater accuracy in the final data.

Case Study missing?

Start adding your own!

Register with your work email and create a new case study profile for your business.

Add New Record

相关案例.

联系我们

欢迎与我们交流!
* Required
* Required
* Required
* Invalid email address
提交此表单,即表示您同意 IoT ONE 可以与您联系并分享洞察和营销信息。
不,谢谢,我不想收到来自 IoT ONE 的任何营销电子邮件。
提交

感谢您的信息!
我们会很快与你取得联系。