Pachyderm > Case Studies > Top Healthcare Provider Derives Actionable Medical Insights from Terabytes of Clinical Data Using Pachyderm’s Scalable, Data-Driven Machine Learning Pipelines

Top Healthcare Provider Derives Actionable Medical Insights from Terabytes of Clinical Data Using Pachyderm’s Scalable, Data-Driven Machine Learning Pipelines

Pachyderm Logo
Customer Company Size
Large Corporate
Region
  • America
Country
  • United States
Product
  • Pachyderm
Tech Stack
  • Artificial Intelligence
  • Machine Learning
Implementation Scale
  • Enterprise-wide Deployment
Impact Metrics
  • Productivity Improvements
  • Cost Savings
Technology Category
  • Analytics & Modeling - Machine Learning
  • Analytics & Modeling - Big Data Analytics
Applicable Industries
  • Healthcare & Hospitals
Applicable Functions
  • Product Research & Development
Use Cases
  • Predictive Maintenance
  • Remote Asset Management
Services
  • Data Science Services
About The Customer
The customer is one of the top for-profit managed healthcare providers in the U.S., with affiliate plans covering one in eight Americans for medical care. They have a mission to be the most innovative, valuable, and inclusive partner in health benefits. The company has a dedicated AI team who are looking to leverage cutting edge AI to harvest long term insights and make much more detailed health predictions from claims and electronic health record data. The data store is massive, with more than 50 terabytes of data covering the company’s tens of millions of members across the U.S. They are mining this data to determine treatment efficacy based on past outcomes given particular patient characteristics.
The Challenge
One of the top for-profit managed healthcare providers in the U.S., with affiliate plans covering one in eight Americans for medical care, was looking to leverage artificial intelligence (AI) to harvest long-term insights and make much more detailed health predictions from claims and electronic health record data. The data store is massive, with more than 50 terabytes of data covering the company’s tens of millions of members across the U.S. They were mining this data to determine treatment efficacy based on past outcomes given particular patient characteristics. However, getting these potential insights into the hands of healthcare providers was a challenge. It’s one thing to have small scale implementations working in a lab, it’s another to deliver machine learning at scale. When the engineering lead joined the AI team, they had a very complicated data delivery pipeline based on Apache Airflow. While it worked, it wouldn’t scale beyond a single pipeline or container instance at a time.
The Solution
The healthcare provider turned to Pachyderm, a data layer that allows machine learning teams to productionize and scale their ML lifecycle. With Pachyderm’s industry-leading data versioning, pipelines, and lineage, teams gain data-driven automation, petabyte scalability, and end-to-end reproducibility. Pachyderm delivered the parallelism and data handling required to efficiently scale the AI team’s ML processing. Importantly, while the company had millions of patient records, only a small subset were relevant at any given time, and Pachyderm’s incrementality saved significant time, money, and resources by only processing the subset of data that had changed, rather than the entire patient universe. With Pachyderm, the team was able to arbitrarily partition table data to only capture events for a single member – effectively creating individual member objects that encapsulate all the events for a particular member. Pachyderm not only processed these records in parallel, it also automatically processed only those containing new information, increasing both scale and speed while reducing costs.
Operational Impact
  • Shrinks processing and storage requirements 90% by only processing new or changed data
  • Increases scalability and speed by processing individual files in parallel
  • Simplifies reproducibility through data versioning and immutable data lineage
  • Abstracts automation so AI teams only need care about data inputs and outputs
Quantitative Benefit
  • Significant improvement in processing efficiency
  • Reduced costs due to efficient data processing
  • Increased speed of data processing
  • Enhanced scalability of machine learning processing

Case Study missing?

Start adding your own!

Register with your work email and create a new case study profile for your business.

Add New Record

Related Case Studies.

Contact us

Let's talk!
* Required
* Required
* Required
* Invalid email address
By submitting this form, you agree that IoT ONE may contact you with insights and marketing messaging.
No thanks, I don't want to receive any marketing emails from IoT ONE.
Submit

Thank you for your message!
We will contact you soon.