ClickHouse > Case Studies > DeepL’s Transformation Journey with ClickHouse: A Case Study

DeepL’s Transformation Journey with ClickHouse: A Case Study

ClickHouse Logo
Technology Category
  • Application Infrastructure & Middleware - Middleware, SDKs & Libraries
  • Platform as a Service (PaaS) - Application Development Platforms
Applicable Industries
  • Buildings
  • E-Commerce
Applicable Functions
  • Maintenance
  • Product Research & Development
Use Cases
  • Building Automation & Control
  • Experimentation Automation
Services
  • System Integration
  • Training
About The Customer
DeepL is a language translation service that uses artificial intelligence to provide translations that are more accurate and natural-sounding than other services. The company was founded in 2017 and is based in Cologne, Germany. DeepL supports translation between several languages, including English, German, French, Spanish, Italian, Dutch, and Polish. The company is committed to protecting user privacy and does not share personal data with third parties. DeepL's mission is to break down language barriers and bring cultures closer together. The company's services are used by millions of people worldwide, including individuals, businesses, and organizations.
The Challenge
DeepL, a language translation service, was looking to enhance its analytics capabilities in a privacy-friendly manner in 2020. The company wanted to self-host a solution that could handle large amounts of data and provide quick query times. They evaluated several options, including the Hadoop world, but found it too maintenance-intensive and time-consuming to set up. DeepL also wanted to automate the process of changing table schemas when frontend developers created new events, which would have otherwise overwhelmed the team. The company needed a system that could handle complex events and queries to understand user interactions, something that traditional tools like Google Analytics couldn't provide. Additionally, DeepL wanted to maintain full control over the data while keeping user privacy in mind.
The Solution
DeepL chose ClickHouse as their central data warehouse due to its single binary deployment from an apt-repository, which made it easy and quick to set up a Minimum Viable Product (MVP). The MVP consisted of an API where the user’s browser would send events to, Kafka as a message broker, a sink writing from Kafka to ClickHouse, ClickHouse itself, and Metabase to visualize the results. The company heavily invested in automation and decided to have a combined source of truth for all events and the table schema. When frontend developers wanted to create a new event, they would need to define this event in protobuf. This protobuf schema file was used for three purposes: validating events, computing ClickHouse table schemas, and creating documentation about all events. Over time, DeepL expanded from a single node setup to a cluster of 3 shards with 3 replicas, ingesting about half a billion raw rows per day. ClickHouse also played a crucial role in DeepL's experimentation framework and ML-Infrastructure of Personalization.
Operational Impact
  • The implementation of ClickHouse has brought about significant operational benefits for DeepL. The system has enabled the company to create complex events and queries that provide a deeper understanding of how users interact with their services. This is something that traditional tools like Google Analytics couldn't achieve. The automation of changing table schemas has saved the team from getting overwhelmed with toil. The protobuf schema file has reduced errors and saved the team time to focus on important things. ClickHouse has also played a crucial role in DeepL's experimentation framework, enabling the company to rapidly iterate on frontend or algorithmic backend changes. This has contributed towards a cultural shift within the company. Furthermore, ClickHouse has been instrumental in DeepL's ML-Infrastructure of Personalization, providing excellent performance even when reading tens of millions of rows.
Quantitative Benefit
  • DeepL was able to set up an MVP in just a few weeks, proving that the system could easily handle the amount of data they were dealing with and that query times were excellent.
  • DeepL expanded from a single node setup to a cluster of 3 shards with 3 replicas after 16 months of usage.
  • Currently, DeepL's setup ingests about half a billion raw rows per day.

Case Study missing?

Start adding your own!

Register with your work email and create a new case study profile for your business.

Add New Record

Related Case Studies.

Contact us

Let's talk!
* Required
* Required
* Required
* Invalid email address
By submitting this form, you agree that IoT ONE may contact you with insights and marketing messaging.
No thanks, I don't want to receive any marketing emails from IoT ONE.
Submit

Thank you for your message!
We will contact you soon.