实例探究 > Penske Media Corporation (PMC) Builds the Infrastructure They Need to Pursue Machine Learning and Data Science

Penske Media Corporation (PMC) Builds the Infrastructure They Need to Pursue Machine Learning and Data Science

公司规模

Large Corporate

地区

America

国家

United States

产品

Google BigQuery
Apache Airflow
Astronomer
Google Analytics

技术栈

Google BigQuery
Apache Airflow
Astronomer
Google Analytics

实施规模

Enterprise-wide Deployment

影响指标

Customer Satisfaction
Digital Expertise
Productivity Improvements

技术

分析与建模 - 机器学习
分析与建模 - 预测分析
平台即服务 (PaaS) - 数据管理平台

适用行业

Professional Service
Software

适用功能

商业运营
产品研发

服务

云规划/设计/实施服务
数据科学服务
系统集成

关于客户

As a leading digital media and information services company, PMC’s award-winning content attracts a global audience of more than 180 million through brands like Rolling Stones, Variety, IndieWire and many more. When Andy Maguire came to PMC from Google in 2015, he found himself tasked with first building the foundations for PMC’s data infrastructure. A solid framework and approach here is crucial to making it easier to prove the importance of data science in understanding PMC’s user base and content performance.\n\nAndy had a deep understanding of the power of data and a plan to incorporate data and machine learning into the heart of every PMC brand to power things like recommendation engines, content pageview predictions, subscriber affinity modelling and much more. Success, however, required a rich data infrastructure and ecosystem, from the breadth and depth of sources being used to the tools and technologies underlying it all.

登录后查看完整内容

挑战

When Andy joined PMC, the data infrastructure was still quite young. Clickstream data from Google Analytics was flowing into Google BigQuery, but it was not being fully leveraged, enriched and made actionable to the business. Where possible, the decision was to leverage cloud tools and limit the “data ops” aspects of the infrastructure. Before long, a myriad of cron jobs, jobs servers and raw job log files had begun to eat away at the time Andy and his team had to extract insights from the data. “I was frustrated,” says Andy, “that wasn’t what I wanted to to be doing.” A data scientist by trade, he wanted as little long-term overhead as possible when it came to data engineering.\n\nWhen Andy discovered Apache Airflow, which programmatically authors, schedules and monitors workflows, he replaced his cron jobs and began to more efficiently engineer his data as directed acyclic workflows (or DAGs). But even Airflow required quite a bit of management. Andy found his engineering team was still stretched thin as they struggled to handle the robust data infrastructure required to build machine learning technology and run the in-depth analyses needed for insights. “I probably would have had to assign a full-time manager,” Andy explains. “It was too easy to make a change to a helper file and kill all the DAGs. If I broke something, I broke everything. There was no testing framework. It simply wasn’t efficient.”

登录后查看完整内容

解决方案

As Andy and his team looked for alternatives that required less management to operate, they stumbled across Astronomer’s managed Airflow option. It was exactly what they had been looking for—and more.\n\nPMC’s use of Astronomer has also solved a tricky monitoring issue with Airflow. Natively, Airflow tells you if something is failing, but as long as at it’s technically “green” and still processing, it’s impossible to tell if something isn’t quite right but still running successfully. Now, Andy feeds events from each task into BigQuery, where it is passed into an anomaly detection system that detects minute behavioural changes in Airflow.\n\n“Now I’m free to focus on the interesting data science stuff,” Andy says. With increasingly little effort, he gathers the content analytics, pageviews, social media, comments and other stats that drive the recommendation technology and lay the foundation for his progressive social media analysis. Andy also has time to pursue additional goals. “For one thing, we want to get smart with subscriptions by finding out how likely users are to subscribe in the first 60 days of engaging with our content,” he elaborates. This—and more— will be easy to do with the right data, including infrastructure improvements. For example, PMC’s data is currently delivered in batches, but Andy and his team will soon begin streaming data in real-time. This will improve the accuracy of their foundational machine learning and unlock new analytics opportunities.

登录后查看完整内容

运营影响

With Astronomer, PMC implemented the data ecosystem they need to pursue data science and machine learning to drive a truly personalized content experience for every user across their many brands.
Since Astronomer handles the data engineering and ensures the ecosystem is healthy, PMC’s engineers can focus on analytics and data science.
Andy is now free to focus on the interesting data science tasks, gathering content analytics, pageviews, social media, comments, and other stats that drive recommendation technology.
Andy and his team can now pursue additional goals, such as improving subscription models and streaming data in real-time for more accurate machine learning and analytics.

登录后查看完整内容

数量效益