Published on 01/10/2017 | Technology
The interest in leveraging big data, analytics and Moneyball in HR and recruiting is gaining significant steam.
Ever since my first article on the subject back in 2011, I’ve set up Google Alerts and Hootsuite streams set up to catch any mention of big data, analytics and/or Moneyball in conjunction with HR, sourcing or recruiting, and the volume of activity is bordering on surprisingly massive and overwhelming, and I’m not the only person to notice this.
Yes, it does seem like everyone is talking about big data in HR.
In 2012, “big data” was mentioned in 2.2M tweets by 980,000+ authors, at a peak rate of 3,070 times per hour!
However, as is often the case with relatively new and nebulous concepts, there is quite a bit of confusion surrounding big data and Moneyball and how they can be applied to HR and recruiting, as evidenced by the obviously incorrect usage of the terms in many cases. It’s also nearly impossible to stay on top of all of the content being generated on the subject (although I am trying my best!).
This is precisely why I’m going to take the opportunity to clear up any confusion by concisely explaining the concepts of big data, analytics, and Moneyball as it relates to HR and recruiting, as well as illustrate some obviously incorrect references to these concepts in recent articles, including those from the Wall Street Journal, Forbes, The Economist, The New York Times, and more.
Many articles use the term “big data” when they are really referring to analytics and data-based decision making.
For example, the recent New York Times article Big Data, Trying to Build Better Workers, The Wall Street Journal’s How Big Data Is Changing the Whole Equation for Business, and Forbes’ Big Data in Human Resources: Talent Analytics Comes of Age all use “Big Data” in their titles, but the data they refer to do not meet the criteria for “big data” (definitively defined in a moment). The same can be said for the very popular Robot Recruiters article in The Economist.
Although inaccurate with regard to referencing “big data,” what each of those articles does do a good job of is provide excellent and interesting examples of leveraging analytics for HR/recruiting.
For example, Forbes’ piece features a company who ran a statistical analysis of sales productivity and turnover against a variety of demographic factors to develop a new sourcing and screening process to identify and hire people based on factors that data proved to be highly correlated with success, increasing sales performance by $4M in 6 months!
Analytics refers to the discovery and communication of meaningful patterns in data, which can be achieved with any data set, “big” or small.
Using analytics in human resources, such as developing correlations between employee performance, retention, demographic and assessment data to make data-based decisions is certainly a best practice, but making data-based decisions doesn’t have anything to do with “big data” unless the data being analyzed meets certain criteria.
Let’s explore the criteria required for data to be classified as “big data.”
You may think you have a good grasp of the concept of big data, but if the misuse of the term by the writers for the New York Times, Forbes and many other respected publications is any indication, there is still quite a bit of confusion as to exactly what “big data” is and what it is not.
To clear up any confusion, I’ll guide you through the concept of big data, referencing industry pioneers and experts, as well as providing practical examples.
Wikipedia claims that “Big data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time,” and that “Big data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set.” Other sources attempting to define big data include “the tools, processes and procedures allowing an organization to create, manipulate, and manage very large data sets…”
Regardless of how big data is defined or measured (terabytes, exabytes, etc.), the big data concept centers around relatively large amounts of data that are not only increasing in volume, but also in velocity and variety. Many big data pioneers and experts such as IBM agree that for something to be defined as “big data,” it should adhere to the “3 V’s:” Volume, Velocity, and Variety.
When it comes to volume, the line seems to be drawn as low as the level of 100’s of gigabytes, but more often at the multi-terabyte+ level.
The variety aspect of big data refers to the mix of data types and sources (e.g., tracking sensors on employees) and varying degrees of structure, from structured to completely unstructured (free text in the form of social network updates, recommendations, awards, endorsements, blog posts, comments, press releases, announcements, etc.).
The “velocity” of data is the speed at which new data is generated. Social media provides and excellent example of high data velocity, with Twitter serving as the poster child, with over 400,000,000 tweets/day (that’s 2.8 billion updates every week!).
It really isn’t the volume of data that poses the processing challenge and requires the specific technologies that Gartner suggests – companies have been processing large volumes of data for over 10 years now, with massive data warehouses (50-100TB+) powering business intelligence, reporting, and analytics.
As you might imagine, it’s really the variety and velocity aspects of big data that necessitate the use of specialized information processing solutions. and more specifically unstructured data that poses the technology challenge.
So, to recap, no matter how large the data set being used to power analytics and to develop insights, if it doesn’t fit the 3 V criteria, it isn’t “big data” – the velocity and variety factors must also be present.
Now that we’ve established that understanding, get ready for this revelation: regardless of volume, velocity and variety, the value of big data isn’t the data!
Using technology to crunch big data is great, but the latent power of data lies in the ability to draw actionable insights (aka – analytics).
Check out this interesting presentation from Daniel Tunkelang, who currently serves as the Head of Query Understanding at LinkedIn (how’s that for a cool job?). He observes that data scientists worry about volume, variety and velocity, but the real bottleneck isn’t technology or computational – it’s cognitive!
In other words, having a team of talented data scientists using all of the right technology isn’t enough to develop a competitive advantage.
Domain experts are necessary when building teams to develop big data insights and drive data-based decision making. When it comes to human resources and workforce science, the domain experts are HR professionals, sourcers, recruiters, and hiring managers – these are the people who should be able to ask the right questions that the data scientists can develop answers/solutions for.
I’ve marked up an excellent big data / big analytics graphic from Karmasphere to show examples of the types of relevant human capital data (structured and unstructured) that can be leveraged by various domain/functional experts – sourcers, recruiters, HRIS analysts, and hiring managers (refer to bottom picture).
Additionally, this recent big data article in the Financial Times recognizes that the challenge we face when it comes to leveraging big data isn’t technology, it’s skills.
While there are many articles and experts that recognize the current and looming significant future talent shortage of analytics/data science professionals, I don’t agree with the idea that data scientists need or should be “all-in-one” technical, functional, and business experts – it is highly unlikely that all of these skills will be present in any one individual.
In fact, I think it’s ridiculous to expect data scientists to be able to ask all of the right questions, and the HBR article comment by the Fortune 500 data scientist with a Ph.D in machine learning confirms it.
Data scientists should be leveraged to crunch, analyze and present data to enable teams to derive sensible answers and possible solutions/courses of action based on questions developed by the functional/domain/business experts.
In software development, most mature software engineering departments don’t expect their software engineers to be “all-in-one” technical, functional, and business experts. Instead, software engineers rely on business analysts who serve as functional experts to solicit requirements from domain experts and users and translate them into technical specs and design documents that they pass on to software engineers to develop the solution.
Similarly, I believe the ideal talent analytics team is a mix of data scientists, human capital data analysts (sourcers), and domain experts (hiring managers, HR and recruiters). In fact, it could be argued that hiring data scientists with little-to-no specific business domain experience might actually be more effective – the story behind Moneyball actually supports that argument (Paul DePodesta graduated from Harvard with a degree in economics).
The full article can be found on Glen Cathey's blog.