In this episode, we discuss the democratisation of data analytics and trends in the future of data analytics related to 5G, big data and the breaking down of data silos.
Rick Hall is a software entrepreneur focused on the analytics market. He has led the development of over a dozen software products and taken several companies from the early stage to an eventual sale. He has been working in analytics and software for 30 years and has been apart of the evolution of several generations of technology and practices. Currently, he is the CEO at Aginity Corporation--Supercharge your SQL Experience with Analytics Management. Aginity Corporation is the only next-generation analytics management toolset designed specifically to empower analytic teams to take advantage of the top analytic platforms.
Transcript.
Erik: Welcome to the Industrial IoT Spotlight, your number one spot for insight from industrial IoT thought leaders who are transforming businesses today with your host, Erik Walenza.
Welcome back to the Industrial IoT Spotlight podcast. I'm your host, Erik Walenza, CEO of IoT ONE. And our guest today is Rick Hall, CEO of Aginity. Aginity is reinventing the data analytics engine room to simplify ingesting, cleansing, and reusing data. In this talk, we discussed the democratization of data analytics as non-IT professionals gain easy access to data analytics tools. We also explored trends in the future of data analytics related to 5G, big data, and the breaking down of data silos.
If you find these conversations valuable, please leave us a comment and a five-star review. And if you'd like to share your company's story or recommend a speaker, please email us at team@IoTone.com. Thank you. Rick, thank you so much for joining us today.
Rick: Erik, really appreciate you. I have been excited to talk to you and your audience.
Erik: So, Rick, this topic today is critical; this topic of analytics is really at the heart of all value creation around IoT systems. But before we get into Aginity, and the topic at hand, I'd like to learn a little bit more about you. You set up your first company back in 2001?
Rick: I've been in the industry a long time, and I started doing what is becoming analytics, out of college and started in a consulting world building database systems for telcos and financial services. And 2001, I founded a company called G4 Analytics, and our objective was to productize what we're doing. And of course, we started actually on September 10th, 2001, so you can imagine what happened thereafter, was interesting time in business.
But our focus was broadly on industrializing and productizing analytics and introducing them into business process. We ended up pretty focused on consumer goods and retail, and we did a bunch of work, and we build these models that help predict retail sales, and ultimately sold that company to Nielsen in 2012. And I went on and took on a CTO role at big retail services company, and then I formed a new company Kairn Corporation, and bought Aginity where I am today.
Erik: And then you're also an Advisor of a company called AdvisoryCloud, but this is also an analytics-related company, or since its around seeking business advice?
Rick: I feel like I've been really fortunate, I had some great mentors on the way. So, AdvisoryCloud is really just a broad based platform where people who are seeking experts can seek them. So I try to mentor mostly startup people where I can. So, if people need my help or think I'm going to be valuable to them, they contact me and I give them as much advice as I can.
Erik: Let's have a look at Aginity. So the front line on your homepage is “Reinventing the analytics engine room”. But maybe we can start with the value proposition. So we're talking about analytics, I think everybody knows what analytics is, but not everybody knows what's under the hood there. As a company, where do you differentiate in the analytics space?
Rick: When I say engine room, it's the whole process of getting data, acquiring it, cleaning it, integrating it, and creating some basic calculation. If you talk to somebody who's doing data science, big models, they almost always start with, and first you get the data, and then they go on to talk about modeling. But that process of getting the data and putting it together in a clean, integrated, cohesive way is not easy.
So we have an application that's designed to support analytic engineers, people who work on analytic systems and business analysts in a collaborative way to build the core hard work that goes into analytics to pull the data together. So we think that it's five steps of acquire the data, cleanse the data, integrate the data, calculate the data, and provision the data. And that journey is what we're supporting for both engineers who that's their full time job and business analysts who need to do it because they're trying to perform some other job for which the analysis is key. So we have a free product and a couple different levels of paid products to help people do that.
Erik: And is it pure product? So you're providing the tools to enable this, or is there any service built on top of that that they're also offering?
Rick: We’re pure play software companies SaaS structure subscription model. We sometimes help our customers. There's a little bit of service there, but our business is product. And that's really where our revenue comes from.
Erik: And who are you working with? Are you still in the CPG consumer space, I guess this is a fairly horizontal offering, it certainly a general public?
Rick: Yeah, it really is. So we have a really broad base of clients. So, healthcare, financial services, insurance, retail, consumer goods, and probably a little bit everything else. And we're in 55 countries. We have customers all over the world using our technology, and in a whole variety of industries. So, IoT is one of the many things creating this explosion of data. And people want to use that data to optimize their business processes in almost every type of organization. And they all have this fundamental problem of how do I get the data, clean it, use it.
Erik: Could you maybe break down what the different steps here? But let's say we're starting from a sensor, and then we're moving to process data that's ready for somebody to show to an executive, what would be the flow there?
Rick: Yeah. So, you have an IoT sensor, it's producing data, you got to do something to that data. So maybe you have that attached to some stream that's going to pull that into a process. Now, it may be that you're accumulating that data for some analytic or maybe you're processing in real time. But in either case, you have to figure out what the structure of that data is, and ingested into that platform so you can use it.
And oftentimes there's things about the data that you need to address from data quality standpoint. So are some devices out at certain hours, so there's a bias in your data that you need to address, and that cleansing process is about making sure that the data you have represents the problem or the actual world that you're trying to understand. So I think in the IoT space, a lot of it is like, am I getting all the data from all the right devices at the right time?
Oftentimes, that data is useful by itself. But it's oftentimes, that data in combination with other data is what helps you make a decision. So, maybe the sensor is telling you could be temperature, obviously, millions of different things, but maybe there's some other thing happening in the business that you're trying to correlate that with so you have to bring that data together.
What our tools do is they make it very simple to ingest the data. So we can look at data in a variety of different formats and ingested into really any one of the big analytic platforms. There's a bunch of them out there for the big providers. Amazon has their process and app databases, like Redshift and snowflake. And Microsoft has synapse. And we sit on top of all these platforms and we make it easy to ingest data in any format. We have simple cleansing routines to help you address these problems.
And then on the integration step, we have an ability to map so you can link it together. So maybe it has to be linked on a date and a time that things are relevant, or maybe there's a code for the device and you want to tie that device code to some product file, where it's got deeper data in it about what the device is. And so the ability to map those datasets together and link them is what we think of as the integration step.
And then a lot of times your analysis needs a calculation. So maybe, if your device it's doing temperature well, maybe you're really interested in what the variations in temperature are away from the norm. So you need some process to calculate the norm, and the variation from the norm. And that calculation, in a lot of cases, it's a simple SQL operation SQL is a language these data platform speak.
And then you're going to feed that into some analytic process. So maybe you're modeling the relationship between temperature or temperature variation and some other event. So, you've done the base calculation of the variation and now you're trying to model the impact in the correlation and you're going to do that in some data science platform. So we make the process of pulling all that together easy. And what we do which is really I think, interesting at this point in time, is we make it easy for both the classic data engineer and a business person. So, almost everybody wants to do some amount of analytics today, and not everybody's got a data science degree or engineering degree. So we make tools that are very self-service for business teams to use directly.
Erik: I was talking yesterday with a very large chemical company. As is often the case, they have whatever, 30,000 employees, and then they have their five data scientists and a couple back in UI UX folks, and that's kind of the team there. And they were basically saying we're actually pretty satisfied with our ability to develop machine learning algorithms and so forth, maybe five people there is actually enough for their internal needs.
But the challenge that they were facing is just getting clean data, and so they were looking for a cost effective solution to actually just process the data. And they were looking for a human solution there, so just an army of people that are sufficiently well-trained that they can do this task. But how would you fit in there? So you're not providing the people, but are you completely automating this process, or are you making this process more structured and standardized so that people can process the data or clean the data more efficiently?
Rick: Yeah, so we're making tools for those people, that are going to do the processing or the cleansing in that case, and do it without being technical engineers. So we often see the case, it's a little bit similar to that customer. There's some central group that has a bunch of engineering and deep data science, and they can handle the biggest core problems of the enterprise. But there's hundreds or sometimes thousands of people across an organization who have their own variation on the problem.
And when I talk to Chief Analytics, or Chief Data Officers, they're like, look, I have X number of engineers, and sometimes it'd be like I have thousand engineers, and really big company. But there are 30,000 people trying to do the work. And they all have some variation than I can't possibly do all the engineering for all these people, and they're not super technical. So our tooling, supports those five steps and it makes it easy for engineers and business people to collaborate. So the engineers can do really hard stuff. They can set up things for the business people and the business people are given self-service tooling that helps them in the process.
Erik: This is correct that where Aginity the scope of your business ends is at the point where somebody would build a machine learning algorithm that would be that built on one of the partner platforms that you're working with? Is that correct or is that also integrated into your core product?
Rick: It used to be. I think, in a lot of cases that the actual data science work is in these very specialized data science platforms and they're using these notebook technologies. And so in some cases, we're doing everything up to providing the data into those platforms. So there's a data scientist, there's a data science platform, and he needs clean, integrated, basically calculated data set. And they're going to do that in this big data platform like Redshift, or Snowflake or one of these others. And we provide the tooling that manages the process through that big data platform.
So we're not the data platform. We're the tools for the engineer and analysts that help them use the platform and sit on top of it, and provide the workflow and the interface. And what's interesting now though, is that these data platforms like Snowflake, and Redshift, and Synapse, all of them are starting to build the capabilities of data science into their platforms. And as they do so, more of that work will move from a standalone platform back into the big data store, in which case, our tooling would then help manage that process.
But we specifically sit on top of these super scalable data platforms that are now really emerging to be the backbone of analytics. So I've mentioned a couple of them. So, all the big cloud providers have them. So, Amazon, Microsoft, Google, IBM, and of course, others have these data platforms. They have advanced hugely in the past few years in terms of what they're capable of. And we provide tools for the end user that helps them manage those platforms.
Erik: So we have a new type of data set. I mean, this data has existed for decades, but it's often been more or less locked on the edge. From your perspective, how does ingesting and processing IoT data differ from, for example, ingesting and processing financial data?
Rick: Yeah. So what is it sides, right? So, IoT data is big data, because these devices oftentimes are generating data all the time, every second, or microsecond, or every time there's an action in their device. And there are a lot of companies live installations have lots and lots of these devices. So the scale of data coming from the IoT world is generally big.
And then the second thing is that it's oftentimes real time or very near real time. So, the devices acting in real time and they want to be able to process that in real time. So that's also a big challenge. So those two challenges are probably a lot of what has limited and left that device on the edge. But now these platforms have stream processing which lets them process the data in real time are very close to real time. And the scale of the data they can handle is much bigger.
So I think that the size of a typical IoT record it’s not a big complicated piece of data. And it's coming quickly. And up until what's happened in the cloud in the past, this current generation of technology is just like, it was too much to handle in a lot of cases. Now, we can handle that data and you can handle it in the clouds really well. I think that's the best place to handle big data because you can scale really quickly and you can drop the scale back down if you're not using it.
So maybe you have a set of devices that run in certain time of day really a big [inaudible 18:34] scale it down when their devices are quiet. So that elasticity is a really big benefit of these big data worlds of they're having. So, lot more data coming from those devices. We've seen that almost every process can be optimized when you actually can look at what's happening in the process.
And IoT has been created a revolution in our ability to actually see what's happening in the process. So you can see everything that happened. You certainly couldn't do it and process all that data. Now we can, as these platforms are getting bigger and scale, now we can start to apply analytics to it. And you can start to think about optimizing and using intelligence to manage the processes that those devices support.
Erik: And then we have this ongoing discussion in IoT around the role of cloud computing and the role of edge computing. And then with 5G, this one line of argumentation is that 5G will enable more data to make its way to the cloud because it's an additional bandwidth and greater latency and so forth. So makes it actually easier to do real time from the cloud.
On the other hand, you have companies that are building edge servers that are connected to a 5G hub somewhere or 5G base station so that you can have a fairly powerful edge computing center connected to a local 5G base station that maybe services a factory or a neighborhood of a city or something like this. So you have these two different dynamics. And I mean, it feels like the consensus is basically the both of these are going to be quite important. But from the perspective of Aginity, are you primarily focused on moving data to the cloud and repairing it there? Do you have also a foot in this edge computing world?
Rick: Yeah. We don't have a big foot in the edge computing world. We do work with some really small data platforms that you can put on devices. But most of our work is really working on the big data platforms themselves, which are more in the cloud. And frankly, over time, perhaps as these platforms start to have good edge data calculation platforms, we’ll support that.
I think that both are really important. Edge computing is really important where you could do it. So, I would say there's a general principle as if you don't have to move the data, then that's a good thing. But the tradeoff is what do you have to bring to that data to make it valuable at the edge? So there's not one answer there, which I think is kind of what you're saying about is it going to be cloud, is it going to be edge? It's probably going to be yes, both.
Erik: The topic of 5G, I mean, there's a lot of hype around that. And so I'm asking everybody I talked to what do you see as the practical implications for you? Is this just the next step change in terms of the volume of data that's going there? Is there any more fundamental or dramatic change that you expect maybe three years in the future once 5G is widely deployed? How do you see this impact in your business?
Rick: I think it makes location less and less important. Because if the bandwidth available, whether you're remote, or you're out moving around, or you're in a physical spot, with 5G the world becomes more connected to you with higher bandwidth. So I think just kind of broadly, it makes location less important when it comes to processing data. It makes it so you can’t move the data, or you can take your process to the data, and you have a lot more flexibility associated with that.
I think that the implications of that are we have to worry less about remote data that we can't get to because bandwidth will allow us to get to [inaudible 22:55] suspect. In the IoT world, that's huge because it makes it easier to just get the data from these things that are not like sitting in a business center somewhere that are out on a farm or out in the woods somewhere 5G makes them feel closer.
Erik: Let's discuss a little bit more this role of democratizing the data because I think this is for IoT really actually quite critical. Because you have basically, and a lot of these companies thousands of mechanical, electrical engineers, so people that have some level of technical capability and they have really concrete problems to solve, and more and more of those problems can be solved to some extent with some type of analytics tool, but they, of course, don't have a deep training there.
So where are we today in allowing those people to be able to process? I know, this is not your core competency, but to be on the algorithm development side as well to be able to actually develop those algorithms and deploy them without getting into queue at it or R&D, and then waiting for six months?
Rick: So I think that's a hugely important trend, first of all. And so I think of us as being in the early stages of the third phase of analytics, and let's call that democratized analytics. And if the first phase was early computing and analytics was no different than anything else, the second phase you might call centralized data warehousing and analytics grew up as being this function of the central team and it’s a priesthood of engineers that the analytics and tons of value is delivered by those big centralized systems, but they were in the big processes of a company.
So your financial and billing processor, your supply chain, they were in every little problem everywhere that some engineer has because he's in charge of 100 devices. And he probably or she got left out by IT in the past and left on their own, trying to manipulate the data in some local system and not very easily.
So I think now, a couple things will happen. One is the explosion of data is more important because we want to address that engineer. In the second phase, just getting the big processes to use analytics was a huge win for the business. And we really weren't trying to focus on every other little process; right now we are. So we want to support all these people. There is more data out there driven by IoT and a lot of other things.
So they're important, but they're not software engineers. In the IoT world, they might be electrical engineers. But in the business world, and probably even dealing with a lot of IoT data, they're not even engineers at all: they're trying to actually support the problem that the IoT devices is addressing. So first of all, the analytic platforms can process a lot more data. They don't require these predefined architectures that have a certain scale. They can scale up very easily and down, so you have a lot more flexibility to support different problems.
So giving the engineer a tool or the non-software engineer a tool that addresses their journey is makes it easier for them to perform analytics and all the steps including the analytic model itself, you have people like us who are generalizing the process. So, the five processes that I articulated, it just cleanse, integrate, calculate, provision, or analyze. It turns out every analytic problem has those five steps in it.
So, what we've been able to do and I think what is also happening with the data science algorithms themselves is to say, look, okay, I can understand this problem, now I can build an easy interface that allows somebody who's not technical to actually ingesting the data. So with a tool like ours, we give you a little wizard to point at the data and figure out the format, indicate if you want to change the data type, etc. Without those kinds of tools, you have to write some program to do it. And if you're back in the centralized data warehouse, your example, you're waiting for it to do it for months.
In a lot of cases, you're standing in the queue because there's thousands other people like you also the queue, and there's 100 people over there in the analytic department, you can't wait long enough. So making it easy, each of those five steps, generalizing them, and making tools that address the use case in a very wizard-driven approach is what we're doing, and making those easy to integrate with the software programming stuff, so you don't have to do it. Some stuff is going to be done wizard, some stuff is still going to be programmed. It's going to be both. And we're tying those things together.
So, democratizing analytics is I think, that's the phase we're in. I think as you and I look back at the world in 20 years, I think that we're all going to be better off as analytics gets to be everywhere; it's there in an empowered way. So I don't think we see the analytics taking over. We're not worried about the robot revolution We see analytics everywhere is empowering people, and it's going to make it easier or better for a decision maker to do something or to get the data instead of like, spending all their time trying to find the [inaudible 29:13] in the data that are missing a date format, or whatever. We make that all really easy to do so you can use the data in your process.
I think analytics is going to get into everything we do. I think that we can pretty much see that we can make processes better with analytics. The ability to process large amounts of data has been a problem. We now have capabilities for that. But not everybody's an engineer, they’re never got to become an engineer, so we need to make tools that make it easier for the non-engineer, and that's what we said.
IBM did the study in the States couple years ago, they're like, okay, there's 2 million jobs for data scientists to the US, they were predicting by 2000. And they're like, okay, but we only trained 50,000 engineers in the US every year. We need to train more of them, and that was their answer. No, that's not the answer, because we're not going to train 2 million people in the US to become engineers. We have to make the tools so the non-engineer can do the problem.
Erik: Well, we got 3 million truckers, just once we have autonomous driving, convert them into software engineers.
Rick: Let's make tools for them so they don't have to go to school to be programmers.
Erik: Now, there's another issue I think related here. So we see this kind of breakdown of data silos between functions within companies so that you can start to aggregate data and have more interesting analytics, and then the democratization of the analytics process so that everything doesn't have to go through the IT or R&D bottleneck.
But you also have this dynamic that's maybe particularly important with IoT sharing data between partners. So, you have a factory, they're getting inventory from suppliers, and they'd like to know maybe what's happening on the production line of their suppliers. So they can anticipate whether there might be some quality issue or if they have a quality issue, they can say, hey, it looks like there was a process change in our suppliers that might have impacted the material. Now we have that insight, so we can better identify what was the root cause of this quality issue that we're experiencing. And that's a big challenge, because it's not just a technical challenge, but it's a legal or to data ownership and challenge. Do you get at all involved in this issue of then how do we share the data and make sure that rights are perspective?
Rick: Yes, that's a huge interesting problem. Outside of the legal framework, we don't directly address the legal framework. There's been this, like, I have my data platform, you have your data platform, and the two don't see each other. And what's happening now is these data platforms are allowing sharing of data with other like platforms, or even platforms that don't look like there's, so you can see that data as if it was your data.
Because one of our partners is Amazon, and Redshift is their big analytic database, and they've just introduced some new features that allow Redshift to see data that's not in the Redshift system that's external. They might be partner data, so they have some new data sharing features. And Snowflake has that kind of thing and they have this ability to federate, and attach rights to data.
And we actually built their data sharing feature into our application. So somebody could see that could work with our app and they could see the Redshift data, and they could see the data in the partner platform all through our app, because Redshift has enabled that. So I think that the data platforms are going to make it easier to see data in other places, and they are doing that. And the data platforms are working on federated infrastructure to make it data available and to attach rights to it.
We are not directly doing the federation or the attachment of rights, but we're making sure our application will light up those features as the platform support them. That is a hugely important issue, kind of across the board, and certainly with IoT.
Erik: So if you could maybe just walk us through to give the end-to-end perspective of a company that is starting without sufficient processes for processing data, and then how they would deploy the system, and we can talk also to some extent around the impacts on that company's operations?
Rick: I won't give you a company name, but I'll give you an example. So let's take the insurance sector. So we have a bunch of insurance clients. And if you think about how insurance works, every single one of their customers is providing them data about themselves, whether it's us an individual or if it's a big company that's being insured about the company. And the structure of that data is oftentimes different. Certainly, if it's an insurance company for individuals, you have to provide them the data the way they asked for it, but when companies the data exists in all kinds of formats.
So, historically, you've had this bottleneck of like, okay, every company's got data they're providing us; the data is not all the same, we have to try to enforce commonality of how they're going to give us the data, that creates a lot of cumbersome stuff. What we've seen happen is that insurance companies have been big adopters of these big data platforms. And as they've adopted those data platforms, they've sought to empower the business teams who have to deal with each customer to be able to ingest additional data, and to use that data in their operation.
So, if you can think of a big insurance companies dealing with businesses, business to business insurance, they got thousands of clients, and they got thousands of people handling all those clients and this current generation of the thing we're doing, they'd all have to get the data in a certain standard format, puts a burden on the customer, takes a lot of time. They have to have a central organization to process all that data. And they have to do that processing before they can get back to the customer.
But what is changed, and what's changing is with these big data platforms as well, now, hey, we can have a data platform that supports all different types of data, that's a big change. And we can have a tool like Aginity’s tool that allows all of these different client teams to ingest the data from their client, regardless of what formats. And so they're not going to force the client to convert every data set to look like every other data set, and wait for that to be done and check the errors.
They can use the Aginity’s tool and its wizard to ingest that data. And then they can be self-service and their ability to take that data and simply add it to whatever calculation that they're doing, which maybe is a risk calculation, or whatever, and do that in much more of a self-service way. So we've taken process used to be very centralized and regimented and required a bunch of preprocessing of data by customer and made it much more democratized.
So we have clients in the insurance space where literally, they have thousands of seats for our product of licenses because there are thousands of these people who are doing that function that in the previous generation would all be sitting in a queue waiting for IT, and the process with the customer would be burdensome about how they're going to get the data. And this architecture has allowed them to free up these client teams to work with the clients’ data on their own, less burden on IT, less preprocessing, easier for the customer, faster to get to answers. And it only can happen because the thousands of client teams are empowered. We have clients in the insurance space, they're doing exactly that. They're seeing massive benefits from doing so.
Erik: That's quite interesting because it's not just an internal operational improvement, but it's really a different value proposition to the customer.
Rick: Yeah, if you think about it, we all go through it. It’s like the companies we deal with, the more burden they put on us about how we have to interact with them, the less satisfied we are. And the more they can adjust to us, the happier we are about working with them, and this kind of thing allows for that.
Erik: I feel like this is my insurance company strategy, they disincentivize me from going to the doctor just because I then have to actually report the data back to them to get reimbursed.
Rick: Yeah, you think about what kind of a hassle that has been. But that is kind of getting easier. And it's apparently easier because the flexibility of a data architecture. Certainly, I get a million people calling you up here, [inaudible 39:07] who's that guy who said that, you know, dealing by insurance company is easy because it's hard. But they're getting better at it, and we play a small part of that.
I was on the phone with a one of the largest insurance companies in the US just last week. And they were just super happy because these [inaudible 39:27] could do all this stuff that they couldn't do before and then like that's really gratifying. You can help somebody do something that they couldn't do before easier. It's better for their customer, better for them, and everybody wins.
Erik: Well, Rick, I think we've covered a lot of ground today. Is there anything that we haven't touched on yet that's important?
Rick: I think we've covered a lot of space. Yes. So I think we're pretty good. I would just say, I think what you're doing is super cool. I mean, IoT is a massive part of our world. Devices are going to be everywhere and they’re providing data. And we want to be a small part of making it easier for the non-engineer to work with that data to make their process their job, ultimately, all of our lives better.
Eirk: Well, that is worthy mission, Rick. So thank you for taking the time to on. I really appreciate it today.
Rick: Yeah, Erik, thank you for having me. I really appreciate it, you have a great podcast.
Erik: Thanks for tuning in to another edition of the industrial IoT spotlight. Don't forget to follow us on Twitter at IotoneHQ, and to check out our database of case studies on IoTONE.com. If you have unique insight or a project deployment story to share, we'd love to feature you on a future edition. Write us at erik.walenza@IoTone.com.