What is the similarity between stability and the S&P 500 index? How can we use data to identify fire risks and where to best open a coffee shop in New York City?
From using regression to identify fire risks in New York City to managing complex process control, it all comes down to getting the right information to the right person at the right time to make a decision. Drew explains what stability means, and the tension between machines and people as it pertains to data.
Transcript.
Welcome to the Industrial IoT Spotlight, your number one spot for insight from industrial IoT thought leaders who are transforming businesses today with your host, Erik Walenza.
Erik: Welcome back to the industrial IoT spotlight. This is the first episode of our three part discussion with Drew Conway. Drew is the founder and CEO of Alluvium. He's also a very active guy that's involved in a number of other companies. So we'll discuss some of those more in detail. We'll focus first on going a bit into Drew's background, some of the other projects he's involved in, and then have a deeper look at Alluvium from a business perspective.
For the second part of the podcast, we will be diving more into the technology, both Alluvium’s technology, but also technology more generally. So looking at how they differentiate from other companies in the market. And then we'll end in the third part with some deep dives into specific case studies. Drew, thanks so much for taking the time to talk with us today.
Drew: Erik, it's great to be here. Thanks for having me.
Erik: So before we get into Alluvium, walk us through, maybe we can start back when you were advising the Mayor's Office of New York City on data analytics. How did you come into that role? And then what does that actually entail working with the city government?
Drew: So, maybe just to go a little bit before that just to get some context around it. When I came to New York City, I came as a graduate student. I was doing my PhD at NYU. And when I got there, just by luck of timing, and force a will manage to meet and interact with a bunch of really interesting and ambitious folks in New York who are doing early data science work. I mean, data science as a discipline is still relatively new. But certainly in the early to mid-2000s, I would say that it was a nascent discipline in practice that folks were still trying to figure out what it was.
And one of the things that I became really interested in and ultimately was able to do some organizing around was how do we get folks from the data science community to interact with the social and public sectors? So I'd actually started an organization called Data Kind, but along the way met a bunch of really interesting people who were doing work broadly within the social and public sectors. And I'd actually done an event in New York, which was called Data Gotham, where we really wanted to just showcase all the interesting work that people in New York City were doing around data. And I was really eager to find someone from the New York City government, who could come talk about what New York City was actually doing with respect to their data.
And so I met a gentleman named Mike Flowers. And Mike Flowers was the very first Chief Data Officer for the City of New York. And he actually started the Mayor's Office of Data Analytics in New York City. And so actually, I had originally met him through the organization of the first Data Gotham conference where Mike was one of our keynote speakers. And he and I managed to continue a really good relationship. Because I was particularly interested in the work that he was doing, ultimately, he asked me if I'd be willing to come and help him and his team think through some of the more deeply technical work that they were doing, and I was happy to do it. So in around 2012 2013, I started volunteering and spending few hours a week with the team going over some of the work and then had a chance to be involved with some really interesting projects there.
Erik: So what type of projects were they doing back then in 2012-13?
Drew: One of the projects that I really didn't have much involvement in, but what really put them on the map was a project that they did with the fire department in New York. So the interesting story here is that the city of New York, like all big, large cities has a resource allocation problem when it comes to building inspection. So the fire department in New York is charged with going around and looking for any safety or fire code violations that may exist in different buildings.
And the way that that's typically done is you get a list of buildings that you are going to go look at. That list goes out to the various fire houses and precincts and then their inspectors go out and they go see if there's anything going on that needs to be fixed. And then if they find something, they issue a document stating that these things need to be fixed. Then the unfortunate thing about that is it tends to be somewhat inefficient. So these documents come in or these requests for inspection come in, and they're not ordered in any particular way. And so the inspectors will go out, and they will typically have a pretty low hit rate in terms of where they would find issues.
So Mike Flowers and his team after a particularly tragic incident where firefighters went into a fire in a building that had a bunch of safety code violations, and some firefighters got hurt and some lost their lives, they said, this needs to change. We need to know exactly what's going on in these buildings. And we need to get much better at finding where these violations are.
And so what Mike and his team did is say, well, the city of New York actually knows a ton about what happens in their buildings and can actually build a model that may be able to assess the particular risk of a given building for having some fire code violations. So rather than say, okay, we'll just take a list of buildings and go out into the city and try to find where problems are, let's order that list based on the likelihood that any given building may have an issue.
So they went on, effectively, this mission to try to go collect and organize as much of the city's data that might be relevant to this question. So obviously, they had the fire departments data they could use past inspections and they had the Department of Buildings data, which had a similar kind of cross section of buildings. But there's a bunch of other data that might not be sort of intuitively relevant but actually tells you a lot about the likelihood that a building might have an issue.
So for example, you have the Department of Taxation data. So you know if a building has a lien on it and it turns out if that is the case, it drastically increases the likelihood that there may be an issue there. You also have the Department of Sanitation data, so how much garbage is collected for any given building. And if a particular building has a garbage collection violation, which would mean that it was producing more garbage or having more waste coming out of the building, well, that too might be an indication of a violation because it might mean that there's more people in that building than they're supposed to be.
And then you have things like 311 data, which in New York is kind of the nonemergency city line where people can do things like do noise complaints, or they can do things like garbage complaints or any kind of unsightly things that they see in the city. And all of that information was really useful too. So what the team ended up doing is actually not a particularly technically sophisticated approach. I mean, really, what they did is they just created a big data table that had all this information where every row was a building, and every column was one of these features.
And they just built out a big logistic regression; that logistic regression would just say, well, what is the likelihood that any one of these buildings may have a violation in it? And they use that to create what what's now known as the risk-based inspection system. And that risk-based inspection system was really a game changer for the fire department. I don't know the exact statistic, but I know that they increase their hit rate, essentially, their ability to find violations by at least 10 times and maybe even higher than that.
After I joined the team that already had a pretty high profile within city government and in fact, we're gaining a lot of interest within all the different departments. So I helped them work through a handful of different projects that were really supporting economic development. We built a similar tool for the city actually helps small businesses decide where they might want to start.
So, again, one of the things that New York City really loves is entrepreneurship and people starting businesses and not just software companies. We want more laundromats. And we want more drugstores. And we want more coffee shops.
But the challenge, let's use the coffee shop example, if you're if you're a small business in New York City, and you want to start a start a coffee shop, you don't have the same resources of a Starbucks where they can send a team to look and count foot traffic and do a bunch of specific research to determine what's the best corner in Chelsea to start a new coffee shop. But the city of New York actually has a bunch of that data already and they can make it publicly available to the citizens to help them make those choices.
So in a similar vein to the work that was done with the fire department, one of the projects that I worked on when I was at MODA, the Mayor's Office of Data Analytics was to try to collect and build essentially a mapping tool, where a potential entrepreneur in New York City or small business owner would say, okay, I want to start a coffee shop in this neighborhood, what might be the best place to do it and where is there opportunity for rent and what's the rent cost there? And that tool is still available. You can actually go use that tool. The city of New York has that tool available through their website.
And I helped them through a bunch of infrastructure questions. They were thinking about using some new databases and new analytics tools. I also helped with some training. So we did some basic statistical programming training with the team. And that was under the Bloomberg administration. And now under the de Blasio administration, that the mayor's office data analytics is still quite successful.
Erik: And I see a couple of years before you worked with Mayor's office, you founded Data Kind. So just reading off a couple of these: forecasting water demand in California when every drop counts, desiloing data to help improve the lives of those suffering from mental illness, using open data to uncover potential corruption, creating safer streets through data science. So I see a lot of emphasis here in your background on working to make urban environments better to solve social problems. And now with your company Alluvium, you're addressing the manufacturing sector, how did you transition from this long term interest in social issues and working with data in cities to focusing on the manufacturing sector?
Drew: If there is any consistent thread through my career, which as you stated at the top is sort of an interesting and winding road is that I've always really been interested in understanding and thinking about how to build software tools that help people make decisions from data, but particularly people who have to make those decisions under some constraint.
The very, very beginning of my career, I actually worked in the US intelligence community and was building custom integrated tools and supporting methods for doing what we called All Source Analysis, because this is before the days of data science as a title in practice. And that was really my first introduction to thinking about building statistical analysis tools to help people who have to make really challenging choices from that data and need to do it quickly because I have a lot of constraint.
The problems that I was working on in that part of my career we’re mostly supporting for deployed Special Ops teams in Iraq and Afghanistan. And they had questions like if we go down the street, is it safe; if we go knock on this door, will we find the person we're looking for? And we're looking at a very broad spectrum of data. So traditional types of data that you would expect to see in an intelligence scenario, like signals intelligence, things like telecommunications records, as well as unstructured just text reports and having to think about building tools that could work with all of that data to help make some decision maker in the field more equipped to make a good decision. And that experience really molded my whole perspective on wanting to build tools that help people in this way.
When I got to New York, I met and got to interact with lots of really interesting people. One of which was a gentleman named Jake Porway, who had also recently come to New York, and at the time was working at the New York Times R&D group. And Jake and I'm quickly becoming friends, because we both had this itching desire to figure out a way to take all of these people that we were getting to know who were working on these really challenging data science problems, but we're working on them in the context of things like ad tech, where we want to figure out, okay, how do you get someone to click on this particular ad? Or they're working in a social media context where we say, okay, how do we understand how to make this particular post more popular?
And the undercurrent of that is that while that work is quite interesting, maybe it doesn't satisfy someone who wants to see how they can use their skills to help people in their roles. And so we actually started Data Kind mostly as an experiment in whether if there are significant number of people in the world who felt the same way we did. Could we find enough smart data scientists, engineers, designers, that would be interested in volunteering their time to support social organizations? Ultimately, we did. And Data Kind has become an extremely successful organization now, with chapters all over the world doing this work.
And so, when I was thinking about the company that I want to start with Alluvium, I was still very much thinking about, okay, I want to build a company that has state of the art technology, but ultimately that technology is built for someone who has to make decisions from data under constraint. And the reality is for the men and women who work in complex, continuous automation process, control environments, they fit this mold precisely.
This is a specific example. If you ever have an opportunity to say walk through an oil refinery, once you get over the fact of how hot it is, you will next be overwhelmed by the amount of data that those systems produce, and the people who operate them and have to think about what that data is telling them and how it can help inform their process.
The reality is it's quickly becoming an impossible task for any one or even a team of people to manage and develop against all of that data. And so we wanted to do is say can we build well designed software tools, particularly with state of the art machine learning and AI at their core to support people in those roles so that they can actually leverage that data in a speedy and meaningful way and get back to making decisions from that data and get back to the work that they're really should be focused on.
So the transition for me was more of where is there an opportunity to begin applying this thinking. And once I started looking around, and quite frankly having had the experience of working in intel community, understanding the challenge of building technology for data that's outside what we typically think of as either the consumer or enterprise web was something that was really appealing to me. And so I got to meet it again speak to and meet a bunch of folks from the industrial side, who really told me this is a real gap in what exists in the market now. And so that was really the spark for me to found the company.
Erik: Just a quick introduction to Alluvium, I'm reading this from your LinkedIn profile, “We use machine learning and artificial intelligence to turn massive data streams produced by industrial operations into insight that help you the experts focus on the anomalies that affect your team safety, productivity and bottom line.” So this is a bit of a horizontal statement. It could apply to a lot of sectors. I guess, anybody who's using heavy equipment in industrial operations. Who are your customers today? What are the segments where you're really working with companies on the ground in 2018?
Drew: And the manufacturing sector broadly speaking has many, many, even big subsections of it. And so for us, we focus primarily on process manufacturing and process control. So, examples of that I already mentioned refineries, so things like downstream oil and gas, chemical processing, more generally is not necessarily petrochemical, but other chemical processing, things like material manufacturing. So we've worked in cement and concrete and other sectors like that things like fertilizer production, which in some senses a specific use case for chemical processing.
And really, the unifying theme of all of those kinds of industrial operations is that they are big, complex mechanical systems that operate in highly interdependent ways and are operating all the time. As opposed to discrete manufacturing or assembly line which may, in and of themselves, be quite complicated, have many, many moving parts, but typically have more of a linear relationships. Or you can imagine a car assembly line, like most of us will have seen some video of robotic arms welding pieces to a car as the car frame moves down an assembly line. Well, each of those stops is discrete, and those operations happen in an order. Whereas in an oil refinery, you have all of these chemicals moving together at once, and so we have tended to focus on that, because that's where we feel our approach can have a much greater impact and quite frankly, a greater salience for the customer.
Erik: What is the typical project look like for you? Who are you working with in terms of the decision makers? Who are the guys that are going to be using your system? How does the project start? And then what does it look like as it moves through deployment?
Drew: So in terms of how we initially get connected with our customers, I would say that there's typically three different ways that we ended up connecting with a company. Most of the companies that we work with, large manufacturers either in the energy industry or chemicals or other more specific parts of process manufacturing, and so for some of the larger companies, they I'm sure given the podcasts and the folks that you talk to, the notion of digital transformation or Industry 4.0 is very well-trodden with In the industrial space.
And so sometimes you have folks at these organizations whose specific role is to go out and try to connect these large quite frankly older organizations with more contemporary, newer technology companies. So someone who says like head of the Office of Innovation or sitting inside an organization in a particular team looking to innovate. And so many times we will connect with those folks and then from there, we may be connected with a particular business unit that has a problem that fits well into our core use cases.
We're typically selling to the folks who manage those teams, whether it's the factory manager themselves, or the head of a business unit or team or even up to the CIO of an organization, depending on the size of the organization.
Erik: I was having dinner with the CEO of Forcam last week, and they focus on tier one, tier two automotive and then other discrete manufacturing. And their key word is OEE, everything is around we measure the OEE before we go in, we measured it 3 months, we measured it 6 months, we measured it 12 months, and their value proposition exactly around OEE.
For you guys, it looks like it's stability. How do you measure this? And does this factor into your business model? If we can improve some stability metric, then this validates that we should move from a pilot up to a scale or that we have a success based fee? Maybe that's two parts of a question. One, how do you track stability? And then two, how does this factor into your sales process or your approach to working with customers and validating that your solutions making an impact?
Drew: So, stability, both as certainly as a word, but as an idea is in some sense the core value proposition of Alluvium. We believe that if we can help our customers understand the stability of their operation and of course, the inverse of that, the instability of their operation, then we believe that we can help them perfect how they do their production.
And so stability for us, as an index and as a value is the core part of what our technology does. And so, for me, I've been building these kinds of platforms, and thinking about these problems for a really long time. And, as a general rule, when I think about building a machine learning tool or even just a decision support tool, there's always this natural tension between having to work at discovering novelty from data. So if you have lots of data, you typically talk to someone and say, well, 80% of the work is just munging in and exploring it and seeing where the interesting features are. And then 20% of the work is actually just building a model or doing some visualization that conveys that.
And so in the context of an industrial operation, given the volume of information and the ways in which that information is used, that's the hardest thing. Finding a way of actually pulling out that core nugget from that sea of information is really, really challenging. And something that a person, no matter how sophisticated their statistical knowledge is, or how deep their particular substantive or industrial knowledge may be that's just fundamentally a hard thing to do. But for a machine, that's actually much easier for machines to do.
Well-designed machine learning systems are really good at observing a bunch of data, building up internal intuition and representation of what the model of that data may be. And then as new observations come, have a good sense of whether that value is what we would expect or is it somehow deviating from what we would expect. And so for us, stability, and actually what we call our stability score is really an index of the overall operation of some complex system.
So that complex system can be very macro in a sense. It can be, say, an entire oil refinery, we want to understand how stable that entire oil refinery as we want to build a single index in the same way that you build the Dow Jones or S&P 500 to understand market stability. We want to build an index that does that for an entire operation. And then that index starts to dip and become unstable, we want to quickly be able to help a customer identify what was the source of that instability and how did that work?
And so for us, that is really core, we think we want to be able to draw that equilibrium or that tension away from having to discover novelty and data to the other side of that tension, which is reasoning about that data. Because just like a computer and a machine is particularly well suited to automate the discovery worry of novelty and data, is particularly poorly suited to reason about it.
Computer doesn't know anything about why an oil refinery or manufacturing line might start to malfunction, but a person is really good at that. Only the person who has 15, 20, 25 years of experience working in these systems can look at a set of values and say I know what this is. And so for us, the stability score and this notion of stability is really a means of getting the right information to the right person's eyeballs at the exactly right time so that they can quickly make a decision or at least have situational awareness around what's happening in their facility without having to dig through all this information and do that discovery themselves.
And then of course, the final piece of this is if you build the system in a way where that interaction is positively reinforcing when an operator looks at a screen or looks at an alert and says, okay, this is something that's important, this is something that I know I need to make a decision on, well, then the machine learning system can learn from that, and say, okay, now I know a little bit more about what actually is an important set of interactions in this underlying data set and I know to bring that to the top of my stack the next time my operator looks for stuff.
Because one of the things that we recognize as a particularly unique challenge to building software for industrial operators is that software is not really central to their jobs. Folks who work inside a manufacturing facility and even those who manage them, they're thinking about the operational technology not to be clever or [inaudible 26:38]. They're thinking about how where do I turn my wrench, where do I point my flashlight? Where are the things I need to be doing to make this physical system work? The software is just there to support that.
And so if we can get 5-10 minutes of someone's attention looking at a screen, we want to make sure that that's an extremely high value interaction for them and that the system is always leveraging that to get smarter. And so all of that feeds into this idea of stability for us, both in terms of what the software does, but in terms of how our operators are able to continue to do their work, because the last thing that we want to do is build a system that sort of fundamentally changes how they do their work. Because A, we know that they're not going to use it because they're not going to be particularly useful for them. And that's going to mean that our system is not going to get any smarter. And that's a problem for us.
Erik: Thanks for tuning in to another edition of the industrial IoT spotlight. Don't forget to follow us on Twitter at IotoneHQ, and to check out our database of case studies on IoTONE.com. If you have unique insight or a project deployment story to share, we'd love to feature you on a future edition. Write us at erik.walenza@IoTone.com.