Wrangling Your Dark Data


 As we talk with clients about their challenges, the topic invariably turns to Industry 4.0. This is interesting to me because it’s a largely generalized concept and while there are many reasons companies want to implement Industry 4.0, one main reason is FOMO.  If you are under the age of 30 then FOMO is probably a standard phrase, luckily I have college-aged kids who happen to hang out with their parents on occasion so I can understand their lingo. FOMO or Fear of Missing Out seems to be a recurring theme as we speak to executives about Industry 4.0 – they see the news feeds in LinkedIn, read the trade magazines and attend happy hours and trade shows and feel as though they need a strategy.  When we ask what Industry 4.0 would look like in their organization, the answers are typically more buzz words; digital transformation, artificial intelligence and machine learning.But then when we ask specifically about how their operations can become more profitable, or how they can do more with less and quickly respond to customers, their answers are typically the same; we need to be able to be proactive and make data driven decisions at all levels in the organization. 

The buzz around Industry 4.0 has motivated leaders to want to adopt new technologies so that their organizations will not be left out; according to my kids this is called FOMO.  But there is a part of Industry 4.0 that leaders could really use effectively and it revolves around real time data driven analytics and decisions – a pillar of Industry 4.0 initiatives. So we know that yes, customers want to implement Industry 4.0 technologies by leveraging more data driven processes at all levels within their organization. Now that the OT side of the organization has begun to catch up with the IT side with networks, sensors and open applications, both sides of the house can potentially provide more real time, relevant data.  

The next question that we ask these clients is how are they approaching this goal for more data driven decision making; in most cases the answer is by hiring data scientists and building a data science capability. This is good for data scientists but does not always lead to the goal.  To realign the organization from being reactive and driven by rigid processes to become more dynamic and fluid requires a top down strategy.  An effective strategy would be one that treats data as an asset and aligns the company to make data driven decisions.  A data science team would benefit the organization the same way a maintenance staff would support mechanical systems. But in order to be effective, the data assets must be trusted, easily accessible and timely, and come from all sources within the organization.  With the advent of advanced analytics, it is impossible to predict the value of each piece of data individually. So should all data be managed equally if it could provide valuable insights to someone at some time? Or is there some data which is more valuable and some data which is disposable?  The answer to both is probably yes – but who determines the relative value and how do we make these data assets available to the right people at the right time? 

After a couple hours of web searches, there seems to be some consistency in the articles written about data scientists which state that a majority of their time is spent not in data science but in “data wrangling.”  Since I now live in the West I like this term, which I found on a forum from someone in retail data science.  It conjures up images of the data scientist with a lasso in one hand and keyboard in another trying to round up and take charge of a herd of text files which need to be organized into some meaningful insight. Move ‘em out rawhide. But seriously the constant thread in these forums is that extracting useful information from volumes of raw data within a company only really happens when it’s needed by the data scientist. Considering that most data scientists are paid six figure salaries, this is a lot of wasted capital and brain power. Perhaps companies should invest in a data strategy that can effectively wrangle the useful data into a more accessible format so that the data scientists can do their magic and the rest of us can also start moving forward with our own data driven decisions on a daily basis. 

As a data management and integration company our lens will always be focused on the actual data, while other worldviews may be centered on processes or organizational structure for enacting change and digital transformation. These are all critical components because the data is at the heart of any organization.  As we drill even deeper into discussions about data driven decision making, a new term emerged (at least new for us) and that is the concept of DARK DATA. The first time I heard this phrase I had absolutely no idea what it was. After talking to several clients about their dark data, I took to Google again and lo and behold it was all over the internet.  It even has its own wikipedia site.  But more importantly it has a Gartner definition – https://www.gartner.com/it-glossary/dark-data 

The interesting thing is that as we start talking about dark data – it becomes evident that all data can be valuable and that dark data is really just data which the status quo had no use for in the past.  According to this Gartner definition “Storing and securing data typically incurs more expense (and sometimes greater risk) than value.”  But who is determining the value?  With the rise of analytics tools, Industry 4.0, machine learning and AI, who decides what data is dark and what data is not?  Perhaps one person’s dark data is another person’s “lit data?” (Not sure what the opposite of dark data could be – comment if you have an idea). 

What is evident is that data is the new capital in this age of machines, and the companies that harness (or wrangle) their data most effectively will succeed and move forward. Old strategies of data lakes, monolithic databases protected by moats of complexity, hinder this new industrial revolution. The companies that can democratize their data and make it accessible to the most number of users while also effectively managing data as an asset will be set to take advantage of new advances without FOMO.