5 Common Mistakes in IoT Sensor Data Analytics

 At this point I think it’s a given that everyone is excited about IoT.  Consumers are excited because their 6 year old daughters can accidentally order $170 doll houses with Alexa and businesses are excited because they can see results like 92% increase in labor utilization.   

That said, if we are being honest, most organizations while hugely interested in IoT, are struggling with how to put it in place.  This is due to the fact that they are hitting some common roadblocks when it comes to gaining actionable insights from their IoT sensor data. 

 1. Structure

The first problem is really driven by the awesome variety of IoT hardware.  By 2022 there could be as many as 43 billion IoT sensors deployed.  Let’s say you need something as simple as a temperature or humidity sensor for your project.  You have literally hundreds of options to choose from and you might even utilize multiple different types from multiple different manufacturers within a single project.  And, you might start utilizing sensors from one manufacturer and later switch to another.  

Wonderful, choice is awesome right?  Partially true, but each one of those devices probably outputs a different data payload.  Maybe they have very similar metrics, but their schemas are entirely different or maybe they have entirely different metrics.  This makes it really hard to get this data in a single place which is required to extract value.  

How do you make sense of these radically different data schemas that are all reporting on the same thing?  Do you pick one vendor and potentially lose the capability you need from the other vendors?  Do you create different tables in your database for each sensor type?  Do you use what is quickly becoming an antiquated solution, a data lake?  Do you dump it into a flexible schema-less NoSQL database and hope to make sense of it later?  

We have seen that all of these are pretty common paths organizations are choosing, but they are all not ideal.  Multiple tables make it very hard to amalgamate the data into a single actionable view.  Data lakes are expensive, complex, and slow while NoSQL databases are awesome because of their flexibility.  But that flexibility makes it impossible to gain actionable insights later.  Vendor lockin is frustrating, limiting, and risky. 

The solution is not in the hardware, but rather in your IoT data management strategy.  You need to pick a product that provides the flexibility to ingest radically different schemas, that are all saying the same thing, into a single place, with real-time reporting capability and the simplicity to quickly switch between hardware types.  

2. Noise to Sound Ratio

The next challenge most organizations face is the overwhelming volume of IoT data.  Imagine if each one of those 43 billion sensors mentioned previously sent 1 data payload per second, per minute, per hour, per day.  That is a 1.35622e+18 data payload per year and a lot of data.   

Most organizations adopt the mindset that they could never possibly make sense of that volume of data.  How could it possibly be useful?  It’s too expensive, too complex, too overwhelming to maintain.  As a result, they choose to just save summary level data and delete the rest.   

Meanwhile, their data scientists are slowly dying inside, as the data that is being deleted is their lifeblood.  With that data, they could make incredible predictions using machine learning and AI.   

It is not impossible to make sense of that data and doesn’t need to be expensive either.  The problem is that solutions like DBaaS, In-Memory Computing, Data Lakes, and traditional RDBMS make it overwhelming.  If we look at the Big Data architectures that are highly cloud reliant and built for a different scale of data, it can become overwhelming.  These architectures were designed for SaaS solutions.  IoT can have a SaaS component, but IoT is not SaaS.  

If we look to the newer architectures that are specifically designed, built, and catered to IoT, the problems become less complex.  While IoT sensors and IoT gateways typically do not have an enormous amount of compute power, when you think about the horizontal scale capability of 43 billion sensors, that is a lot of compute.   

Organizations need to look for stand-alone, offline capable, self-sufficient, and horizontally scaled solutions to make sense of this level of data.    

3. Cloud Lockin

As mentioned above IoT is not SaaS, despite the fact that most organizations are using the same architectural patterns that worked for SaaS in IoT.  This simply doesn’t work.  The more pioneering and cutting edge organizations are realizing that they need to be cloud agnostic.  Even Microsoft has launched a Cloud agnostic IoT solutions.  

This is important because a cloud first IoT strategy can make connectivity, security, and hardware choices very challenging.  Some cloud solutions only support certain hardware and relying on the cloud for decision making can make your data storage and processing costs skyrocket.  Security can be a major concern as well.  There are a lot of amazing IoT data services provided by cloud vendors like AWS and Google; however, what happens if you are working in a highly secure environment?  What if you need to manage your entire IoT lifecycle on site?  Cloud Lockin can make that impossible.   

As a result companies really need to consider hybrid cloud, horizontally scaled, edge first architectures.  It is important to look at utilizing the horizontal scale capability of edge hardware to make decisions, process data, and power their IoT applications.  

4. Connectivity

Connectivity is a major hot topic in IoT and many companies are working on solving connectivity challenges.  That said, will it ever be perfect?  No matter how much bandwidth we can gain from different IoT connectivity solutions will it be 100% reliable?  Will it be affordable?  

If you are building a mission critical IoT application that has life saving use cases, do you want to rely on off-site decision making capability?  You might have the best connectivity in the world, but what happens if you’re offline for the three seconds where your application needs to make a lifesaving decision, helps locate a first responder in need of assistance, or prevents an industrial disaster?  

Companies again are looking at mono-directional architectures that capture data at the edge and offload the decision making to the cloud.  This is not ideal for mission critical use cases.  Instead, as mentioned above, companies should be looking at local, self sufficient, distributed computing architectures that allow for offline decision-making capability.   

5. Security

In general, privacy is another major concern for most organizations regardless of IoT, but the stakes are simply higher in IoT.  Do you want your mission critical systems controlled by hostile parties?  

A lot of very smart people are looking at protocol security and north/south security.  This is a good thing and important as it would be overly dogmatic to suggest that people keep their IoT projects disconnected from the cloud and broader internet. 

That said, layering on more advanced north/south security capability, which is required, is bound to introduce latency.    

It is highly counterproductive to focus on building a mono-directional cloud based IoT architectural pattern while at the same time focusing on building heavy and highly secure north/south traffic protocols.  We all know that security and usability are in a constant battle.  To resolve this conflict and find the correct balance, we can take a clue from cloud environments where they have introduced concepts like VPCs.    

The north/south traffic is highly secure while the east/west traffic is less secure as those systems are operating within a secure bubble created by the VPC.  Most decision making in the cloud occurs within the VPC/VPN environment and connectivity outside that bubble is normally response based on decisions from within the bubble.  

We need to do the same thing in IoT.  Think of your facility, location, or IoT system as within the VPC/VPN bubble.  Think of the cloud or internet in general as outside the bubble.  We need to make the bubble highly secure while ensuring low-latency, real-time decision-making capability within that bubble.   The best way to do this is through distributed compute systems that are locally present and self- sufficient.  This keeps your IoT data secure while allowing you to make real-time decisions.