There are a lot of things to consider when jumping into a new IoT project such as security, connectivity, hardware, device management, battery life, sensor types, and many more. One of the things that often becomes an afterthought is data management. That said, data management can have a major impact on many of the aforementioned areas. The goal of this article is to look at some common decision points that should be considered when evaluating a product.
IoT Database vs Traditional Application Database
A traditional application like an ERP, CRM, or CMS typically will have a fair amount of tables maybe 15 or 20, sometimes even a few hundred. Those tables typically are wide and shallow, meaning they often have lots of columns and not a lot of records. When we say not a lot of records this could mean tens of thousands or even hundreds of millions. Conversely, an IoT database can often potentially have billions of records in narrow deep tables. The data models and scale for IoT are very different than your traditional database.
Transactional Velocity and Type
Often in IoT you can have sensors writing data more than once a second. The number of sensors could be in the hundreds, thousands, or even millions. Imagine a highway with multiple sensors per mile measuring weather, traffic, and temperature data. These sensors most likely are reporting every second. The velocity of these data writes is significantly higher than traditional application use cases and you could be dealing with billions of records per hour.
The types of these transactions are also very different. Often in an application use case you might update the same record many times. You might also want to roll back transactions in the event the situation changes or the user changes their mind.
While this can happen in IoT it’s not as common. More frequently these data writes represent events that typically don’t get updated again. They are points in time.
Many times, folks begin their project with a traditional SQL database as that is where they are comfortable. This can lead to problems as traditional RDBMs often cannot handle the ingestion scale of IoT and is really why NoSQL databases were born. That said, once folks have adopted a SQL database they will often upgrade to an in-memory SQL database to try and solve the problem. However, the vertical scale required to manage an in-memory footprint at IoT scale is massive and extremely expensive.
Hardware and Network Constraints
Another thing to consider when evaluating databases for IoT projects is that your network topology and hardware constraints are going to be radically different than your typical cloud based application. In the cloud, the only thing constraining your compute and throughput is your budget. In IoT however, because of size, weight, battery life, and cost constraints, it’s a different story. While edge computing devices are becoming increasingly more powerful, for a lot of edge projects those more powerful devices are outside the budget as the devices may need to be disposable or deployed at a level of scale that makes their price point impossible.
Network connectivity may also be intermittent or limited as well. This is especially true in the case of IoT devices that are affixed to assets that are mobile.
As a result developers need to design a data strategy that takes into account limited edge resources and unreliable connectivity.
There are a lot of great database products; however, those products often require fairly hefty server requirements to run optimally and also probably have a fair amount of background processes which can eat up battery life. Those products also typically deploy a parent-child replication strategy which can be challenging when network connectivity is intermittent.
Decision Making and Analysis - Data Value Chain
The most important question you need to answer before designing your data management strategy is what do you want to do with the data? Don’t just think about what you want to do with the data today, think about what you want to do with the data long term. Data architectures are critical parts of any project and while you can often swap out pieces like the individual products, it is really challenging to rip out and replace the entire design pattern once in production.
Do you want to make real-time decisions and responses from the data you collect? This is especially critical in life saving scenarios or industry 4.0 scenarios where milliseconds can lead to millions of dollars of losses in manufacturing.
Do you want to layer on machine learning and predictive analytics? Do you just want to look at historical data?
Do you care about the individual records or do you just care about the aggregates? It's critical that you think through your entire data value chain before you embark on your project. Does your data have value? When? How?
If you just want to report on aggregate data for the most part and don’t need to take action on your data, a time series database might be ideal. That said, if you need to take action on your data and also want to interact with your data, that might not be an ideal choice. You could potentially look at a NoSQL product, but if you need complex analysis of your data that might prove problematic down the road.
What is really needed in IoT is an HTAP database that can handle high scale ingestion, complex analytics and the ability to deploy directly on the edge. In our CTO Kyle Bernhardy’s blog HTAP: What is it and Why Choose it as a Database, he explains the concept and why we felt it was important for HarperDB to be an HTAP database.