We have written a lot about the technical need to combine NoSQL and SQL solutions for handling big data workloads. That said, what are the business impacts of combining both NoSQL and SQL? When should I use an object model vs. a query model? What use cases require both? Today we thought we would look at best practices for each as we have given this a lot of thought in designing HarperDB.
When to use SQL vs. NoSQL
Modern application development, for the most part, uses an Object Oriented model. NoSQL databases return and store objects instead of the traditional column/row model of an RDBMS SQL database. It makes sense to want to use a NoSQL database as the primary interface for interacting with your application. Otherwise, you consistently have to transform columns and rows into objects within your application. This adds latency and resource utilization that is unnecessary with the advent of modern NoSQL databases. As a result, when we built HarperDB, we decided that regardless of how you query HarperDB with a filter search or a SQL query we return an object array.
This allows developers to quickly and easily wire HarperDB’s native REST API into their application eliminating the need for using things like Mongoose or Hibernate. Not only does this save on resource utilization, this also saves the developers a lot of time in getting up and running.
On the other hand, analytics, reporting, and data science all require the ability to slice and dice data in many different ways and can get pretty complex. Let’s say you are looking at sensor data from cell phone towers. You might have data that represents the towers. You will also probably have data that represents usage from the network, sensors on the towers measuring electrical usage, temperature etc. Maybe you want to compare the network usage to electrical usage to predict how much power you are going to need based on the time of day?
In a standard NoSQL database this becomes challenging. You can either store all your data in a single giant “collection” or you can store each of those different types of data in multiple “collections.” If you go with the first option it makes it really hard to search by all of the parameters, like number of phones connected, power usage, temperature etc. in the data. If you go with the second option it makes it really hard to compare the collections like comparing towers to power.
NoSQL and SQL together
Traditional RDBMS solutions do a much better job of making this comparison possible. NoSQL vendors have begun to realize this and that is why they have adopted the concept of Multi-model to also accommodate SQL. You can read about why this is risky in our blog Multi-Model Databases - A Mistake. That said, one of the driving factors for the development of HarperDB was the idea that we wanted to provide SQL capability on top of JSON/Object data in real-time. When we were designing our search functionality, we spent an enormous amount of time mapping out what a filtered search (NoSQL) with multiple joins (comparing different types of data) would look like. The answer? It looked AWFUL. We realized there was a pretty established 40+-year old way of asking databases complex questions - SQL. As a result, while you can still do complex searches in HarperDB without SQL, we have documented and support SQL as a best practice for answering complex questions in HarperDB.
So when building your application, focus on objects. When trying to get business intelligence, and make actionable decisions that drive business value, SQL is still king.
How Designing your IoT Database can Drive Transformation
IoT solutions are probably one of the most important areas of digital transformation and one of the most necessary places for the combination of unstructured and structured data. Most things in IoT report back objects. These could be simple sensors like a pressure sensor or something as basic as a wi-fi router. They could also be more complex like an autonomous vehicle.
Most folks that we speak to are simply collecting these objects and storing them. This is great for solving their IoT problem on the edge, but it won’t drive business transformation. Digital transformation comes from gaining actionable insights on data. As we discussed above, this makes it necessary to add SQL capability to your data value chain.
Implementing cool hardware on the edge is only the first step in mainstream IoT adoption. Gaining business value from the data will drive value and ROI from IoT.
Imagine if retail chains could begin to provide just-in-time inventory management using predictive analytics and machine learning with IoT sensors on the shelves where the sensors would track inventory levels. The data scientists could look at inventory levels compared to data, foot traffic, weather patterns and more. This is true digital transformation.
This type of complex business analysis will not be completed on object stores as it requires too much comparison of data. As a result it is important to ensure that when developing IoT applications, that both NoSQL as well as SQL workloads are considered. Furthermore, it is key to do this in a simple scalable fashion to help manage cost and continue to drive innovation.