We have talked a lot in our blog about NoSQL vs SQL covering the pluses and minuses of both. You can read in more detail about our perspectives on this topic in Kyle’s blog NoSQL & SQL: Why Can't We Be Friends? To summarize, at HarperDB we feel that for high scale data ingestion and uptime, NoSQL is awesome. For business analytics we feel that SQL is still king. We feel both have something incredibly valuable to contribute to the technology landscape. However, we didn’t feel that anyone was providing a solution for both in a way that makes sense.
That is why when we built HarperDB we wanted to make sure that we could support both interfaces as well as take advantage of the capabilities of both technologies. We wanted to do it in a way though where we didn’t sacrifice the scale of NoSQL or the deep analytical capability of SQL.
We noticed that a lot of the NoSQL databases in the market have similarly acknowledged the fact that SQL is still the tried and true method for delivering complex queries. They have done this by adopting a concept called multimodel. We don’t like this concept.
Multimodel under the hood is basically like running two different databases. Anyone who currently supports a multi-tiered database architecture will tell you that while sometimes necessary with existing technology, multi-tiered database architectures are complex, resource intensive, error prone, and have data integrity issues. It’s just basic logic. The more places I have to transport data to and from, the more can go wrong, the more storage I need, the more servers I need, and the more bandwidth I need. Multimodel databases under the hood are really no different except that a vendor has packaged this architecture together as a “single product” when in fact it’s really multiple solutions. You can read about the history of Multimodel on Tech Target where they allude to some of these issues “…multimodel approach may limit the transactional integrity that relational database management systems use to maintain data accuracy and consistency.”
The way they work is that they store data using one primary NoSQL mechanism, a json/object store, a key value pair etc. Then when you want to perform SQL operations you need to transform that data either on disk or in memory into something resembling a column/row store. This is expensive from a resource utilization perspective. It is also risky, because it means that your data is duplicated, and therefore you might have integrity issues and it is certainly not ACID. Finally, it’s slow. You can’t perform complex SQL queries in real-time because you need to wait for your data to be transformed from a NoSQL model into a SQL model.
This just simply doesn’t seem like the best solution possible. It seems like a bolt on solution. Like a Rube Goldberg machine. That’s why when we built HarperDB we built it as a single model database that can accommodate both SQL and NoSQL in a single storage mechanism.
From the ground up, HarperDB is designed to handle both SQL and NoSQL use cases. When data is inserted into HarperDB either via SQL statements or NoSQL objects, HarperDB maps this data to a single model. Data is never replicated and nothing ever needs to be transformed after insert. This has the added benefit of ensuring that every column in HarperDB is fully indexed without increasing the storage footprint or memory utilization. You can perform complex SQL queries in real-time on NoSQL inserts. This gives developers the benefits of NoSQL scale while maintaining the granularity and capability of complex SQL.
As an engineering team our philosophy has generally always been that the simplest solution is the best. We designed the single model concept because we felt that there had to be a simpler way.