Today, the choices of which database to use for our projects can be overwhelming. You want a high scalability, high throughput, easy to implement product so you choose a NoSQL database. You also want to relate your objects together for advanced dashboarding and reporting so now a relational database is thrown into the mix.
You also have data lakes of unstructured data that you need to perform analysis on so you start researching how to implement a high scale MapReduce solution. Once architecting is complete, your data chain looks like a game of mousetrap and the potential cost and complexity makes you think about packing it in and starting a goat farm.
It wasn't that long ago when we just used one database to solve our needs. Throughout much of modern computing, the database of choice has been relational databases. Even today, the top 4 according to DB-Engines are all RDBMS.
Relational Databases
In 1970, E.F. Codd published a paper that proposed the relational model of databases that we know and use today, starting with the first release of Oracle in 1979. The relational model was a leap forward from linked-lists and pointers of old systems in terms of performance and the freedom of splitting the data into separate tables. No longer did developers need to rewrite links and pointers, the database simply evolved with the record sets. Codd also suggested a language to interface with the database which became the SQL language we know today. This language has enabled developers a standard way of interfacing with databases regardless of their underlying storage mechanism. Overall, the language is the same (SELECT, FROM, WHERE, etc..), developers just need to understand the nuances of the implementation of the SQL standard for that database.
Benefits of Relational Databases:
Structure & Granularity: As stated above, relational databases allow developers to isolate their data in a natural structure via tables and link these tables together via joins. Furthering the power of this flexibility is the ability to be granular with writes and reads (constraints and privileges permitting). Developers can write to specific columns in a table as well as define exactly the columns they want to be returned and how they want it filtered.
ACID compliance: Atomicity, Consistency, Isolation & Durability are the backbone of relational databases. This standard is enforced to ensure trust in your data integrity and to provide safeguards in the event of failure, whether that is a power failure or just a bad record attempting to be committed.
SQL: Structured Query Language gives developers an easy and powerful syntax to interact with their data. From creating a schema, provisioning users, writing data, creating simple to complex reads, to dropping your schema. The full life cycle of a database can be managed all from this straightforward language. Since the overall language is the same from database to database, developers can transition fairly easily between RDBMS implementations.
That Sounds Great, but…
Relational databases have many advantages beyond the above capabilities but there are limitations. Primarily scalability, throughput, and adaptability.
Relational databases can scale but typically this means scaling vertically which means moving to a larger server. This migration is expensive due to increased infrastructure costs, downtime during migration & the cost of human capital executing the migration. Once this server is outgrown, the whole process needs to start over. A monolithic server also creates a bottleneck in throughput as even a very large server can only manage so many connections. When the scale of individual transactions grows to the millions / second relational databases again cannot keep up.
Unstructured data and large data sets are also a significant issue for relational databases. SQL databases do not easily handle data like JSON, XML, raw text, etc. and when they do, it is difficult to make sense of the information after it has been committed due to limited indexing or expensive text analysis.
The nature of unstructured data is that it can be elastic, meaning new data points can constantly be introduced and the data consistency varies. The fixed schema inherent to relational databases means that new data points can be lost without involving a DBA and row constraints/data types can cause documents to not transact. Given all this, an alternative has been adopted to pick up the slack.
NoSQL Databases
NoSQL databases have been around since the ‘60s but it wasn't until the 2000s where we see the explosion of NoSQL solutions. This proliferation of a new database paradigm came about from the needs of big data companies the likes of Facebook, Google & Amazon.
With a global reach and an influx of data points related to social media, infrastructure logs, supply chain tracking, and allowing people to share their high score on Flappy Bird, a vertically scaled, monolithic database was not going to keep up. The NoSQL solutions are diverse - graph, key-value & document stores being just a few - but they all share in the ability to be highly available, horizontally scalable and very flexible.
Benefits of NoSQL Databases:
Adaptability: NoSQL databases have elasticity built into their DNA. You need to add a new attribute to your data? No problem, you don’t need to ask an admin to perform an ALTER statement your schema will reflexively respond to what you send it. This same approach goes for data types & constraints. For the most part, the only limit on your data is that a key or hash is provided.
Scale & Cost: NoSQL databases are built to horizontally distribute across a cluster of servers. This allows for adaptability when data storage and/or throughput expands beyond what is currently provisioned. A new server can be brought online into the cluster. On top of this, NoSQL databases can sit on relatively inexpensive hardware as the compute needs are significantly lower.
Ease of Use: As NoSQL databases have been created more recently they adhere to more modern standards. API access is the norm allowing for less convoluted means of connecting to the database. Adding to this, NoSQL databases more consistently conform to object models, which follows with modern OO coding practices and hierarchical data patterns.
So Why Can’t We Be Friends?
Given the unique benefits that NoSQL and SQL bring to the table, it is beyond frustrating that in order to get these paradigms working together you have a convoluted architecture map with many moving pieces that are subject to failure and data loss.
Even the most sophisticated solutions only move data from point A to B to C to the stars and beyond. And solutions that do have SQL and NoSQL are multi-model or have a complex indexing schema that is only a half measure. And this isn’t even mentioning the emerging IoT market.
The Peacemaker
At HarperDB, we have felt this pain as we have built & maintained our own big-data solutions. We stepped back and decided we would find a way to allow developers the best of all these solutions. You want a schema-less, high scale database that can ingest your unstructured and structured data and report on it with multi-joins and conditions the moment it transacts. We made this solution for you and we made it easy and portable.
HarperDB Is...
Easy: HarperDB installs in minutes and does not rely on heavy config or an army of engineers to maintain. We built it to be installed and maintained by an engineer out of code school. HarperDB’s interface is a straight forward REST API that is easy to understand and powerful. HarperDB also responds gracefully to your data - adding a new attribute does not require a DBA. We automatically add the new column to your schema and index it for optimized searches.
Portable: HarperDB was built from the ground up in Node.js. This gives HarperDB a small footprint that can grow with your hardware. HarperDB can fulfill IoT use cases as it has a native ARM build and can expand all the way up to a high performance computer. By leveraging Node’s native CPU clustering capacity we grow with your compute. On top of vertical scaling, clustering is built into the Enterprise Edition of HarperDB. As your data needs evolve, HarperDB will replicate & scale along with you.
SQL & NoSQL Friendly: HarperDB was built from the ground up with a patent-pending data schema unique to the industry. Rather than saving your data as a document or in tabular/columnar format, HarperDB atomizes every record into its individual attributes and saves them discretely on disk.
This format allows us to index every attribute in your entire schema with no extra disk or compute overhead. The added benefit is we ingest and read unstructured and structured data exactly the same way from a single source. Having the entire schema indexed empowers developers to perform powerful searches with joins, conditions, aggregates, and lots of other features you expect from SQL.
These benefits and other features of HarperDB allow developers to have a single database, that is reliable, scalable and adaptable. Say goodbye to the complex architecture and enjoy creating your project on a database that has your various workloads in mind, you might just forget about starting that goat farm. Give us a try today.