After coming up with the worst pun ever made in the database community, this title, to talk about one of if not the most powerful features in relational databases: the SQL join. I believe that the joins are a big reason the relational databases have stayed so relevant in a world where everyone seems to be talking Big Data. NoSQL databases cannot perform join operations. Sure there are a ton of articles out there describing work arounds, but at the end of the day they all come to the same conclusion and that is if you need to join, you need a relational database.
What is a SQL Join?
Before I get too ahead of myself here, let’s take a look at the basic principles of the SQL join. Don’t tell my English teachers, but I’m going to ahead and cite Wikipedia here:
"An SQL join clause combines columns from one or more tables in a relational database. It creates a set that can be saved as a table or used as it is. A JOIN is a means for combining columns from one (self-join) or more tables by using values common to each."
What this ultimately breaks down to is that SQL joins are used to pull data together from multiple tables; all with various join types to ensure the correct dataset is retrieved. Developers define how tables are connected within the join clause, allowing them to specify which data points link tables together. This is incredibly important when trying to gather, present, and analyze data that is dispersed. All of the leading Business Intelligence (BI) tools, like Tableau and Power BI, rely on joins to produce even the most simple analyses. The ability to coalesce data is so easy with SQL and relational databases that any step backwards is simply unacceptable.
NoSQL means NoJoins
There are multiple different types of NoSQL databases, all focused on handling large amounts of sometimes changing data at scale. These databases are great at ingesting high volumes of data, structured or unstructured. Relational databases are not designed to handle this and therefore are lacking in this area, as they require a strict structure of data to be know before data is inserted. The lack of structure in NoSQL databases is one of the contributing factors to why they can’t perform joins. In fact, the accepted practice in NoSQL databases is that if data is needed in multiple places then it should be repeated. It has always baffled me that this was ever considered acceptable, but that’s beside the point. NoSQL best practice says that joins are not necessary, but that hasn’t stopped developers from trying. Complex aggregations are required in order to attempt join-like behavior in MongoDB, as described here. No matter how many different ways developers attempt to create join-like behavior in NoSQL the conclusion is always to use a relational database for this functionality. This is how we end up with multi-model, Rube Goldberg-esque, database architectures.
The HarperDB Alternative
HarperDB was founded because of the all too common need to ingest and analyze any volume of structured, semi-structured, and unstructured data. The HarperDB Dynamic Schema adjusts to changing data structures as they are ingested. This allows the database to cope with a constantly changing data model, without the need for developer or DBA interaction, something that standard relational databases are incapable of handling.
The HarperDB difference is made possible by our unique storage algorithm which allows developers to ingest and access their data with either SQL or NoSQL. This means that high volumes of unstructured data can be ingested via NoSQL into multiple different tables without duplicating any data, but then queried via SQL using standard join operations. This was thought to be impossible in a single-model database until HarperDB was launched. HarperDB bridges the gap between relational and NoSQL databases, giving developers “the best of both worlds.” Something I’ve heard from countless HarperDB customers, partners, and users.