Many people learn or understand new things relative to things they already know. This makes sense, it’s probably a natural instinct. When it comes to products and technology, a lot of people ask “how are you different,” but different from what? You need some sort of baseline to start from, so you can say, “Similar to X, but different because of Y.” Because of this, comparisons, competitive analysis, and feature matrices are a great way to understand which technology solutions are right for you. So today let’s do a comparison of three different database systems.
As stated in my Database Architectures & Use Cases article: In most cases, it’s not that one database is better than the other, it’s that one is a better fit for a specific use case due to numerous factors. The point of this article is not to determine which database is the best, but to help uncover the factors to consider when selecting a database for your specific project. With MongoDB and PostgreSQL being two of the most popular tools out there, you may already know that there are tons of resources comparing the two. However, with HarperDB being a net new database, I thought it might be helpful to throw it in the mix to provide further clarity.
Referring to my database architecture overview post again: It’s important to understand things such as data type / structure, data volume, consistency, write & read frequency, hosting, cost, security, and integration constraints. That article provides a great high level explanation across all different types of databases, but today we’ll get a but more specific.
These technologies are all similar in that they are used to store data, but that simple concept is where the similarities end. However, while MongoDB and PostgreSQL are actually quite different from one another, HarperDB lies somewhere in the middle.
As mentioned, there are numerous resources out there comparing MongoDB and PostgreSQL, which are both awesome databases. This article from Educative is one great place to start for understanding differences between them. Therefore, to avoid redundancy, in this post I will focus a bit more on HarperDB compared to the two.
MongoDB is classified as a NoSQL database. It is document-oriented, and uses JSON-like documents with optional schemas.
PostgreSQL is a traditional RDBMS (relational database management system). Mainly used for relational data, it is object-oriented in nature.
HarperDB is a distributed database with a REST API and dynamic schema, that supports NoSQL and SQL including joins. (For example, you can ingest data via NoSQL JSON then immediately query it via SQL)
MongoDB vs. PostgreSQL: PostgreSQL is a relational database handling more complex procedures, designs, and integrations. MongoDB is a NoSQL database often used for simpler, more unstructured data, great for app development. Ultimately PostgreSQL enforces schema validations whereas Mongo does not.
HarperDB vs. MongoDB: MongoDB is a document store which is great for unstructured data, whereas HarperDB offers full document store capability plus enterprise grade ACID SQL. Benchmark tests found that HarperDB is 37 times faster than Mongo at less than half the price. HarperDB also has a native REST API, supports SQL on JSON, and can be easier to use and manage. See the full benchmark here.
(Mongo is optimized for high scale writes, but not for reads. HarperDB’s data storage algorithm written on top of LMDB enables both high scale reads and writes, resulting in high performance overall.)
HarperDB v. PostgreSQL: HarperDB is more flexible than PostgreSQL, which is a great technology for complicated data or strict consistency, but HarperDB has simplified much of the work on installation, configuration, and administration. HarperDB is allowing developers from relational backgrounds to use their existing knowledge with SQL with a database that also allows their team to use NoSQL from the same data model.
MongoDB, PostgreSQL and HarperDB can each run anywhere; in the cloud, locally, data center, etc. (There is not a PostgreSQL Cloud like there is with HarperDB Cloud and MongoDB Atlas, but cloud providers offer PostgreSQL-as-a-service.)
Under the hood
PostgreSQL ultimately employs SQL, a structured query language, to define, access and manipulate the database. PostgreSQL also has a JSON datatype.
HarperDB does not enforce data types, it currently stores all data attributes as strings and data can be queried via SQL and/or NoSQL. (HarperDB is also working on enabling the ability for administrators to explicitly set attribute types (as numbers or strings) for performance tuning.)
Data Storage & Architecture
MongoDB stores data as individual documents without regard to attributes, PostgreSQL stores data in traditional tables and rows, and HarperDB stores data in tables and rows/objects with all top level attributes indexed by default.
HarperDB has a unique data storage algorithm running on top of LMDB, this enables HarperDB to ingest JSON documents and relational data in a single product. As data comes in, HarperDB maps it to the data model; it’s not a SQL engine or NoSQL engine. (Regardless of how data is ingested, data is stored according to the HarperDB data model and can be queried via SQL or NoSQL.)
MongoDB and HarperDB are more distributed architectures, whereas PostgreSQL might be considered a monolithic architecture.
ACID Properties (atomic, consistent, isolated, & durable)
HarperDB and PostgreSQL both have enterprise grade ACID SQL transactions, meaning the validity of data is quite reliable.
NoSQL databases like MongoDB usually adopt eventual consistency instead of ACID properties. (A study from May 2020 identified a bug that disputes claims that Mongo is ACID compliant, as MongoDB’s transactions are not fully isolated.)
HarperDB and PostgreSQL both follow the ANSI SQL standard.
Schema & Tables
With both MongoDB and HarperDB, using JSON allows you to change your schema flexibly without consequence. Documents can vary in terms of key/value pairs.
Both MongoDB and HarperDB scale horizontally, which allows for speed. HarperDB has bidirectional table-level data replication. HarperDB uses a simple pub-sub model; data is replicated by publishing data to different “chat rooms” which different nodes subscribe to and are able to be distributed horizontally.
PostgreSQL scales vertically (as it gets bigger, more space or more memory is needed), therefore it requires downtime to upgrade.
With relational databases like PostgreSQL, altering your table is necessary to make any changes. The whole schema needs to be designed and configured at creation. You might be able to alter a table later on, but this may lead to database downtime and bugs in your application. PostgreSQL databases can use foreign keys which explicitly link data between tables and are used to keep the data normalized.
Use Cases & Summary
As stated in this article, because transactions in PostgreSQL follow ACID properties, it’s a good choice for industries such as fintech. When you absolutely need to control the state of your data, use a relational database like Postgres. Alternatively, if you only have unstructured data, or are working with big data, it might be a good idea to use the horizontal scaling approach with a tool like MongoDB.
Use cases where HarperDB might be a better fit than existing systems include projects where you need SQL and NoSQL, rapid application development, integration, edge computing, distributed computing, real-time analytics, and high transactions. Our team will also be the first to tell you when HarperDB is not a good fit for your specific project. HarperDB is not recommended when you need full-text indexing, highly structured relational data, strict consistency across systems, or for projects where developers are not trusted to constrain and maintain data.
Both MongoDB and PostgreSQL are really great tools for certain use cases. They are loved by many and the team at HarperDB is thankful that they have paved the way for technologies like ours to exist. HarperDB was created to expand and blend the capabilities of awesome SQL, NoSQL, and NewSQL products on the market to fill in the gaps and ensure that developers are continuously empowered to use the right tool for the job.