Recently, I was meeting with one of our partners and we were speaking in depth about HarperDB’s architecture and roadmap. During the conversation, what struck me as really interesting, was his focus on HarperDB’s stability and scalability. This was what excited him most about our product.
I was really surprised by this. Don’t get me wrong, I think those are two really important parts of our product, but in our messaging and marketing they often take a back seat to ease of use, performance, and the simplification of your data pipeline.
That said, one of the most enjoyable things about talking to customers and partners is learning how they see and use HarperDB. Kyle, Zach, and I may have created HarperDB (now with a lot of help from others), but what it has become is far beyond our original idea.
I thought in this post I would explore the different elements that make HarperDB incredibly stable and scaleable.
Node.js Database
While it might seem counterintuitive to folks who have not drunk the Node.js Kool-Aid, one of the primary reasons that HarperDB is so stable is due to the use of Node.js as a primary framework. I will touch on this more in the sections below, but our ability to run HarperDB as a stateless database, utilizing a multi-threaded architecture, leveraging horizontal scale, and much more, come as benefits of adopting Node.js.
Stateless Database
Because we chose Node.js as our framework, HarperDB is stateless. What this means is that there is nothing really to crash. Think about that for a second. There are no background processes. There is no in-memory component. HarperDB is really doing nothing unless you are interacting with it as it uses zero CPU and very little RAM at rest.
Most databases typically have some form of background process that is running constantly. This might be to update index files, or to better utilize in-memory caching, etc. The problem is if these central processes crash, the entire database can lock up and become unresponsive. With HarperDB, because of our data model and architecture, this is not required. As a result, a single point of failure does not exist, and as a result, one bad operation will not crash your database.
Parallel Computing
It is a common misconception that Node.js is single threaded. While this is true out of the box, it’s pretty easy to implement Node.js in a multi-threaded architecture using the cluster library.
HarperDB utilizes a multi-threaded, multi-core architecture which means that behind the scenes multiple instances of HarperDB are running independently. This enables HarperDB to handle vast amounts of current clients as the interactions are handled separately by the different threads. It also allows for our database to scale to fit your hardware.
Our community edition is typically running 4 threads, depending on your hardware, whereas our Enterprise edition is running 16 threads. HarperDB will spin up threads based on the number of cores it can access.
It is true that a request can fail, but because we are using parallel computing, your next request will be routed to a different thread which can handle that transaction entirely independently. Also, because Node.js is so lightweight, that failed thread can come back online in less than 1 MS.
Horizontal Scale
Many databases, especially RDBMS, utilize a master-slave paradigm for clustering and replication. These models can be deployed with horizontal scale, but that is pretty challenging and typically they have a single point of failure. This often leads to vertical scale, to avoid failure points and to reduce configuration complexity, which is expensive.
Furthermore, while capable of horizontal scale, most RDBMS systems are more performant when deployed in a vertical scale model as sharding makes horizontal scale challenging. It is difficult to evenly distribute your data on RDBMS systems and the configuration can be very complex.
HarperDB is designed for horizontal scale on commodity hardware leveraging a peer-to-peer clustering model. Every node in the cluster is aware of every other node and can act as a “master” for a particular transaction. This allows for true distributed computing on the edge or in the cloud. If a single node fails, the other nodes can continue to handle transactions without any downtime and catch that node up when it returns to a ready state.
Because hardware costs do not increase in a linear fashion, but rather exponentially increase, the peer-to-peer clustering model can allow organizations to scale HarperDB on cheaper hardware while achieving high scale, high availability architectures, that don't break the bank.
Conclusion
In summary, a lot of things make HarperDB stable and scaleable. These are just a few that I have touched on briefly. As we continue to drive innovation within HarperDB, we will continue to add features and functionality around the scaleability of HarperDB. Some of the things we are looking at in the near future are serverless architectures, new sharding paradigms, clustering groups, and much more. In the beginning of the post, I mentioned that we have learned an enormous amount from the HarperDB community. We would love to hear your thoughts on this post. Do you agree with our design decisions? Do you think we are crazy? Do you have a suggestion? Let us know at hello@harperdb.io