UPDATE Since the initial publication of this article, Linode has been acquired and rebranded as Akamai Connected Cloud.
Intro
I’ve recently been looking into different databases, and I find it fun to see how easy it is to setup and configure something new. In this article, I walk through how to set up a Kubernetes cluster on Linode, then show how you can install a service mesh, and finally a clustered HarperDB instance leveraging mTLS. I finish by briefly discussing my impressions on the setup and configuration of HarperDB.
Before we start setting everything up, I would like to briefly describe what mutual-TLS (mTLS) is. Essentially it is using the same encryption and certificates that are issued to websites to encrypt traffic to the servers (ie. https). With mTLS though, traffic is encrypted in both directions. This is essential when both sides may initiate a connection, which in the case of a distributed database is definitely the case.
What did we test?
So I started reading through and playing with running HarperDB on Kubernetes. More specifically, I wanted to see how easy it would be to run a clustered database. To make my test applicable to potential edge-computing use cases, I wanted to be sure of secure communications between the nodes. I didn’t want any raw data having exposure risk, so had to be sure I could fold it into a service-mesh, in this case, linkered. (At this point, if you’re more interested in code than in the explanation, you can find it here. Clone the repository and follow along if you’d like).
I spun up a Kubernetes cluster on a Linode instance, and found it to be a quick and painless procedure. In particular the terraform provider was high quality and I had no issues with integration.
Running the test
The following section is a technical walk-through of creating multiple HarperDB instances in a Kubernetes cluster. If you follow the steps, you’ll end up with a HarperDB cluster running over a service-mesh. This section is a bit more technical, if you would like to directly view the code it’s available on GitHub. Otherwise, skip to the Outcomes and Impressions section for my thoughts on the process. After setting up an account at Linode, we can use the following terraform code to get us a basic cluster:
After a few minutes we have a fresh K8s cluster we can start playing with. First things first, let’s get a service-mesh installed on it.
We then need to set up a self-signed certificate authority which can be used by our service-mesh. This is a requirement for mutual-TLS, one of our core goals.
It’s a little opaque, but essentially we are setting up our own CA authority, then generating a number of client/server authentication derived certificates.
We can then pipe these certificates into a Linkerd Helm chart which will take care of configuring our Kubernetes cluster. We will simply leverage the public helm charts currently available to install Linkerd.
For more information on installing and configuring Linkerd, you can refer to their comprehensive documentation.
For the HarperDB installation I referenced this article for the initial Kubernetes resources. Some changes were required which I highlight below. I also wrapped the templates up into a helm-chart which you can find here.
We first need to convert the Deployment into a StatefulSet so that we can reference sibling instances in the same deployment.
We then need to set a number of environment variables to fully enable clustering,
Let’s dive a little into what each of these mean.
- CLUSTERING_ENABLED, this lets HarperDB know we would like to enable clustering on this node instance
- CLUSTERING_USER, this is the username that all instances should share in order to sync & propagate data. If there is a mismatch then the clustering will fail.
- CLUSTERING_PASSWORD, similar to the username, this must be shared between all nodes
- CLUSTERING_HUBSERVER_CLUSTER_NETWORK_ROUTES, is the most interesting of the lot. This defines the ‘neighbouring nodes’ for each node in the cluster. In other words, it builds the connections between the graphs and is the reason that we need stable names for each node.
You can have a read of the docs on clustering if you would like to know more.
Routing is the most interesting element of the database configuration, I briefly describe below how it works.
We have connected ‘harperdb-1’ and ‘harperdb-2’ both to ‘harperdb-0’, in this instance if harperdb-1 would like to sync with harperdb-2 it would need to be done via harperdb-0.
With the current configuration model, it is not easily possible to create an arbitrary routing configuration. I’ll discuss this in more detail in the Outcome & Impressions section.
Harperdb is now being routed via the service-mesh. All our inter-database traffic is now being protected by mTLS.
If you would like to connect into the database and play with it, have a look at this article on using harperdb with kubernetes.
Outcome & Impressions
Technically it was a breeze to set up and configure everything. Starting from spinning up the Kubernetes cluster through to installation and configuration of the HarperDB instances.
The database was running successfully over mTLS, so our technical goal was achieved and HarperDB could communicate transparently and securely.
As with any innovative tech product, there is always room for improvement. As a next step, it would be great for HarperDB to focus on building out more explicit Kubernetes support in the form of a helm chart. It would be preferable to have this be actively maintained so as to reduce the risk for prospective users evaluating adoption.
In this article we set up a clustered HarperDB instance with communication over mTLS. It was a pleasant experience with no large technical roadblocks. If you’re wanting to set up a secure clustered database, I would recommend taking HarperDB under consideration.