How to Set Up mTLS in a Kubernetes Cluster of HarperDB Nodes

UPDATE Since the initial publication of this article, Linode has been acquired and rebranded as Akamai Connected Cloud.

Intro

I’ve recently been looking into different databases, and I find it fun to see how easy it is to setup and configure something new. In this article, I walk through how to set up a Kubernetes cluster on Linode, then show how you can install a service mesh, and finally a clustered HarperDB instance leveraging mTLS. I finish by briefly discussing my impressions on the setup and configuration of HarperDB.

Before we start setting everything up, I would like to briefly describe what mutual-TLS (mTLS) is. Essentially it is using the same encryption and certificates that are issued to websites to encrypt traffic to the servers (ie. https). With mTLS though, traffic is encrypted in both directions. This is essential when both sides may initiate a connection, which in the case of a distributed database is definitely the case.

What did we test?

So I started reading through and playing with running HarperDB on Kubernetes. More specifically, I wanted to see how easy it would be to run a clustered database. To make my test applicable to potential edge-computing use cases, I wanted to be sure of secure communications between the nodes. I didn’t want any raw data having exposure risk, so had to be sure I could fold it into a service-mesh, in this case, linkered. (At this point, if you’re more interested in code than in the explanation, you can find it here. Clone the repository and follow along if you’d like).
‍

I spun up a Kubernetes cluster on a Linode instance, and found it to be a quick and painless procedure. In particular the terraform provider was high quality and I had no issues with integration.

resource "linode_lke_cluster" "harperdb-cluster" {
  label       = "harperdb-cluster"
  k8s_version = "1.25"
  region      = "us-central"
  tags        = ["prod", "database"]

  pool {
    type  = "g6-standard-2"
    count = 3
  }
}

Running the test

The following section is a technical walk-through of creating multiple HarperDB instances in a Kubernetes cluster. If you follow the steps, you’ll end up with a HarperDB cluster running over a service-mesh. This section is a bit more technical, if you would like to directly view the code it’s available on GitHub. Otherwise, skip to the Outcomes and Impressions section for my thoughts on the process. After setting up an account at Linode, we can use the following terraform code to get us a basic cluster:

resource "linode_lke_cluster" "harperdb-cluster" {
  label       = "harperdb-cluster"
  k8s_version = "1.25"
  region      = "us-central"
  tags        = ["prod", "database"]

  pool {
    type  = "g6-standard-2"
    count = 3
  }
}

After a few minutes we have a fresh K8s cluster we can start playing with. First things first, let’s get a service-mesh installed on it.
‍

We then need to set up a self-signed certificate authority which can be used by our service-mesh. This is a requirement for mutual-TLS, one of our core goals.

// Create trusted Anchor Certificate

resource "tls_private_key" "trustanchor_key" {
  algorithm   = "ECDSA"
  ecdsa_curve = "P256"
}

resource "tls_self_signed_cert" "trustanchor_cert" {
  # key_algorithm         = "ECDSA"
  private_key_pem       = tls_private_key.trustanchor_key.private_key_pem
  validity_period_hours = 87600
  is_ca_certificate     = true

  subject {
    common_name = "identity.linkerd.cluster.local"
  }

  allowed_uses = [
    "crl_signing",
    "cert_signing",
    "server_auth",
    "client_auth"
  ]
}

resource "tls_private_key" "issuer_key" {
  algorithm   = "ECDSA"
  ecdsa_curve = "P256"
}

resource "tls_cert_request" "issuer_req" {
  # key_algorithm   = "ECDSA"
  private_key_pem = tls_private_key.issuer_key.private_key_pem

  subject {
    common_name = "identity.linkerd.cluster.local"
  }
}

resource "tls_locally_signed_cert" "issuer_cert" {
  cert_request_pem = tls_cert_request.issuer_req.cert_request_pem
  # ca_key_algorithm      = tls_private_key.trustanchor_key.algorithm
  ca_private_key_pem    = tls_private_key.trustanchor_key.private_key_pem
  ca_cert_pem           = tls_self_signed_cert.trustanchor_cert.cert_pem
  validity_period_hours = 8760
  is_ca_certificate     = true

  allowed_uses = [
    "crl_signing",
    "cert_signing",
    "server_auth",
    "client_auth"
  ]
}

It’s a little opaque, but essentially we are setting up our own CA authority, then generating a number of client/server authentication derived certificates.
‍

We can then pipe these certificates into a Linkerd Helm chart which will take care of configuring our Kubernetes cluster. We will simply leverage the public helm charts currently available to install Linkerd.

resource "local_sensitive_file" "kubeconfig" {
  filename       = ".temp/kubeconfig"
  content_base64 = linode_lke_cluster.harperdb-cluster.kubeconfig
}

provider "helm" {
  kubernetes {
    config_path = local_sensitive_file.kubeconfig.filename
  }
}

resource "helm_release" "linkerd-crds" {
  name = "linkerd-crds"

  repository = "https://helm.linkerd.io/stable"
  chart      = "linkerd-crds"
  # version    = "2.12.3"

  atomic           = true
  namespace        = "linkerd"
  create_namespace = true
}

resource "helm_release" "linkerd-control-plane" {
  name = "linkerd-control-plane"

  repository = "https://helm.linkerd.io/stable"
  chart      = "linkerd-control-plane"
  # version    = "2.12.3"

  atomic    = true
  namespace = "linkerd"

  set_sensitive {
    name  = "identityTrustAnchorsPEM"
    value = tls_self_signed_cert.trustanchor_cert.cert_pem
  }

  set_sensitive {
    name  = "identity.issuer.tls.crtPEM"
    value = tls_locally_signed_cert.issuer_cert.cert_pem
  }

  set_sensitive {
    name  = "identity.issuer.tls.keyPEM"
    value = tls_private_key.issuer_key.private_key_pem
  }
}

# resource "helm_release" "linkerd-viz" {
#   name             = "linkerd-viz"
#   chart            = "linkerd-viz"
#   namespace        = "linkerd-viz"
#   create_namespace = true
#   repository       = "https://helm.linkerd.io/stable"
#   # version    = "2.12.3"
# }

For more information on installing and configuring Linkerd, you can refer to their comprehensive documentation.
‍

For the HarperDB installation I referenced this article for the initial Kubernetes resources. Some changes were required which I highlight below. I also wrapped the templates up into a helm-chart which you can find here.
‍

We first need to convert the Deployment into a StatefulSet so that we can reference sibling instances in the same deployment.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: harperdb
  namespace: harperdb
spec:
  replicas: 3
  selector:
    matchLabels:
      app: harperdb
  template:
    metadata:
      annotations:
        linkerd.io/inject: enabled
      labels:
        app: harperdb
    spec:
      containers:
        - name: harperdb
          image: harperdb/harperdb
          env:
            - name: CLUSTERING_NODENAME
              value: { { randAlphaNum 5 } }
          envFrom:
            - secretRef:
                name: harperdb

We then need to set a number of environment variables to fully enable clustering,

apiVersion: v1
kind: Secret
metadata:
  name: harperdb
  namespace: harperdb
stringData:
  HDB_ADMIN_USERNAME: admin
  HDB_ADMIN_PASSWORD: password12345
  CLUSTERING_ENABLED: "true"
  CLUSTERING_USER: cluster_user
  CLUSTERING_PASSWORD: cluster_password
  CLUSTERING_HUBSERVER_CLUSTER_NETWORK_ROUTES: '[{ "host": "harperdb-0", "port": 9932 }]'

Let’s dive a little into what each of these mean.

CLUSTERING_ENABLED, this lets HarperDB know we would like to enable clustering on this node instance
CLUSTERING_USER, this is the username that all instances should share in order to sync & propagate data. If there is a mismatch then the clustering will fail.
CLUSTERING_PASSWORD, similar to the username, this must be shared between all nodes
CLUSTERING_HUBSERVER_CLUSTER_NETWORK_ROUTES, is the most interesting of the lot. This defines the ‘neighbouring nodes’ for each node in the cluster. In other words, it builds the connections between the graphs and is the reason that we need stable names for each node.

You can have a read of the docs on clustering if you would like to know more.
‍

Routing is the most interesting element of the database configuration, I briefly describe below how it works.

‍

We have connected ‘harperdb-1’ and ‘harperdb-2’ both to ‘harperdb-0’, in this instance if harperdb-1 would like to sync with harperdb-2 it would need to be done via harperdb-0.

With the current configuration model, it is not easily possible to create an arbitrary routing configuration. I’ll discuss this in more detail in the Outcome & Impressions section.

Harperdb is now being routed via the service-mesh. All our inter-database traffic is now being protected by mTLS.

If you would like to connect into the database and play with it, have a look at this article on using harperdb with kubernetes.

Outcome & Impressions

Technically it was a breeze to set up and configure everything. Starting from spinning up the Kubernetes cluster through to installation and configuration of the HarperDB instances.

The database was running successfully over mTLS, so our technical goal was achieved and HarperDB could communicate transparently and securely.

As with any innovative tech product, there is always room for improvement. As a next step, it would be great for HarperDB to focus on building out more explicit Kubernetes support in the form of a helm chart. It would be preferable to have this be actively maintained so as to reduce the risk for prospective users evaluating adoption.

In this article we set up a clustered HarperDB instance with communication over mTLS. It was a pleasant experience with no large technical roadblocks. If you’re wanting to set up a secure clustered database, I would recommend taking HarperDB under consideration.

Dev Center

Our Story