Apparently I’ve been living under a rock for the last 5 years, because I’d never heard of Docker until Zach started talking to me about his new HarperDB Docker container. I guess it’s because I’m not normally involved with DevOps. Regardless, my eyes have been opened to a whole new world of containerization and I see a lot of potential.
For those of you out there like me who have not heard of Docker, I’ll do my best to explain it. Docker is a software tool used to build and execute containerized applications. A container is essentially a package containing all of the pieces necessary to run an application, including the application itself and associated dependencies. Containers are similar to virtual machines, but instead of running a full operating system, a container only consists of the application. Docker itself runs a virtualized Linux kernel which is shared between all of the containers running in a docker instance. ZDNet has an in-depth explanation and history here.
The container movement is the fastest technology adoption I’ve ever seen. By the time I’d heard of Docker it was already being used in production across the tech world. That’s not normally how things work. What could’ve possibly made Docker into the wave that it is? My guess is that it’s because you get all of the flexibility of a virtual machine (VM) without all of the excess overhead. You might even get a bit more flexibility because docker containers are guaranteed to run consistently on every operating system, because everything is running on top of the virtualized Linux kernel, think like the JVM. This means that I can develop on my Mac, send the code over to my colleague running Windows, then ship it off to a Linux server for production and I can rest assured that the execution will be consistent everywhere. I like that.
One important thing to note about Docker containers is that they do not persist. Meaning, they should be used for executable application code. Fortunately for us, Docker has volumes that allow for persistent storage. One of the great things about Docker volumes is that Docker manages them, so you can rely on the same file system consistency you get with the containers. Another great volume feature is that containers can share the same host volume. Imagine a swarm of HarperDB containers all accessing a single host volume, a few read only, maybe a few specific to individual schemas. All of these application instances would be accessing the same managed volume; truly database as a microservice.
This is all well and good, but why did we put a database in a Docker container? If you Google“docker database” you’re going to get a bunch of conflicting opinions on whether or not a database belongs in a container. The argument I like best is that in general your application layer should be separate from your data layer. In general, I agree with that. However, because HarperDB stores data directly on the file system, our application and data are not explicitly tied together. Meaning, the HarperDB executable can be considered part of the application layer, while the data volume is the only actual part of the data layer. In our case, containerizing HarperDB is completely acceptable and arguably a plus for those with DevOps teams who prefer containers to VMs or native installations.
HarperDB is all about Simplicity without Sacrifice. Our Docker container is simple and easy to get up and running with minimal configuration. Users can pull our container from Docker Hub and run HarperDB in just a few steps. The only configuration necessary is that on installation you need to specify a volume and map the HarperDB ports. From there, you’re good to go. Check out the HarperDB Docker container on Docker Hub: https://hub.docker.com/r/harperdb/hdb/