LMDB Deep Dive: Interview with Kyle, HarperDB CTO





Recently, the HarperDB team invited the folks behind AlaSQL, a popular client-side in-memory SQL database, to a virtual Q&A. It was interesting to learn more about AlaSQL and how HarperDB uses AlaSQL on the backend. This got me thinking about one of the other tools we use within our tech foundation, LMDB. While we have not yet had a similar event with the creators of LMDB (hopefully in the future!), I was able to catch up with our CTO, Kyle Bernhardy, to learn more about how HarperDB incorporates LMDB and what it’s like to work with the open source key value store. Kyle was the lead in implementing LMDB into HarperDB so it was highly insightful to hear about his experience.
You can listen to the full 30 minute interview here 

Kaylan Stock: Well, thank you, Kyle, for doing this little interview with me today. I’m excited to learn more about how we implemented LMDB and all that good stuff.

Kyle Bernhardy: I am just here to share what I know.

Kaylan: Well, let’s dive in. So my first question. Pretty basic one. What is LMDB? For people that might not know.

Kyle: So LMDB, it’s a really fast, really lightweight key value store. And the one differential with LMDB is it’s an embedded datastore. So that means that it embeds in your code. It doesn’t run as a separate server. It actually acts as a library. And you just call functions on that library to execute the functions that you need. So that keeps it really lightweight because there’s not some extra resource running on the side. So it actually just runs in line with our code.

Kaylan: Awesome, and we love lightweight and compact here at HarperDB so it’s a good fit.

Kyle: Simple as possible.

Kaylan: Yes. All of that. So how does HarperDB use LMDB on the backend?

Kyle: Sure. So LMDB is our new data storage mechanism. When we started the company over three years ago, our initial data storage mechanism was something that we had created a patent around and it was based on the file system. And so when you inserted let’s say, a record or an object, we would break it all apart by attributes and then store each element separately as a file.

And that had some real good benefits, but it also had some real big tradeoffs. One of the big tradeoffs was searches. Also, there are some issues on the file system as well with things called inodes (index nodes). And on the scale of data, it sort of fell over on itself with our old data mechanism. So for LMDB that’s our replacement data storage mechanism, and it allows us to still do data modeling, very similar without breaking records / objects apart, auto indexing and all that.  

Kaylan: So that’s awesome. And doesn’t LMDB help with performance too or is that more the AlaSQL side of it?

Kyle: Well, we did do some performance improvements in the SQL side. Sam Johnson, one of our engineers, had done a lot of work on our SQL Engine to improve that. But the lower level below that is the data itself. And we got significant performance improvements from C.P.U. utilization, memory utilization, disk utilization and really across the board. We got significant improvements just from the hardware utilization of your computer, server, or  whatever it is that you’re running. 

Kaylan: Yeah. I think everyone here at HarperDB is a fan of LMDB for sure. It’s a really cool [product]… it’s open source, isn’t it?

Kyle: It is. And we’re using a node library because LMDB is written in C and HarperDB is written in Node. The really nice thing with Node is you can import C as native node modules. And another open source contributor had created a great library for LMDB for Node. That’s a big game changer and the implementation of it was very simple on the node library side. But we’re essentially using two open source libraries that’s sort of like the bigger fish is eating the smaller.