LMDB Deep Dive: Interview with Kyle, HarperDB CTO





Kaylan: Yes, definitely. I think you’ve kind of already touched on what your least favorite thing was about working with the tool. Was there anything you wanted to add?

Kyle: As far as specifics of LMDB itself, you know, nothing really comes to mind. I know there were things I struggled with just off the top of my head right now. I have no major complaints, a slight hurdle I had to figure out was handling various data types. I ended up leveraging the Binary data type as it allows for all kinds of data type. 

Kaylan: Do you have any tips for people that are looking into using LMDB or incorporating it into their product or project?

Kyle: Yes. Yeah, I think you know just going through a good process of understanding what you’re trying to achieve up front. I mean, it’s sort of more basic design principles, but specific to LMDB. You know, there are some quirks with working with LMDB before you begin the transaction to make sure you initialize all of your key value stores. Because if you tried to do it inside the transaction, it’ll blow up.

So there are some little tricks to how you need to initialize things. I’m just trying to think of some other kind of gotchas. Yeah, opening and closing environments, if you do them in the wrong order, you’ll end up in a weird state with your data or your process will hang. Which I experienced one time when I was initially vetting the product, I almost gave up because all the sudden it just started hanging.

Actually to go back. Another [favorite] aspect I have of LMDB is one of the big communities that uses LMDB is the data science community and Python. There’s a big community that uses it. And there’s a lot of Stack Overflow articles and posts that I could access to understand how to use LMDB. So while I’m using it node, it’s still the same tool. It’s just the language is different. But the way you interact with it is all the same. So I was able to get online help without having to reach out to the actual development team.

Kaylan: Yeah, that’s nice. How long has LMDB been around?

Kyle: Its initial release was 2011, so it’s been around for nine years. So that’s the other reason for choosing it was it’s a well established project. It’s been around for a long time. You know, nine years is a long time to work kinks out and bugs and understand different architectures.

Kaylan Stock: Yeah, definitely. And maybe you don’t have anything you would change, because I know you’ve been speaking so highly of it. But if you could change anything about HarperDB’s implementation of LMDB, what would it be?

Kyle Bernhardy: I mean, right now, probably nothing. If you talk to me in like six months… but for now, I feel like it’s really solid. You know, through our managed service, we’re having users hit it and straight downloading the product. And we’ve not had any issues with data writes. Data reads, like a very low level. So the implementation now feels really solid. The thing is I think any issues would not be on LMDB itself. It would just be on what we did on top of it. I don’t see any right now. It’s more about what I want to do with it.

Kaylan: Yeah. And on that note, that’s like perfect leading into my final question. What do you want to do in the future with LMDB and HarperDB? Where do you see that going?

Kyle: Near term, we are leveraging LMDB to store transactions. This will allow us to show the history & audit trail of your data by time, user & record id. We will also use this as a replacement for our existing clustering catchup data store, which is currently an append only file log. Longer term, I want to allow users to predefine data types for specific attributes, the key benefits will be enabling constraints on data which enhances data integrity as well as improves performance for searches

Kaylan: Very cool. It sounds like you have a long list of things you want to get done with LMDB so that’s awesome. And that actually was my last question. It’s just cool to you know, I’ve watched you implement it, but it’s kind of cool to just talk about the process and like, how you came to [choose] LMDB. It’s very interesting.

Kyle: The more I researched, the more I read and the more use cases I found that made me feel more confident about the choice that we made. For this to be like our default underlying datastore. It’s a great product. I hope someday we get to talk to Howard Chu who created it because he is super smart. I think [it would be] a cool conversation with him.

Kaylan: Yeah, I think I would love to send this to him. I’m sure he’d be stoked to hear all of your feedback. And it would definitely be cool to do a similar showcase that we did with the AlaSQL team. So yeah, that’s definitely something we should look into and maybe get on the books depending on if he’s open to it.

Kyle: Yeah, yeah. I hope so too. I think he lives in Colorado.

Kaylan: Really? That’s cool.

Kyle: I think he might be on the Western Slope, but I’m not totally sure

Kaylan: That would be super cool if he was living right next door to us.

Kyle: He’s just down the street.

Kaylan: All right, Kyle. Well, thank you so much for your time. And I am excited to write up some info on this awesome interview, and I’ll definitely share the recording out and yeah, I appreciate it.

Kyle: Thank you. This is great. Thanks so much.

You can listen to the full 30 minute interview here