LMDB Deep Dive: Interview with Kyle, HarperDB CTO

Kaylan: That just doesn’t fit with our style here at HarperDB. And so I guess you may have already answered this in a sense. What’s your favorite feature or aspect of LMDB?

Kyle: I have to pick just one?

Kaylan: OK. You could give me five. Yeah. Give me your top five or whatever.

Kyle: Yeah. I think overall, like from an architectural perspective of the product, what I really love about what they did in the implementation is they have something called a memory map. And so what that means is when you go to access or insert data, the memory…it actually assigns the byte address of an entry in that file into memory, and so it acts as if the data I’m trying to fetch is in-memory.

So the very first call to get that item of data is as slow as just pulling it off of your SSD. But then the second call, the byte addresses are already cached and mapped in memory. And so it’s acting as if it’s an in-memory database. So it has the speed of that, but it has the persistence of an on disk database. And so we’re getting the benefits of both. And in any other key value store, I’ve not seen that implementation. It creates massive efficiencies. And it also aligns with when we started HarperDB, we wanted to leverage the file system. That’s exactly what Howard Chu (the creator of LMDB) and [LMDB’s] other engineers have done, is they’re leveraging how file systems work, how virtual memory works on operating systems and using existing technologies. And it’s a really clever way to solve really complex problems. I think that’s overall my favorite thing. It’s just super smart about, and efficient about how they’re accessing data.

I think the other thing, too, is they’re using a B plus tree that just creates really efficient searches rather than like log structure merge tree or LSM, which are used in other implementations like LevelDB. Also, they’re (LMDB) natively acidic and so meaning in the database world, it’s Atomic, Consistent, has Isolation and Durability and that’s native to the datastore. So we’re able to lock a transaction. Anything that we’re doing during that transaction does not impede the readers. So there’s no isolation concerns, and then that data doesn’t show up until we actually commit the data. And if something fails inside that transaction, it just rolls back. The readers never see it. So there’s this complete division between the readers and the writers and the writers and the readers. They don’t impact each other.