• jimis 5 months ago

    The page writes that LMDB "is exceptionally fast for our kinds of load", and then links to an in-memory microbenchmark: http://www.lmdb.tech/bench/inmem/

    Aren't they interested in persistence of the key-value data? In my experience, once data is persisted to disk or SSD, LMDB is way slower from alternatives because it needs to operate in synchronous mode to avoid corruption (effectively flushing to disk after after every transaction committed). If operated in the non-default MDB_NOSYNC mode (which is the mode chosen in the above benchmarks), then there is a high probability to be left with an unreadable database file after a crash, thus losing all your data.

    It is not fair to compare with other databases in sync mode, since they might operate safe but faster in async mode. For example sqlite with PRAGMA journal_mode=WAL and PRAGMA synchronous=NORMAL can operate in semi-asynchronous mode (fsync()ing sporadically) without fear of corruption in case of crash, because it keeps a WAL journal and is able to properly roll-back after a crash. This should be much faster than LMDB's default-and-safe synchronous mode, that msync()s on every value written.

  • malkia 5 months ago

    Concidentally, I'm in the middle of testing out LMDB as replacement for SQLite - we've been running fine for a year with journal_mode=WAL, but synchronos=NORMAL without issues, and just added last week LMDB, but with MDB_NOSYNC... hmm... so far our real testing showed (for our use case) almost same results, except some 95-99% percentiles where SQLite goes much slower (but due to other factors). In synthetic benchmarking (suiting our needs) the difference is miniscule..

  • jimis 5 months ago

    If you are using MDB_NOSYNC in production, keep in mind that the trade-off for the boost in performance is that a system crash can lead to an unreadable database.

  • malkia 5 months ago

    Yup, we had SQLITE in the same fashion (synchrnous=OFF, and other non-safe settings, no WAL also) and had occasional crashes, then someone had to manually delete the database (in our case it's for a desktop app, a tool, so no server-like requirements, yet we want it to be stable).

    Would remove the MDB_NOSYNC then, and see how it goes...

  • hyc_symas 5 months ago

    We have only ever compared LMDB in synchronous mode to other DBs in synchronous mode, and LMDB in asynch mode to other DBs in asynch mode. Come on, that's too obvious.

    And LMDB beats the crap out of SQLite, in any mode. http://www.lmdb.tech/bench/microbench/

    Replacing SQLite's Btree engine with LMDB makes the SQLite footprint smaller, faster, and more reliable too. https://github.com/LMDB/sqlightning

  • jimis 5 months ago

    > And LMDB beats the crap out of SQLite, in any mode. http://www.lmdb.tech/bench/microbench/

    Alright, I went through this page to see why my experience is different. While it mentions that they enabled SQLite's WAL journal, it also mentions that the synchronous writes were performed with PRAGMA synchronous=FULL.

    I believe that if they set PRAGMA synchronous=NORMAL, they will get an ACI-reliable database that is way faster than LMDB with write-to-disk workloads.

    Most of the measurements on that page are not really useful to me; they do not persist the data (db on tmpfs) or they do not care about reliability, using dangerous settings. Only few measurements are both persisting the data and writing safely, but the SQLite configuration is sub-optimal for them.

  • jimis 5 months ago

    It's not that simple. What you mean by saying "(a)synchronous mode" is very different from database to database. See my above comment. Is SQLite synchronous or asynchronous if you configure it as journal_mode=WAL and PRAGMA synchronous=NORMAL?

    For my purposes as an application developer, I care about comparing databases operating in safe mode i.e. a system crash should never cause total data loss. According to my experience SQLite's safe mode is many times faster than LMDB's safe mode, with a write workload. While LMDB is thrashing the disk and achieves only a handful of write transactions per second.

  • hyc_symas 5 months ago

    SQLite's safe mode is not comparable to LMDB's; SQLite is vulnerable to silent data loss in a crash.


    LMDB is not.

  • jimis 5 months ago

    Isn't this from the paper "all filesystems are not created equal"? [1] If you search the tables for "sqlite-wal" you will see that it shows zero vulnerabilities.

    [1] https://www.usenix.org/system/files/conference/osdi14/osdi14...

  • hyc_symas 5 months ago

    Ah, the earlier result was for SQLite-rollback, their default mode. IMO any DB should default to its safest mode.

  • marlinsearch 5 months ago

    I have been comparing multiple key-value stores and lmdb has been the simplest and fastest to use with or without MDB_NOSYNC. It beats almost every kv store in writes.. and in reads it is untouchable by anything by a large magnitude.

  • jimis 5 months ago

    Agreed for the read workloads, I have the same experience. It was designed primarily for an LDAP directory after all, an application that is very read-heavy.

  • ovao 5 months ago

    Where on this page does it say that LMDB was run with MDB_NOSYNC? Tests were run with the DB on a tempfs and then again (lower down on the page) on ext4.

  • jimis 5 months ago

    Section 4:

    "using a 512GB Samsung 830 SSD and an ext4 partition.

    The actual drive characteristics should not matter because the test datasets still fit entirely in RAM and are all using asynchronous writes"

  • CardenB 5 months ago

    How does this compare to sstable?

  • nilsocket 5 months ago

    Didn't they review badger from dgraph.


    Here is the comparison back in 2017: https://blog.dgraph.io/post/badger-lmdb-boltdb/

    It supports concurrent ACID transactions with serializable snapshot isolation (SSI) guarantees.

  • drzaiusx11 5 months ago

    Hopeful this goes better than WebSQL. WebSQL ran into standardization problems because they started with an existing implementation (SQLite) instead of a spec as their base implementation--meaning anyone making a greenfield implementation would need to implement all the quirks of SQLite to be compatible with other implementations. Sadly it died as a standard and we were stuck with either local storage or indexdb.

  • Mossop 5 months ago

    There is no intent to standardize this or expose it to the web. This is purely for use internal to Firefox. Of course that may end up meaning that it is used as the internal storage for some web feature.

  • drzaiusx11 5 months ago

    The first sentence on that page says "We propose the standardization of a simple key-value storage...usable from JS, Java, Rust, Swift, and C++"

    I assumed this meant an API callable from webasm/js. Did I miss something?

  • drzaiusx11 5 months ago

    Ah, found my error:

    "Not-yet or never goals for this proposal are:

    Standardization via a standards body as a web API."

    So this is "internal" stuff I guess.

  • Mossop 5 months ago

    Poor wording perhaps. There are a bunch of places in Firefox that use key-value stores, this proposes standardising on one type of store.

  • lclarkmichalek 5 months ago

    I'm really surprised LSM trees didn't get more commentary. 'All are targeted at server workloads' - sure, but they're also incredibly popular and appear to be as close to the 'one-size-fits-all solution for storage' as we've found.

  • coleifer 5 months ago

    Also typically more complicated and require a separate compaction process. They're good for writing lots of data, but not so great for random reads.

  • mrjn 5 months ago

    Author of Badger here. Our design of separating keys and values has gotten us incredibly fast writes, while still keeping the read latencies neck-to-neck against B+ trees. Worth checking out: https://github.com/dgraph-io/badger

  • digikata 5 months ago

    Badger looks nicely done! Did you end up needing to change much in implementing Badger from what was described in the WiscKey paper?

  • dana321 5 months ago

    Badger is really good, i'm using it.

  • nekitamo 5 months ago

    This looks very interesting. Do you have any builtin compression or tips on compressing data?

  • guachesuedehack 5 months ago

    Nice, why the choice of go ?

  • thstart 5 months ago

    “Please note that Badger obtains a lock on the directories so multiple processes cannot open the same database at the same time.”

    With RocksDB there is not such limitation. Do you plan to lift it?

  • coleifer 5 months ago

    I mean...whatever, sure I'll take your word for it. I don't recall anyone talking about badgers though.

  • dang 5 months ago

    Please don't be a jerk in comments here.

    Edit: sadly, it looks like you've been posting quite a few uncivil comments to HN. We ban accounts that do that, so would you please review https://news.ycombinator.com/newsguidelines.html and follow the rules from now on?

    The idea here is: if you have a substantive point to make, make it respectfully and thoughtfully; if you don't, please don't comment until you do.

  • coleifer 5 months ago

    Hey, I know you care a lot about this forum. Sometimes its easy to forget what this place means to some folks and just do a drive-by shitpost. I hear you.

  • iplusplus 5 months ago

    In fact he cares so much about this forum that he won't let 12-year members participate fairly on it, and claims that mocking him when he acts like a dictator is "abusive to the community." He considers himself the community.

    Don't take this place too seriously. You may find that 12 years later, all of your hard work in writing quality comments is for nothing, just because of the whims and moods of Pilate.

    If you try to ask him "Why?" he will not explain the reasons (unless they are very clear-cut and obvious) and instead falls back on demanding respect and unquestioning obedience. It's a shame, because pg's reign was at least tactful.

    The worst part is, if you confront him about any of this, he will claim it's not true and ban you permanently if you persist. (Well, if you persist publicly, at least.)

    It's a good idea to suck up, though. If you do want to participate earnestly and in the long-term, I suggest getting used to that.

  • catwell 5 months ago

    In case someone involved in this reads this thread: the document does not specify which LMDB version you tested. I suggest you run your tests with the `mdb.master` branch, i.e. the work towards a future 1.0, and not the stable 0.9 branch. The answers to several of your interrogations will depend on that: with `mdb.master` you can use the VL32 mode which greatly improves usage on 32 bit platforms, and Windows support is much better.

    Regarding NFS: I have recently started testing LMDB on NFS v4 and had no issues so far, but with a single process using the database. AFAIK the warning at http://www.lmdb.tech/doc/ is only for multiple processes using the DB concurrently. I am still not entirely sure there won't be any mmap-related issues, but so far so good.

    Regarding "being careful": this is a very important point. The LMDB API does not hold your hand, it lets you do dangerous things which will result in corruption of your database, which you will discover too late. I suggest writing a wrapper around the API to ensure you are using it correctly. (I wish there was a compile flag like LUA_USE_APICHECK [1] for LMDB, which could help detect problems like this, but there isn't.)

    [1] https://www.lua.org/manual/5.3/manual.html#4

  • hyc_symas 5 months ago

    The main reason for that warning about NFS is that people will try it, see that it seems to work, and then get careless and try to use the same DB from two different hosts at once. It's inevitable when you're working on files living on networked filesystems, and it cannot work. NFS doesn't offer any cache coherency guarantees, and the mutexes used for synchronizing writers only work on the host that created them.

    Not sure what you're talking about re: the API letting you corrupt your database.

  • justin66 5 months ago

    If we're using lmdb for something that's largely or entirely read only, how viable is using it on NFS with (shudder) lock files or something like that?

  • hyc_symas 5 months ago

    If it's 100% read-only, you could probably use it safely. Make sure its filesystem on the NFS server is mounted read-only, and obviously all the clients' NFS mounts must also be read-only. As for "largely read only" - if there are any writers at all, all bets are off.

    LMDB automatically detects read-only filesystems, and turns off its locking in that case, so it should perform as well as anyone could expect NFS to perform.

  • catwell 5 months ago

    Things like closing DBIs, transactions and cursors in the wrong order, for instance.

  • hyc_symas 5 months ago

    That won't corrupt the DB. It might corrupt the heap of your current process, but the DB won't be harmed.

    C doesn't have destructors. If you free stuff in the wrong order, you lose. That's nothing to do with LMDB's API. Even if LMDB cleaned everything up internally, you'd still have dangling pointers in your app space. No language or API can prevent that.

    The only thing you can do in the API that might possibly corrupt the DB is to use MDB_RESERVE and then store a value bigger than the space you reserved. That generally causes a SEGV, and you'll discover very quickly that you have a bug in your code. LMDB will fail fast, every time, and every failure will be a bug in your own code. Makes debugging very quick.

  • catwell 5 months ago

    Glad to hear that.

    To be honest the only real corruption issues I had with LMDB in practice were dues to me using MDB_NOSYNC, and broken mmap behavior on some Android devices (on external storage).

  • adobeeee 5 months ago

    I have a layman question if somebody could please answer. I have never in my entire life seen databases fail. But db failures and issues seem to be brought up all the time. Now I understand that part if this maybe the cost function associated with them. But I'm sure there's also something that I have no clue about. So my questions are:

    1) what kind of problems do databases actually face.

    2) what kind of scenarios create those problems.

    3) how does a programmer go about testing them?

  • mbreese 5 months ago

    The easiest scenario to imagine is a hardware failure or power outage. The database was in the middle of doing something, and then was prevented by a hard drive dying or the lights going out. One way to test such a thing is to literally unplug the computer to see how it handles the failure.

    So, let's say you have a client/server application... the client is telling the server (database) to write some records to the database. In the middle of the write, you pull the plug. Some questions you'd want to know: what does the database look like when it restarts? Can we read it? What is the current state? Did any of the new data get written? What does the client think was written? If there was an uncommitted database transaction, was the database left unaltered?

    It's just as important to test the client in these scenarios. While the server may have crashed, what does the client think happened? Was it waiting for an ACK or "OK" message? Did it get the message? If the update failed, what does the client do in that situation?

    Things can get even more complicated if you're thinking of replication across different servers. If one of the servers fails, how does the replication work? Do sessions fail over to other servers? How many servers are required? If there was a corrupted record, did it propagate or was it scrubbed?

  • adobeeee 5 months ago

    Thank you for explaining so well and clearly!

    To you and others, are there any other scenarios too that happen in production?

  • philix001 5 months ago

    Enough material here to scare anyone about databases


  • marcosdumay 5 months ago

    Some things I have seen in production:

    The disk become inoperative during a write, this can be either silent or writes start to return errors. Again, how does the database look like after the problem is solved.

    A large operation exceeds the capacity of the server to deal with intermediary state. It runs out of memory, disk, or in some not great DBs it loses control of some locks and gets deadlocked. Can it recover with only the partial log data?

    Disks lie about data being written, what happens if one of the problems happen between the disk saying the data was written and it actually getting written?

    And, of course, when you move beyond a single server things get way more complex.

  • positr0n 5 months ago

    You'd be surprised by how often I've seen a database fail in prod simply because it ran out of disk space. In both cases monitoring software was running but misconfigured.

  • tfha 5 months ago

    We ship a consumer application with databases in it (multiple). Even the ACID ones fail all the time, maybe 1-3% of our userbase has had a corruption at some point.

  • adobeeee 5 months ago

    Can you talk about the reasons of the failure?

  • shekispeaks 5 months ago

    This doc says LevelDB has no transacion, thats not true, they have batch writes and LEVEL-DB is not implemented in Go. Its implemented in CPP

  • derefr 5 months ago

    > LEVEL-DB is not implemented in Go. Its implemented in CPP

    There is more than one implementation of LevelDB. This one (https://github.com/syndtr/goleveldb), in Go, is used in major projects such as https://github.com/ethereum/go-ethereum.

  • floatingatoll 5 months ago

    Transactions let you perform any series of SQL commands with various expectations around data safety and locking guarantees depending on isolation level.

    Batch writes provide a tiny subset of the full possibilities of transactions. While sufficient in many cases, that cannot be generalized to "LevelDB supports transactions".

  • romed 5 months ago

    That's just your pet definition of transaction. That is not universally accepted.

  • barrkel 5 months ago

    Atomicity and consistency. The classic case is balance transfer. Batch writes only cope with this to the degree that you can model your computation purely in database primitives that can execute server-side as part of the batch. If your computation involves anything else, such as external data or a computation that can't be modelled, then microbatches aren't good enough: you risk lost updates and inconsistent reads.

  • floatingatoll 5 months ago

    Universal acceptance is not a valid criterion for human beings at scale.

    Schizophrenic people don’t accept facts like “this wall exists”, yet we still reach agreement as a society that a wall exists and don’t try to walk through walls because they exist.

    You are of course welcome to hold any belief you wish, but believing something that puts you in direct contradiction to an entire industry significantly raises the bar of proof you must provide in order to convince others to listen to you.

    Your reply does not provide that proof, and thus your argument is not persuasive.

  • Twirrim 5 months ago

    I would love to see an example of someone actually knowledgeable about databases who has a different definition for transaction.

  • erpellan 5 months ago

    Atomicity, consistency, isolation, durability.


    But that's just like, the industry's opinion, man.

  • Twirrim 5 months ago

    Maybe I completely misunderstood the parents description of a transaction, but that was exactly what they were saying.

  • sidcool 5 months ago

    This is a great example of how to conduct a technical design review.

  • mamcx 5 months ago

    I see their have some questions (like how good is the windows and android support) that are not answered? or only internally? I think will be good to see what them found.

  • hyc_symas 5 months ago

    Windows and Android are fully supported, have been for years. As are iOS and MacOSX and all the BSDs.

  • mamcx 5 months ago

    Thats great to know, and I suspect that. But what I'm saying is that them ask some valid questions but where are the results of their findings?

    Because if I read this, I could conclude LMDB could have troubles in that areas...

  • jinqueeny 5 months ago

    Curious about how this compares to TiKV...

  • 5 months ago
  • AboutTheWhisles 5 months ago

    It really sounds like they should take a look at this:


    "One appealing aspect of LMDB is its relative ease of use from multiple processes, above and beyond its basic capabilities as yet-another-fast-key-value-store."

    simdb is only lock free and thread safe. While LMDB is benchmarked at around 10k writes, this should be able to do millions of mixed reads and writes with 4 modern cores. LMDB seems to use a separate lock file to sync multiple threads/processes. The only catch here is that the keys aren't sorted, which doesn't seem to be a requirement of theirs.

  • erichocean 5 months ago

    LMDB is awesome, can't tell how many times I've wished to have access to it from the browser...

  • stevewilhelm 5 months ago

    How does this compare to Redis?

  • detaro 5 months ago

    Redis isn't embeddable, but a standalone server application, and thus it's not really the same space.

  • bgilroy26 5 months ago

    Thank you for asking this question. I don't know a lot about databases, but I would have thought redis, as a major key-value store, would have been part of this evaluation. So your answers were helpful.

  • nosequel 5 months ago

    It actually stores data.

  • smacktoward 5 months ago

    Bring back Mork!


  • jcranmer 5 months ago

    Mork actually sucked even as a key-value store. It's only decent if your requirements are a) only lookup on a fixed, autoincrement integer ID, b) the only operation you're likely to do is load an entire record or store an entire record at once, and c) parallelism is not in your vocabulary.

    Disclaimer: I'm one of the last people to make functional changes to mork.

  • majewsky 5 months ago

    That's not a "Disclaimer". It's actually the polar opposite... a "Claimer"? ... I think "Source" is the best word to use there.

  • pvg 5 months ago
  • majewsky 5 months ago

    Hah! :) I've been meaning to call out other misuses of "Disclaimer" vs. "Disclosure" before, but I usually don't because it's just tedious.

  • espeed 5 months ago

    NB: NB works for me.

  • amelius 5 months ago

    Seems like an instance of reinventing the wheel ...

  • wvenable 5 months ago

    From the article:

    "We propose ‘buying’, not building, the core of such a solution, and wrapping it in idiomatic libraries that we can use on all platforms.

    We propose that LMDB is a suitable core (see Appendix A for options considered): it is compact (32KB of object code), well-tested, professionally maintained, reliable, portable, scales well, and is exceptionally fast for our kinds of load. We have engineers at Mozilla with prior experience with LMDB, and their feedback is entirely positive."

  • cpeterso 5 months ago

    How so? The project is building upon an existing key-value store (LMDB) to replace some custom data stores in Firefox using flat text files, JSON files, Kinto, and SQLite.

  • ska 5 months ago

    Curious comment. My reading of it is exactly the opposite of reinventing any wheels.

  • alexnewman 5 months ago

    Why do they make this proposal? What does that mean?

  • Mossop 5 months ago

    It's a proposal for making use of a new storage engine to store Firefox internal data.

  • acqq 5 months ago

    I thought that the issues like this would be blocking:


    Who knows how it should work on 32-bit systems?

    And isn't endianess also a problem? And doesn't SQLite solve both problems by default? And isn't also possible to configure SQLite to be very fast, if one know what he's doing?

    Btw I'd expected that the "notes here" https://docs.google.com/document/d/1bwbpqPb58a0GcEyB4W424pyf... are conclusions, but the document seems to be not publicly accessible.

  • mykmelez 5 months ago

    According to @hyc in https://monero.stackexchange.com/questions/2606/are-the-lmdb..., "As of v0.10.0, yes the LMDB files are cross-compatible between 32 and 64bit architectures. They have always been cross-compatible between OSs. They are still byte-order dependent but almost everyone uses little-endian CPUs these days so it's not much of an issue."

  • acqq 5 months ago

    So the endianness answer is, from your link: "They are still byte-order dependent but almost everyone uses little-endian CPUs these days so it's not much of an issue." Which just means "we solve the problem by ignoring it completely."

    And what about the limitations on the 32-bit system? Isn't there also needed to use memory space of RAM for the complete size of the database, that's how that database works if I understood? Which makes LMDB effectively unsuitable for 32-bit systems:


    "your user will need to either enable PAE on their system, or upgrade to 64-bit CPU. If neither of these is an option in your application, then you cannot use a memory mapped file larger than your available address space"

    In short it still looks that LMDB is designed effectively only for one-endianness and only for 64-bit systems, which is still limiting for many use cases.

    Again, it seems that for LMDB is again "solving" the problem by ignoring it. Which is OK if it fits your use case... But shouldn't Firefox work properly on 32-bit systems? Or did they completely decide that they don't want to target any 32-bit platform anymore?

    Moreover, there are use-case scenarios where the memory mapped file approach can suffer from problems of unnecessarily reading the page that will anyway be completely overwritten. My conclusion is... if LMDB "works for you" fine, but do properly your research first. I still believe SQLite covers much more use cases and is safer starting point for many of them, including Firefox use cases, until I read that they really decided to reduce these.

  • hyc_symas 5 months ago

    The endianness answer is "we thought about it, and it's a non-problem 99% of the time, and not worth further thought." You can't use a DB concurrently from multiple hosts, so it's unnecessary to support both endiannesses at once. If you need to copy a DB between different endian hosts, export it with mdb_dump and reimport it with mdb_load on the other machine. Don't be an ass.

    There are no limitations on 32bit systems using MDB_VL32 in the 1.0 branch, you're spewing crap without any idea what you're talking about. Whoever's writing on stackoverflow doesn't know what they're talking about either.

    > Moreover, there are use-case scenarios where the memory mapped file approach can suffer from problems of unnecessarily reading the page that will anyway be completely overwritten.

    That's only true if you use a writable mmap, and the DB is larger than RAM. The default for LMDB is not to use a writable mmap, so this isn't an issue that affects most LMDB users. It is also already documented, so you choose it at your own risk.

  • mykmelez 5 months ago

    > Again, it seems that for LMDB is again "solving" the problem by ignoring it. Which is OK if it fits your use case... But shouldn't Firefox work properly on 32-bit systems? Or did they completely decide that they don't want to target any 32-bit platform anymore?

    Yes and no, respectively. Yes, Firefox should work properly on 32-bit systems; and no, we didn't completely decide that we don't want to target 32-bit platforms anymore.

    Rather, Firefox plans to use LMDB where it fits one of its many use cases for persistent storage (and not use it where it doesn't).

    The StackOverflow thread you referenced describes a blockchain program that expects to use "a few GB of lmdb diskspace." Whereas Firefox's use cases for LMDB are sized in the range of a few KiB to a few MiB. Firefox has no plans to start storing a blockchain.

    Claimer: I'm the engineer integrating LMDB into Firefox.

  • jnwatson 5 months ago

    The last issue is merely one of performance, not capability.

    I’ve successfully used LMDB in 32-bit land, though it takes some effort. I had to scan available virtual memory for the largest contiguous chunk and use that.

    Bigger issues are growing the database size and performance in low memory. No transactions can be running to increase the map size, which is usually hard to coordinate.

    Also, in low memory situations, LMDB’s performance suffers tremendously. It can be 100x slower, and commits can take seconds. You won’t run into it unless you are really hammering it, and the developer usually won’t notice because they usually have lots of RAM.

  • hyc_symas 5 months ago

    Fwiw, we've benchmarked on Raspberry Pis with slow SDcards. You want to talk about low RAM situations and slow I/Os. The reality is still the same though - every other DB engine is many times slower under the same conditions.

    Even when the DB is 5x or 50x larger than RAM... http://www.lmdb.tech/bench/hyperdex/

  • hyc_symas 5 months ago

    re: 32bit land, You should look into MDB_VL32 in mdb.master as other posters have suggested.

    If any single commit takes multiple seconds, you've probably written too much data in a single transaction. Above about 512MB it starts preemptively flushing intermediate pages out, to keep the in-RAM footprint limited. Some of those intermediate pages will need to be read back in if your transactions are hitting all over the DB. This is where the main slowdown comes.