Once a key-value store gets serious write traffic, the bottleneck is often no longer hashing or indexing. It is the physical act of rewriting scattered pages on disk. LSM trees change the game by making the write path almost entirely sequential, then paying the cleanup bill later through compaction.
A B-tree is excellent when the storage medium tolerates random page updates cheaply. But a busy write workload turns every tiny logical change into a small storm of physical work: append to the write-ahead log, rewrite the leaf page, maybe rewrite an internal page, maybe split, and eventually flush all of that through the filesystem and device.
The pain is not that B-trees are bad. The pain is that updating in place creates write amplification. A 40-byte value update can force kilobytes of page traffic, plus random I/O that is much harder for storage hardware to batch efficiently than a single append stream.
Each overwrite reaches into existing structure. The fast path is still several writes, and unlucky writes trigger page splits.
Durability comes from an append-only log, while the in-memory table absorbs updates until it is flushed as a new sorted run.
The log-structured merge tree flips the write path on purpose. New writes go to two places: an append-only log for durability, and an in-memory sorted structure usually called the MemTable. When the MemTable fills, it is frozen and flushed out as a new immutable sorted file: an SSTable.
That means writes are cheap right now, but reads now have to consult multiple places: the MemTable, maybe a recently frozen MemTable, and several SSTables on disk. The system accepts that temporary disorder because compaction will later merge those files into larger, cleaner levels.
Immutable SSTables make the write path fast, but they create a new problem: multiple versions of the same key accumulate across files. Deletes become tombstones. Reads may have to search several runs. So the system keeps merging files down the tree.
This is why compaction is not an incidental optimization. It is the second half of the data structure. The write path borrows time from the future, and compaction is how that debt gets paid.
Point lookups are where a naive LSM can feel painful. If the key is absent, the system may have to ask file after file, "Do you have this key?" and pay disk work for repeated "no" answers. A Bloom filter changes the economics of that question.
Many LSM systems attach a compact probabilistic summary to each SSTable or data block. On lookup, the filter can say either definitely not here or maybe here. That means many negative lookups can skip opening the data file entirely. The only tolerated mistake is a false positive, never a false negative.
The implementation below is deliberately small, but it includes the essential pieces: a MemTable, immutable SSTables, tombstones for deletes, per-SSTable Bloom filters, point lookups from newest to oldest, and a compacting merge that rewrites several files into one cleaner run.