Note #3•4 min read

LSM Trees: The Log-Structured Revolution

Modern storage engines must balance durability, write performance, read efficiency, and predictable behavior under load. Traditional B-Tree architectures excel at indexed lookups but struggle under high write throughput or when operating on hardware with limited I/O bandwidth. Log-Structured Merge Trees (LSM Trees) emerged as an alternative design that rethinks the write path entirely.

Instead of modifying data structures in place, LSM-based engines embrace an append-only philosophy: writes are first accumulated in memory, then flushed to disk as immutable files. This simple shift enables engines like Bigtable, LevelDB, and RocksDB to sustain massive ingestion rates while remaining CPU-efficient and resilient across diverse environments.

This note examines the principles behind LSM Trees, their architecture, how they differ from B-Trees, and why they have become foundational to modern storage systems.

The Log-Structured Idea

At the heart of an LSM Tree is a surprisingly elegant principle: never overwrite data on disk.
In place of random writes, the engine performs sequential appends—an operation that hardware handles far more efficiently.

When a write arrives:


Write → Memtable (in-memory structure)
│
└─ flush → immutable SSTable on disk

The memtable acts as a fast buffer, indexed for quick access. When it fills up, it is flushed into an immutable file format known as an SSTable (Sorted String Table). Because SSTables are sorted and append-only, the engine avoids costly random writes and eliminates the risk of in-place corruption.

This architecture is inherently CPU-first: it leverages inexpensive memory and sequential disk bandwidth rather than relying on low-latency random I/O.

From Memtables to SSTables: Building the Storage Pyramid

An LSM engine organizes data across multiple layers:

        +----------------------+
        |       Memtable       |  (in-memory, fast writes)
        +----------------------+
                 │ flush
                 ▼
        +----------------------+
        |     SSTable Level 0  |  (small, recent files)
        +----------------------+
                 │ compaction
                 ▼
        +----------------------+
        |     SSTable Level 1  |
        +----------------------+
                 │ ...

Memtable

A balanced tree or skiplist storing recent writes. Fast, mutable, and easily flushed.

SSTables

Immutable, sorted disk files. They enable efficient merging, sequential reads, and compact storage layouts.
Each file contains:

key-value blocks,
indexes to locate blocks,
bloom filters to skip irrelevant files.

By keeping files immutable, the engine avoids complex concurrency issues and simplifies crash recovery: if the process dies, the WAL + flush logic reconstructs a valid state.

Compaction: The Hidden Engine Room

Without maintenance, SSTables would accumulate indefinitely, making reads inefficient. LSM engines solve this with compaction, a background process that merges files, removes outdated keys, and maintains sorted invariants.

A simplified view:


L0 SSTables
│
├─→ Compaction → merged SSTables in L1
│
└─→ obsolete files removed

Compaction ensures:

bounded read amplification,
garbage collection of overwritten/deleted keys,
predictable file sizes and disk layout.

Yet compaction is also the main trade-off of LSM systems: while it improves long-term efficiency, it consumes CPU and I/O, which must be carefully regulated.

LSM vs. B-Tree: Two Philosophies

Although both aim to index large datasets efficiently, they rely on opposing strategies:

Aspect	B-Tree	LSM Tree
Write path	Random I/O, update in place	Append-only, sequential
Read performance	Strong for point lookups	Good, but may require checking multiple levels
Space management	Requires splitting/merging nodes	Compaction reclaims space
Durability	WAL + page writes	WAL + immutable files
Strength	Predictable reads	Exceptional write throughput
Weakness	Slow on heavy writes	Background compaction cost

Neither architecture is “better”, they are optimized for different workloads and environments. But as systems increasingly require high ingestion rates, cloud-scale storage, or operation under constrained hardware, LSM Trees have become the natural fit.

Conclusion

LSM Trees represent a profound shift in storage-engine design. By embracing immutability and sequential writes, they enable high throughput, graceful scaling, and robustness across a wide range of hardware profiles. Engines like Bigtable, LevelDB, and RocksDB demonstrate that the log-structured idea is not a niche alternative—it is the backbone of many modern data systems.

Understanding LSM Trees provides essential context for interpreting performance characteristics, designing indexing strategies, and evaluating whether a storage engine is suited to a particular environment.

Recommended References

Google — Bigtable: A Distributed Storage System for Structured Data
O’Neil et al. — “The Log-Structured Merge-Tree (LSM-Tree)”
LevelDB Documentation — Design and Implementation
RocksDB Engineering — Compaction, SSTables, and Bloom Filters
Sears & Ramakrishnan — “bLSM: A General Purpose Log Structured Merge Tree”