Note #4•4 min read

SSTables: Immutable Storage, Indexing, and Compaction

Sorted String Tables, or SSTables, are one of the foundational components of log-structured storage engines. Although they may appear simple (just sorted key-value files on disk), their design enables high write throughput, predictable read performance, and efficient compaction across multiple storage levels. SSTables are central not only in systems like LevelDB, RocksDB, and Bigtable, but also in many distributed engines and modern analytical systems.

This note explores how SSTables are structured, how indexing and bloom filters minimize I/O, and why compaction strategies dramatically shape engine performance and latency.

What an SSTable Really Is

An SSTable is an immutable, sorted file storing key-value pairs. Once written, it is never modified in place. This immutability simplifies concurrency, crash recovery, and merging operations, while allowing the engine to store data in optimized layouts.

A conceptual structure looks like this:


+-------------------------------+
|  Data Blocks (sorted KV)      |
|  Data Blocks (sorted KV)      |
+-------------------------------+
|  Index Block (key → offset)   |
+-------------------------------+
|  Bloom Filter                 |
+-------------------------------+
|  Footer (offsets & metadata)  |
+-------------------------------+

Each block is typically compressed and self-contained. The index enables the engine to locate the right block without scanning the entire file. The bloom filter serves as a fast probabilistic check to determine whether a key is possibly in the file.

Together, these pieces allow SSTables to remain both compact and efficient, even at scale.

Minimizing I/O With Bloom Filters

Searching for a key across multiple SSTables could become expensive if the engine had to examine each file. Bloom filters solve this by storing hashed fingerprints of keys.

A bloom filter answers one crucial question:


Is this key definitely not in this SSTable?

If the filter responds no, the engine avoids costly disk reads entirely. If it responds maybe, the engine consults the index and relevant block.

This drastically reduces I/O amplification, especially in multi-level LSM structures where dozens of files may coexist in lower levels.

Bloom filters are probabilistic structures (false positives are allowed, false negatives are not). Making them perfect companions for immutable storage.

Compaction: Maintaining Order and Eliminating Redundancy

As LSM engines flush memtables into new SSTables, older versions of keys accumulate across levels. Compaction is responsible for merging files, discarding obsolete entries, and reorganizing data to maintain bounded read costs.

Two major compaction strategies dominate in modern engines:

Tiered Compaction

Tiered compaction (sometimes called size-tiered or level-0 style) groups SSTables of similar sizes and merges them when thresholds are reached.


Level 0:
[File A] [File B] [File C]
merge ↓
Level 1:
[Merged File]

Characteristics:

Fewer compactions overall
Heavy merges occurring less frequently
Larger write-amplification when merges eventually happen
Good for high write throughput

Tiered systems delay work—but when compaction occurs, it is substantial.

Leveled Compaction

Leveled compaction aims for predictable read performance by keeping each level’s total size bounded. Files are broken into smaller, non-overlapping ranges.

merge ↓
Level 1: [R1][R2][R3]   (non-overlapping ranges)

Characteristics:

Lower read amplification
Higher write amplification due to frequent, incremental merges
More predictable latency for point lookups
Popular in RocksDB due to balanced performance

Leveled systems transform compaction into a continuous background process, trading CPU and I/O for predictable search times.

Performance and Latency Trade-offs

The choice of compaction strategy profoundly affects performance:

Factor	Tiered	Leveled
Write throughput	Higher	Moderate
Read latency	Less predictable	More predictable
Write amplification	Low to moderate	High
Space amplification	Higher	Lower
CPU cost	Bursty	Continuous

SSTables themselves are efficient structures, but it is compaction that defines how they behave over time. Understanding these dynamics is crucial when selecting or tuning a storage engine.

Conclusion

SSTables embody the core principles of log-structured design: immutability, sorted data, and optimized access paths. With help from bloom filters and carefully designed compaction strategies, they allow storage engines to achieve high throughput while maintaining acceptable read performance.

While their simplicity is attractive, their long-term behavior is governed by compaction—an essential yet costly process that shapes performance, latency, and resource consumption. Mastering SSTables means understanding both their internal structure and the algorithms that keep them manageable.

Recommended References

Google — Bigtable: A Distributed Storage System for Structured Data
LevelDB — SSTable File Format
RocksDB — Leveled vs Tiered Compaction
O’Neil et al. — The Log-Structured Merge-Tree (LSM-Tree)
Sears & Ramakrishnan — bLSM: A General Purpose Log Structured Merge Tree