SSTables: Immutable Storage, Indexing, and Compaction
Sorted String Tables, or SSTables, are one of the foundational components of log-structured storage engines. Although they may appear simple—just sorted key-value files on disk—their design enables high write throughput, predictable read performance, and efficient compaction across multiple storage levels. SSTables are central not only in systems like LevelDB, RocksDB, and Bigtable, but also in many distributed engines and modern analytical systems.
This note explores how SSTables are structured, how indexing and bloom filters minimize I/O, and why compaction strategies dramatically shape engine performance and latency.
What an SSTable Really Is
An SSTable is an immutable, sorted file storing key-value pairs. Once written, it is never modified in place. This immutability simplifies concurrency, crash recovery, and merging operations, while allowing the engine to store data in optimized layouts.
A conceptual structure looks like this:
+-------------------------------+
| Data Blocks (sorted KV) |
| Data Blocks (sorted KV) |
+-------------------------------+
| Index Block (key → offset) |
+-------------------------------+
| Bloom Filter |
+-------------------------------+
| Footer (offsets & metadata) |
+-------------------------------+
Each block is typically compressed and self-contained. The index enables the engine to locate the right block without scanning the entire file. The bloom filter serves as a fast probabilistic check to determine whether a key is possibly in the file.
Together, these pieces allow SSTables to remain both compact and efficient, even at scale.
Minimizing I/O With Bloom Filters
Searching for a key across multiple SSTables could become expensive if the engine had to examine each file. Bloom filters solve this by storing hashed fingerprints of keys.
A bloom filter answers one crucial question:
Is this key definitely not in this SSTable?
If the filter responds no, the engine avoids costly disk reads entirely. If it responds maybe, the engine consults the index and relevant block.
This drastically reduces I/O amplification, especially in multi-level LSM structures where dozens of files may coexist in lower levels.
Bloom filters are probabilistic structures—false positives are allowed, false negatives are not—making them perfect companions for immutable storage.
Compaction: Maintaining Order and Eliminating Redundancy
As LSM engines flush memtables into new SSTables, older versions of keys accumulate across levels. Compaction is responsible for merging files, discarding obsolete entries, and reorganizing data to maintain bounded read costs.
Two major compaction strategies dominate in modern engines:
Tiered Compaction
Tiered compaction (sometimes called size-tiered or level-0 style) groups SSTables of similar sizes and merges them when thresholds are reached.
Level 0:
[File A] [File B] [File C]
merge ↓
Level 1:
[Merged File]
Characteristics:
- Fewer compactions overall
- Heavy merges occurring less frequently
- Larger write-amplification when merges eventually happen
- Good for high write throughput
Tiered systems delay work—but when compaction occurs, it is substantial.
Leveled Compaction
Leveled compaction aims for predictable read performance by keeping each level’s total size bounded. Files are broken into smaller, non-overlapping ranges.
merge ↓
Level 1: [R1][R2][R3] (non-overlapping ranges)Characteristics:
- Lower read amplification
- Higher write amplification due to frequent, incremental merges
- More predictable latency for point lookups
- Popular in RocksDB due to balanced performance
Leveled systems transform compaction into a continuous background process, trading CPU and I/O for predictable search times.
Performance and Latency Trade-offs
The choice of compaction strategy profoundly affects performance:
| Factor | Tiered | Leveled |
|---|---|---|
| Write throughput | Higher | Moderate |
| Read latency | Less predictable | More predictable |
| Write amplification | Low to moderate | High |
| Space amplification | Higher | Lower |
| CPU cost | Bursty | Continuous |
SSTables themselves are efficient structures, but it is compaction that defines how they behave over time. Understanding these dynamics is crucial when selecting or tuning a storage engine.
Conclusion
SSTables embody the core principles of log-structured design: immutability, sorted data, and optimized access paths. With help from bloom filters and carefully designed compaction strategies, they allow storage engines to achieve high throughput while maintaining acceptable read performance.
While their simplicity is attractive, their long-term behavior is governed by compaction—an essential yet costly process that shapes performance, latency, and resource consumption. Mastering SSTables means understanding both their internal structure and the algorithms that keep them manageable.
Recommended References
- Google — Bigtable: A Distributed Storage System for Structured Data
- LevelDB — SSTable File Format
- RocksDB — Leveled vs Tiered Compaction
- O’Neil et al. — The Log-Structured Merge-Tree (LSM-Tree)
- Sears & Ramakrishnan — bLSM: A General Purpose Log Structured Merge Tree