1. Overall Vision of V1
V1 establishes the foundations of CortexDB: a persistent, crash-safe key-value storage engine built around a minimal LSM-tree architecture.
Main objectives:
- Persistent key to value storage on disk
- Minimal API:
put(key, value),get(key),delete(key) - Single process, single file (or directory) on disk
- Tolerate simple crashes thanks to the WAL
- Minimal LSM-tree architecture
2. V1 Non-Goals
- No compaction
- No Bloom filters
- No vector search
- No range queries
- No replication
- No SQL
- No advanced performance optimization
- No compression
- No multi-threading
- No external clients (Python/Node), postponed to V3
3. V1 Architecture
3.1. Main Components
┌───────────┐
│ Client │
└─────┬─────┘
│ put(key, value)
▼
┌───────────────────┐
│ WAL │
│ (append-only log)│
└───────┬───────────┘
│ replay
▼
┌───────────────────┐
│ Memtable │
│ (HashMap<Vec<u8>, │
│ Vec<u8>>) │
└───────────────────┘
3.2. Modules
src/db.rs:DBstruct and public APIsrc/memtable.rs: in-memory structure (HashMap) for recent writessrc/wal.rs: append-only log file to replay the memtable state after a crashsrc/sstable.rs(optional in V1): can be postponed to V2 for simplicity
4. WAL (Write-Ahead Log)
4.1. Simple Format
Binary format of an entry:
+----------+-----------+------------+-----------+-----------+
| op (u8) | key_len | val_len | key bytes | val bytes |
| | (u32) | (u32) | | |
+----------+-----------+------------+-----------+-----------+
op = 1→ PUTop = 2→ DELETE
4.2. V1 Rules
- The WAL must be flushed after each append()
- The WAL must be replayed strictly in order at startup
- DELETE only stores the key (
val_len = 0) - Empty values are allowed
- If the WAL is corrupted, use a simple strategy (stop or partial skip)
5. Memtable
In-memory structure (HashMap) for recent writes.
- Stores key and value pairs in RAM
- A
Nonevalue represents a tombstone (deletion) - Unbounded size in V1, no automatic flush
6. External API V1
In pseudo-code:
db = DB.open(path="./data")
db.put("user:1", b"Alice")
value = db.get("user:1")
db.delete("user:1")7. V1 Invariants
- Every write must go through the WAL before reaching the Memtable
- The Memtable is always fully reconstructed from the WAL at startup
- A newer key always overrides an older one
- A tombstone overrides all previous versions of a key
- The engine must be able to restart after a crash without losing consistency
8. Engine State After Crash
- Full WAL replay
- Memtable reconstruction
- The engine is ready to serve reads
9. Local Server
Note: The local server and Python or Node clients are postponed to V3 in order to stabilize the API first. V1 focuses exclusively on the core engine.
10. Required Tests for V1
WAL:
- Simple replay
- Replay after reopen
- Empty keys or empty values
- Large values
- Delete
- Crash recovery
Memtable:
- Put, get, delete
- Tombstones
- Reconstruction from WAL
Integration:
- put → get → delete → get
- crash → reopen → get (must retrieve the data)
11. V1 Objective
At the end of V1, CortexDB is:
- Functional: basic API operational
- Persistent: data stored on disk
- Crash-safe: recovery after crash via WAL
- Simple: minimal and understandable architecture
This is the foundation on which all subsequent versions will be built.