CortexDB v1

1. Overall Vision of V1

V1 establishes the foundations of CortexDB: a persistent, crash-safe key-value storage engine built around a minimal LSM-tree architecture.

Main objectives:

Persistent key to value storage on disk
Minimal API: put(key, value), get(key), delete(key)
Single process, single file (or directory) on disk
Tolerate simple crashes thanks to the WAL
Minimal LSM-tree architecture

2. V1 Non-Goals

No compaction
No Bloom filters
No vector search
No range queries
No replication
No SQL
No advanced performance optimization
No compression
No multi-threading
No external clients (Python/Node), postponed to V3

3. V1 Architecture

3.1. Main Components

┌───────────┐
│   Client  │
└─────┬─────┘
      │  put(key, value)
      ▼
┌───────────────────┐
│       WAL         │
│  (append-only log)│
└───────┬───────────┘
        │ replay
        ▼
┌───────────────────┐
│     Memtable      │
│ (HashMap<Vec<u8>, │
│      Vec<u8>>)    │
└───────────────────┘

3.2. Modules

src/db.rs: DB struct and public API
src/memtable.rs: in-memory structure (HashMap) for recent writes
src/wal.rs: append-only log file to replay the memtable state after a crash
src/sstable.rs (optional in V1): can be postponed to V2 for simplicity

4. WAL (Write-Ahead Log)

4.1. Simple Format

Binary format of an entry:

+----------+-----------+------------+-----------+-----------+
| op (u8)  | key_len   | val_len    | key bytes | val bytes |
|          | (u32)     | (u32)      |           |           |
+----------+-----------+------------+-----------+-----------+

op = 1 → PUT
op = 2 → DELETE

4.2. V1 Rules

The WAL must be flushed after each append()
The WAL must be replayed strictly in order at startup
DELETE only stores the key (val_len = 0)
Empty values are allowed
If the WAL is corrupted, use a simple strategy (stop or partial skip)

5. Memtable

In-memory structure (HashMap) for recent writes.

Stores key and value pairs in RAM
A None value represents a tombstone (deletion)
Unbounded size in V1, no automatic flush

6. External API V1

In pseudo-code:

db = DB.open(path="./data")
 
db.put("user:1", b"Alice")
value = db.get("user:1")
db.delete("user:1")

7. V1 Invariants

Every write must go through the WAL before reaching the Memtable
The Memtable is always fully reconstructed from the WAL at startup
A newer key always overrides an older one
A tombstone overrides all previous versions of a key
The engine must be able to restart after a crash without losing consistency

8. Engine State After Crash

Full WAL replay
Memtable reconstruction
The engine is ready to serve reads

9. Local Server

Note: The local server and Python or Node clients are postponed to V3 in order to stabilize the API first. V1 focuses exclusively on the core engine.

10. Required Tests for V1

WAL:

Simple replay
Replay after reopen
Empty keys or empty values
Large values
Delete
Crash recovery

Memtable:

Put, get, delete
Tombstones
Reconstruction from WAL

Integration:

put → get → delete → get
crash → reopen → get (must retrieve the data)

11. V1 Objective

At the end of V1, CortexDB is:

Functional: basic API operational
Persistent: data stored on disk
Crash-safe: recovery after crash via WAL
Simple: minimal and understandable architecture

This is the foundation on which all subsequent versions will be built.