CortexDB is built as a layered system evolving through explicit architectural steps.
It does not attempt to solve every database problem at once.
It introduces complexity progressively, under constraint.
The architecture is structured around four pillars:
- A KV-first storage core
- A structured query surface (CortexQL)
- A CPU-first vector layer
- A pragmatic distributed model
Implementation Stack
CortexDB is implemented in Rust.
The choice is architectural rather than ideological.
Rust provides:
- Deterministic memory control
- Strong safety guarantees without a garbage collector
- Fine-grained concurrency control
- Predictable performance characteristics
For a storage engine that prioritizes crash safety, resource constraints, and explicit invariants, Rust aligns naturally with the design philosophy.
The stack is intentionally minimal and system-oriented.
1. KV-First Storage Core
At its foundation, CortexDB is a persistent LSM-based key - value engine.
Core components:
- Write-Ahead Log (crash safety)
- Memtable (in-memory writes)
- SSTables (immutable disk structures)
- Compaction (read/write amplification control)
The engine prioritizes:
- determinism over hidden heuristics
- predictable memory usage
- explicit performance trade-offs
Everything else builds on this layer.
2. Structured Query Surface
CortexDB introduces structure without becoming relational.
The system remains KV-dominant, but progressively adds:
- Prefix and range scans
- Scoped transactions
- TTL semantics
- JSON document conventions
- Secondary indexing (later stage)
CortexQL executes through a pipeline model:
SCAN → FILTER → PROJECT → AGGREGATE → LIMIT
There is no global SQL planner.
No JOIN engine.
No relational abstraction layer.
Each operator maps directly to internal primitives.
3. CPU-First Vector Layer
Vector search is a first-class capability, not an external add-on.
CortexDB introduces:
- Dedicated vector collections
- Approximate nearest neighbor indexing (CPU-oriented)
- Quantization for memory reduction
- Hybrid and filtered vector search (progressively)
The design assumes:
- CPU baseline
- constrained memory environments
- persistence compatibility with the storage engine
It does not assume GPU infrastructure.
4. Replication & Offline Model
Replication is introduced pragmatically:
- Leader/follower model
- Asynchronous WAL-based replication
- Offline-first synchronization patterns
- Conflict resolution strategies for temporary divergence
The system favors convergence and operability over global coordination complexity.
5. Observability & Operational Discipline
CortexDB evolves into an operable system through:
- Integrated metrics
- Structured logging
- Query tracing
- Explicit introspection commands
Operational visibility is considered part of the architecture. Not an afterthought.
6. Distributed Model (Sharding)
Horizontal scalability is introduced late.
Design principles:
- A shard is an autonomous CortexDB instance
- Strong guarantees per shard
- No global 2PC
- No mandatory global consensus
A coordinator routes requests across shards:
Client → Coordinator → Shards
Distributed vector search follows scatter–gather patterns.
Consensus is optional and local (per shard), not global.
This avoids premature distributed complexity.
Concurrency Model
CortexDB introduces concurrency progressively.
Within a single instance:
- Writes are durably ordered through the WAL.
- Reads are optimized but remain aligned with deterministic write sequencing.
- Background tasks (compaction, indexing) operate independently.
Transactions are scoped and local.
There is no global MVCC layer and no distributed transaction model.
In sharded deployments:
- Guarantees apply per shard.
- There is no global 2PC.
- No mandatory global consensus.
Concurrency expands layer by layer:
Local determinism → Scoped transactions → Replication → Shards.
Architectural Stance
CortexDB is defined less by features than by decisions:
- KV remains dominant
- Documents are conventions, not a separate engine
- Vector search is integrated, not bolted on
- Distribution is progressive, not ideological
- Non-objectives are explicit
The system evolves from V1 to V7 through controlled expansion, never by collapsing layers into abstraction.
For detailed iteration design, see Versions & Roadmap.