Mohamed KEITA

Architecture

CortexDB is built as a layered system evolving through explicit architectural steps.

It does not attempt to solve every database problem at once.
It introduces complexity progressively, under constraint.

The architecture is structured around four pillars:

  1. A KV-first storage core
  2. A structured query surface (CortexQL)
  3. A CPU-first vector layer
  4. A pragmatic distributed model

Implementation Stack

CortexDB is implemented in Rust.

The choice is architectural rather than ideological.

Rust provides:

  • Deterministic memory control
  • Strong safety guarantees without a garbage collector
  • Fine-grained concurrency control
  • Predictable performance characteristics

For a storage engine that prioritizes crash safety, resource constraints, and explicit invariants, Rust aligns naturally with the design philosophy.

The stack is intentionally minimal and system-oriented.


1. KV-First Storage Core

At its foundation, CortexDB is a persistent LSM-based key - value engine.

Core components:

  • Write-Ahead Log (crash safety)
  • Memtable (in-memory writes)
  • SSTables (immutable disk structures)
  • Compaction (read/write amplification control)

The engine prioritizes:

  • determinism over hidden heuristics
  • predictable memory usage
  • explicit performance trade-offs

Everything else builds on this layer.


2. Structured Query Surface

CortexDB introduces structure without becoming relational.

The system remains KV-dominant, but progressively adds:

  • Prefix and range scans
  • Scoped transactions
  • TTL semantics
  • JSON document conventions
  • Secondary indexing (later stage)

CortexQL executes through a pipeline model:


SCAN → FILTER → PROJECT → AGGREGATE → LIMIT

There is no global SQL planner.
No JOIN engine.
No relational abstraction layer.

Each operator maps directly to internal primitives.


3. CPU-First Vector Layer

Vector search is a first-class capability, not an external add-on.

CortexDB introduces:

  • Dedicated vector collections
  • Approximate nearest neighbor indexing (CPU-oriented)
  • Quantization for memory reduction
  • Hybrid and filtered vector search (progressively)

The design assumes:

  • CPU baseline
  • constrained memory environments
  • persistence compatibility with the storage engine

It does not assume GPU infrastructure.


4. Replication & Offline Model

Replication is introduced pragmatically:

  • Leader/follower model
  • Asynchronous WAL-based replication
  • Offline-first synchronization patterns
  • Conflict resolution strategies for temporary divergence

The system favors convergence and operability over global coordination complexity.


5. Observability & Operational Discipline

CortexDB evolves into an operable system through:

  • Integrated metrics
  • Structured logging
  • Query tracing
  • Explicit introspection commands

Operational visibility is considered part of the architecture. Not an afterthought.


6. Distributed Model (Sharding)

Horizontal scalability is introduced late.

Design principles:

  • A shard is an autonomous CortexDB instance
  • Strong guarantees per shard
  • No global 2PC
  • No mandatory global consensus

A coordinator routes requests across shards:


Client → Coordinator → Shards

Distributed vector search follows scatter–gather patterns.

Consensus is optional and local (per shard), not global.

This avoids premature distributed complexity.


Concurrency Model

CortexDB introduces concurrency progressively.

Within a single instance:

  • Writes are durably ordered through the WAL.
  • Reads are optimized but remain aligned with deterministic write sequencing.
  • Background tasks (compaction, indexing) operate independently.

Transactions are scoped and local.
There is no global MVCC layer and no distributed transaction model.

In sharded deployments:

  • Guarantees apply per shard.
  • There is no global 2PC.
  • No mandatory global consensus.

Concurrency expands layer by layer:

Local determinism → Scoped transactions → Replication → Shards.


Architectural Stance

CortexDB is defined less by features than by decisions:

  • KV remains dominant
  • Documents are conventions, not a separate engine
  • Vector search is integrated, not bolted on
  • Distribution is progressive, not ideological
  • Non-objectives are explicit

The system evolves from V1 to V7 through controlled expansion, never by collapsing layers into abstraction.

For detailed iteration design, see Versions & Roadmap.