CortexDB Architecture

CortexDB is built as a layered system evolving through explicit architectural steps.

It does not attempt to solve every database problem at once.
It introduces complexity progressively, under constraint.

The architecture is structured around four pillars:

A KV-first storage core
A structured query surface (CortexQL)
A CPU-first vector layer
A pragmatic distributed model

Implementation Stack

CortexDB is implemented in Rust.

The choice is architectural rather than ideological.

Rust provides:

Deterministic memory control
Strong safety guarantees without a garbage collector
Fine-grained concurrency control
Predictable performance characteristics

For a storage engine that prioritizes crash safety, resource constraints, and explicit invariants, Rust aligns naturally with the design philosophy.

The stack is intentionally minimal and system-oriented.

1. KV-First Storage Core

At its foundation, CortexDB is a persistent LSM-based key - value engine.

Core components:

Write-Ahead Log (crash safety)
Memtable (in-memory writes)
SSTables (immutable disk structures)
Compaction (read/write amplification control)

The engine prioritizes:

determinism over hidden heuristics
predictable memory usage
explicit performance trade-offs

Everything else builds on this layer.

2. Structured Query Surface

CortexDB introduces structure without becoming relational.

The system remains KV-dominant, but progressively adds:

Prefix and range scans
Scoped transactions
TTL semantics
JSON document conventions
Secondary indexing (later stage)

CortexQL executes through a pipeline model:


SCAN → FILTER → PROJECT → AGGREGATE → LIMIT

There is no global SQL planner.
No JOIN engine.
No relational abstraction layer.

Each operator maps directly to internal primitives.

3. CPU-First Vector Layer

Vector search is a first-class capability, not an external add-on.

CortexDB introduces:

Dedicated vector collections
Approximate nearest neighbor indexing (CPU-oriented)
Quantization for memory reduction
Hybrid and filtered vector search (progressively)

The design assumes:

CPU baseline
constrained memory environments
persistence compatibility with the storage engine

It does not assume GPU infrastructure.

4. Replication & Offline Model

Replication is introduced pragmatically:

Leader/follower model
Asynchronous WAL-based replication
Offline-first synchronization patterns
Conflict resolution strategies for temporary divergence

The system favors convergence and operability over global coordination complexity.

5. Observability & Operational Discipline

CortexDB evolves into an operable system through:

Integrated metrics
Structured logging
Query tracing
Explicit introspection commands

Operational visibility is considered part of the architecture. Not an afterthought.

6. Distributed Model (Sharding)

Horizontal scalability is introduced late.

Design principles:

A shard is an autonomous CortexDB instance
Strong guarantees per shard
No global 2PC
No mandatory global consensus

A coordinator routes requests across shards:


Client → Coordinator → Shards

Distributed vector search follows scatter–gather patterns.

Consensus is optional and local (per shard), not global.

This avoids premature distributed complexity.

Concurrency Model

CortexDB introduces concurrency progressively.

Within a single instance:

Writes are durably ordered through the WAL.
Reads are optimized but remain aligned with deterministic write sequencing.
Background tasks (compaction, indexing) operate independently.

Transactions are scoped and local.
There is no global MVCC layer and no distributed transaction model.

In sharded deployments:

Guarantees apply per shard.
There is no global 2PC.
No mandatory global consensus.

Concurrency expands layer by layer:

Local determinism → Scoped transactions → Replication → Shards.

Architectural Stance

CortexDB is defined less by features than by decisions:

KV remains dominant
Documents are conventions, not a separate engine
Vector search is integrated, not bolted on
Distribution is progressive, not ideological
Non-objectives are explicit

The system evolves from V1 to V7 through controlled expansion, never by collapsing layers into abstraction.

For detailed iteration design, see Versions & Roadmap.