Mohamed KEITA

CortexDB v7

Horizontal scalability through sharding with progressive coordination.

1. Overall Vision of V7

V7 introduces horizontal scalability through sharding, with strong consistency at the shard level and progressive distributed coordination. It is divided into three sub-versions:

  • V7.1: Sharded CortexDB, pragmatic sharding without global consensus
  • V7.2: Distributed Vector Search
  • V7.3: Coordinated Shards with optional Raft-lite per shard

Main objectives:

  • Horizontal scalability
  • High availability per shard
  • Distributed vector search
  • Progressive coordination with optional Raft-lite

2. V7 Structure

V7.1 — Sharded CortexDB

Features:

  • Sharding using hash or range
  • Coordinator or router
  • One shard equals one autonomous CortexDB V6 instance
  • Replication inherited from V5, no Raft
  • No multi-shard transactions
  • No automatic rebalancing, manual or semi-automatic only

V7.2 — Distributed Vector Search

Features:

  • Scatter-gather vector search
  • Top-k merge
  • Search across multiple shards
  • No need for strong global consensus

V7.3 — Coordinated Shards

Features:

  • Raft-lite per shard
  • Shard leader election
  • Intra-shard replication
  • Still no global two-phase commit

3. V7 Non-Goals

  • No distributed multi-shard transactions, possibly V8
  • No global two-phase commit, very late if ever
  • No globally strong consistency, never promised
  • No global consensus
  • No full automatic rebalancing

4. V7 Architecture Overview

┌─────────────┐
│   Client    │
└──────┬──────┘


┌─────────────┐
│ Coordinator │
│   or Router │
└──────┬──────┘

       ├──────────┬──────────┐
       ▼          ▼          ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│  Shard 1 │ │  Shard 2 │ │  Shard 3 │
│   (V6)   │ │   (V6)   │ │   (V6)   │
└──────────┘ └──────────┘ └──────────┘

5. Consistency Model

5.1. Principle

Consistency is guaranteed at the shard level, not globally.

5.2. Rules

  • Shard-level consistency: strong guarantees within a shard
  • Multi-shard operations: best effort or application-managed
  • No two-phase commit
  • Convergence: the system converges after network failures

5.3. Advantages

  • Simplicity: no global coordination complexity
  • Performance: no global consensus latency
  • Pragmatic: early-sharded MongoDB style, not Spanner

6. V7 Objective

At the end of V7 (V7.1 + V7.2 + V7.3), CortexDB becomes:

  • Horizontally scalable: operational sharding
  • Reliable per shard: optional Raft-lite
  • Performant: distributed vector search
  • Pragmatic: no unnecessary complexity

This is the version that makes CortexDB capable of horizontal scalability in a pragmatic and progressive way.


7. V7 Invariants

  • All V6 invariants remain valid
  • Consistency is guaranteed at the shard level
  • Multi-shard operations are best effort or application-managed
  • Distributed vector search returns globally relevant results
  • The system converges after network failures
  • The cluster remains operable without global consensus

8. Relationship with Previous Versions

V7 builds on the foundations of V1 to V6:

  • V1 and V2: core engine and performance
  • V3: query primitives and developer experience
  • V4: vector layer
  • V5: replication and sync
  • V6: advanced CortexQL, secondary indexes, observability

V7 completes the ecosystem with:

  • Sharding for horizontal scalability
  • Distributed vector search for AI at scale
  • Coordinated shards for increased reliability