Mohamed KEITA

CortexDB v6

Advanced CortexQL, Secondary Indexes, Observability, Server Maturity.

1. Overall Vision of V6

V6 consolidates CortexDB as a mature and production-ready data system by strengthening:

  • Advanced CortexQL, non-SQL: a declarative query language oriented toward KV, documents, and vectors
  • Secondary indexes to accelerate non-key-based filters
  • Native observability, including metrics, logs, and traces
  • A robust server mode for long-running deployments

V6 does not aim to become a relational engine, but to provide high expressiveness while remaining consistent with the KV, document, and vector model.


2. V6 Non-Goals

  • No relational SQL
  • No generic relational JOINs
  • No imposed relational schema
  • No full distribution (V7)
  • No distributed transactions (V7)
  • No sharding (V7)

3. CortexQL Positioning in V6

In V6, CortexQL becomes the primary query language, with:

  • A declarative but non-relational syntax

  • Semantics aligned with:

    • KV scans
    • Optional JSON documents
    • Vector collections
  • A direct execution model without a complex SQL planner

CortexQL is not an alternative SQL. It is a structured operation and filtering language optimized for LSM, offline, edge, and vector search.


4. Advanced CortexQL in V6

4.1. Added Capabilities

Compared to CortexQL v0 or v1:

  • Richer JSON value filters
  • Partial projections using PROJECT
  • Simple aggregations such as COUNT, MIN, MAX
  • Native pagination using CURSOR
  • Introspection commands such as DESCRIBE and STATS

4.2. CortexQL V6 Examples

SCAN PREFIX "user:"
| WHERE JSON("$.age") > 25
| PROJECT key, JSON("$.name"), JSON("$.age")
| LIMIT 50
SCAN RANGE "order:2024:" "order:2025:"
| AGGREGATE COUNT()
VECTOR SEARCH embeddings
WITH VECTOR([…])
TOPK 10
FILTER META.category == "tech"

4.3. CortexQL V6 Commands

New commands:

  • PROJECT field1, field2, ... for partial projection
  • AGGREGATE COUNT() for simple aggregation
  • AGGREGATE MIN(field) and AGGREGATE MAX(field)
  • CURSOR ... for cursor-based pagination
  • DESCRIBE collection for introspection
  • STATS collection for statistics
  • EXPLAIN query for query explanation

Enhanced existing commands:

  • WHERE JSON(...) for richer JSON filtering
  • SCAN ... for optimized scans with index usage

5. Secondary Indexes

5.1. Principle

Secondary indexes accelerate:

  • JSON filters
  • Frequent field-based lookups
  • Certain CortexQL operations

They are optional, explicit, and declarative.

5.2. Index Types

  • JSON path index using B-Tree for comparison filters such as greater than, less than, equal
  • Equality index using hash for frequent equality filters
  • Vector metadata index mapping keys to IDs for vector collections

5.3. Declaration via CortexQL

CREATE INDEX users_age_idx
ON PREFIX "user:"
PATH JSON("$.age")
TYPE BTREE
CREATE INDEX users_email_idx
ON PREFIX "user:"
PATH JSON("$.email")
TYPE HASH

Index usage is:

  • Automatic, the engine selects the appropriate index
  • Transparent, the user does not specify the index explicitly
  • Observable via EXPLAIN

5.4. Index Maintenance

  • Automatic updates during writes
  • Optional rebuild if needed
  • Compaction-aware and compatible with LSM compaction

5.5. V6 Limitations

  • No composite indexes, one field per index
  • No indexes on vector collections except metadata
  • Write amplification due to index updates

6. CortexQL Execution Engine Without SQL Planner

6.1. Principles

  • No global SQL planner
  • Pipeline-based execution
  • Each CortexQL operator maps to an internal primitive
  • Local optimizations only, such as filter ordering and index usage

6.2. Execution Pipeline

SCAN → FILTER → PROJECT → AGGREGATE → LIMIT

Each stage:

  • Consumes data from the previous stage
  • Produces data for the next stage
  • May use indexes if available

6.3. Local Optimizations

  • Filter ordering with most selective filters first
  • Automatic index usage when available
  • Early termination when LIMIT is reached

6.4. Query Explanation

EXPLAIN
SCAN PREFIX "user:"
| WHERE JSON("$.age") > 25

Returns:

  • Scan type used
  • Index used or not
  • Estimated cost
  • Number of items scanned
  • Applied optimizations

7. V6 Invariants

  • All V5 invariants remain valid
  • CortexQL is the single source of truth for queries
  • Every CortexQL query maps to existing internal primitives
  • Secondary indexes remain consistent with the data
  • Observability reflects the real engine state
  • The server never alters data semantics

8. V6 Objective

At the end of V6, CortexDB becomes:

  • Expressive without SQL, with advanced and stable CortexQL
  • Performant, thanks to usable secondary indexes
  • Observable, with integrated metrics, logs, and traces
  • Production-grade, with a robust and monitorable server

This is the version where CortexDB stops being just a powerful engine and becomes a coherent, readable, and large-scale data platform.


9. Relationship with Previous Versions

V6 builds on the foundations of V1 to V5:

  • V1 and V2: core engine and performance
  • V3: query primitives and developer experience, CortexQL v0 or v1
  • V4: vector layer, CortexQL v1 for vector
  • V5: replication and sync, CortexQL v2 for replication

V6 completes the ecosystem with:

  • Advanced CortexQL for expressiveness
  • Secondary indexes for performance
  • Observability for production readiness
  • A mature server for robustness