CortexDB v6

1. Overall Vision of V6

V6 consolidates CortexDB as a mature and production-ready data system by strengthening:

Advanced CortexQL, non-SQL: a declarative query language oriented toward KV, documents, and vectors
Secondary indexes to accelerate non-key-based filters
Native observability, including metrics, logs, and traces
A robust server mode for long-running deployments

V6 does not aim to become a relational engine, but to provide high expressiveness while remaining consistent with the KV, document, and vector model.

2. V6 Non-Goals

No relational SQL
No generic relational JOINs
No imposed relational schema
No full distribution (V7)
No distributed transactions (V7)
No sharding (V7)

3. CortexQL Positioning in V6

In V6, CortexQL becomes the primary query language, with:

A declarative but non-relational syntax
Semantics aligned with:
- KV scans
- Optional JSON documents
- Vector collections
A direct execution model without a complex SQL planner

CortexQL is not an alternative SQL. It is a structured operation and filtering language optimized for LSM, offline, edge, and vector search.

4. Advanced CortexQL in V6

4.1. Added Capabilities

Compared to CortexQL v0 or v1:

Richer JSON value filters
Partial projections using PROJECT
Simple aggregations such as COUNT, MIN, MAX
Native pagination using CURSOR
Introspection commands such as DESCRIBE and STATS

4.2. CortexQL V6 Examples

SCAN PREFIX "user:"
| WHERE JSON("$.age") > 25
| PROJECT key, JSON("$.name"), JSON("$.age")
| LIMIT 50

SCAN RANGE "order:2024:" "order:2025:"
| AGGREGATE COUNT()

VECTOR SEARCH embeddings
WITH VECTOR([…])
TOPK 10
FILTER META.category == "tech"

4.3. CortexQL V6 Commands

New commands:

PROJECT field1, field2, ... for partial projection
AGGREGATE COUNT() for simple aggregation
AGGREGATE MIN(field) and AGGREGATE MAX(field)
CURSOR ... for cursor-based pagination
DESCRIBE collection for introspection
STATS collection for statistics
EXPLAIN query for query explanation

Enhanced existing commands:

WHERE JSON(...) for richer JSON filtering
SCAN ... for optimized scans with index usage

5. Secondary Indexes

5.1. Principle

Secondary indexes accelerate:

JSON filters
Frequent field-based lookups
Certain CortexQL operations

They are optional, explicit, and declarative.

5.2. Index Types

JSON path index using B-Tree for comparison filters such as greater than, less than, equal
Equality index using hash for frequent equality filters
Vector metadata index mapping keys to IDs for vector collections

5.3. Declaration via CortexQL

CREATE INDEX users_age_idx
ON PREFIX "user:"
PATH JSON("$.age")
TYPE BTREE

CREATE INDEX users_email_idx
ON PREFIX "user:"
PATH JSON("$.email")
TYPE HASH

Index usage is:

Automatic, the engine selects the appropriate index
Transparent, the user does not specify the index explicitly
Observable via EXPLAIN

5.4. Index Maintenance

Automatic updates during writes
Optional rebuild if needed
Compaction-aware and compatible with LSM compaction

5.5. V6 Limitations

No composite indexes, one field per index
No indexes on vector collections except metadata
Write amplification due to index updates

6. CortexQL Execution Engine Without SQL Planner

6.1. Principles

No global SQL planner
Pipeline-based execution
Each CortexQL operator maps to an internal primitive
Local optimizations only, such as filter ordering and index usage

6.2. Execution Pipeline

SCAN → FILTER → PROJECT → AGGREGATE → LIMIT

Each stage:

Consumes data from the previous stage
Produces data for the next stage
May use indexes if available

6.3. Local Optimizations

Filter ordering with most selective filters first
Automatic index usage when available
Early termination when LIMIT is reached

6.4. Query Explanation

EXPLAIN
SCAN PREFIX "user:"
| WHERE JSON("$.age") > 25

Returns:

Scan type used
Index used or not
Estimated cost
Number of items scanned
Applied optimizations

7. V6 Invariants

All V5 invariants remain valid
CortexQL is the single source of truth for queries
Every CortexQL query maps to existing internal primitives
Secondary indexes remain consistent with the data
Observability reflects the real engine state
The server never alters data semantics

8. V6 Objective

At the end of V6, CortexDB becomes:

Expressive without SQL, with advanced and stable CortexQL
Performant, thanks to usable secondary indexes
Observable, with integrated metrics, logs, and traces
Production-grade, with a robust and monitorable server

This is the version where CortexDB stops being just a powerful engine and becomes a coherent, readable, and large-scale data platform.

9. Relationship with Previous Versions

V6 builds on the foundations of V1 to V5:

V1 and V2: core engine and performance
V3: query primitives and developer experience, CortexQL v0 or v1
V4: vector layer, CortexQL v1 for vector
V5: replication and sync, CortexQL v2 for replication

V6 completes the ecosystem with:

Advanced CortexQL for expressiveness
Secondary indexes for performance
Observability for production readiness
A mature server for robustness