1. Overall Vision of V6
V6 consolidates CortexDB as a mature and production-ready data system by strengthening:
- Advanced CortexQL, non-SQL: a declarative query language oriented toward KV, documents, and vectors
- Secondary indexes to accelerate non-key-based filters
- Native observability, including metrics, logs, and traces
- A robust server mode for long-running deployments
V6 does not aim to become a relational engine, but to provide high expressiveness while remaining consistent with the KV, document, and vector model.
2. V6 Non-Goals
- No relational SQL
- No generic relational JOINs
- No imposed relational schema
- No full distribution (V7)
- No distributed transactions (V7)
- No sharding (V7)
3. CortexQL Positioning in V6
In V6, CortexQL becomes the primary query language, with:
-
A declarative but non-relational syntax
-
Semantics aligned with:
- KV scans
- Optional JSON documents
- Vector collections
-
A direct execution model without a complex SQL planner
CortexQL is not an alternative SQL. It is a structured operation and filtering language optimized for LSM, offline, edge, and vector search.
4. Advanced CortexQL in V6
4.1. Added Capabilities
Compared to CortexQL v0 or v1:
- Richer JSON value filters
- Partial projections using
PROJECT - Simple aggregations such as
COUNT,MIN,MAX - Native pagination using
CURSOR - Introspection commands such as
DESCRIBEandSTATS
4.2. CortexQL V6 Examples
SCAN PREFIX "user:"
| WHERE JSON("$.age") > 25
| PROJECT key, JSON("$.name"), JSON("$.age")
| LIMIT 50SCAN RANGE "order:2024:" "order:2025:"
| AGGREGATE COUNT()VECTOR SEARCH embeddings
WITH VECTOR([…])
TOPK 10
FILTER META.category == "tech"4.3. CortexQL V6 Commands
New commands:
PROJECT field1, field2, ...for partial projectionAGGREGATE COUNT()for simple aggregationAGGREGATE MIN(field)andAGGREGATE MAX(field)CURSOR ...for cursor-based paginationDESCRIBE collectionfor introspectionSTATS collectionfor statisticsEXPLAIN queryfor query explanation
Enhanced existing commands:
WHERE JSON(...)for richer JSON filteringSCAN ...for optimized scans with index usage
5. Secondary Indexes
5.1. Principle
Secondary indexes accelerate:
- JSON filters
- Frequent field-based lookups
- Certain CortexQL operations
They are optional, explicit, and declarative.
5.2. Index Types
- JSON path index using B-Tree for comparison filters such as greater than, less than, equal
- Equality index using hash for frequent equality filters
- Vector metadata index mapping keys to IDs for vector collections
5.3. Declaration via CortexQL
CREATE INDEX users_age_idx
ON PREFIX "user:"
PATH JSON("$.age")
TYPE BTREECREATE INDEX users_email_idx
ON PREFIX "user:"
PATH JSON("$.email")
TYPE HASHIndex usage is:
- Automatic, the engine selects the appropriate index
- Transparent, the user does not specify the index explicitly
- Observable via
EXPLAIN
5.4. Index Maintenance
- Automatic updates during writes
- Optional rebuild if needed
- Compaction-aware and compatible with LSM compaction
5.5. V6 Limitations
- No composite indexes, one field per index
- No indexes on vector collections except metadata
- Write amplification due to index updates
6. CortexQL Execution Engine Without SQL Planner
6.1. Principles
- No global SQL planner
- Pipeline-based execution
- Each CortexQL operator maps to an internal primitive
- Local optimizations only, such as filter ordering and index usage
6.2. Execution Pipeline
SCAN → FILTER → PROJECT → AGGREGATE → LIMIT
Each stage:
- Consumes data from the previous stage
- Produces data for the next stage
- May use indexes if available
6.3. Local Optimizations
- Filter ordering with most selective filters first
- Automatic index usage when available
- Early termination when LIMIT is reached
6.4. Query Explanation
EXPLAIN
SCAN PREFIX "user:"
| WHERE JSON("$.age") > 25Returns:
- Scan type used
- Index used or not
- Estimated cost
- Number of items scanned
- Applied optimizations
7. V6 Invariants
- All V5 invariants remain valid
- CortexQL is the single source of truth for queries
- Every CortexQL query maps to existing internal primitives
- Secondary indexes remain consistent with the data
- Observability reflects the real engine state
- The server never alters data semantics
8. V6 Objective
At the end of V6, CortexDB becomes:
- Expressive without SQL, with advanced and stable CortexQL
- Performant, thanks to usable secondary indexes
- Observable, with integrated metrics, logs, and traces
- Production-grade, with a robust and monitorable server
This is the version where CortexDB stops being just a powerful engine and becomes a coherent, readable, and large-scale data platform.
9. Relationship with Previous Versions
V6 builds on the foundations of V1 to V5:
- V1 and V2: core engine and performance
- V3: query primitives and developer experience, CortexQL v0 or v1
- V4: vector layer, CortexQL v1 for vector
- V5: replication and sync, CortexQL v2 for replication
V6 completes the ecosystem with:
- Advanced CortexQL for expressiveness
- Secondary indexes for performance
- Observability for production readiness
- A mature server for robustness