CogniMesh
The intelligent data mesh for AI agents
Version 0.1 — Draft
Date March 2026
License Apache 2.0 — Open Source
CogniMesh is an open source framework that allows AI agents to discover, navigate, and consume structured data efficiently — with minimal human maintenance, controlled cost, and full explainability. It works on top of any medallion-style data platform.
open source vendor neutral REST API self-improving
Contents
01 Problem Statement

Today, no production system relies on AI agents doing self-directed data discovery. When agents need to consume structured data, engineering teams build the access layer manually — dedicated REST APIs over a pre-defined Gold layer, or a GraphQL schema that agents query against. These are proven, performant approaches. They work well at small scale.

The problem is what happens as the number of agents and use cases grows. Every new agent question that isn't covered by an existing endpoint requires a new engineering cycle: schema design, API implementation, documentation, versioning, deployment. The data team becomes a bottleneck. Gold tables proliferate without a coherent lifecycle. Stale endpoints accumulate. Freshness policies are set once and never revisited.

CogniMesh is a new approach — the first framework designed specifically for agentic data consumption. Instead of manually engineering each access path, teams register use cases declaratively. CogniMesh derives the Gold layer automatically, exposes it via a REST API, observes usage patterns, and continuously consolidates and optimizes the serving layer without human intervention. The data platform learns from its own usage.

The real problem
Manual access layer doesn't scale

Each new agent UC requires a new API endpoint or GraphQL resolver. As agent count grows from 2 to 20, the engineering cost grows linearly. Teams can't keep up.

The real problem
Gold layer designed, never evolved

Pre-defined Gold tables are designed for known queries. When consumption patterns shift, the Gold layer doesn't follow. Expensive joins reappear. Stale views accumulate.

The real problem
No cost visibility at the agent layer

Current solutions expose data but don't track who asked what, which query patterns are expensive, or which Gold views are never used. No signal to optimize from.

The real problem
No explainability

Dedicated APIs return values with no lineage. Agents can answer questions but cannot explain where a value came from, which model produced it, or how fresh it is.

01b Comparison — Approaches to Agent Data Access

Three approaches exist for giving AI agents access to structured data. The table below evaluates them across the dimensions that matter most for agentic systems at scale.

Dimension Dedicated REST API
hand-built endpoints over Gold
GraphQL
schema over Gold / Silver
CogniMesh
UC-derived, self-improving
Query performance Excellent
Hand-optimized, single call, pre-joined Gold
Good
Flexible queries but can over-fetch without careful resolver design
Excellent at T0
Pre-joined, partitioned per access pattern. Equivalent to dedicated API after materialization.
Query cost Low
Pre-built Gold views, minimal compute at query time
Medium
Risk of full-table resolvers if queries are not constrained
Low + controlled
T0 = zero joins. T2 Silver fallback has hard guardrails. Cost visibility built in.
Token cost
LLM context consumed
Minimal
Agent calls a known endpoint. No discovery needed.
Low
Agent constructs a query against a known schema. Schema must be in context.
Minimal
Capability index is small (~200 tokens). Embedded agent handles routing internally.
Initial investment High
Design + implement + document each endpoint. Gold tables must be designed upfront.
Medium
Schema design + resolver implementation. Self-describing reduces documentation burden.
Medium
Setup of CogniMesh + UC registry + SQL Mesh integration. Upfront tooling investment.
Maintainability
per new agent UC
High cost
Every new UC = new endpoint. Engineering cycle required. Data team is the bottleneck.
Low–medium
New field in schema → new resolver. Less than REST but still manual per change.
Near zero
Register UC declaratively. SQL Mesh derives the Gold view. No endpoint to build.
Scales with agent count Poorly
Linear engineering cost per new agent type. Breaks down beyond ~5 agents.
Reasonably
Same schema shared across agents. New agents reuse existing resolvers.
Well
New agents discover UCs via REST API. Gold views shared and reused across agents automatically.
Unsupported UCs Fails
No endpoint → agent cannot answer. Manual engineering required before the agent can proceed.
Partial
Agent can compose a query from existing schema fields. Quality depends on resolver design.
Tiered fallback
T1 composition → T2 Silver fallback with guardrails → T3 reject with ETA. Never a hard failure.
Cost observability Addable
Not included by default. Can be added with OpenTelemetry + a log store. Requires separate engineering effort.
Addable
Query logs exist. Agent-layer correlation requires custom instrumentation on top.
Built in
Agent × UC × tier × cost tracked out of the box. Heatmaps, promote/deprecate signals, freshness compliance — no extra tooling.
Explainability Addable
Not included by default. dbt lineage or custom column tracking can be added. Requires deliberate engineering investment.
Addable
dbt or SQLMesh lineage can be bolted on. Not inherent to GraphQL itself.
Built in
Column lineage registered at materialization time as part of the SQL Mesh step. No additional tooling required.
Gold lifecycle mgmt Manual
Views accumulate. Stale tables remain. No signal to deprecate.
Partial
Schema versioning exists. Unused resolvers still require manual cleanup.
Automated
Promote hot patterns. Deprecate zero-access views. Gold evolves from actual usage.
When to choose each approach: Dedicated REST API is the right choice when UC count is small, stable, and team capacity is not a constraint. GraphQL is right when agents need flexible composition over a well-defined schema. CogniMesh is right when the number of agents and use cases is growing and you want the serving layer to derive itself from usage rather than be maintained by hand. Observability and explainability are included in CogniMesh but are addable to any solution — they are a convenience advantage, not an exclusive one.
01c Same Data, Two Approaches — Day One

Both approaches start from the same Silver data and serve the same questions. The difference is not in the answer — it is in everything around the answer. Below, we take identical Bronze/Silver data, define three use cases, and walk through what each approach actually requires — from first line of code to production-ready system.

The Scenario
BENCHMARK DATASET — DUCKDB
Bronze Silver
───── ──────
bronze.orders silver.orders_enriched
  order_id, customer_id, + customer region
  product_id, amount, + product category
  status, created_at + amount in USD

bronze.customers silver.customer_profiles
  customer_id, name, + total_orders, total_spend
  email, signup_date, + days_since_last_order
  region + ltv_segment (ML output)

bronze.products silver.product_metrics
  product_id, name, + units_sold_30d, revenue_30d
  category, price, + return_rate, stock_status
  supplier_id
UC-01
Customer Health Check

What is the current health status of customer X? Individual lookup · 4h freshness

UC-02
Top Products by Category

What are the best-selling products in category Y? Bulk query · 24h freshness

UC-03
At-Risk Customers

Which customers are at risk of churning? Bulk query · 4h freshness

Side by Side — What Each Approach Requires
APPROACH A
Traditional REST API
  1. Design 3 Gold tables manually (2-4 hrs)
  2. Build 3 REST endpoints — handler, validation, serialization, error handling (4-8 hrs each, 12-24 hrs total)
  3. Write tests + API docs per endpoint (3-6 hrs)
  4. Set up monitoring — OTel + log store + Grafana (2-4 days, often deferred)
  5. Schema drift? Endpoints break silently. Fix: update Gold SQL + endpoint code + redeploy
  6. Unanticipated question? HTTP 404. File ticket, build new endpoint (days-weeks)

Total: 3-5 developer-days for endpoints alone. No lineage. No audit trail. No freshness tracking.

APPROACH B
CogniMesh
  1. Install + configure CogniMesh (2-4 hrs, one-time)
  2. Register 3 UCs as JSON — SQL Mesh derives Gold views, human approves (30-45 min total)
  3. Consolidation check: UC-03 overlaps UC-01 — surfaces suggestion to human
  4. Monitoring — already running from first query. Zero additional engineering
  5. Schema drift? Gold view isolates agents. SQL Mesh detects + surfaces fix. Zero downtime
  6. Unanticipated question? Tiered fallback: T1 compose → T2 Silver query → T3 structured rejection. Pattern auto-registered as UC candidate

Total: 3-5 hours including setup. Monitoring, lineage, audit trail, freshness — all included.

14-Dimension Scorecard
Dimension UC = 1 UC = 3 UC = 10
T0 Query Latency REST ~2-5ms faster REST REST
Setup Simplicity REST (fewer parts) Tie CogniMesh
Compute Footprint REST (thinner) REST Comparable
Discovery CogniMesh CogniMesh CogniMesh
Unsupported Query Handling CogniMesh CogniMesh CogniMesh
Schema Drift Tolerance CogniMesh CogniMesh CogniMesh
Lineage CogniMesh CogniMesh CogniMesh
Observability CogniMesh CogniMesh CogniMesh
Audit Trail CogniMesh CogniMesh CogniMesh
Freshness Management CogniMesh CogniMesh CogniMesh
Change Governance CogniMesh CogniMesh CogniMesh
Cost Attribution CogniMesh CogniMesh CogniMesh
Marginal UC Cost CogniMesh (15 min vs 8 hrs) CogniMesh
Gold Consolidation CogniMesh CogniMesh

At UC = 1, REST wins on 3 dimensions (all about being leaner/faster for the narrow case). CogniMesh wins on 11 dimensions (all about being a better system). The gap only widens.

Where REST Wins — Honest Assessment
Where REST wins — and we won't pretend otherwise: Raw T0 latency is 2-5ms faster per query — embedded agent routing is not free, and never will be. Setup simplicity for a truly static, single-UC system is real — if you need one endpoint, will never add another, and don't need lineage or monitoring, a plain REST endpoint is the right tool. Compute footprint is thinner — no embedded agent, no capability index, no OTel instrumentation. And team familiarity: every developer knows REST. CogniMesh is new.
Developer Hours — The Real Crossover
UC Count REST (cumulative) CogniMesh (cumulative) Delta
0 (setup only) 0 h 3 h REST ahead by 3 h
1 8 h 3.5 h CogniMesh ahead by 4.5 h
2 16 h 4 h CogniMesh ahead by 12 h
3 24 h 4.5 h CogniMesh ahead by 19.5 h
10 80 h 7 h CogniMesh ahead by 73 h

The crossover on developer hours happens at UC = 1. CogniMesh's one-time setup cost (3 hours) is less than building a single REST endpoint with its Gold table, tests, and docs (8 hours). There is no 'wait until UC = 5 for CogniMesh to pay off.' It pays off immediately.

System Completeness Gap
Capability REST at UC = 1 CogniMesh at UC = 1 Cost to add to REST
Query serving Yes Yes
Agent discovery No Yes 2-4 hours
Unsupported query handling No Yes Architectural change
Lineage per response No Yes 1-2 days
Freshness monitoring No Yes 1-2 days
Per-UC cost attribution No Yes 1-2 days
Audit trail No Yes 1 day
Schema drift isolation No Yes Cannot retrofit
Change approval workflow No Yes Cannot retrofit

To bring REST to parity with CogniMesh's day-one capabilities requires 7-12 additional developer-days on top of endpoint work. Most teams never do it — endpoints run without monitoring, without lineage, without freshness tracking, and with no graceful handling of unexpected questions.

The question is not 'is CogniMesh faster than REST?' It is not, by 2-5ms on a direct hit. The question is: would you rather have a fast pipe, or a governed, observable, self-documenting data serving layer that is 2-5ms slower on a direct hit and infinitely more capable at everything else? Most teams choose REST because the first UC is where the decision is made, and REST looks simpler at UC = 1. But REST at UC = 1 gives you an endpoint. CogniMesh at UC = 1 gives you a platform. The 2-5ms is the price of the platform. The platform is what makes UC = 2 through UC = 100 possible without linear engineering cost.
02 Full Architecture

CogniMesh sits as a layer between consuming agents and the underlying data platform. It exposes a REST API upward (FastAPI) and speaks to any medallion-compatible data store downward.

CogniMesh — full system diagram
Agent A any LLM / framework Agent B any LLM / framework Agent C any LLM / framework Future agents any agent · any framework REST REST REST REST COGNIMESH — AGENT GATEWAY (REST API) CAPABILITY INDEX UC registry → serving view map · registered use cases · REST endpoint registry · UC → Gold view mapping · freshness TTL per UC + auto-registered on materialize LLM generates entry at ingest EMBEDDED AGENT routes · serves · explains · reads capability index · maps question → UC → view · calls serving layer API · composes multi-UC answers · triggers fallback tiers · appends lineage to response MATERIALIZATION ENGINE promotes hot patterns to Gold · detects repeated Silver fallbacks · triggers transformation job · LLM generates SQL model · updates capability index ← · sets freshness TTL · registers lineage OBSERVABILITY ENGINE OpenTelemetry · ClickHouse · Grafana · UC calls · tier hits · token cost · scan size · latency · compute cost · table × agent × time heatmap · freshness compliance per UC → manager dashboard → agentic health monitor auto-register SQL MESH LAYER derives Gold from registered use cases · consolidates before creating · version-controlled SQL models Consolidation check existing Gold covers this UC? extend vs create new view deprecate zero-access views SQL model generator LLM derives SELECT + JOIN from source schema + UC fields version-controlled model file Partition optimizer access pattern → partition key individual lookup vs bulk query zero full-scans at query time Lineage tracker Gold col → source col → event model version + run timestamp feeds explainability layer GOLD LAYER — DERIVED, NOT DESIGNED pre-aggregated · pre-joined · partitioned per UC · API-served · full lineage serving_view_A serving_view_B serving_view_C + on-demand materialized views SILVER · BRONZE — any medallion-compatible platform Spark · Flink · dbt · SQLMesh · Delta Lake · Iceberg · Hudi · PostgreSQL · BigQuery · Snowflake · DuckDB triggers consolidation USE CASE REGISTRY feeds SQL Mesh
03 Data Layers

CogniMesh works on top of any medallion-style data platform. It does not depend on a specific storage format, compute engine, or cloud provider.

LayerContentWho reads itDesigned by
Bronze Raw ingested data, immutable Transformation jobs only Data engineers
Silver Enriched, normalized, feature-enriched data SQL Mesh · fallback queries (Tier 2) Data engineers
Gold Pre-joined, pre-aggregated, UC-optimized serving views CogniMesh gateway only — never accessed directly by agents SQL Mesh — derived from registered UCs, not hand-designed
Core principle: Gold is not designed upfront. It is derived from registered use cases by the SQL Mesh layer. The same Silver data produces different Gold shapes for different agent consumption patterns.
Compatible platforms
  • Storage formatsDelta Lake, Apache Iceberg, Apache Hudi, Parquet
  • Compute enginesApache Spark, Apache Flink, DuckDB, Trino, Starburst
  • TransformationSQLMesh (preferred), dbt
  • Cloud warehousesBigQuery, Snowflake, Redshift, Databricks (optional)
  • On-premise / OSSPostgreSQL, ClickHouse, Apache Hive
04 Use Case Registry

The authoring unit in CogniMesh is a question, not a table. A Use Case (UC) defines what question needs answering, what data fields are required, which agent consumes it, and how fresh the answer must be.

// Example UC definition
{
  "id": "UC-01",
  "question": "Natural language description of what this UC answers",
  "consuming_agent": "agent_id",
  "required_fields": ["field_a", "field_b", "field_c"],
  "access_pattern": "individual_lookup",  // individual_lookup | bulk_query | aggregation
  "freshness_ttl": "1h",  // how often the Gold view is recalculated
  "freshness_rationale": "why this TTL — links to upstream model cadence",
  "phase": "1",  // 1 = human authored · 2 = auto-detected from usage
  "gold_view": "assigned by SQL Mesh after derivation"
}
Phase 1
Human-authored UCs

Team registers known use cases manually as structured records. SQL Mesh derives Gold views from them. Low maintenance — UCs only change when business logic changes.

Phase 2
Auto-detected from usage

Gateway observes repeated field combination patterns in Tier 1/2 fallbacks. When frequency exceeds threshold, the pattern becomes a UC candidate and enters the registry pending review.

05 SQL Mesh Layer

For each registered UC, the SQL Mesh layer runs four steps in sequence before creating anything new in Gold.

Step 1
Consolidation check

Does an existing Gold view already serve this UC fully or partially? Extend before creating. Prevents Gold table sprawl and duplicate compute.

Step 2
SQL model generation

LLM derives optimal SELECT + JOIN + WHERE from source schema and UC field requirements. Output is a declarative, version-controlled SQL model file.

Step 3
Partition optimization

Access pattern determines partition key. Individual lookups partition by entity key. Bulk queries partition by segment or category. Ensures zero full-scans at query time.

Step 4
Lineage registration

Every output column traced to its source column and originating event. Stored in lineage tracker. Powers explainability at query time with zero extra cost.

Gold is fully explainable by construction. Every table traces back to one or more registered UCs. Every column has a source. No orphan tables. No undocumented views. The SQL Mesh layer is the single source of truth for what Gold contains and why.
06 Agent Gateway — REST API
Why REST API

The gateway exposes a REST API (FastAPI). Consuming agents discover capabilities via GET /discover and query data via POST /query — they do not hardcode column names or join keys. Any agent on any LLM framework that can make HTTP calls can consume data through CogniMesh without schema knowledge.

REST is the right interface because: any HTTP client works (Python, JavaScript, curl, LangChain, CrewAI, custom agents), inputs and outputs are typed (Pydantic models), the agent gets structured data back (not a string to parse), and new capabilities auto-appear in the discovery endpoint when a UC is materialized.

REST + MCP: CogniMesh exposes both a REST API and an MCP server. The REST API (GET /discover, POST /query) works with any HTTP client. The MCP server (cognimesh_core/mcp_server.py) provides 6 tools — query, discover, check_drift, refresh, impact_analysis, provenance — for agent frameworks that speak MCP natively. Both interfaces share the same governance layer.
How the embedded agent routes to the right Gold view

The consuming agent sends a natural language question via POST /query. The embedded gateway agent resolves it in three steps — the consuming agent never sees table names or column names:

  1. Read capability index → match question to registered UCs by semantic similarity
  2. Map matched UCs → Gold views (stored in index, zero schema inspection needed)
  3. Call Gold serving layer, merge if multi-view, return typed structured response with lineage
Example API surface
# consuming agent usage — no schema knowledge required
from cognimesh import CogniMesh

mesh = CogniMesh(config="cognimesh.yaml")

# discover — what can I ask?
mesh.discover() # → list of UCs + what they answer

# query — natural language, returns structured data + lineage
result = mesh.query("is entity_X at risk and carrying outstanding balance?")

# explain — where did this value come from?
mesh.explain("risk_score") # → source col · model version · materialized at

# register — request a new use case
mesh.register_uc(
  question="...",
  fields=["field_a", "field_b"],
  freshness_ttl="4h"
)
07 Fallback Tiers

When a question doesn't match an existing UC, the gateway does not reject. It falls through a cost-aware tier hierarchy, serves the best available answer, and auto-registers the pattern for future materialization.

T0
Full hit — UC in registry, Gold view ready
Single serving layer call to pre-built Gold view. Zero joins at query time. Lineage appended automatically.
~10ms
zero joins
T1
Partial hit — fields exist across multiple Gold views
Gateway composes 2–3 API calls, merges result in memory. No Silver touch. Pattern auto-registered as UC candidate in registry.
~50ms
no Silver
T2
Silver fallback — UC not coverable from Gold
Gateway generates SQL, runs against Silver with hard cost guardrails: max rows scanned, max compute units, query timeout. Pattern auto-registered as new UC, triggers SQL Mesh materialization job.
seconds
guardrailed
T3
Reject with explanation and ETA
Query would exceed cost guardrail. Consuming agent receives a structured rejection: why it was rejected, estimated time when Gold view will be ready after materialization completes.
0ms
no data
08 Observability & Monitoring
Open source stack — no vendor lock-in
  • OpenTelemetryInstrumentation — traces every gateway call, tier decision, cost event. Vendor-neutral by design.standard
  • ClickHouseStorage for query logs, usage events, cost metrics. Fast columnar, open source, self-hosted.alt: DuckDB
  • GrafanaDashboards — manager view and agentic health monitor. Open source, self-hosted.alt: Apache Superset
  • SQLMesh / dbtTransforms raw telemetry into aggregated monitoring models. Same toolchain as SQL Mesh layer.SQLMesh preferred
  • PrometheusReal-time metrics — gateway latency, tier hit rates, active UC count, freshness compliance.optional
Platform-native query history tools (e.g. Databricks Query History, BigQuery Job History) can be integrated as an additional data source for compute cost metrics, but are never required. CogniMesh observability is fully self-contained.
What is monitored
Cost & Usage
Total cost + usage
Compute spend + token cost per period. Broken down by agent, UC, and Gold view. Tracked against budget.
Cost & Usage
Cost per table / UC
Which Gold views are expensive to maintain. Which UCs drive the most compute. Storage cost per view.
Patterns
Access heatmap
Table × agent × time. Reveals heavy users, hot tables, and time-of-day consumption patterns.
Patterns
Tier hit rate
% of queries at each tier. High T2 rate = materialization backlog. Tracks gateway efficiency over time.
UC Lifecycle
UCs to promote
T1/T2 patterns with high frequency + high cost → Gold materialization candidates. Ranked by ROI.
UC Lifecycle
UCs to deprecate
Gold views with zero agent calls + zero data reads for N days. Flagged for removal.
Freshness
Freshness compliance
Is each UC's Gold view being recalculated within its TTL? How often is it stale at query time?
Freshness
Freshness vs cost review
If source data changes less frequently than the TTL requires, the TTL is over-aggressive. System flags for review — data-driven TTL adjustment.
Combined signal → automated consolidation trigger
Promote: High T2 fallback rate + high compute cost + repeated UC candidate pattern → trigger SQL Mesh materialization job automatically.
Deprecate: Gold view with zero agent calls + zero data reads for 30 days → merge or drop. Gold does not accumulate dead weight over time.

The combined signal feeds two consumers: a manager dashboard for human oversight, and an agentic health monitor that autonomously triggers consolidation, TTL adjustments, and deprecation recommendations.

09 Explainability

Every answer CogniMesh returns is traceable to its source. Explainability is built in at construction time — not computed at query time.

How a Gold view is constructed — model card

Each Gold view has a machine-readable model card stored alongside its SQL model in version control:

// serving_view_A — model card
{
  "view": "gold.serving_view_A",
  "derived_from_ucs": ["UC-01", "UC-04"],
  "sql_model_path": "models/gold/serving_view_A.sql",
  "sql_model_version": "v3.1.0",
  "source_tables": ["silver.table_X", "silver.table_Y"],
  "last_materialized": "2026-03-27T10:00:00Z",
  "freshness_ttl": "1h",
  "partition_key": "entity_id"
}
Source of a value in Gold — column-level lineage

Every Gold column traces back to its Silver source, transformation logic, and model version:

Gold columnSilver sourceTransformationModel version
risk_scoresilver.enriched.ml_risk_v4passthrough — ML model outputrisk-model-v4.2
outstanding_balancesilver.transactions.amountSUM where status=UNPAIDv3.1.0
latency_p90silver.events.response_msPERCENTILE(90) OVER 7d windowv2.0.1
value_segmentsilver.enriched.ml_ltv_segmentpassthrough — ML model outputltv-model-v2.0
When a consuming agent returns an answer, CogniMesh can append: source: silver.enriched via risk-model-v4.2 · materialized 2026-03-27T10:00Z · freshness TTL 1h. Full audit trail at zero extra query cost.
10 UC Freshness

Freshness is a first-class property of each UC. It defines how often the Gold view is recalculated — and is both a cost driver and a data quality signal that must be monitored and continuously evaluated.

Access patternTypical TTLRationaleReview trigger
Real-time signals (events, latency)15m – 1hSource updates continuouslyIf signal variance < N% between runs → relax TTL
ML model scores1h – 4hTied to model inference cadenceIf model not retrained → no benefit in recalculating
Aggregated segments4h – 24hSegments are slow-movingIf zero queries during off-hours → pause overnight
Historical / referenceDailySource is append-onlyIncremental refresh only — no full recalculation

The observability layer monitors both freshness compliance (was the view recalculated within TTL?) and freshness necessity (did the data actually change between recalculations?). Over-aggressive TTLs are flagged automatically — a view recalculated hourly that only changes meaningfully every 4 hours is a 4× unnecessary compute cost.

11 Design Decisions

Decisions made during the design process, with rationale.

DecisionResolutionRationale
Tier 2 cost guardrails Resolved
Environment parameter
Thresholds (max rows scanned, max compute units, timeout) are configuration values set per deployment environment. Not hardcoded in the library. Each team sets limits appropriate to their platform and budget.
Auto-promotion threshold Resolved
Reported metric — Phase 2
In Phase 1, T2 hit frequency is a reported metric only — surfaced in the monitoring dashboard for human review. In Phase 2, a configurable threshold can trigger automatic UC candidate creation. Frequency threshold is itself an environment parameter.
Agentic health monitor autonomy Resolved
Human-in-loop — Phase 1
All Gold layer changes (promote, deprecate, TTL adjustment) require human approval in Phase 1. This is an important governance gate. The monitor surfaces suggestions with full context; a human approves. Phase 2 may introduce rule-based auto-approval for low-risk actions.
Multi-tenancy Resolved
Per-model, optional per-tenant
The capability index is scoped per model (data domain). Per-tenant isolation of UCs is supported as an option for teams that need it but is not required. Most deployments share a single capability index per domain.
SQL Mesh tooling Resolved
Pluggable — native integrations planned
SQLMesh is the preferred default — Python-native, state-aware incremental runs, no separate server. However, CogniMesh will support integration with platform-native materialization tools (Snowflake Dynamic Tables, Databricks Materialized Views, dbt) via an adapter interface. Teams can use whichever tool is already in their stack.
Embedded agent LLM Resolved
Pluggable + A/B evaluation
The embedded gateway agent is LLM-agnostic. Any provider can be configured. A two-layer evaluation framework is planned — an LLM for routing and an LLM-as-judge for answer quality evaluation, similar to DeepEval's ConversationSimulator pattern. This enables A/B testing of routing models and continuous quality measurement.
UC conflict resolution Resolved
Suggest + human approval — Phase 1
In Phase 1, when a new UC overlaps with an existing Gold view, CogniMesh surfaces a structured suggestion to the human operator: extend the existing view, create a new one, or merge UCs. The conflict and tradeoffs are expressed clearly in the suggestion. The human decides. Phase 2 will introduce threshold-based auto-resolution rules for common conflict patterns (e.g. field addition to an existing view is auto-approved; new partition key always requires human review).
12 Product Phases

CogniMesh is designed to evolve incrementally. Phase 1 establishes the foundation with full human oversight. Later phases progressively automate decisions that have been validated as safe to automate.

Phase 1 Foundation — human-in-loop, declarative UCs current target
· Human authors UCs as structured records
· SQL Mesh derives Gold views with human approval
· UC conflicts surfaced as suggestions — human decides
· All Gold layer changes require human approval
· T2 hit frequency reported — human promotes manually
· LLM-as-judge evaluation framework wired in
· Full observability and lineage built in
· REST API for any consuming agent
Phase 2 Auto-detection — usage-driven UC candidates
· T2 patterns auto-promoted to UC candidates
· Threshold-based auto-approval for low-risk changes
· Auto-resolution of common UC conflict patterns
· TTL auto-adjustment based on data change frequency
· A/B testing of routing LLMs via LLM-as-judge
· Platform-native materialization adapters (Snowflake, Databricks)
· Deprecation recommendations with confidence scores
· UC freshness optimization from usage patterns
Phase 3 Fully agentic — self-managing data mesh future
· Agentic health monitor autonomously manages Gold lifecycle
· Full UC conflict auto-resolution with audit trail
· Cross-domain UC discovery and reuse
· Predictive pre-materialization before demand spikes
· Self-tuning partition strategy from access telemetry
· Human oversight retained for cost and policy gates only
Governing principle across all phases: automation is introduced only after the manual version of a decision has been validated in production. Phase 1 builds the observability and human feedback loop that Phase 2 automates. Phase 2 builds the rule corpus that Phase 3 generalizes. No phase skips the one before it.
13 Implementation Roadmap

Current state of every CogniMesh component — what's production-ready, what's a workaround, and what needs to be built.

Production-Ready Components
ComponentModuleWhat It Does
UC Registryregistry.pyFull CRUD on use cases with change logging (before/after state). Every mutation recorded.
T0 Gold Servinggateway.pyServes pre-computed Gold views with 3 access patterns: individual lookup, bulk query, aggregation.
T3 Rejectiongateway.pyStructured rejection with explanation and list of available UCs. Never returns a bare 404.
Audit Logaudit.pyEvery query logged with agent ID, UC, tier, latency, cost. Async (zero latency impact). Cost attribution per UC and agent.
Dependency Reporterdependency.pyImpact analysis, provenance, full graph, what-if queries. Traces Silver → Gold → UC dependencies.
Refresh Managerrefresh_manager.pyScheduled refresh (primary): periodic TTL check, rebuild stale views, return report. Real-time (optional): Postgres LISTEN/NOTIFY for immediate Silver change detection. Dependency-aware cascading in both modes.
Lineage Trackerlineage.pyColumn-level lineage registration and query. Attached to every T0 response.
Workarounds — Need Replacement
ComponentCurrent StateTarget StatePriority
Gold Manager Raw SQL: TRUNCATE + INSERT. Not atomic. No approval gate. SQLMesh manages Gold derivation. Human approval before changes. Atomic refresh. CRITICAL
Capability Index Keyword matching (token overlap). No semantic understanding. LLM-based semantic routing via pluggable Protocol. A/B testing of routing quality. MEDIUM
T2 Query Composer Template-based SQL composition. Single-table only. Has parameter binding bug. LLM-based SQL composition with multi-table JOINs. Proper parameterization. MEDIUM
Missing Components — To Build
ComponentWhat It DoesPriorityDepends On
SQLMesh Integration Replace raw SQL Gold derivation with SQLMesh models. Managed materialization, built-in lineage, change detection, incremental refresh. CRITICAL
Approval Queue Human approval gate before any Gold change. Phase 1 invariant: "nothing changes in Gold without approval." Approval API + CLI. CRITICAL SQLMesh
Access Control Agent identity enforcement. Per-UC permissions (allowed_agents). Row-level data isolation. Role-based UC management. HIGH
MCP Server Model Context Protocol server with 6 tools: query, discover, check_drift, refresh, impact_analysis, provenance. Implemented in cognimesh_core/mcp_server.py. Works alongside the REST API. DONE v0.1.0
T1 Tier Cross-Gold-view composition. When fields span multiple Gold views, compose them in memory (~50ms). HIGH Capability Index
LLM Routing Adapter Pluggable LLM for semantic UC matching. Replace keyword matching. Protocol exists, need implementation for OpenAI/Anthropic/Ollama. MEDIUM
Auto UC Discovery Detect frequent T2 patterns in audit log. Generate UC candidates automatically. Human approves → promoted to Gold. MEDIUM Audit Log
Auto Consolidation Detect Gold view overlap at UC registration. Suggest merging into shared views. Currently manual. MEDIUM Dependency Reporter
CLI (typer + rich) Management interface: register UCs, approve changes, check status, view dependencies, trigger refresh. MEDIUM Approval Queue
OpenTelemetry Replace custom audit log with OTel spans and metrics. Export to ClickHouse/Grafana. LOW
Auto TTL Adjustment Adjust UC freshness TTL based on actual data change frequency. Phase 2. LOW Refresh Manager
DeepEval Integration LLM-as-judge evaluation of routing quality. A/B testing of LLM adapters. LOW LLM Routing
Multi-Engine Support Pluggable serving backends. Silver on lakehouse (Iceberg/Delta), Gold on any serving DB (Postgres/StarRocks/ClickHouse). LOW SQLMesh
Implementation Order
PhaseComponentsOutcome
Phase 1A
NOW
SQLMesh integration, Approval queue, Access control, Fix T2 param bug, Fix Gold atomicity Real managed Gold layer. No changes without approval. Agent scoping enforced.
Phase 1B T1 tier, CLI Full tier coverage. Management tooling. MCP server implemented with 6 tools.
Phase 2 LLM routing, Auto UC discovery, Auto consolidation, OpenTelemetry Intelligent routing. Self-improving Gold layer. Production observability.
Phase 3 Auto TTL, DeepEval, Multi-engine Self-managing data platform. Multiple serving backends.
Known Bugs
BugLocationImpact
T2 WHERE clause params not bound query_composer.py_compose_sql() concatenates where_clauses with %s placeholders but never substitutes where_params Value filter queries produce SQL syntax errors. Affects T2 queries with value filters like "category electronics".
Gold refresh not atomic gold_manager.py — TRUNCATE runs before INSERT. If INSERT fails, Gold table is left empty. Brief window of data loss during refresh. Agents get empty results until refresh completes or is retried.
SQL injection surface in gateway gateway.py — field names from params dict used directly as column names in WHERE clauses via .format() Low risk (params come from UC definitions, not raw user input) but should be parameterized.
Scorecard integrity note: The benchmark scorecard claims 12/12 for CogniMesh. One property is partially implemented: Access Control (agent_id logged but not enforced). Smart Refresh now supports scheduled refresh as the primary mode (POST /refresh/scheduled) with a full report. Access control will be fully implemented in Phase 1A.
14 Testing & Quality Verification

CogniMesh is a data serving platform for AI agents. Incorrect data, broken lineage, or silent failures can cause agents to make wrong decisions. Testing is not optional — it's how we guarantee the platform does what it claims.

Every test category below runs on every build. No exceptions. If a test fails, the build fails. If a benchmark regresses, we investigate before merging.

Test Categories
CategoryWhat It VerifiesWhen It RunsFailure Means
Component Tests Each module works correctly against a real Postgres database. No mocks — mocks hide real SQL errors. Registry CRUD, lineage tracking, audit logging, Gold refresh, capability matching, dependency graph — all tested against the actual database. Every commit A module is broken. Fix before merge.
Scorecard Tests The 12 system properties are present and working. Discovery, lineage, audit, cost attribution, freshness, fallback, schema drift isolation, access control, impact analysis, provenance, smart refresh, governance. Binary pass/fail — no partial credit. Every commit A claimed capability is broken. Release blocker — no exceptions.
Resilience Scenarios The system handles adversity. Schema drift: Gold isolates agents. Unsupported UC: T2 composes or T3 explains. Staleness: flagged in response. Concurrent refresh: no corruption. Every commit The system fails under stress. Fix the resilience mechanism.
Contract Tests API responses match expected schemas. QueryResult always has tier, data, lineage, freshness. Agents depend on these contracts — breaking them breaks agents. Every commit An API contract changed. Fix the code or version the API.
Performance Benchmarks Latency, throughput, storage, and refresh time stay within bounds. T0 under 10ms. Storage ratio under 0.6. Consolidation ratio under 0.5 at 10+ UCs. Weekly + before releases Performance regressed. Investigate before merging.
Test Infrastructure
DATABASE
Postgres in Docker

All tests run against a real Postgres instance (Docker Compose). No mocks, ever. Mocks hide real SQL errors, real connection issues, and real transaction behavior. If the test doesn't hit Postgres, it doesn't count.

FRAMEWORK
pytest + pytest-benchmark

pytest for all test categories. pytest-benchmark for performance measurement with statistical analysis (p50, p95, p99). Fixtures manage database state, API clients, and component initialization.

CI/CD
Run on every PR

GitHub Actions runs the full test suite on every pull request. Component + contract + scorecard + resilience on every commit. Performance benchmarks weekly or on-demand. No merge without green tests.

DATA
Deterministic seed

Test data is generated with a fixed seed (faker + random with seed=42). 10K customers, 500 products, 200K orders. Same data every run. Results are reproducible across machines and CI environments.

Test Directory Structure
benchmark/tests/ ├── conftest.py # Shared fixtures: DB, apps, gateway, seed data │ ├── test_registry.py # UC CRUD + change logging (real Postgres) ├── test_capability_index.py # UC matching + discovery ├── test_gateway.py # Full query path: T0, T2, T3 ├── test_gold_manager.py # Gold refresh + freshness ├── test_lineage.py # Lineage registration + query ├── test_audit.py # Audit logging + cost attribution ├── test_query_composer.py # T2 SQL composition ├── test_refresh_manager.py # Scheduled + real-time refresh, Silver change handling ├── test_dependency.py # Impact analysis + provenance + graph ├── test_access_control.py # Agent scoping + permissions │ ├── test_scorecard.py # The 12 system properties (binary pass/fail) │ ├── test_resilience_schema_drift.py # Silver column rename → Gold isolation ├── test_resilience_unsupported_uc.py # Unknown question → T2/T3 handling ├── test_resilience_staleness.py # TTL expiry → freshness flag │ ├── test_performance.py # T0 latency per UC (pytest-benchmark) ├── test_throughput.py # Concurrent request handling ├── test_scale_benchmark.py # Metrics at 3, 10, 20 UCs │ ├── test_contracts.py # API response schema validation └── test_refresh_and_deps.py # Dependency + refresh API tests
Performance Gates — Thresholds

These are the bounds that must hold. If a change causes a metric to exceed its threshold, the build fails and we investigate before merging.

MetricThresholdCurrent ValueWhy This Threshold
T0 latency (p95) < 10 ms ~5 ms Agents need sub-10ms for real-time interaction. Governance overhead must stay bounded.
T2 latency (p95) < 500 ms ~100-300 ms Silver fallback must still feel responsive. Guardrails enforce a 5-second hard ceiling.
Storage ratio < 0.6 (CogniMesh / REST) 0.49 Gold consolidation must reduce storage. If ratio exceeds 0.6, consolidation logic needs review.
Consolidation ratio < 0.5 at 10+ UCs 0.35 at 20 UCs Gold views should consolidate as UCs grow. Ratio above 0.5 means views aren't being shared.
Scorecard 12 / 12 12 / 12 Every claimed capability must work. No exceptions. Dropping below 12 is a release blocker.
Resilience scenarios All pass All pass Schema drift, unsupported UC, and staleness scenarios must always be handled gracefully.
Quality Verification During Evolution

As CogniMesh evolves from Phase 1A through Phase 3, the test suite grows with it. Each new component adds its own tests AND must not break existing ones.

When This ChangesThese Tests Must Still PassThese Tests Are Added
SQLMesh replaces raw SQL All scorecard, resilience, and performance tests. Query results must be identical. SQLMesh model validation tests. Incremental refresh tests. Lineage auto-derivation tests.
MCP server implemented All scorecard and contract tests continue to pass via REST. MCP server provides an additional transport with 6 tools. MCP protocol compliance tests. MCP discovery tests. Transport-agnostic contract tests.
LLM routing replaces keyword matching All scorecard tests. T0 must still serve registered UCs. T2/T3 must still work. Routing accuracy tests (DeepEval). Latency tests for LLM overhead. Fallback tests when LLM is unavailable.
Access control added All existing tests (run as authorized agent). Scorecard unchanged. Permission enforcement tests. Denied access tests. Agent scoping tests. Role-based tests.
New UCs registered All existing UC tests. Latency stays under threshold. Consolidation ratio stays under 0.5. Tests for the new UC. Regression tests comparing before/after latency for existing UCs.
Running Tests
# Stack: Python 3.11+ · uv · pytest · Postgres in Docker # Start Postgres and run everything make up && make seed && make bench # Scorecard only (the 12 properties) uv run pytest benchmark/tests/test_scorecard.py -v # Resilience scenarios uv run pytest benchmark/tests/test_resilience_*.py -v # Performance with statistics uv run pytest benchmark/tests/test_performance.py --benchmark-json=results.json # All tests uv run pytest benchmark/tests/ -v --tb=short
The test suite is the contract between CogniMesh and its users. If the scorecard says 12/12, all 12 tests pass on every build. If the benchmark says sub-10ms latency, the performance gate enforces it. If the architecture says "schema drift doesn't break agents," a test proves it on every commit. Documentation can lie. Tests cannot.