CogniMesh — Design Document

01 Problem Statement

Today, no production system relies on AI agents doing self-directed data discovery. When agents need to consume structured data, engineering teams build the access layer manually — dedicated REST APIs over a pre-defined Gold layer, or a GraphQL schema that agents query against. These are proven, performant approaches. They work well at small scale.

The problem is what happens as the number of agents and use cases grows. Every new agent question that isn't covered by an existing endpoint requires a new engineering cycle: schema design, API implementation, documentation, versioning, deployment. The data team becomes a bottleneck. Gold tables proliferate without a coherent lifecycle. Stale endpoints accumulate. Freshness policies are set once and never revisited.

CogniMesh is a new approach — the first framework designed specifically for agentic data consumption. Instead of manually engineering each access path, teams register use cases declaratively. CogniMesh derives the Gold layer automatically, exposes it via a REST API, observes usage patterns, and continuously consolidates and optimizes the serving layer without human intervention. The data platform learns from its own usage.

The real problem

Manual access layer doesn't scale

Each new agent UC requires a new API endpoint or GraphQL resolver. As agent count grows from 2 to 20, the engineering cost grows linearly. Teams can't keep up.

The real problem

Gold layer designed, never evolved

Pre-defined Gold tables are designed for known queries. When consumption patterns shift, the Gold layer doesn't follow. Expensive joins reappear. Stale views accumulate.

The real problem

No cost visibility at the agent layer

Current solutions expose data but don't track who asked what, which query patterns are expensive, or which Gold views are never used. No signal to optimize from.

The real problem

No explainability

Dedicated APIs return values with no lineage. Agents can answer questions but cannot explain where a value came from, which model produced it, or how fresh it is.

01b Comparison — Approaches to Agent Data Access

Three approaches exist for giving AI agents access to structured data. The table below evaluates them across the dimensions that matter most for agentic systems at scale.

Dimension	Dedicated REST API hand-built endpoints over Gold	GraphQL schema over Gold / Silver	CogniMesh UC-derived, self-improving
Query performance	Excellent Hand-optimized, single call, pre-joined Gold	Good Flexible queries but can over-fetch without careful resolver design	Excellent at T0 Pre-joined, partitioned per access pattern. Equivalent to dedicated API after materialization.
Query cost	Low Pre-built Gold views, minimal compute at query time	Medium Risk of full-table resolvers if queries are not constrained	Low + controlled T0 = zero joins. T2 Silver fallback has hard guardrails. Cost visibility built in.
Token cost LLM context consumed	Minimal Agent calls a known endpoint. No discovery needed.	Low Agent constructs a query against a known schema. Schema must be in context.	Minimal Capability index is small (~200 tokens). Embedded agent handles routing internally.
Initial investment	High Design + implement + document each endpoint. Gold tables must be designed upfront.	Medium Schema design + resolver implementation. Self-describing reduces documentation burden.	Medium Setup of CogniMesh + UC registry + SQL Mesh integration. Upfront tooling investment.
Maintainability per new agent UC	High cost Every new UC = new endpoint. Engineering cycle required. Data team is the bottleneck.	Low–medium New field in schema → new resolver. Less than REST but still manual per change.	Near zero Register UC declaratively. SQL Mesh derives the Gold view. No endpoint to build.
Scales with agent count	Poorly Linear engineering cost per new agent type. Breaks down beyond ~5 agents.	Reasonably Same schema shared across agents. New agents reuse existing resolvers.	Well New agents discover UCs via REST API. Gold views shared and reused across agents automatically.
Unsupported UCs	Fails No endpoint → agent cannot answer. Manual engineering required before the agent can proceed.	Partial Agent can compose a query from existing schema fields. Quality depends on resolver design.	Tiered fallback T1 composition → T2 Silver fallback with guardrails → T3 reject with ETA. Never a hard failure.
Cost observability	Addable Not included by default. Can be added with OpenTelemetry + a log store. Requires separate engineering effort.	Addable Query logs exist. Agent-layer correlation requires custom instrumentation on top.	Built in Agent × UC × tier × cost tracked out of the box. Heatmaps, promote/deprecate signals, freshness compliance — no extra tooling.
Explainability	Addable Not included by default. dbt lineage or custom column tracking can be added. Requires deliberate engineering investment.	Addable dbt or SQLMesh lineage can be bolted on. Not inherent to GraphQL itself.	Built in Column lineage registered at materialization time as part of the SQL Mesh step. No additional tooling required.
Gold lifecycle mgmt	Manual Views accumulate. Stale tables remain. No signal to deprecate.	Partial Schema versioning exists. Unused resolvers still require manual cleanup.	Automated Promote hot patterns. Deprecate zero-access views. Gold evolves from actual usage.

When to choose each approach: Dedicated REST API is the right choice when UC count is small, stable, and team capacity is not a constraint. GraphQL is right when agents need flexible composition over a well-defined schema. CogniMesh is right when the number of agents and use cases is growing and you want the serving layer to derive itself from usage rather than be maintained by hand. Observability and explainability are included in CogniMesh but are addable to any solution — they are a convenience advantage, not an exclusive one.

01c Same Data, Two Approaches — Day One

Both approaches start from the same Silver data and serve the same questions. The difference is not in the answer — it is in everything around the answer. Below, we take identical Bronze/Silver data, define three use cases, and walk through what each approach actually requires — from first line of code to production-ready system.

The Scenario

BENCHMARK DATASET — DUCKDB

Bronze Silver
───── ──────
bronze.orders silver.orders_enriched
  order_id, customer_id, + customer region
  product_id, amount, + product category
  status, created_at + amount in USD

bronze.customers silver.customer_profiles
  customer_id, name, + total_orders, total_spend
  email, signup_date, + days_since_last_order
  region + ltv_segment (ML output)

bronze.products silver.product_metrics
  product_id, name, + units_sold_30d, revenue_30d
  category, price, + return_rate, stock_status
  supplier_id

UC-01

Customer Health Check

What is the current health status of customer X? Individual lookup · 4h freshness

UC-02

Top Products by Category

What are the best-selling products in category Y? Bulk query · 24h freshness

UC-03

At-Risk Customers

Which customers are at risk of churning? Bulk query · 4h freshness

Side by Side — What Each Approach Requires

APPROACH A

Traditional REST API

Design 3 Gold tables manually (2-4 hrs)
Build 3 REST endpoints — handler, validation, serialization, error handling (4-8 hrs each, 12-24 hrs total)
Write tests + API docs per endpoint (3-6 hrs)
Set up monitoring — OTel + log store + Grafana (2-4 days, often deferred)
Schema drift? Endpoints break silently. Fix: update Gold SQL + endpoint code + redeploy
Unanticipated question? HTTP 404. File ticket, build new endpoint (days-weeks)

Total: 3-5 developer-days for endpoints alone. No lineage. No audit trail. No freshness tracking.

APPROACH B

CogniMesh

Install + configure CogniMesh (2-4 hrs, one-time)
Register 3 UCs as JSON — SQL Mesh derives Gold views, human approves (30-45 min total)
Consolidation check: UC-03 overlaps UC-01 — surfaces suggestion to human
Monitoring — already running from first query. Zero additional engineering
Schema drift? Gold view isolates agents. SQL Mesh detects + surfaces fix. Zero downtime
Unanticipated question? Tiered fallback: T1 compose → T2 Silver query → T3 structured rejection. Pattern auto-registered as UC candidate

Total: 3-5 hours including setup. Monitoring, lineage, audit trail, freshness — all included.

14-Dimension Scorecard

Dimension	UC = 1	UC = 3	UC = 10
T0 Query Latency	REST ~2-5ms faster	REST	REST
Setup Simplicity	REST (fewer parts)	Tie	CogniMesh
Compute Footprint	REST (thinner)	REST	Comparable
Discovery	CogniMesh	CogniMesh	CogniMesh
Unsupported Query Handling	CogniMesh	CogniMesh	CogniMesh
Schema Drift Tolerance	CogniMesh	CogniMesh	CogniMesh
Lineage	CogniMesh	CogniMesh	CogniMesh
Observability	CogniMesh	CogniMesh	CogniMesh
Audit Trail	CogniMesh	CogniMesh	CogniMesh
Freshness Management	CogniMesh	CogniMesh	CogniMesh
Change Governance	CogniMesh	CogniMesh	CogniMesh
Cost Attribution	CogniMesh	CogniMesh	CogniMesh
Marginal UC Cost	—	CogniMesh (15 min vs 8 hrs)	CogniMesh
Gold Consolidation	—	CogniMesh	CogniMesh

At UC = 1, REST wins on 3 dimensions (all about being leaner/faster for the narrow case). CogniMesh wins on 11 dimensions (all about being a better system). The gap only widens.

Where REST Wins — Honest Assessment

Where REST wins — and we won't pretend otherwise: Raw T0 latency is 2-5ms faster per query — embedded agent routing is not free, and never will be. Setup simplicity for a truly static, single-UC system is real — if you need one endpoint, will never add another, and don't need lineage or monitoring, a plain REST endpoint is the right tool. Compute footprint is thinner — no embedded agent, no capability index, no OTel instrumentation. And team familiarity: every developer knows REST. CogniMesh is new.

Developer Hours — The Real Crossover

UC Count	REST (cumulative)	CogniMesh (cumulative)	Delta
0 (setup only)	0 h	3 h	REST ahead by 3 h
1	8 h	3.5 h	CogniMesh ahead by 4.5 h
2	16 h	4 h	CogniMesh ahead by 12 h
3	24 h	4.5 h	CogniMesh ahead by 19.5 h
10	80 h	7 h	CogniMesh ahead by 73 h

The crossover on developer hours happens at UC = 1. CogniMesh's one-time setup cost (3 hours) is less than building a single REST endpoint with its Gold table, tests, and docs (8 hours). There is no 'wait until UC = 5 for CogniMesh to pay off.' It pays off immediately.

System Completeness Gap

Capability	REST at UC = 1	CogniMesh at UC = 1	Cost to add to REST
Query serving	Yes	Yes	—
Agent discovery	No	Yes	2-4 hours
Unsupported query handling	No	Yes	Architectural change
Lineage per response	No	Yes	1-2 days
Freshness monitoring	No	Yes	1-2 days
Per-UC cost attribution	No	Yes	1-2 days
Audit trail	No	Yes	1 day
Schema drift isolation	No	Yes	Cannot retrofit
Change approval workflow	No	Yes	Cannot retrofit

To bring REST to parity with CogniMesh's day-one capabilities requires 7-12 additional developer-days on top of endpoint work. Most teams never do it — endpoints run without monitoring, without lineage, without freshness tracking, and with no graceful handling of unexpected questions.

The question is not 'is CogniMesh faster than REST?' It is not, by 2-5ms on a direct hit. The question is: would you rather have a fast pipe, or a governed, observable, self-documenting data serving layer that is 2-5ms slower on a direct hit and infinitely more capable at everything else? Most teams choose REST because the first UC is where the decision is made, and REST looks simpler at UC = 1. But REST at UC = 1 gives you an endpoint. CogniMesh at UC = 1 gives you a platform. The 2-5ms is the price of the platform. The platform is what makes UC = 2 through UC = 100 possible without linear engineering cost.

02 Full Architecture

CogniMesh sits as a layer between consuming agents and the underlying data platform. It exposes a REST API upward (FastAPI) and speaks to any medallion-compatible data store downward.

CogniMesh — full system diagram

03 Data Layers

CogniMesh works on top of any medallion-style data platform. It does not depend on a specific storage format, compute engine, or cloud provider.

Layer	Content	Who reads it	Designed by
Bronze	Raw ingested data, immutable	Transformation jobs only	Data engineers
Silver	Enriched, normalized, feature-enriched data	SQL Mesh · fallback queries (Tier 2)	Data engineers
Gold	Pre-joined, pre-aggregated, UC-optimized serving views	CogniMesh gateway only — never accessed directly by agents	SQL Mesh — derived from registered UCs, not hand-designed

Core principle: Gold is not designed upfront. It is derived from registered use cases by the SQL Mesh layer. The same Silver data produces different Gold shapes for different agent consumption patterns.

Compatible platforms

Storage formatsDelta Lake, Apache Iceberg, Apache Hudi, Parquet
Compute enginesApache Spark, Apache Flink, DuckDB, Trino, Starburst
TransformationSQLMesh (preferred), dbt
Cloud warehousesBigQuery, Snowflake, Redshift, Databricks (optional)
On-premise / OSSPostgreSQL, ClickHouse, Apache Hive

04 Use Case Registry

The authoring unit in CogniMesh is a question, not a table. A Use Case (UC) defines what question needs answering, what data fields are required, which agent consumes it, and how fresh the answer must be.

// Example UC definition
{
  "id": "UC-01",
  "question": "Natural language description of what this UC answers",
  "consuming_agent": "agent_id",
  "required_fields": ["field_a", "field_b", "field_c"],
  "access_pattern": "individual_lookup", // individual_lookup | bulk_query | aggregation
  "freshness_ttl": "1h", // how often the Gold view is recalculated
  "freshness_rationale": "why this TTL — links to upstream model cadence",
  "phase": "1", // 1 = human authored · 2 = auto-detected from usage
  "gold_view": "assigned by SQL Mesh after derivation"
}

Phase 1

Human-authored UCs

Team registers known use cases manually as structured records. SQL Mesh derives Gold views from them. Low maintenance — UCs only change when business logic changes.

Phase 2

Auto-detected from usage

Gateway observes repeated field combination patterns in Tier 1/2 fallbacks. When frequency exceeds threshold, the pattern becomes a UC candidate and enters the registry pending review.

05 SQL Mesh Layer

For each registered UC, the SQL Mesh layer runs four steps in sequence before creating anything new in Gold.

Step 1

Consolidation check

Does an existing Gold view already serve this UC fully or partially? Extend before creating. Prevents Gold table sprawl and duplicate compute.

Step 2

SQL model generation

LLM derives optimal SELECT + JOIN + WHERE from source schema and UC field requirements. Output is a declarative, version-controlled SQL model file.

Step 3

Partition optimization

Access pattern determines partition key. Individual lookups partition by entity key. Bulk queries partition by segment or category. Ensures zero full-scans at query time.

Step 4

Lineage registration

Every output column traced to its source column and originating event. Stored in lineage tracker. Powers explainability at query time with zero extra cost.

Gold is fully explainable by construction. Every table traces back to one or more registered UCs. Every column has a source. No orphan tables. No undocumented views. The SQL Mesh layer is the single source of truth for what Gold contains and why.

06 Agent Gateway — REST API

Why REST API

The gateway exposes a REST API (FastAPI). Consuming agents discover capabilities via GET /discover and query data via POST /query — they do not hardcode column names or join keys. Any agent on any LLM framework that can make HTTP calls can consume data through CogniMesh without schema knowledge.

REST is the right interface because: any HTTP client works (Python, JavaScript, curl, LangChain, CrewAI, custom agents), inputs and outputs are typed (Pydantic models), the agent gets structured data back (not a string to parse), and new capabilities auto-appear in the discovery endpoint when a UC is materialized.

REST + MCP: CogniMesh exposes both a REST API and an MCP server. The REST API (GET /discover, POST /query) works with any HTTP client. The MCP server (cognimesh_core/mcp_server.py) provides 6 tools — query, discover, check_drift, refresh, impact_analysis, provenance — for agent frameworks that speak MCP natively. Both interfaces share the same governance layer.

How the embedded agent routes to the right Gold view

The consuming agent sends a natural language question via POST /query. The embedded gateway agent resolves it in three steps — the consuming agent never sees table names or column names:

Read capability index → match question to registered UCs by semantic similarity
Map matched UCs → Gold views (stored in index, zero schema inspection needed)
Call Gold serving layer, merge if multi-view, return typed structured response with lineage

Example API surface

# consuming agent usage — no schema knowledge required
from cognimesh import CogniMesh

mesh = CogniMesh(config="cognimesh.yaml")

# discover — what can I ask?
mesh.discover() # → list of UCs + what they answer

# query — natural language, returns structured data + lineage
result = mesh.query("is entity_X at risk and carrying outstanding balance?")

# explain — where did this value come from?
mesh.explain("risk_score") # → source col · model version · materialized at

# register — request a new use case
mesh.register_uc(
  question="...",
  fields=["field_a", "field_b"],
  freshness_ttl="4h"
)

07 Fallback Tiers

When a question doesn't match an existing UC, the gateway does not reject. It falls through a cost-aware tier hierarchy, serves the best available answer, and auto-registers the pattern for future materialization.

T0

Full hit — UC in registry, Gold view ready

Single serving layer call to pre-built Gold view. Zero joins at query time. Lineage appended automatically.

~10ms
zero joins

T1

Partial hit — fields exist across multiple Gold views

Gateway composes 2–3 API calls, merges result in memory. No Silver touch. Pattern auto-registered as UC candidate in registry.

~50ms
no Silver

T2

Silver fallback — UC not coverable from Gold

Gateway generates SQL, runs against Silver with hard cost guardrails: max rows scanned, max compute units, query timeout. Pattern auto-registered as new UC, triggers SQL Mesh materialization job.

seconds
guardrailed

T3

Reject with explanation and ETA

Query would exceed cost guardrail. Consuming agent receives a structured rejection: why it was rejected, estimated time when Gold view will be ready after materialization completes.

0ms
no data

08 Observability & Monitoring

Open source stack — no vendor lock-in

OpenTelemetryInstrumentation — traces every gateway call, tier decision, cost event. Vendor-neutral by design.standard
ClickHouseStorage for query logs, usage events, cost metrics. Fast columnar, open source, self-hosted.alt: DuckDB
GrafanaDashboards — manager view and agentic health monitor. Open source, self-hosted.alt: Apache Superset
SQLMesh / dbtTransforms raw telemetry into aggregated monitoring models. Same toolchain as SQL Mesh layer.SQLMesh preferred
PrometheusReal-time metrics — gateway latency, tier hit rates, active UC count, freshness compliance.optional

Platform-native query history tools (e.g. Databricks Query History, BigQuery Job History) can be integrated as an additional data source for compute cost metrics, but are never required. CogniMesh observability is fully self-contained.

What is monitored

Cost & Usage

Total cost + usage

Compute spend + token cost per period. Broken down by agent, UC, and Gold view. Tracked against budget.

Cost & Usage

Cost per table / UC

Which Gold views are expensive to maintain. Which UCs drive the most compute. Storage cost per view.

Patterns

Access heatmap

Table × agent × time. Reveals heavy users, hot tables, and time-of-day consumption patterns.

Patterns

Tier hit rate

% of queries at each tier. High T2 rate = materialization backlog. Tracks gateway efficiency over time.

UC Lifecycle

UCs to promote

T1/T2 patterns with high frequency + high cost → Gold materialization candidates. Ranked by ROI.

UC Lifecycle

UCs to deprecate

Gold views with zero agent calls + zero data reads for N days. Flagged for removal.

Freshness

Freshness compliance

Is each UC's Gold view being recalculated within its TTL? How often is it stale at query time?

Freshness

Freshness vs cost review

If source data changes less frequently than the TTL requires, the TTL is over-aggressive. System flags for review — data-driven TTL adjustment.

Combined signal → automated consolidation trigger

Promote: High T2 fallback rate + high compute cost + repeated UC candidate pattern → trigger SQL Mesh materialization job automatically.

Deprecate: Gold view with zero agent calls + zero data reads for 30 days → merge or drop. Gold does not accumulate dead weight over time.

The combined signal feeds two consumers: a manager dashboard for human oversight, and an agentic health monitor that autonomously triggers consolidation, TTL adjustments, and deprecation recommendations.

09 Explainability

Every answer CogniMesh returns is traceable to its source. Explainability is built in at construction time — not computed at query time.

How a Gold view is constructed — model card

Each Gold view has a machine-readable model card stored alongside its SQL model in version control:

// serving_view_A — model card
{
  "view": "gold.serving_view_A",
  "derived_from_ucs": ["UC-01", "UC-04"],
  "sql_model_path": "models/gold/serving_view_A.sql",
  "sql_model_version": "v3.1.0",
  "source_tables": ["silver.table_X", "silver.table_Y"],
  "last_materialized": "2026-03-27T10:00:00Z",
  "freshness_ttl": "1h",
  "partition_key": "entity_id"
}

Source of a value in Gold — column-level lineage

Every Gold column traces back to its Silver source, transformation logic, and model version:

Gold column	Silver source	Transformation	Model version
`risk_score`	silver.enriched.ml_risk_v4	passthrough — ML model output	risk-model-v4.2
`outstanding_balance`	silver.transactions.amount	SUM where status=UNPAID	v3.1.0
`latency_p90`	silver.events.response_ms	PERCENTILE(90) OVER 7d window	v2.0.1
`value_segment`	silver.enriched.ml_ltv_segment	passthrough — ML model output	ltv-model-v2.0

When a consuming agent returns an answer, CogniMesh can append: source: silver.enriched via risk-model-v4.2 · materialized 2026-03-27T10:00Z · freshness TTL 1h. Full audit trail at zero extra query cost.

10 UC Freshness

Freshness is a first-class property of each UC. It defines how often the Gold view is recalculated — and is both a cost driver and a data quality signal that must be monitored and continuously evaluated.

Access pattern	Typical TTL	Rationale	Review trigger
Real-time signals (events, latency)	15m – 1h	Source updates continuously	If signal variance < N% between runs → relax TTL
ML model scores	1h – 4h	Tied to model inference cadence	If model not retrained → no benefit in recalculating
Aggregated segments	4h – 24h	Segments are slow-moving	If zero queries during off-hours → pause overnight
Historical / reference	Daily	Source is append-only	Incremental refresh only — no full recalculation

The observability layer monitors both freshness compliance (was the view recalculated within TTL?) and freshness necessity (did the data actually change between recalculations?). Over-aggressive TTLs are flagged automatically — a view recalculated hourly that only changes meaningfully every 4 hours is a 4× unnecessary compute cost.

11 Design Decisions

Decisions made during the design process, with rationale.

Decision	Resolution	Rationale
Tier 2 cost guardrails	Resolved Environment parameter	Thresholds (max rows scanned, max compute units, timeout) are configuration values set per deployment environment. Not hardcoded in the library. Each team sets limits appropriate to their platform and budget.
Auto-promotion threshold	Resolved Reported metric — Phase 2	In Phase 1, T2 hit frequency is a reported metric only — surfaced in the monitoring dashboard for human review. In Phase 2, a configurable threshold can trigger automatic UC candidate creation. Frequency threshold is itself an environment parameter.
Agentic health monitor autonomy	Resolved Human-in-loop — Phase 1	All Gold layer changes (promote, deprecate, TTL adjustment) require human approval in Phase 1. This is an important governance gate. The monitor surfaces suggestions with full context; a human approves. Phase 2 may introduce rule-based auto-approval for low-risk actions.
Multi-tenancy	Resolved Per-model, optional per-tenant	The capability index is scoped per model (data domain). Per-tenant isolation of UCs is supported as an option for teams that need it but is not required. Most deployments share a single capability index per domain.
SQL Mesh tooling	Resolved Pluggable — native integrations planned	SQLMesh is the preferred default — Python-native, state-aware incremental runs, no separate server. However, CogniMesh will support integration with platform-native materialization tools (Snowflake Dynamic Tables, Databricks Materialized Views, dbt) via an adapter interface. Teams can use whichever tool is already in their stack.
Embedded agent LLM	Resolved Pluggable + A/B evaluation	The embedded gateway agent is LLM-agnostic. Any provider can be configured. A two-layer evaluation framework is planned — an LLM for routing and an LLM-as-judge for answer quality evaluation, similar to DeepEval's ConversationSimulator pattern. This enables A/B testing of routing models and continuous quality measurement.
UC conflict resolution	Resolved Suggest + human approval — Phase 1	In Phase 1, when a new UC overlaps with an existing Gold view, CogniMesh surfaces a structured suggestion to the human operator: extend the existing view, create a new one, or merge UCs. The conflict and tradeoffs are expressed clearly in the suggestion. The human decides. Phase 2 will introduce threshold-based auto-resolution rules for common conflict patterns (e.g. field addition to an existing view is auto-approved; new partition key always requires human review).

12 Product Phases

CogniMesh is designed to evolve incrementally. Phase 1 establishes the foundation with full human oversight. Later phases progressively automate decisions that have been validated as safe to automate.

Phase 1 Foundation — human-in-loop, declarative UCs current target

· Human authors UCs as structured records

· SQL Mesh derives Gold views with human approval

· UC conflicts surfaced as suggestions — human decides

· All Gold layer changes require human approval

· T2 hit frequency reported — human promotes manually

· LLM-as-judge evaluation framework wired in

· Full observability and lineage built in

· REST API for any consuming agent

Phase 2 Auto-detection — usage-driven UC candidates

· T2 patterns auto-promoted to UC candidates

· Threshold-based auto-approval for low-risk changes

· Auto-resolution of common UC conflict patterns

· TTL auto-adjustment based on data change frequency

· A/B testing of routing LLMs via LLM-as-judge

· Platform-native materialization adapters (Snowflake, Databricks)

· Deprecation recommendations with confidence scores

· UC freshness optimization from usage patterns

Phase 3 Fully agentic — self-managing data mesh future

· Agentic health monitor autonomously manages Gold lifecycle

· Full UC conflict auto-resolution with audit trail

· Cross-domain UC discovery and reuse

· Predictive pre-materialization before demand spikes

· Self-tuning partition strategy from access telemetry

· Human oversight retained for cost and policy gates only

Governing principle across all phases: automation is introduced only after the manual version of a decision has been validated in production. Phase 1 builds the observability and human feedback loop that Phase 2 automates. Phase 2 builds the rule corpus that Phase 3 generalizes. No phase skips the one before it.

13 Implementation Roadmap

Current state of every CogniMesh component — what's production-ready, what's a workaround, and what needs to be built.

Production-Ready Components

Component	Module	What It Does
UC Registry	`registry.py`	Full CRUD on use cases with change logging (before/after state). Every mutation recorded.
T0 Gold Serving	`gateway.py`	Serves pre-computed Gold views with 3 access patterns: individual lookup, bulk query, aggregation.
T3 Rejection	`gateway.py`	Structured rejection with explanation and list of available UCs. Never returns a bare 404.
Audit Log	`audit.py`	Every query logged with agent ID, UC, tier, latency, cost. Async (zero latency impact). Cost attribution per UC and agent.
Dependency Reporter	`dependency.py`	Impact analysis, provenance, full graph, what-if queries. Traces Silver → Gold → UC dependencies.
Refresh Manager	`refresh_manager.py`	Scheduled refresh (primary): periodic TTL check, rebuild stale views, return report. Real-time (optional): Postgres LISTEN/NOTIFY for immediate Silver change detection. Dependency-aware cascading in both modes.
Lineage Tracker	`lineage.py`	Column-level lineage registration and query. Attached to every T0 response.

Workarounds — Need Replacement

Component	Current State	Target State	Priority
Gold Manager	Raw SQL: TRUNCATE + INSERT. Not atomic. No approval gate.	SQLMesh manages Gold derivation. Human approval before changes. Atomic refresh.	CRITICAL
Capability Index	Keyword matching (token overlap). No semantic understanding.	LLM-based semantic routing via pluggable Protocol. A/B testing of routing quality.	MEDIUM
T2 Query Composer	Template-based SQL composition. Single-table only. Has parameter binding bug.	LLM-based SQL composition with multi-table JOINs. Proper parameterization.	MEDIUM

Missing Components — To Build

Component	What It Does	Priority	Depends On
SQLMesh Integration	Replace raw SQL Gold derivation with SQLMesh models. Managed materialization, built-in lineage, change detection, incremental refresh.	CRITICAL	—
Approval Queue	Human approval gate before any Gold change. Phase 1 invariant: "nothing changes in Gold without approval." Approval API + CLI.	CRITICAL	SQLMesh
Access Control	Agent identity enforcement. Per-UC permissions (allowed_agents). Row-level data isolation. Role-based UC management.	HIGH	—
MCP Server	Model Context Protocol server with 6 tools: query, discover, check_drift, refresh, impact_analysis, provenance. Implemented in `cognimesh_core/mcp_server.py`. Works alongside the REST API.	DONE	v0.1.0
T1 Tier	Cross-Gold-view composition. When fields span multiple Gold views, compose them in memory (~50ms).	HIGH	Capability Index
LLM Routing Adapter	Pluggable LLM for semantic UC matching. Replace keyword matching. Protocol exists, need implementation for OpenAI/Anthropic/Ollama.	MEDIUM	—
Auto UC Discovery	Detect frequent T2 patterns in audit log. Generate UC candidates automatically. Human approves → promoted to Gold.	MEDIUM	Audit Log
Auto Consolidation	Detect Gold view overlap at UC registration. Suggest merging into shared views. Currently manual.	MEDIUM	Dependency Reporter
CLI (typer + rich)	Management interface: register UCs, approve changes, check status, view dependencies, trigger refresh.	MEDIUM	Approval Queue
OpenTelemetry	Replace custom audit log with OTel spans and metrics. Export to ClickHouse/Grafana.	LOW	—
Auto TTL Adjustment	Adjust UC freshness TTL based on actual data change frequency. Phase 2.	LOW	Refresh Manager
DeepEval Integration	LLM-as-judge evaluation of routing quality. A/B testing of LLM adapters.	LOW	LLM Routing
Multi-Engine Support	Pluggable serving backends. Silver on lakehouse (Iceberg/Delta), Gold on any serving DB (Postgres/StarRocks/ClickHouse).	LOW	SQLMesh

Implementation Order

Phase	Components	Outcome
Phase 1A NOW	SQLMesh integration, Approval queue, Access control, Fix T2 param bug, Fix Gold atomicity	Real managed Gold layer. No changes without approval. Agent scoping enforced.
Phase 1B	T1 tier, CLI	Full tier coverage. Management tooling. MCP server implemented with 6 tools.
Phase 2	LLM routing, Auto UC discovery, Auto consolidation, OpenTelemetry	Intelligent routing. Self-improving Gold layer. Production observability.
Phase 3	Auto TTL, DeepEval, Multi-engine	Self-managing data platform. Multiple serving backends.

Known Bugs

Bug	Location	Impact
T2 WHERE clause params not bound	`query_composer.py` — `_compose_sql()` concatenates `where_clauses` with `%s` placeholders but never substitutes `where_params`	Value filter queries produce SQL syntax errors. Affects T2 queries with value filters like "category electronics".
Gold refresh not atomic	`gold_manager.py` — TRUNCATE runs before INSERT. If INSERT fails, Gold table is left empty.	Brief window of data loss during refresh. Agents get empty results until refresh completes or is retried.
SQL injection surface in gateway	`gateway.py` — field names from `params` dict used directly as column names in WHERE clauses via `.format()`	Low risk (params come from UC definitions, not raw user input) but should be parameterized.

Scorecard integrity note: The benchmark scorecard claims 12/12 for CogniMesh. One property is partially implemented: Access Control (agent_id logged but not enforced). Smart Refresh now supports scheduled refresh as the primary mode (POST /refresh/scheduled) with a full report. Access control will be fully implemented in Phase 1A.

14 Testing & Quality Verification

CogniMesh is a data serving platform for AI agents. Incorrect data, broken lineage, or silent failures can cause agents to make wrong decisions. Testing is not optional — it's how we guarantee the platform does what it claims.

Every test category below runs on every build. No exceptions. If a test fails, the build fails. If a benchmark regresses, we investigate before merging.

Test Categories

Category	What It Verifies	When It Runs	Failure Means
Component Tests	Each module works correctly against a real Postgres database. No mocks — mocks hide real SQL errors. Registry CRUD, lineage tracking, audit logging, Gold refresh, capability matching, dependency graph — all tested against the actual database.	Every commit	A module is broken. Fix before merge.
Scorecard Tests	The 12 system properties are present and working. Discovery, lineage, audit, cost attribution, freshness, fallback, schema drift isolation, access control, impact analysis, provenance, smart refresh, governance. Binary pass/fail — no partial credit.	Every commit	A claimed capability is broken. Release blocker — no exceptions.
Resilience Scenarios	The system handles adversity. Schema drift: Gold isolates agents. Unsupported UC: T2 composes or T3 explains. Staleness: flagged in response. Concurrent refresh: no corruption.	Every commit	The system fails under stress. Fix the resilience mechanism.
Contract Tests	API responses match expected schemas. QueryResult always has tier, data, lineage, freshness. Agents depend on these contracts — breaking them breaks agents.	Every commit	An API contract changed. Fix the code or version the API.
Performance Benchmarks	Latency, throughput, storage, and refresh time stay within bounds. T0 under 10ms. Storage ratio under 0.6. Consolidation ratio under 0.5 at 10+ UCs.	Weekly + before releases	Performance regressed. Investigate before merging.

Test Infrastructure

DATABASE

Postgres in Docker

All tests run against a real Postgres instance (Docker Compose). No mocks, ever. Mocks hide real SQL errors, real connection issues, and real transaction behavior. If the test doesn't hit Postgres, it doesn't count.

FRAMEWORK

pytest + pytest-benchmark

pytest for all test categories. pytest-benchmark for performance measurement with statistical analysis (p50, p95, p99). Fixtures manage database state, API clients, and component initialization.

CI/CD

Run on every PR

GitHub Actions runs the full test suite on every pull request. Component + contract + scorecard + resilience on every commit. Performance benchmarks weekly or on-demand. No merge without green tests.

DATA

Deterministic seed

Test data is generated with a fixed seed (faker + random with seed=42). 10K customers, 500 products, 200K orders. Same data every run. Results are reproducible across machines and CI environments.

Test Directory Structure

benchmark/tests/ ├── conftest.py # Shared fixtures: DB, apps, gateway, seed data │ ├── test_registry.py # UC CRUD + change logging (real Postgres) ├── test_capability_index.py # UC matching + discovery ├── test_gateway.py # Full query path: T0, T2, T3 ├── test_gold_manager.py # Gold refresh + freshness ├── test_lineage.py # Lineage registration + query ├── test_audit.py # Audit logging + cost attribution ├── test_query_composer.py # T2 SQL composition ├── test_refresh_manager.py # Scheduled + real-time refresh, Silver change handling ├── test_dependency.py # Impact analysis + provenance + graph ├── test_access_control.py # Agent scoping + permissions │ ├── test_scorecard.py # The 12 system properties (binary pass/fail) │ ├── test_resilience_schema_drift.py # Silver column rename → Gold isolation ├── test_resilience_unsupported_uc.py # Unknown question → T2/T3 handling ├── test_resilience_staleness.py # TTL expiry → freshness flag │ ├── test_performance.py # T0 latency per UC (pytest-benchmark) ├── test_throughput.py # Concurrent request handling ├── test_scale_benchmark.py # Metrics at 3, 10, 20 UCs │ ├── test_contracts.py # API response schema validation └── test_refresh_and_deps.py # Dependency + refresh API tests

Performance Gates — Thresholds

These are the bounds that must hold. If a change causes a metric to exceed its threshold, the build fails and we investigate before merging.

Metric	Threshold	Current Value	Why This Threshold
T0 latency (p95)	< 10 ms	~5 ms	Agents need sub-10ms for real-time interaction. Governance overhead must stay bounded.
T2 latency (p95)	< 500 ms	~100-300 ms	Silver fallback must still feel responsive. Guardrails enforce a 5-second hard ceiling.
Storage ratio	< 0.6 (CogniMesh / REST)	0.49	Gold consolidation must reduce storage. If ratio exceeds 0.6, consolidation logic needs review.
Consolidation ratio	< 0.5 at 10+ UCs	0.35 at 20 UCs	Gold views should consolidate as UCs grow. Ratio above 0.5 means views aren't being shared.
Scorecard	12 / 12	12 / 12	Every claimed capability must work. No exceptions. Dropping below 12 is a release blocker.
Resilience scenarios	All pass	All pass	Schema drift, unsupported UC, and staleness scenarios must always be handled gracefully.

Quality Verification During Evolution

As CogniMesh evolves from Phase 1A through Phase 3, the test suite grows with it. Each new component adds its own tests AND must not break existing ones.

When This Changes	These Tests Must Still Pass	These Tests Are Added
SQLMesh replaces raw SQL	All scorecard, resilience, and performance tests. Query results must be identical.	SQLMesh model validation tests. Incremental refresh tests. Lineage auto-derivation tests.
MCP server implemented	All scorecard and contract tests continue to pass via REST. MCP server provides an additional transport with 6 tools.	MCP protocol compliance tests. MCP discovery tests. Transport-agnostic contract tests.
LLM routing replaces keyword matching	All scorecard tests. T0 must still serve registered UCs. T2/T3 must still work.	Routing accuracy tests (DeepEval). Latency tests for LLM overhead. Fallback tests when LLM is unavailable.
Access control added	All existing tests (run as authorized agent). Scorecard unchanged.	Permission enforcement tests. Denied access tests. Agent scoping tests. Role-based tests.
New UCs registered	All existing UC tests. Latency stays under threshold. Consolidation ratio stays under 0.5.	Tests for the new UC. Regression tests comparing before/after latency for existing UCs.

Running Tests

# Stack: Python 3.11+ · uv · pytest · Postgres in Docker # Start Postgres and run everything make up && make seed && make bench # Scorecard only (the 12 properties) uv run pytest benchmark/tests/test_scorecard.py -v # Resilience scenarios uv run pytest benchmark/tests/test_resilience_*.py -v # Performance with statistics uv run pytest benchmark/tests/test_performance.py --benchmark-json=results.json # All tests uv run pytest benchmark/tests/ -v --tb=short

The test suite is the contract between CogniMesh and its users. If the scorecard says 12/12, all 12 tests pass on every build. If the benchmark says sub-10ms latency, the performance gate enforces it. If the architecture says "schema drift doesn't break agents," a test proves it on every commit. Documentation can lie. Tests cannot.