Today, no production system relies on AI agents doing self-directed data discovery. When agents need to consume structured data, engineering teams build the access layer manually — dedicated REST APIs over a pre-defined Gold layer, or a GraphQL schema that agents query against. These are proven, performant approaches. They work well at small scale.
The problem is what happens as the number of agents and use cases grows. Every new agent question that isn't covered by an existing endpoint requires a new engineering cycle: schema design, API implementation, documentation, versioning, deployment. The data team becomes a bottleneck. Gold tables proliferate without a coherent lifecycle. Stale endpoints accumulate. Freshness policies are set once and never revisited.
CogniMesh is a new approach — the first framework designed specifically for agentic data consumption. Instead of manually engineering each access path, teams register use cases declaratively. CogniMesh derives the Gold layer automatically, exposes it via a REST API, observes usage patterns, and continuously consolidates and optimizes the serving layer without human intervention. The data platform learns from its own usage.
Each new agent UC requires a new API endpoint or GraphQL resolver. As agent count grows from 2 to 20, the engineering cost grows linearly. Teams can't keep up.
Pre-defined Gold tables are designed for known queries. When consumption patterns shift, the Gold layer doesn't follow. Expensive joins reappear. Stale views accumulate.
Current solutions expose data but don't track who asked what, which query patterns are expensive, or which Gold views are never used. No signal to optimize from.
Dedicated APIs return values with no lineage. Agents can answer questions but cannot explain where a value came from, which model produced it, or how fresh it is.
Three approaches exist for giving AI agents access to structured data. The table below evaluates them across the dimensions that matter most for agentic systems at scale.
| Dimension | Dedicated REST API hand-built endpoints over Gold |
GraphQL schema over Gold / Silver |
CogniMesh UC-derived, self-improving |
|---|---|---|---|
| Query performance | Excellent Hand-optimized, single call, pre-joined Gold |
Good Flexible queries but can over-fetch without careful resolver design |
Excellent at T0 Pre-joined, partitioned per access pattern. Equivalent to dedicated API after materialization. |
| Query cost | Low Pre-built Gold views, minimal compute at query time |
Medium Risk of full-table resolvers if queries are not constrained |
Low + controlled T0 = zero joins. T2 Silver fallback has hard guardrails. Cost visibility built in. |
| Token cost LLM context consumed |
Minimal Agent calls a known endpoint. No discovery needed. |
Low Agent constructs a query against a known schema. Schema must be in context. |
Minimal Capability index is small (~200 tokens). Embedded agent handles routing internally. |
| Initial investment | High Design + implement + document each endpoint. Gold tables must be designed upfront. |
Medium Schema design + resolver implementation. Self-describing reduces documentation burden. |
Medium Setup of CogniMesh + UC registry + SQL Mesh integration. Upfront tooling investment. |
| Maintainability per new agent UC |
High cost Every new UC = new endpoint. Engineering cycle required. Data team is the bottleneck. |
Low–medium New field in schema → new resolver. Less than REST but still manual per change. |
Near zero Register UC declaratively. SQL Mesh derives the Gold view. No endpoint to build. |
| Scales with agent count | Poorly Linear engineering cost per new agent type. Breaks down beyond ~5 agents. |
Reasonably Same schema shared across agents. New agents reuse existing resolvers. |
Well New agents discover UCs via REST API. Gold views shared and reused across agents automatically. |
| Unsupported UCs | Fails No endpoint → agent cannot answer. Manual engineering required before the agent can proceed. |
Partial Agent can compose a query from existing schema fields. Quality depends on resolver design. |
Tiered fallback T1 composition → T2 Silver fallback with guardrails → T3 reject with ETA. Never a hard failure. |
| Cost observability | Addable Not included by default. Can be added with OpenTelemetry + a log store. Requires separate engineering effort. |
Addable Query logs exist. Agent-layer correlation requires custom instrumentation on top. |
Built in Agent × UC × tier × cost tracked out of the box. Heatmaps, promote/deprecate signals, freshness compliance — no extra tooling. |
| Explainability | Addable Not included by default. dbt lineage or custom column tracking can be added. Requires deliberate engineering investment. |
Addable dbt or SQLMesh lineage can be bolted on. Not inherent to GraphQL itself. |
Built in Column lineage registered at materialization time as part of the SQL Mesh step. No additional tooling required. |
| Gold lifecycle mgmt | Manual Views accumulate. Stale tables remain. No signal to deprecate. |
Partial Schema versioning exists. Unused resolvers still require manual cleanup. |
Automated Promote hot patterns. Deprecate zero-access views. Gold evolves from actual usage. |
Both approaches start from the same Silver data and serve the same questions. The difference is not in the answer — it is in everything around the answer. Below, we take identical Bronze/Silver data, define three use cases, and walk through what each approach actually requires — from first line of code to production-ready system.
What is the current health status of customer X? Individual lookup · 4h freshness
What are the best-selling products in category Y? Bulk query · 24h freshness
Which customers are at risk of churning? Bulk query · 4h freshness
Total: 3-5 developer-days for endpoints alone. No lineage. No audit trail. No freshness tracking.
Total: 3-5 hours including setup. Monitoring, lineage, audit trail, freshness — all included.
| Dimension | UC = 1 | UC = 3 | UC = 10 |
|---|---|---|---|
| T0 Query Latency | REST ~2-5ms faster | REST | REST |
| Setup Simplicity | REST (fewer parts) | Tie | CogniMesh |
| Compute Footprint | REST (thinner) | REST | Comparable |
| Discovery | CogniMesh | CogniMesh | CogniMesh |
| Unsupported Query Handling | CogniMesh | CogniMesh | CogniMesh |
| Schema Drift Tolerance | CogniMesh | CogniMesh | CogniMesh |
| Lineage | CogniMesh | CogniMesh | CogniMesh |
| Observability | CogniMesh | CogniMesh | CogniMesh |
| Audit Trail | CogniMesh | CogniMesh | CogniMesh |
| Freshness Management | CogniMesh | CogniMesh | CogniMesh |
| Change Governance | CogniMesh | CogniMesh | CogniMesh |
| Cost Attribution | CogniMesh | CogniMesh | CogniMesh |
| Marginal UC Cost | — | CogniMesh (15 min vs 8 hrs) | CogniMesh |
| Gold Consolidation | — | CogniMesh | CogniMesh |
At UC = 1, REST wins on 3 dimensions (all about being leaner/faster for the narrow case). CogniMesh wins on 11 dimensions (all about being a better system). The gap only widens.
| UC Count | REST (cumulative) | CogniMesh (cumulative) | Delta |
|---|---|---|---|
| 0 (setup only) | 0 h | 3 h | REST ahead by 3 h |
| 1 | 8 h | 3.5 h | CogniMesh ahead by 4.5 h |
| 2 | 16 h | 4 h | CogniMesh ahead by 12 h |
| 3 | 24 h | 4.5 h | CogniMesh ahead by 19.5 h |
| 10 | 80 h | 7 h | CogniMesh ahead by 73 h |
The crossover on developer hours happens at UC = 1. CogniMesh's one-time setup cost (3 hours) is less than building a single REST endpoint with its Gold table, tests, and docs (8 hours). There is no 'wait until UC = 5 for CogniMesh to pay off.' It pays off immediately.
| Capability | REST at UC = 1 | CogniMesh at UC = 1 | Cost to add to REST |
|---|---|---|---|
| Query serving | Yes | Yes | — |
| Agent discovery | No | Yes | 2-4 hours |
| Unsupported query handling | No | Yes | Architectural change |
| Lineage per response | No | Yes | 1-2 days |
| Freshness monitoring | No | Yes | 1-2 days |
| Per-UC cost attribution | No | Yes | 1-2 days |
| Audit trail | No | Yes | 1 day |
| Schema drift isolation | No | Yes | Cannot retrofit |
| Change approval workflow | No | Yes | Cannot retrofit |
To bring REST to parity with CogniMesh's day-one capabilities requires 7-12 additional developer-days on top of endpoint work. Most teams never do it — endpoints run without monitoring, without lineage, without freshness tracking, and with no graceful handling of unexpected questions.
CogniMesh sits as a layer between consuming agents and the underlying data platform. It exposes a REST API upward (FastAPI) and speaks to any medallion-compatible data store downward.
CogniMesh works on top of any medallion-style data platform. It does not depend on a specific storage format, compute engine, or cloud provider.
| Layer | Content | Who reads it | Designed by |
|---|---|---|---|
| Bronze | Raw ingested data, immutable | Transformation jobs only | Data engineers |
| Silver | Enriched, normalized, feature-enriched data | SQL Mesh · fallback queries (Tier 2) | Data engineers |
| Gold | Pre-joined, pre-aggregated, UC-optimized serving views | CogniMesh gateway only — never accessed directly by agents | SQL Mesh — derived from registered UCs, not hand-designed |
The authoring unit in CogniMesh is a question, not a table. A Use Case (UC) defines what question needs answering, what data fields are required, which agent consumes it, and how fresh the answer must be.
Team registers known use cases manually as structured records. SQL Mesh derives Gold views from them. Low maintenance — UCs only change when business logic changes.
Gateway observes repeated field combination patterns in Tier 1/2 fallbacks. When frequency exceeds threshold, the pattern becomes a UC candidate and enters the registry pending review.
For each registered UC, the SQL Mesh layer runs four steps in sequence before creating anything new in Gold.
Does an existing Gold view already serve this UC fully or partially? Extend before creating. Prevents Gold table sprawl and duplicate compute.
LLM derives optimal SELECT + JOIN + WHERE from source schema and UC field requirements. Output is a declarative, version-controlled SQL model file.
Access pattern determines partition key. Individual lookups partition by entity key. Bulk queries partition by segment or category. Ensures zero full-scans at query time.
Every output column traced to its source column and originating event. Stored in lineage tracker. Powers explainability at query time with zero extra cost.
The gateway exposes a REST API (FastAPI). Consuming agents discover capabilities via GET /discover and query data via POST /query — they do not hardcode column names or join keys. Any agent on any LLM framework that can make HTTP calls can consume data through CogniMesh without schema knowledge.
REST is the right interface because: any HTTP client works (Python, JavaScript, curl, LangChain, CrewAI, custom agents), inputs and outputs are typed (Pydantic models), the agent gets structured data back (not a string to parse), and new capabilities auto-appear in the discovery endpoint when a UC is materialized.
GET /discover, POST /query) works with any HTTP client. The MCP server (cognimesh_core/mcp_server.py) provides 6 tools — query, discover, check_drift, refresh, impact_analysis, provenance — for agent frameworks that speak MCP natively. Both interfaces share the same governance layer.
The consuming agent sends a natural language question via POST /query. The embedded gateway agent resolves it in three steps — the consuming agent never sees table names or column names:
When a question doesn't match an existing UC, the gateway does not reject. It falls through a cost-aware tier hierarchy, serves the best available answer, and auto-registers the pattern for future materialization.
The combined signal feeds two consumers: a manager dashboard for human oversight, and an agentic health monitor that autonomously triggers consolidation, TTL adjustments, and deprecation recommendations.
Every answer CogniMesh returns is traceable to its source. Explainability is built in at construction time — not computed at query time.
Each Gold view has a machine-readable model card stored alongside its SQL model in version control:
Every Gold column traces back to its Silver source, transformation logic, and model version:
| Gold column | Silver source | Transformation | Model version |
|---|---|---|---|
risk_score | silver.enriched.ml_risk_v4 | passthrough — ML model output | risk-model-v4.2 |
outstanding_balance | silver.transactions.amount | SUM where status=UNPAID | v3.1.0 |
latency_p90 | silver.events.response_ms | PERCENTILE(90) OVER 7d window | v2.0.1 |
value_segment | silver.enriched.ml_ltv_segment | passthrough — ML model output | ltv-model-v2.0 |
Freshness is a first-class property of each UC. It defines how often the Gold view is recalculated — and is both a cost driver and a data quality signal that must be monitored and continuously evaluated.
| Access pattern | Typical TTL | Rationale | Review trigger |
|---|---|---|---|
| Real-time signals (events, latency) | 15m – 1h | Source updates continuously | If signal variance < N% between runs → relax TTL |
| ML model scores | 1h – 4h | Tied to model inference cadence | If model not retrained → no benefit in recalculating |
| Aggregated segments | 4h – 24h | Segments are slow-moving | If zero queries during off-hours → pause overnight |
| Historical / reference | Daily | Source is append-only | Incremental refresh only — no full recalculation |
The observability layer monitors both freshness compliance (was the view recalculated within TTL?) and freshness necessity (did the data actually change between recalculations?). Over-aggressive TTLs are flagged automatically — a view recalculated hourly that only changes meaningfully every 4 hours is a 4× unnecessary compute cost.
Decisions made during the design process, with rationale.
| Decision | Resolution | Rationale |
|---|---|---|
| Tier 2 cost guardrails | Resolved Environment parameter |
Thresholds (max rows scanned, max compute units, timeout) are configuration values set per deployment environment. Not hardcoded in the library. Each team sets limits appropriate to their platform and budget. |
| Auto-promotion threshold | Resolved Reported metric — Phase 2 |
In Phase 1, T2 hit frequency is a reported metric only — surfaced in the monitoring dashboard for human review. In Phase 2, a configurable threshold can trigger automatic UC candidate creation. Frequency threshold is itself an environment parameter. |
| Agentic health monitor autonomy | Resolved Human-in-loop — Phase 1 |
All Gold layer changes (promote, deprecate, TTL adjustment) require human approval in Phase 1. This is an important governance gate. The monitor surfaces suggestions with full context; a human approves. Phase 2 may introduce rule-based auto-approval for low-risk actions. |
| Multi-tenancy | Resolved Per-model, optional per-tenant |
The capability index is scoped per model (data domain). Per-tenant isolation of UCs is supported as an option for teams that need it but is not required. Most deployments share a single capability index per domain. |
| SQL Mesh tooling | Resolved Pluggable — native integrations planned |
SQLMesh is the preferred default — Python-native, state-aware incremental runs, no separate server. However, CogniMesh will support integration with platform-native materialization tools (Snowflake Dynamic Tables, Databricks Materialized Views, dbt) via an adapter interface. Teams can use whichever tool is already in their stack. |
| Embedded agent LLM | Resolved Pluggable + A/B evaluation |
The embedded gateway agent is LLM-agnostic. Any provider can be configured. A two-layer evaluation framework is planned — an LLM for routing and an LLM-as-judge for answer quality evaluation, similar to DeepEval's ConversationSimulator pattern. This enables A/B testing of routing models and continuous quality measurement. |
| UC conflict resolution | Resolved Suggest + human approval — Phase 1 |
In Phase 1, when a new UC overlaps with an existing Gold view, CogniMesh surfaces a structured suggestion to the human operator: extend the existing view, create a new one, or merge UCs. The conflict and tradeoffs are expressed clearly in the suggestion. The human decides. Phase 2 will introduce threshold-based auto-resolution rules for common conflict patterns (e.g. field addition to an existing view is auto-approved; new partition key always requires human review). |
CogniMesh is designed to evolve incrementally. Phase 1 establishes the foundation with full human oversight. Later phases progressively automate decisions that have been validated as safe to automate.
Current state of every CogniMesh component — what's production-ready, what's a workaround, and what needs to be built.
| Component | Module | What It Does |
|---|---|---|
| UC Registry | registry.py | Full CRUD on use cases with change logging (before/after state). Every mutation recorded. |
| T0 Gold Serving | gateway.py | Serves pre-computed Gold views with 3 access patterns: individual lookup, bulk query, aggregation. |
| T3 Rejection | gateway.py | Structured rejection with explanation and list of available UCs. Never returns a bare 404. |
| Audit Log | audit.py | Every query logged with agent ID, UC, tier, latency, cost. Async (zero latency impact). Cost attribution per UC and agent. |
| Dependency Reporter | dependency.py | Impact analysis, provenance, full graph, what-if queries. Traces Silver → Gold → UC dependencies. |
| Refresh Manager | refresh_manager.py | Scheduled refresh (primary): periodic TTL check, rebuild stale views, return report. Real-time (optional): Postgres LISTEN/NOTIFY for immediate Silver change detection. Dependency-aware cascading in both modes. |
| Lineage Tracker | lineage.py | Column-level lineage registration and query. Attached to every T0 response. |
| Component | Current State | Target State | Priority |
|---|---|---|---|
| Gold Manager | Raw SQL: TRUNCATE + INSERT. Not atomic. No approval gate. | SQLMesh manages Gold derivation. Human approval before changes. Atomic refresh. | CRITICAL |
| Capability Index | Keyword matching (token overlap). No semantic understanding. | LLM-based semantic routing via pluggable Protocol. A/B testing of routing quality. | MEDIUM |
| T2 Query Composer | Template-based SQL composition. Single-table only. Has parameter binding bug. | LLM-based SQL composition with multi-table JOINs. Proper parameterization. | MEDIUM |
| Component | What It Does | Priority | Depends On |
|---|---|---|---|
| SQLMesh Integration | Replace raw SQL Gold derivation with SQLMesh models. Managed materialization, built-in lineage, change detection, incremental refresh. | CRITICAL | — |
| Approval Queue | Human approval gate before any Gold change. Phase 1 invariant: "nothing changes in Gold without approval." Approval API + CLI. | CRITICAL | SQLMesh |
| Access Control | Agent identity enforcement. Per-UC permissions (allowed_agents). Row-level data isolation. Role-based UC management. | HIGH | — |
| MCP Server | Model Context Protocol server with 6 tools: query, discover, check_drift, refresh, impact_analysis, provenance. Implemented in cognimesh_core/mcp_server.py. Works alongside the REST API. |
DONE | v0.1.0 |
| T1 Tier | Cross-Gold-view composition. When fields span multiple Gold views, compose them in memory (~50ms). | HIGH | Capability Index |
| LLM Routing Adapter | Pluggable LLM for semantic UC matching. Replace keyword matching. Protocol exists, need implementation for OpenAI/Anthropic/Ollama. | MEDIUM | — |
| Auto UC Discovery | Detect frequent T2 patterns in audit log. Generate UC candidates automatically. Human approves → promoted to Gold. | MEDIUM | Audit Log |
| Auto Consolidation | Detect Gold view overlap at UC registration. Suggest merging into shared views. Currently manual. | MEDIUM | Dependency Reporter |
| CLI (typer + rich) | Management interface: register UCs, approve changes, check status, view dependencies, trigger refresh. | MEDIUM | Approval Queue |
| OpenTelemetry | Replace custom audit log with OTel spans and metrics. Export to ClickHouse/Grafana. | LOW | — |
| Auto TTL Adjustment | Adjust UC freshness TTL based on actual data change frequency. Phase 2. | LOW | Refresh Manager |
| DeepEval Integration | LLM-as-judge evaluation of routing quality. A/B testing of LLM adapters. | LOW | LLM Routing |
| Multi-Engine Support | Pluggable serving backends. Silver on lakehouse (Iceberg/Delta), Gold on any serving DB (Postgres/StarRocks/ClickHouse). | LOW | SQLMesh |
| Phase | Components | Outcome |
|---|---|---|
| Phase 1A NOW |
SQLMesh integration, Approval queue, Access control, Fix T2 param bug, Fix Gold atomicity | Real managed Gold layer. No changes without approval. Agent scoping enforced. |
| Phase 1B | T1 tier, CLI | Full tier coverage. Management tooling. MCP server implemented with 6 tools. |
| Phase 2 | LLM routing, Auto UC discovery, Auto consolidation, OpenTelemetry | Intelligent routing. Self-improving Gold layer. Production observability. |
| Phase 3 | Auto TTL, DeepEval, Multi-engine | Self-managing data platform. Multiple serving backends. |
| Bug | Location | Impact |
|---|---|---|
| T2 WHERE clause params not bound | query_composer.py — _compose_sql() concatenates where_clauses with %s placeholders but never substitutes where_params |
Value filter queries produce SQL syntax errors. Affects T2 queries with value filters like "category electronics". |
| Gold refresh not atomic | gold_manager.py — TRUNCATE runs before INSERT. If INSERT fails, Gold table is left empty. |
Brief window of data loss during refresh. Agents get empty results until refresh completes or is retried. |
| SQL injection surface in gateway | gateway.py — field names from params dict used directly as column names in WHERE clauses via .format() |
Low risk (params come from UC definitions, not raw user input) but should be parameterized. |
CogniMesh is a data serving platform for AI agents. Incorrect data, broken lineage, or silent failures can cause agents to make wrong decisions. Testing is not optional — it's how we guarantee the platform does what it claims.
Every test category below runs on every build. No exceptions. If a test fails, the build fails. If a benchmark regresses, we investigate before merging.
| Category | What It Verifies | When It Runs | Failure Means |
|---|---|---|---|
| Component Tests | Each module works correctly against a real Postgres database. No mocks — mocks hide real SQL errors. Registry CRUD, lineage tracking, audit logging, Gold refresh, capability matching, dependency graph — all tested against the actual database. | Every commit | A module is broken. Fix before merge. |
| Scorecard Tests | The 12 system properties are present and working. Discovery, lineage, audit, cost attribution, freshness, fallback, schema drift isolation, access control, impact analysis, provenance, smart refresh, governance. Binary pass/fail — no partial credit. | Every commit | A claimed capability is broken. Release blocker — no exceptions. |
| Resilience Scenarios | The system handles adversity. Schema drift: Gold isolates agents. Unsupported UC: T2 composes or T3 explains. Staleness: flagged in response. Concurrent refresh: no corruption. | Every commit | The system fails under stress. Fix the resilience mechanism. |
| Contract Tests | API responses match expected schemas. QueryResult always has tier, data, lineage, freshness. Agents depend on these contracts — breaking them breaks agents. | Every commit | An API contract changed. Fix the code or version the API. |
| Performance Benchmarks | Latency, throughput, storage, and refresh time stay within bounds. T0 under 10ms. Storage ratio under 0.6. Consolidation ratio under 0.5 at 10+ UCs. | Weekly + before releases | Performance regressed. Investigate before merging. |
All tests run against a real Postgres instance (Docker Compose). No mocks, ever. Mocks hide real SQL errors, real connection issues, and real transaction behavior. If the test doesn't hit Postgres, it doesn't count.
pytest for all test categories. pytest-benchmark for performance measurement with statistical analysis (p50, p95, p99). Fixtures manage database state, API clients, and component initialization.
GitHub Actions runs the full test suite on every pull request. Component + contract + scorecard + resilience on every commit. Performance benchmarks weekly or on-demand. No merge without green tests.
Test data is generated with a fixed seed (faker + random with seed=42). 10K customers, 500 products, 200K orders. Same data every run. Results are reproducible across machines and CI environments.
These are the bounds that must hold. If a change causes a metric to exceed its threshold, the build fails and we investigate before merging.
| Metric | Threshold | Current Value | Why This Threshold |
|---|---|---|---|
| T0 latency (p95) | < 10 ms | ~5 ms | Agents need sub-10ms for real-time interaction. Governance overhead must stay bounded. |
| T2 latency (p95) | < 500 ms | ~100-300 ms | Silver fallback must still feel responsive. Guardrails enforce a 5-second hard ceiling. |
| Storage ratio | < 0.6 (CogniMesh / REST) | 0.49 | Gold consolidation must reduce storage. If ratio exceeds 0.6, consolidation logic needs review. |
| Consolidation ratio | < 0.5 at 10+ UCs | 0.35 at 20 UCs | Gold views should consolidate as UCs grow. Ratio above 0.5 means views aren't being shared. |
| Scorecard | 12 / 12 | 12 / 12 | Every claimed capability must work. No exceptions. Dropping below 12 is a release blocker. |
| Resilience scenarios | All pass | All pass | Schema drift, unsupported UC, and staleness scenarios must always be handled gracefully. |
As CogniMesh evolves from Phase 1A through Phase 3, the test suite grows with it. Each new component adds its own tests AND must not break existing ones.
| When This Changes | These Tests Must Still Pass | These Tests Are Added |
|---|---|---|
| SQLMesh replaces raw SQL | All scorecard, resilience, and performance tests. Query results must be identical. | SQLMesh model validation tests. Incremental refresh tests. Lineage auto-derivation tests. |
| MCP server implemented | All scorecard and contract tests continue to pass via REST. MCP server provides an additional transport with 6 tools. | MCP protocol compliance tests. MCP discovery tests. Transport-agnostic contract tests. |
| LLM routing replaces keyword matching | All scorecard tests. T0 must still serve registered UCs. T2/T3 must still work. | Routing accuracy tests (DeepEval). Latency tests for LLM overhead. Fallback tests when LLM is unavailable. |
| Access control added | All existing tests (run as authorized agent). Scorecard unchanged. | Permission enforcement tests. Denied access tests. Agent scoping tests. Role-based tests. |
| New UCs registered | All existing UC tests. Latency stays under threshold. Consolidation ratio stays under 0.5. | Tests for the new UC. Regression tests comparing before/after latency for existing UCs. |