Product6 min read

Meet the Data Guardrail Agent: Code Review's Missing Half

The newest agent in the Data Workers swarm reviews what a change does to your data — and ends every review with a verdict that can act, not a comment that can't.

By The Data Workers Team

There is a failure mode every data team knows and almost no review process catches: the pipeline runs green, the tests pass, the PR looks clean — and the data is quietly wrong. A type cast that rounds differently. A join that silently drops 3% of rows. A refactor whose description says 'no behavior change' while the mean of a revenue column drifts 4%. Code review examines what the change *says*. Nothing examines what it *does* — to the actual rows, in the actual warehouse.

The Data Guardrail Agent — dw-review, internally — is the newest member of the Data Workers swarm, and it exists to close exactly that gap. Point it at a change — a PR diff, a list of tables, or a pair of dbt manifests — and it runs the same SQL against your base and current environments, diffs the results column by column, buckets the risk, and returns a verdict your other agents can act on.

Cheap evidence first

Data diffing has a reputation for being expensive, and done naively, it is — a row-level comparison across two environments of a wide table is real warehouse money. The agent's answer is a progressive review: it spends nothing before it spends something, and it stops descending the moment the verdict is clear.

The progressive review funnel: metadata first, then row counts, then column profiles, then row-level proof — query cost rises at each stage
the review descends only as far as the verdict requires

Metadata comes first because it is free: lineage and schema diffs need no warehouse query at all. Row counts come next — one COUNT(*) per side catches the loud failures. Column profiles (null rates, distinct counts, mins, maxes, means, medians) catch distribution drift. Only when the evidence demands it does the review reach for row-level proof: a primary-key join that reports exactly which rows changed, appeared, or vanished — and even then, follow-ups are capped. A review is an investigation with a budget, not a fishing trip.

A verdict that can act

Most data tooling stops at *surfacing* a problem: a dashboard, an alert, a PR comment. We think a review that ends in prose is a dashboard with extra steps. Every verdict it issues carries a machine-actionable handoff plan — structured tool calls aimed at the right specialist agents in the swarm — and the plan is simply empty when the change is safe.

Anatomy of a Data Review Summary: impact, root cause, intent check, and a HIGH risk bucket on the left; structured handoffs to dw-schema, dw-incidents, and dw-governance on the right
the verdict on the left · the handoffs it generates on the right
  • A breaking schema change on a consumed column produces a handoff to the schema agent's migration generator — with the exact change object it needs to draft a reversible migration plan.
  • A confirmed regression — most rows changed, mean shifted — produces a handoff to the incident agent with the anomaly signals already extracted.
  • An intent mismatch — the PR says 'no behavior change,' the data disagrees — routes to the governance agent so a named human signs off on the discrepancy.
  • A clean change produces an empty handoff plan and a LOW verdict. No theater.

The risk buckets themselves are deliberately boring: HIGH for breaking schema changes, majority-row changes with a mean shift, or intent mismatches; MEDIUM for double-digit row-count or value drift; LOW when every delta is small and the structure held. One rule we hold to everywhere: missing evidence widens uncertainty — it never shrinks risk. A table with no primary key doesn't get a pass because the expensive check couldn't run; it gets a wider error bar and a sharper question.

Fourteen tools, four warehouses, no dbt required

The Data Guardrail Agent ships as a standard MCP agent: fourteen tools behind one stdio server, so it works anywhere the rest of the swarm works — Claude Code, Cursor, your CI runner, or another agent's tool call.

The fourteen Data Guardrail Agent tools grouped by function: review_pr orchestration, free metadata diffs, base-vs-current data diffs, and re-runnable checklists
one hero tool, thirteen sharp primitives

Everything runs on plain SQL and INFORMATION_SCHEMA against Snowflake, BigQuery, PostgreSQL, and Databricks. If you have dbt manifests, they enrich the review — modified-model detection gets sharper, primary keys get inferred from your tests — but they never gate it. Plenty of valuable data lives in warehouses no dbt project has ever touched, and it deserves review too.

Findings worth keeping become checks: named, re-runnable validations attached to the review session. The third time you catch the same regression on the same table, you stop catching it manually — the check runs on every review of that table from then on.

Honest by construction

A review tool you can't trust is worse than no review tool, so the agent is built to refuse the easy lie. Every summary is stamped with the warehouse mode it ran against — including a visible warning when it ran on the bundled demo fixture instead of a real connection. A row count that couldn't be measured is reported as null with a note, never as a fabricated zero. A duplicate primary key fails loudly, with the probe SQL included so you can see exactly what was checked. And every verdict cites its evidence: which diffs ran, what they found, what couldn't be verified.

Try it in five minutes

The agent ships with a seeded demo fixture, so the first run needs zero credentials: clone the swarm, start the agent, and call review_pr on the bundled scenario — you'll get a full Data Review Summary, risk bucket, handoffs and all, honestly stamped as the demo it is. Pointing it at a real warehouse is one environment variable and an inline environment mapping. From there, the CI recipe in the repo wires the same review into every pull request, with the summary posted as a comment and the verdict available as a gate.

The Data Guardrail Agent joins the swarm alongside the schema, quality, incident, lineage, and governance agents it hands off to. If your pipeline can pass while your data fails, you don't have a testing problem — you have a review problem. Now there's an agent for it.

Related Posts