Meet the Data Guardrail Agent: Code Review's Missing Half
The newest agent in the Data Workers swarm reviews what a change does to your data — and ends every review with a verdict that can act, not a comment that can't.
By The Data Workers Team
There is a failure mode every data team knows and almost no review process catches: the pipeline runs green, the tests pass, the PR looks clean — and the data is quietly wrong. A type cast that rounds differently. A join that silently drops 3% of rows. A refactor whose description says 'no behavior change' while the mean of a revenue column drifts 4%. Code review examines what the change *says*. Nothing examines what it *does* — to the actual rows, in the actual warehouse.
The Data Guardrail Agent — dw-review, internally — is the newest member of the Data Workers swarm, and it exists to close exactly that gap. Point it at a change — a PR diff, a list of tables, or a pair of dbt manifests — and it runs the same SQL against your base and current environments, diffs the results column by column, buckets the risk, and returns a verdict your other agents can act on.
Cheap evidence first
Data diffing has a reputation for being expensive, and done naively, it is — a row-level comparison across two environments of a wide table is real warehouse money. The agent's answer is a progressive review: it spends nothing before it spends something, and it stops descending the moment the verdict is clear.
Metadata comes first because it is free: lineage and schema diffs need no warehouse query at all. Row counts come next — one COUNT(*) per side catches the loud failures. Column profiles (null rates, distinct counts, mins, maxes, means, medians) catch distribution drift. Only when the evidence demands it does the review reach for row-level proof: a primary-key join that reports exactly which rows changed, appeared, or vanished — and even then, follow-ups are capped. A review is an investigation with a budget, not a fishing trip.
A verdict that can act
Most data tooling stops at *surfacing* a problem: a dashboard, an alert, a PR comment. We think a review that ends in prose is a dashboard with extra steps. Every verdict it issues carries a machine-actionable handoff plan — structured tool calls aimed at the right specialist agents in the swarm — and the plan is simply empty when the change is safe.
- •A breaking schema change on a consumed column produces a handoff to the schema agent's migration generator — with the exact change object it needs to draft a reversible migration plan.
- •A confirmed regression — most rows changed, mean shifted — produces a handoff to the incident agent with the anomaly signals already extracted.
- •An intent mismatch — the PR says 'no behavior change,' the data disagrees — routes to the governance agent so a named human signs off on the discrepancy.
- •A clean change produces an empty handoff plan and a LOW verdict. No theater.
The risk buckets themselves are deliberately boring: HIGH for breaking schema changes, majority-row changes with a mean shift, or intent mismatches; MEDIUM for double-digit row-count or value drift; LOW when every delta is small and the structure held. One rule we hold to everywhere: missing evidence widens uncertainty — it never shrinks risk. A table with no primary key doesn't get a pass because the expensive check couldn't run; it gets a wider error bar and a sharper question.
Fourteen tools, four warehouses, no dbt required
The Data Guardrail Agent ships as a standard MCP agent: fourteen tools behind one stdio server, so it works anywhere the rest of the swarm works — Claude Code, Cursor, your CI runner, or another agent's tool call.
Everything runs on plain SQL and INFORMATION_SCHEMA against Snowflake, BigQuery, PostgreSQL, and Databricks. If you have dbt manifests, they enrich the review — modified-model detection gets sharper, primary keys get inferred from your tests — but they never gate it. Plenty of valuable data lives in warehouses no dbt project has ever touched, and it deserves review too.
Findings worth keeping become checks: named, re-runnable validations attached to the review session. The third time you catch the same regression on the same table, you stop catching it manually — the check runs on every review of that table from then on.
Honest by construction
A review tool you can't trust is worse than no review tool, so the agent is built to refuse the easy lie. Every summary is stamped with the warehouse mode it ran against — including a visible warning when it ran on the bundled demo fixture instead of a real connection. A row count that couldn't be measured is reported as null with a note, never as a fabricated zero. A duplicate primary key fails loudly, with the probe SQL included so you can see exactly what was checked. And every verdict cites its evidence: which diffs ran, what they found, what couldn't be verified.
Try it in five minutes
The agent ships with a seeded demo fixture, so the first run needs zero credentials: clone the swarm, start the agent, and call review_pr on the bundled scenario — you'll get a full Data Review Summary, risk bucket, handoffs and all, honestly stamped as the demo it is. Pointing it at a real warehouse is one environment variable and an inline environment mapping. From there, the CI recipe in the repo wires the same review into every pull request, with the summary posted as a comment and the verdict available as a gate.
The Data Guardrail Agent joins the swarm alongside the schema, quality, incident, lineage, and governance agents it hands off to. If your pipeline can pass while your data fails, you don't have a testing problem — you have a review problem. Now there's an agent for it.
Related Posts
Why AI Agents Hallucinate on Your Data (And How to Fix It)
AI agents writing SQL against your data warehouse get it wrong 66% more often without semantic grounding. Here is why context is the missing layer in every data stack — and what we are building to fix it.
Copilots, Agents, and Swarms: A Decision Framework for Data Teams
The AI discourse in data engineering has collapsed into a single word: agents. Every vendor is an "agent" now. The word has lost meaning.
Our Agent Roadmap: What We've Built, What We're Building, and Why
The average data team spends 60-70% of their time on reactive maintenance. Here is how 11 specialized agents address that — and why the order we build them matters more than the features.