guide15 min read

AI for Data Infra: The Complete 2026 Guide to Agents for Data Engineering

AI for Data Infra: The Complete 2026 Guide to Agents for Data Engineering

Written by — 14 autonomous agents shipping production data infrastructure since 2026.

Technically reviewed by the Data Workers engineering team.

Last updated .

AI for Data Infra is the engineering practice of running autonomous agents inside the data platform — pipelines, catalogs, quality, governance, incidents — so agents do the work a platform team used to do by hand. It replaces chat-with-your-data toys with production agent swarms built on Claude Code, MCP, and the modern lakehouse.

If you have tried LLMs for data and been burned by hallucinated SQL, broken pipelines, or agents that cannot see your catalog, this guide is the playbook for the next generation. It covers the 4-layer engineering system, the MCP protocol, Claude Code as the runtime, integrations with Snowflake and Databricks and dbt, compliance and governance, evaluation, and how Data Workers implements every piece of it as open source.

By the end you will know how to architect an AI-for-data-infra stack that ships to production without breaking it, how to evaluate vendor claims, and how to map the 14-agent Data Workers swarm onto your existing data platform. The 2,500-word walkthrough below is the canonical reference — every link points to a deeper dive in this resource set.

What is AI for Data Infra?

AI for Data Infra is the discipline of building and operating autonomous agents that manage the data platform itself: pipelines, warehouses, catalogs, quality checks, cost controls, migrations, and incident response. The category is distinct from chat-with-your-data BI bots because the agents act on infrastructure, not on dashboards — they open pull requests, run migrations, page humans, and roll back failed deploys.

The emergence of this category in 2025 and 2026 is driven by three converging forces. First, LLMs crossed the reliability threshold for code and SQL generation. Second, the Model Context Protocol standardized how agents access data systems. Third, Claude Code and similar coding agents gave platforms a durable runtime that remembers projects across sessions. Together they made agents for data infra not just possible but deployable.

The shorthand definition: AI for data infra means replacing tickets with agents. Every handoff that used to be a Jira card — add a column, backfill a dim table, investigate a freshness alert, approve a schema change — becomes a prompt to an agent that has the context, tools, and guardrails to do it safely. The agents do not replace the data team; they replace the queue the data team used to work through.

Why Early Chat-With-Your-Data Agents Failed

The first generation of data agents — Text2SQL demos, natural-language BI, chat-over-your-warehouse — shipped in 2023 and 2024 and almost all of them underperformed their demos. The failure modes were predictable once you looked at the architecture: they treated the warehouse as a flat schema, fed the LLM raw table names, and hoped it would guess the business meaning. It could not.

Three specific gaps killed the first generation. The context gap: agents had no persistent memory of how the business used its data. The tribal knowledge gap: the rules that made a query correct (which user_id to exclude, which revenue column is net of refunds, which date field is the source of truth) lived in Slack threads and senior engineers' heads, never in the schema. The canonical tables gap: most warehouses have three revenue tables and the correct one depends on the question — the LLM picked the first match and got it wrong.

  • No persistent project memory across sessions — every question started from zero
  • Flat schema prompts that ignored metrics definitions and business logic
  • No access to lineage, ownership, or the catalog's semantic layer
  • Text-to-SQL evaluated on Spider, not on dirty real-world warehouses
  • No write path — agents could suggest but never ship a fix

The lesson was not that LLMs are bad at data — it was that data work is context-heavy and the first wave did not invest in context. The second generation, which we call AI for data infra, starts from the context layer and works outward. See our autonomous data engineering primer at /resources/autonomous-data-engineering for the long version of this history.

The 4-Layer AI Engineering System for Data

A production AI-for-data-infra stack has four layers. Skip any of them and the agents break. The layers, from bottom to top, are: project memory (CLAUDE.md), reusable skills, enforcement hooks, and orchestrated agents. Each layer is a concrete artifact you commit to the repo, not a vendor feature.

LayerArtifactPurposeExample
1. MemoryCLAUDE.mdPersistent project context across sessionsWarehouse DSN, canonical tables, brand voice, owners
2. Skills/.claude/skills/*.mdReusable playbooks agents can invokerun-dbt, backfill-dim, on-call-triage
3. Hookssettings.json hooksGuardrails and automationBlock prod writes without approval, run tests on save
4. AgentsSubagents + MCP toolsAutonomous execution with toolsPipeline agent, catalog agent, incident agent

CLAUDE.md is the load-bearing file. It replaces the one-shot prompt with durable project memory — the model reads it at the start of every session, so you do not re-explain your warehouse on every conversation. The best CLAUDE.md files are 500 to 2,000 lines and read like an onboarding doc for a new hire: conventions, tables, SLAs, who owns what. See our CLAUDE.md guide for data teams at /resources/claude-md-data-engineering for the full template.

Skills are the second layer because they package playbooks. A skill is a Markdown file that says how to do a recurring task — run a dbt build, backfill a dimension, triage an incident, rotate a credential. Agents invoke skills by name, which means the same playbook runs the same way whether a human or an agent is driving. That is what makes the output reproducible.

Hooks are the third layer because they turn policy into code. A hook is a small script registered in settings.json that runs before or after an agent action: block a write to production unless there is a PR, run dbt tests on every save, post a Slack message when an agent touches a PII table. Hooks are how you trust an agent with production — not because the agent is perfect, but because the hooks catch the 1% of mistakes.

Agents are the top layer. Each agent is a Claude Code subagent with a scoped tool set: a pipeline agent has dbt, Airflow, and git; a catalog agent has OpenMetadata and DataHub; a cost agent has Snowflake query history and Databricks billing. Agents orchestrate by calling each other through MCP, so a pipeline failure triggers an incident agent, which pages the owner and files a ticket — no human in the middle.

Context Engineering: The Successor to Prompt Engineering

Prompt engineering treated the LLM like a black box you tricked into behaving. Context engineering treats it like a new hire you onboard properly. The shift is from clever sentences to durable artifacts — files the agent reads every session, not prompts you reinvent every time.

The context stack for data infra has four planes: the code plane (repo contents), the data plane (schemas, sample rows, lineage), the runbook plane (incidents, SLAs, ownership), and the history plane (past decisions, why the previous migration failed). An agent that sees only the code plane will write syntactically valid SQL that is semantically wrong. An agent that sees all four will write SQL that your senior engineer would ship.

  • Code plane — repo files, dbt models, Airflow DAGs, Terraform
  • Data plane — schemas, sample rows, lineage graphs, metric definitions
  • Runbook plane — incident history, SLAs, on-call rotations, escalation paths
  • History plane — decisions log, past migration notes, review comments
  • Human plane — slack conversations, design docs, PRD archives

Data Workers builds the context stack automatically — the catalog agent crawls metadata, the observability agent builds lineage, the insights agent indexes decision history, and everything surfaces through MCP tools. See our context engineering playbook at /resources/context-engineering-for-data-agents for the full architecture.

Multi-Agent Tech Department for Data

A single do-everything agent breaks at scale. The alternative — and the pattern that works in production — is a multi-agent tech department: specialized subagents that each own a slice of the work, coordinated by a planning agent. The pattern mirrors a real engineering org: an architect thinks about the change, a builder implements it, a reviewer checks it, and a release agent ships it.

For data work the three core roles are Architect, Builder, and Reviewer. The Architect agent plans the change (new column, new table, new pipeline) and writes the technical spec. The Builder agent implements the spec — opens a PR with the dbt model, the tests, the docs, and the catalog entry. The Reviewer agent runs the tests, checks lineage impact, flags SLA risks, and either approves or requests changes. Humans only step in when the Reviewer escalates.

SubagentOwnsToolsOutput
ArchitectPlanning and designCatalog, lineage, decisions logTechnical spec doc
BuilderImplementationRepo, dbt, git, MCP writersPull request with tests
ReviewerQuality gateTest runner, lineage diff, SLA checkApprove or request changes
ReleaseProduction deployCI/CD, rollback, monitoringDeploy or roll back

The pattern maps directly onto the 14 Data Workers agents: pipeline, incidents, catalog, schema, quality, governance, cost, migration, insights, observability, streaming, orchestration, connectors, and usage-intelligence. Each can act as Architect, Builder, or Reviewer depending on the task. See our multi-agent orchestration patterns guide at /resources/multi-agent-orchestration-data for the full mapping.

The MCP Layer: Model Context Protocol for Data Domains

MCP — Model Context Protocol — is the interface that lets agents call data systems without vendor lock-in. Before MCP, every agent had its own connector for every system, which meant every new warehouse required a new integration. After MCP, a warehouse exposes a single MCP server and every agent that speaks MCP can use it. The analogy is obvious: MCP is to data what LSP is to IDEs.

For AI for data infra, MCP is the load-bearing protocol because it standardizes three things: the tool catalog (what can an agent do), the resource catalog (what data can an agent see), and the authorization model (what is this agent allowed to touch). Without MCP you end up with agents duct-taped to APIs; with MCP you get a portable, auditable, tier-gateable toolset.

  • Tools — executable operations (run query, open PR, trigger dbt)
  • Resources — readable context (schemas, docs, lineage, metrics)
  • Prompts — reusable templates the server provides to the agent
  • Sampling — server-initiated LLM calls for multi-step workflows
  • Authorization — OAuth 2.1 + tier gating (community, pro, enterprise)

Data Workers ships 212+ MCP tools across 14 agents and exposes them through a single Claude Code plugin. The tools are tier-gated at the framework level, so community users see one set and enterprise users see another, and PII middleware plus audit logging sit in front of every call. See our MCP for data engineers guide at /resources/mcp-data-engineering-guide and our MCP server comparison at /resources/mcp-server-comparison-data for deep dives.

Claude Code as the Orchestration Engine

Claude Code is the runtime that makes all of the above work in practice. It is a terminal-native agent that runs on the engineer's laptop or in CI, reads CLAUDE.md, loads skills, enforces hooks, and orchestrates subagents. The reason it is the default runtime for AI for data infra is durability — sessions resume, memory persists, and the same agent that shipped yesterday's change can ship today's.

The other candidates — bespoke LangChain scripts, custom agent frameworks, vendor-hosted UIs — all fail on the same axis. They treat each agent run as a fresh conversation. Data work is long-running: a migration takes weeks, a schema rollout takes days, an incident investigation can span multiple on-call shifts. You need a runtime that remembers. Claude Code remembers.

Data Workers is Claude Code native — the 14 agents install as a Claude Code plugin, the skills live in .claude/skills, the hooks live in settings.json, and CLAUDE.md is the project context. See our Claude Code for data engineers guide at /resources/claude-code-data-engineering and our Claude Code vs LangChain deep agents comparison at /resources/dataworkers-vs-langchain-deep-agents.

Integration with Snowflake, Databricks, and dbt

The three systems that matter most for AI for data infra are Snowflake, Databricks, and dbt — they are where the actual work lives. The agent layer has to integrate with all three without becoming a lock-in layer of its own. The right approach is MCP servers per system, with the agent framework acting as the client.

SystemMCP serverAgent capabilitiesGuardrails
Snowflakesnowflake-mcpQuery, schema, cost, RBACRow-level access, cost budget, masking
Databricksdatabricks-mcpSQL warehouse, Unity Catalog, jobsCluster budget, Unity ACLs, PII scan
dbtdbt-mcpModels, tests, docs, lineageCI enforcement, prod branch protection
BigQuerybq-mcpQuery, INFORMATION_SCHEMA, billingSlot budget, dataset ACLs
Icebergiceberg-mcpTables, snapshots, compactionBranch protection, snapshot retention

Data Workers ships first-party integrations with all five. See our Claude Code Snowflake integration guide at /resources/claude-code-snowflake-integration-guide, the Databricks Unity Catalog agent guide at /resources/databricks-unity-catalog-agent, the dbt Cloud AI agent guide at /resources/dbt-cloud-ai-agent, and our BigQuery autonomous agent guide at /resources/bigquery-autonomous-agent-guide for implementation specifics.

Compliance and Governance Agents

Any agent that touches production data will touch regulated data. That is not a bug you avoid — it is a design constraint you plan for. Compliance and governance in AI for data infra is not an afterthought layer; it is the middleware that sits between every agent call and every data system. PII detection, audit logging, policy enforcement, and region pinning all run before the agent sees a single row.

The four regimes that matter in 2026 are GDPR (EU), the EU AI Act (high-risk AI systems), BCBS 239 (banking risk aggregation), and the US state privacy patchwork (CCPA, CPRA, and friends). A compliant AI-for-data-infra stack has to answer three questions per request: what data did the agent see, what did it do, and who authorized it. If you cannot answer all three in an audit, you do not have compliance — you have a liability.

  • PII middleware — detect and mask sensitive fields before they reach the LLM
  • Tamper-evident audit log — SHA-256 hash chain of every agent action
  • OAuth 2.1 with JWT — machine-verifiable authorization per request
  • Region pinning — keep EU data in EU, US data in US
  • Policy as code — Open Policy Agent rules that agents cannot bypass

Data Workers ships all five as core/enterprise middleware wired into every MCP agent. See the AI governance for data agents guide at /resources/ai-governance-data-agents, our EU AI Act compliance playbook at /resources/eu-ai-act-data-agents, and the BCBS 239 with AI agents guide at /resources/bcbs-239-ai-agents for regime-specific detail.

Developer Productivity and Human-in-the-Loop

The productivity story for AI for data infra is not 10x. It is closer to 3x to 5x on the work the agents can autonomously complete, and 0x on the work they cannot — so the real leverage comes from knowing which is which and letting humans focus there. The insights agent is the piece most teams miss: it watches what the agents do, flags decisions that need human review, and compounds team knowledge instead of burning it.

Human-in-the-loop is the hinge. The four modes are: full autonomy (low-risk, reversible actions like doc updates), review-before-deploy (pull requests), approval-per-action (prod writes), and supervised (the human drives, the agent advises). A production stack uses all four, switching modes per task class. Forcing every action through the same gate makes agents either useless or dangerous.

See our human-in-the-loop data agents guide at /resources/human-in-the-loop-data-agents, the developer productivity with AI agents report at /resources/developer-productivity-ai-agents, and the insights agent deep dive at /resources/insights-agent-deep-dive for the productivity math.

Evaluation: Agent-as-a-Judge and Decision-Tracing Context Graphs

Evaluating agents is harder than evaluating models. A model you score on a fixed benchmark; an agent you score on an open-ended task with many valid outcomes. The two approaches that work in production are Agent-as-a-Judge (a second agent scores the first) and decision-tracing context graphs (you record every decision the agent made and replay it later). Neither replaces humans, but together they catch regressions faster than any test suite.

  • Golden queries — 200 known-good prompts with expected outputs
  • Agent-as-a-Judge — a scoring agent reviews every production action
  • Decision traces — store every tool call, input, output, and rationale
  • Replay harness — re-run yesterday's incident with today's agent
  • Human spot checks — sample 1% of agent actions for manual review

Data Workers ships a 200-golden-query eval suite for the catalog agent and Agent-as-a-Judge harnesses for pipeline, quality, and incident agents. See our agent evaluation playbook at /resources/agent-evaluation-data-engineering and the Agent-as-a-Judge guide at /resources/agent-as-a-judge-data for the methodology.

How Data Workers Implements AI for Data Infra

Data Workers is the open-source reference implementation of the stack in this guide. It ships 14 specialized agents, 212+ MCP tools, the 4-layer CLAUDE.md / Skills / Hooks / Agents architecture, enterprise PII and audit middleware, and first-party integrations with Snowflake, Databricks, dbt, BigQuery, and Iceberg. It runs on Claude Code, installs as a plugin, and costs zero for the community tier.

The 14 agents are: pipelines, incidents, catalog, schema, quality, governance, cost, migration, insights, observability, streaming, orchestration, connectors, and usage-intelligence. Every agent is tier-gated (community, pro, enterprise), every tool is audited, and every action is reversible. The repo is on GitHub, the docs are at dataworkers.io/docs, and the community is on Discord.

If you are evaluating vendors, the three questions to ask are: does it run on Claude Code, does it expose MCP, and is the agent code open source. Data Workers is the only stack that answers yes to all three. See the Data Workers architecture overview at /resources/dataworkers-architecture-overview, the 14 agents reference at /resources/data-workers-14-agents, and our open source data agents comparison at /resources/open-source-data-agents-comparison.

Frequently Asked Questions

What is the difference between AI for data infra and chat-with-your-data?

Chat-with-your-data lets a human ask a natural-language question against a dashboard. AI for data infra runs autonomous agents that manage the underlying platform — pipelines, catalogs, quality, incidents. The first is a BI feature; the second is an engineering practice. Chat bots read, agents write.

Is AI for data infra ready for production?

Yes, for the right tasks. Agents are production-ready for schema changes, backfills, doc generation, catalog sync, cost optimization, and incident triage. They are not yet ready for net-new architecture design or judgment calls on ambiguous business logic. Use the agents for the 80% of work that is repeatable and keep humans on the 20% that is not.

Do I need Claude Code to use AI for data infra?

Not strictly, but practically yes. Claude Code is the durable runtime that makes CLAUDE.md, skills, hooks, and subagents work as one system. Other runtimes exist but none match Claude Code on persistent memory and MCP integration. Data Workers is Claude Code native — if you pick another runtime you will reimplement the plumbing.

How is this different from LangChain or CrewAI?

LangChain and CrewAI are general-purpose agent frameworks. AI for data infra is a vertical — it comes with opinionated agents, data-specific MCP tools, and compliance middleware out of the box. You could build it on LangChain, but you would be rewriting what Data Workers already ships. See our Data Workers vs LangChain deep agents comparison at /resources/dataworkers-vs-langchain-deep-agents.

What does it cost?

Data Workers community tier is free and open source. Pro and enterprise tiers add hosted middleware, SSO, and SLAs. Agent inference cost depends on the model you choose — with prompt caching and Claude Code's session reuse, typical data teams spend 50 to 500 dollars per month on model calls for heavy daily use.

Can agents touch production?

Yes, with hooks. The pattern is: agent opens a PR, CI runs tests, a human (or a reviewer agent) approves, CI deploys. The agent never writes directly to prod without a gate. For lower-risk actions (doc updates, catalog sync) agents can run with full autonomy. Tune the gate per action class.

How do I evaluate vendor claims in this space?

Ask for the agent count, the MCP tool count, the test count, and the repo URL. If the vendor cannot show you any of those, it is marketing not engineering. Data Workers publishes all four: 14 agents, 212+ tools, 3,342+ tests, github.com/DhanushAShetty/data-workers.

Does it work with my warehouse?

If your warehouse is Snowflake, Databricks, BigQuery, Redshift, or an Iceberg lakehouse, yes. Data Workers also supports Postgres, MySQL, Trino, DuckDB, and 35+ enterprise connectors. See the connectors guide at /resources/data-workers-connectors for the full list.

How long does it take to deploy?

First MCP tool call in under 60 seconds from a fresh repo clone. A full production rollout with CLAUDE.md, skills, hooks, and the 14 agents takes a focused afternoon for a senior engineer. See our deployment guide at /resources/data-workers-deployment-guide for the step-by-step.

Is my data sent to Anthropic?

Only the data the agent needs for the task, and only with PII masking if you enable it. Data Workers ships PII middleware that runs before every LLM call, and region pinning keeps EU data in EU. See the LLM data disclosure at /resources/llm-data-disclosure for the policy.

This hero page is the entry point for the full AI for data infra resource set. The spoke articles below go deep on every section above.

  • See also: /resources/autonomous-data-engineering/
  • See also: /resources/dataworkers-vs-langchain-deep-agents/
  • See also: /resources/claude-md-data-engineering/
  • See also: /resources/context-engineering-for-data-agents/
  • See also: /resources/multi-agent-orchestration-data/
  • See also: /resources/mcp-data-engineering-guide/
  • See also: /resources/mcp-server-comparison-data/
  • See also: /resources/claude-code-data-engineering/
  • See also: /resources/claude-code-snowflake-integration-guide/
  • See also: /resources/databricks-unity-catalog-agent/
  • See also: /resources/dbt-cloud-ai-agent/
  • See also: /resources/bigquery-autonomous-agent-guide/
  • See also: /resources/ai-governance-data-agents/
  • See also: /resources/eu-ai-act-data-agents/
  • See also: /resources/bcbs-239-ai-agents/
  • See also: /resources/human-in-the-loop-data-agents/
  • See also: /resources/developer-productivity-ai-agents/
  • See also: /resources/insights-agent-deep-dive/
  • See also: /resources/agent-evaluation-data-engineering/
  • See also: /resources/agent-as-a-judge-data/
  • See also: /resources/dataworkers-architecture-overview/
  • See also: /resources/data-workers-14-agents/
  • See also: /resources/open-source-data-agents-comparison/
  • See also: /resources/data-workers-connectors/
  • See also: /resources/data-workers-deployment-guide/
  • See also: /resources/llm-data-disclosure/
  • See also: /resources/ai-for-data-pipelines/
  • See also: /resources/ai-for-data-catalogs/
  • See also: /resources/ai-for-data-quality/
  • See also: /resources/ai-for-data-incidents/
  • See also: /resources/ai-for-data-migration/
  • See also: /resources/ai-for-data-cost-optimization/
  • See also: /resources/ai-for-schema-evolution/
  • See also: /resources/ai-for-data-observability/
  • See also: /resources/ai-for-streaming-data/
  • See also: /resources/ai-for-orchestration/
  • See also: /resources/ai-for-data-governance/
  • See also: /resources/ai-for-data-lineage/
  • See also: /resources/ai-for-pii-detection/
  • See also: /resources/ai-for-data-contracts/
  • See also: /resources/ai-for-dbt-testing/
  • See also: /resources/ai-for-airflow-dags/
  • See also: /resources/ai-for-unity-catalog/
  • See also: /resources/ai-for-iceberg-tables/
  • See also: /resources/agents-for-data-infra-vs-agents-for-bi/

AI for data infra is the next decade of data engineering, and the playbook is already in production. Start with CLAUDE.md, layer on skills and hooks, deploy the 14 Data Workers agents, and let the swarm run the platform while your team focuses on the judgment calls agents cannot make. To see the full stack in action on your own warehouse, book a walkthrough at /book-demo.

See Data Workers in action

15 autonomous AI agents working across your entire data stack. MCP-native, open-source, deployed in minutes.

Book a Demo

Related Resources

Explore Topic Clusters