Engineering8 min read

Why We Bet on MCP (And What We're Still Figuring Out)

The Model Context Protocol is our integration backbone — but it is not magic

By The Data Workers Team

When we started building Data Workers, we had to make a foundational decision: how do our AI agents connect to the dozens of tools in a modern data stack? We could build custom integrations for each tool. We could use existing orchestration frameworks. Or we could bet on the Model Context Protocol (MCP).

We bet on MCP. Here is why, and what we are still figuring out.

What MCP Actually Is

MCP is an open protocol, originally developed by Anthropic, that standardizes how AI models interact with external tools and data sources. Think of it as a USB-C port for AI — a universal connector that lets an AI agent talk to any tool that implements the protocol.

The ecosystem has exploded. There are now 12,230+ MCP servers available, covering everything from databases to CI/CD tools to cloud platforms. A year ago, this number was in the hundreds.

Why We Chose MCP Over Custom Integrations

The math is simple. Data Workers needs to connect to warehouses (Snowflake, Databricks, BigQuery, Redshift), orchestrators (Airflow, Dagster, Prefect), transformation tools (dbt, Spark), catalogs (Unity Catalog, Datahub, Hive Metastore), BI tools (Tableau, Looker, Power BI), and more.

Building and maintaining custom integrations for each of these is a full-time job for a team our size. With MCP, we get a standard interface. If a tool has an MCP server, our agents can connect to it. We are building custom MCP servers for each agent in our swarm.

What Is Working

  • Rapid prototyping. Our Incident Debugging Agent prototype connected to Snowflake query logs, dbt manifests, and Airflow DAGs through MCP in days, not weeks.
  • Composability. Because each agent has its own MCP server, agents can share context through the protocol. When the Incident Debugging Agent identifies a data quality issue, it can invoke tools from the Quality Monitoring Agent's server.
  • Community leverage. We do not have to build an Airflow integration from scratch because community MCP servers for Airflow already exist.

What We're Still Figuring Out

  • Authentication at scale. Managing credentials across dozens of tools in an enterprise environment is complex. OAuth flows, service accounts, token rotation, least-privilege access.
  • Latency. Each MCP call adds network overhead. When an agent needs to make 15-20 tool calls to diagnose an incident, those round trips add up.
  • Server quality variance. The 12,230+ MCP servers vary wildly in quality. We have had to fork and fix community servers more than we expected.
  • Stateful workflows. MCP is fundamentally request-response. But data engineering workflows are stateful. We are building a context layer on top of MCP to handle this.
  • Security surface area. Every MCP connection is an attack surface. When an agent can execute queries against your warehouse, the security implications are serious.

Our Honest Assessment

MCP is the right bet for us. The alternative — building custom integrations — would consume our entire engineering bandwidth. MCP lets a small team connect to a broad tool landscape.

But MCP is not a silver bullet. It solves the connector problem, not the intelligence problem. Our agents still need to know what queries to run, how to interpret results, and when to escalate to a human. MCP gives us the plumbing. We still have to build the logic.

Related Posts