Architecture Overview
This document describes how Data Workers agents are structured, how they coordinate, and how they integrate with your data stack.
Core Concept
Data Workers is a swarm of specialized AI agents. Each agent is an MCP (Model Context Protocol) server focused on one domain — incidents, quality, schema, pipelines, governance, and so on.
Agents share context and coordinate through a shared context layer. They can operate independently (single agent) or as a coordinated swarm (multiple agents working together on a problem).
Shared Context Layer
The shared context layer is a distributed memory and event bus that enables coordination across agents. It serves three functions:
- •Shared state: Agents read and write contextual information (e.g., an incident diagnosis, a quality score, a schema change plan) so that other agents can build on prior work without duplicating effort.
- •Context passing: When one agent completes a task that feeds into another agent's domain, context is passed along automatically. For example, an incident diagnosis is passed to the pipeline agent for remediation.
- •Event coordination: Agents publish and subscribe to events. When a quality anomaly is detected, relevant agents are notified and can respond based on their configured autonomy levels.
The context layer is managed by the platform — you do not need to configure or maintain it. In SaaS deployments, the context layer is fully managed. In VPC and on-premise deployments, the context layer runs within your infrastructure and may require configuration.
The context layer is designed for sub-second context propagation across agents.
How Agents Work Together
When a data issue occurs, agents collaborate across domains. The typical flow is:
- •Incident Debugging Agent detects and diagnoses the root cause
- •Quality Monitoring Agent provides data quality context
- •Schema Evolution Agent generates a schema fix if needed
- •Pipeline Building Agent deploys the fix
- •Catalog & Context Agent documents the change
Each agent handles its domain, passes context to the next, and the swarm resolves the issue end to end.
Key Design Principles
Each agent is independent. Enable or disable agents individually. Start with one, add more as trust builds. No agent depends on another to function.
Vendor-neutral. Agents connect to your tools via MCP. Snowflake, BigQuery, Databricks, Airflow, Dagster, dbt — if a tool has an MCP server, our agents connect to it. No vendor lock-in.
Human-in-the-loop. Every agent has configurable autonomy levels. Fully autonomous for routine operations, human approval for sensitive changes, advisory-only for new deployments. You decide per agent, per operation.
Open core. The community edition is fully functional. Enterprise features (VPC deployment, SSO, compliance certifications, SLAs) are available for organizations that need them.
Integration Model
Agents connect to your existing tools through MCP (Model Context Protocol). No custom integrations needed. MCP provides a standard protocol for agent-to-tool communication. If your tool exposes an MCP server, Data Workers agents can connect to it with minimal configuration — typically an endpoint URL and credentials.
Supported tools include: Snowflake, BigQuery, Databricks, Redshift, Airflow, Dagster, Prefect, dbt, Kafka, DataHub, OpenMetadata, AWS Glue, Hive Metastore, Azure Purview, Google Dataplex, Apache Nessie, Grafana, PagerDuty, Slack, ServiceNow, Jira, Looker, Tableau, Great Expectations, Soda, Monte Carlo, New Relic, Opsgenie, and many more.