|Documentation
dataworkers

Agent Reference

Data Workers includes 15 specialized agents. Each is an independent MCP server that can be enabled, disabled, and configured individually.

Integrations listed below as built-in have MCP connectors maintained by the Data Workers team. Many additional tools are supported through community-maintained or custom MCP connectors.

Incident Debugging Agent

claude mcp add data-workers-incident

Monitors data infrastructure for failures and anomalies, performs automated root cause analysis, and provides actionable remediation steps. Operates read-only by default.

  • Real-time failure detection across pipelines, queries, and data loads
  • Automated root cause analysis with cross-system correlation
  • Historical pattern matching against previously resolved incidents
  • Remediation recommendations with confidence scoring
  • Escalation to human operators when confidence is low

Integrations: Airflow, Dagster, Prefect, Snowflake, BigQuery, Databricks, Redshift, Grafana, PagerDuty, Opsgenie, New Relic, ServiceNow, Jira SM

Autonomy levels: Read-only diagnostics (default: autonomous), remediation execution (default: semi-autonomous)

Pipeline Building Agent

claude mcp add data-workers-pipeline

Generates, modifies, and deploys data pipelines based on natural language descriptions or detected requirements. Handles DAG construction, dependency management, and scheduling.

  • Generate new pipelines from natural language specifications
  • Modify existing DAGs to add sources, transformations, or destinations
  • Automated dependency resolution and scheduling optimization
  • Dry-run validation before deployment
  • Rollback support for failed deployments

Integrations: Airflow, Dagster, Prefect, dbt, Fivetran, Airbyte

Autonomy levels: Pipeline generation (default: autonomous), deployment to production (default: semi-autonomous)

Quality Monitoring Agent

claude mcp add data-workers-quality

Continuously monitors data quality across your warehouse, detects anomalies and drift, and generates quality rules based on observed data patterns. Operates read-only by default.

  • Automated anomaly detection on freshness, volume, and distribution
  • Schema drift detection and alerting
  • Quality rule generation from historical data patterns
  • Impact analysis for quality issues (downstream dependencies)
  • Trend reporting and quality scorecards

Integrations: Snowflake, BigQuery, Databricks, Redshift, Monte Carlo, dbt, Great Expectations, Soda Cloud

Autonomy levels: Monitoring and alerting (default: autonomous), quality rule enforcement (default: semi-autonomous)

Schema Evolution Agent

claude mcp add data-workers-schema

Manages schema changes across your data warehouse — detects breaking changes, generates migration scripts, and coordinates schema evolution across dependent systems.

  • Breaking change detection before deployment
  • Automated migration script generation
  • Impact analysis across downstream consumers
  • Version-controlled schema history
  • Cross-system schema synchronization

Integrations: Snowflake, BigQuery, Databricks, Redshift, dbt, Alation, DataHub

Autonomy levels: Impact analysis (default: autonomous), schema migration execution (default: semi-autonomous)

Data Context & Catalog Agent

claude mcp add data-workers-context

Maintains a living data catalog by automatically documenting tables, columns, lineage, and usage patterns. Provides context to other agents and human users. Includes trust scoring, impact analysis, intent classification, and lineage visualization.

  • Automated documentation generation from data profiling
  • Column-level lineage tracking and lineage visualization
  • Trust scoring for data assets based on quality, freshness, and usage
  • Impact analysis for understanding downstream effects of changes
  • Intent classification for natural language catalog queries
  • Usage pattern analysis (who queries what, how often)
  • Business glossary management
  • Natural language search across your data catalog

Integrations: DataHub, Atlan, Alation, Snowflake, BigQuery, Databricks, Looker, Tableau

Autonomy levels: Documentation and cataloging (default: autonomous), metadata corrections (default: semi-autonomous)

Governance & Security Agent

claude mcp add data-workers-governance

Enforces data governance policies, manages access controls, detects PII, and ensures compliance with organizational and regulatory requirements.

  • Automated PII detection and classification
  • Access control policy enforcement
  • Compliance monitoring and reporting
  • Data retention policy management
  • Audit trail generation for regulatory requirements

Integrations: Snowflake, BigQuery, Databricks, DataHub, Atlan, Alation, Collibra

Autonomy levels: Policy monitoring (default: autonomous), access revocation (default: semi-autonomous)

Real-Time Streaming Agent

claude mcp add data-workers-streaming

Monitors and manages real-time data streams, detects processing delays and data loss, and optimizes streaming pipeline performance.

  • Stream health monitoring (lag, throughput, error rates)
  • Consumer group management and rebalancing
  • Dead letter queue analysis and replay
  • Stream processing optimization recommendations
  • Automated alerting on stream degradation

Integrations: Kafka, Kinesis, Pulsar, Flink, Spark Streaming

Autonomy levels: Monitoring and alerting (default: autonomous), stream reconfiguration (default: semi-autonomous)

Swarm Orchestration Agent

claude mcp add data-workers-swarm

Coordinates multi-agent workflows, manages task delegation between agents, and ensures coherent end-to-end issue resolution across the swarm.

  • Cross-agent workflow orchestration
  • Task prioritization and delegation
  • Conflict resolution when agents have competing recommendations
  • End-to-end progress tracking for multi-agent operations
  • Agent health monitoring and failover

Integrations: All other Data Workers agents

Autonomy levels: Coordination and routing (default: autonomous), workflow execution (default: semi-autonomous per workflow)

Cost Savings & Cleanup Agent

claude mcp add data-workers-cost

Analyzes warehouse usage and spending, identifies unused tables, optimizes expensive queries, and recommends cost reduction strategies.

  • Unused table and view detection
  • Expensive query identification and optimization
  • Storage optimization recommendations
  • Warehouse sizing and scheduling analysis
  • Cost attribution by team, project, or pipeline

Integrations: Snowflake, BigQuery, Databricks, Redshift

Autonomy levels: Analysis and recommendations (default: autonomous), resource cleanup (default: semi-autonomous)

Data Migration Agent

claude mcp add data-workers-migration

Plans and executes data migrations between warehouses, databases, or environments. Handles schema translation, data validation, and cutover coordination.

  • Cross-platform schema translation
  • Automated data validation and reconciliation
  • Incremental migration with progress tracking
  • Rollback planning and execution
  • Cutover coordination with minimal downtime

Integrations: Snowflake, BigQuery, Databricks, Redshift, PostgreSQL, MySQL

Autonomy levels: Migration planning (default: autonomous), migration execution (default: semi-autonomous)

Data Science & Insights Agent

claude mcp add data-workers-insights

Performs exploratory data analysis, generates statistical summaries, identifies trends, and provides data-driven insights to support decision making.

  • Automated exploratory data analysis
  • Statistical anomaly detection and trend identification
  • Natural language query interface for ad-hoc analysis
  • Automated report generation
  • Feature engineering suggestions for ML workflows

Integrations: Snowflake, BigQuery, Databricks, Jupyter, Looker, Tableau

Autonomy levels: Analysis and reporting (default: autonomous), data modifications (default: semi-autonomous)

Usage Intelligence Agent

claude mcp add data-workers-usage-intelligence

See how your data team actually works. Tracks practitioner usage patterns across every MCP tool, detects workflow patterns, measures adoption, and provides full agent observability with audit trails and drift detection.

  • Tool usage metrics: volume, unique users, and trends per tool and agent
  • Workflow pattern detection: common multi-agent tool sequences
  • Adoption dashboards: which agents are fully adopted vs. shelfware
  • Usage anomaly detection: drops, spikes, and behavior shifts
  • Session analytics: engagement depth and power user identification
  • Usage heatmaps: activity by hour, day, and agent
  • Agent health monitoring: SHA-256 audit trails, drift detection, health checks

Integrations: All other Data Workers agents, Grafana, Datadog

Autonomy levels: Monitoring and reporting (default: autonomous), agent reconfiguration (default: semi-autonomous)

Data Connectors Agent

claude mcp add data-workers-connectors

Unified access to 40+ data platforms and enterprise tools. Catalog discovery across Snowflake, BigQuery, Databricks, AWS Glue, Hive Metastore, OpenMetadata, DataHub, Purview, Dataplex, Nessie, and more. Cross-catalog search with capability negotiation.

  • Connect to 40+ data platforms through a single interface
  • Cross-catalog search and discovery across multiple data sources
  • Capability negotiation — automatically adapts to each platform's features
  • Unified metadata access regardless of underlying platform
  • Automatic credential management and connection pooling

Integrations: Snowflake, BigQuery, Databricks, AWS Glue, Hive Metastore, OpenMetadata, DataHub, Azure Purview, Google Dataplex, Apache Nessie, and 30+ more

Autonomy levels: Discovery and search (default: autonomous), connection management (default: semi-autonomous)

Observability Agent

claude mcp add data-workers-observability

Agent health monitoring, audit trails, drift detection, and performance metrics. Full observability into agent behavior for enterprise compliance.

  • Agent health monitoring with real-time status dashboards
  • Immutable audit trails for every agent action and decision
  • Drift detection — alerts when agent behavior deviates from baselines
  • Performance metrics and latency tracking across all agents
  • Compliance reporting for enterprise governance requirements

Integrations: All other Data Workers agents, Grafana, Datadog, New Relic

Autonomy levels: Monitoring and alerting (default: autonomous), agent reconfiguration (default: semi-autonomous)

ML & Data Science Agent

claude mcp add data-workers-ml

Assists with machine learning workflows — feature engineering, model training pipelines, experiment tracking, and model deployment. Bridges the gap between data engineering and data science.

  • Automated feature engineering from warehouse tables
  • ML pipeline generation and orchestration
  • Experiment tracking and model versioning
  • Model performance monitoring and drift detection
  • Integration with feature stores and model registries

Integrations: Snowflake, BigQuery, Databricks, MLflow, Weights & Biases, SageMaker, Vertex AI

Autonomy levels: Analysis and recommendations (default: autonomous), model deployment (default: semi-autonomous)