|Documentation
dataworkers

FAQ

What is Data Workers?

A coordinated swarm of AI agents that autonomously manage data infrastructure — pipelines, quality, schema, governance, and more. Each agent is a specialized MCP server focused on one domain.

How is this different from a copilot?

Copilots assist within a single session when prompted. Data Workers agents run continuously — monitoring 24/7, detecting issues, and resolving them based on configured autonomy levels. You configure the level of autonomy per agent and per operation.

What LLM powers the agents?

Data Workers uses large language models from leading providers for reasoning and code generation. PII is automatically scrubbed before any data reaches the LLM. In VPC deployments, you can configure which LLM providers are used. You do not need your own API keys — inference is included in the platform.

Where is my data processed?

In SaaS deployments, data is processed in the cloud region closest to your infrastructure. SaaS deployments are available in US and EU regions. In VPC/on-premise deployments, data never leaves your network. See Security & Compliance for details on data residency.

Do agents have access to my data?

Agents connect to your data tools via MCP. PII is automatically detected and scrubbed before any LLM processing. In VPC deployments, data never leaves your network.

Can I control what agents do?

Yes. Every agent has configurable autonomy levels:

  • Fully autonomous: Agent executes on its own
  • Semi-autonomous: Agent proposes an action, waits for human approval
  • Advisory only: Agent suggests, human decides and executes

These are configurable per agent, per operation. See Configuration for details.

Which agents should I start with?

Start with Incident Debugging and Quality Monitoring. Both operate in read-only mode by default — they observe and report without making changes. Low risk, high value. Expand to action-taking agents as trust builds.

What tools do you integrate with?

Snowflake, BigQuery, Databricks, Redshift, Airflow, Dagster, Prefect, dbt, Kafka, Fivetran, Airbyte, Monte Carlo, Great Expectations, Soda, Grafana, New Relic, DataHub, OpenMetadata, AWS Glue, Hive Metastore, Azure Purview, Google Dataplex, Apache Nessie, Atlan, Alation, Looker, Tableau, PagerDuty, Opsgenie, Slack, ServiceNow, Jira, and more. All integrations use MCP. If a tool has an MCP server, Data Workers agents can connect to it.

What platforms are supported?

Data Workers agents work with any MCP-compatible client: Claude Code (first-class support via claude mcp add), Cursor (add agents via .cursor/mcp.json or Cursor Settings), OpenClaw (connect via standard MCP stdio transport), and other MCP clients (Cline, Continue, custom implementations). Each agent is a standard MCP server. If your tool speaks MCP, it works with Data Workers.

Is there a free / open-source version?

Yes. The community edition includes all 15 agents and can be self-hosted at no cost. It is fully functional with no artificial limits. Enterprise features — VPC deployment, SSO, compliance certifications, SLAs — are available for organizations that need them.

The source code is available at github.com/DataWorkersProject/dataworkers-claw-community under the Apache 2.0 license.

How do I know the agents are making correct decisions?

The Usage Intelligence Agent monitors how practitioners and agents interact with the platform. It provides tool usage analytics, workflow pattern detection, adoption dashboards, and session analytics — plus full agent observability with decision audit trails, drift detection, and health monitoring. Every tool call and agent action is logged with full context.

What happens if an agent makes a mistake?

All action-taking operations support rollback. Agents create checkpoints before making changes, and rollback can be triggered manually or automatically if validation checks fail after execution.

For irreversible operations (e.g., dropping a table, sending an external notification), agents require explicit human approval by default regardless of autonomy level. The full decision audit trail is available for forensic review and post-incident analysis.

How do agents coordinate with each other?

Agents share context through a shared context layer. When multiple agents are active, the Swarm Orchestration Agent coordinates workflows — routing tasks to the right agent, resolving conflicts, and tracking end-to-end progress. See Architecture Overview for details.