comparison5 min read

Dataworkers vs Amundsen: Agent Platform vs Search Catalog

Dataworkers vs Amundsen: Broad Agent Platform vs Focused Catalog

Dataworkers vs Amundsen in one sentence: Amundsen is an open-source metadata search and discovery engine originally built at Lyft, focused on helping data users find trustworthy tables. Dataworkers is an open-source MCP-native AI agent platform with 14 agents that automate data engineering work. Amundsen is a focused discovery tool; Dataworkers is a broad agent platform.

Amundsen was one of the earliest open-source data catalogs, released by Lyft in 2019, and is maintained under the Linux Foundation's LF AI & Data umbrella. According to Amundsen's public documentation, it is designed for metadata-powered search and discovery with Elasticsearch, a Neo4j graph backend, and a React frontend. Dataworkers approaches data engineering from the opposite angle: rather than a single-purpose catalog UI, we ship 14 autonomous agents through MCP.

Feature Comparison

FeatureDataworkersAmundsen
LicenseApache 2.0Apache 2.0
Primary focusAI agent platformMetadata discovery + search
DeploymentDocker, npm, CloudflareDocker, Kubernetes
AI agents14 autonomous agentsNo AI agents documented
MCP supportNativeNot MCP-native
Search ranking4-signal RRF (recency, usage, authority, relevance)PageRank-inspired popularity ranking
LineageColumn-levelTable-level (per public docs)
Data qualityQuality agentNot in scope
GovernanceGovernance agent + PII middlewareNot primary scope
Commercial backingData Workers (commercial support)Community-maintained under LF AI & Data
Active developmentActive (2026)Slower cadence per public GitHub activity

Scope Difference

Amundsen is narrow and focused — it does search and discovery well, and little else. Dataworkers is broad — it does catalog federation plus pipelines, quality, governance, cost, migration, insights, observability, streaming, and orchestration. If you only need search across your tables, Amundsen is simpler. If you need agents that act on metadata, Dataworkers is the only option in the open-source space.

Where Amundsen Wins

Amundsen wins in three cases: (1) your team already uses Amundsen and is happy, (2) you want a single-purpose search tool with minimal operational footprint, or (3) you prefer the Neo4j graph approach to metadata modeling. Amundsen's search UX is well regarded and its community is active.

Where Dataworkers Wins

Dataworkers wins when you need more than search. If you want AI agents that can migrate pipelines, detect schema drift, run quality checks, automate lineage, and respond to incidents, Dataworkers ships those out of the box as MCP tools. Dataworkers also has 50 connectors compared to Amundsen's smaller ingestion library.

Which Should You Choose?

If you only need search and discovery and are fine running a Neo4j + Elasticsearch stack, Amundsen is a valid lightweight choice. If you want a modern MCP-native agent platform that covers the full data engineering lifecycle, choose Dataworkers. Explore the product or book a demo to see the agents in action.

Development Velocity

Amundsen's GitHub activity has slowed compared to its peak years at Lyft. While it is still maintained under LF AI & Data, merge cadence is slower than OpenMetadata, DataHub, or Dataworkers. For teams that value active development — new features, bug fixes, security patches — this is a consideration. Dataworkers ships updates weekly to both the OSS community repo and the private build, with each release going through automated testing across 3,342+ tests in 155+ test files.

Architecture Modernity

Amundsen's architecture reflects its 2019 origins — Neo4j for graph, Elasticsearch for search, a separate Python metadata service, and a React frontend. This stack works but is heavier to operate than modern alternatives. Dataworkers' architecture is designed for 2026 — MCP-native from day one, with TypeScript and Python MCP servers that run in any MCP host (Claude Code, Cursor, ChatGPT), and deployable to modern infrastructure (Cloudflare Workers, Docker, Kubernetes, or as an npm package). For teams building on modern cloud infrastructure, Dataworkers fits more naturally.

When Amundsen Is Still the Right Choice

If you already run Amundsen and it works, there is no urgency to migrate. Our catalog agent can federate Amundsen through our connector, so you can add Dataworkers agents on top of your existing Amundsen deployment. If you are starting a new project and want a lightweight search-only catalog, Amundsen is still a reasonable choice — just be aware that its development cadence is slower than alternatives. For most greenfield projects in 2026, OpenMetadata or DataHub are the more active OSS options, and Dataworkers is the agent-first option.

Migration Path From Amundsen

Teams migrating from Amundsen to Dataworkers (or another modern catalog) typically do so in two scenarios. First, when development cadence becomes a concern — if Amundsen is not getting the features or security patches you need, migration is justified. Second, when the scope of governance needs expands beyond search and discovery to include quality, observability, governance, or cost — scopes that Amundsen does not cover. The migration itself is straightforward because both Amundsen and Dataworkers are open source and accessible through APIs. The migration agent can inventory an Amundsen deployment and map entities into Dataworkers' catalog registry in most cases without manual work.

Cost of Operation

Amundsen's operational cost is moderate — Neo4j, Elasticsearch, and a Python metadata service. For small deployments, a few containers suffice; for large deployments, Neo4j capacity planning becomes a specialized skill. Dataworkers' operational cost is lower because the core is lighter-weight and federates rather than storing metadata. For cost-sensitive teams, Dataworkers typically has a smaller infrastructure footprint than Amundsen for equivalent functionality. For feature-sensitive teams, the question is whether you need the features Amundsen provides but Dataworkers does not (primarily the polished catalog web UI).

Summary of the Choice

Amundsen is a focused, lightweight open-source catalog for search and discovery. Dataworkers is a broad, open-source MCP-native agent platform that includes catalog federation as one of 14 agents. For teams that need only search and discovery, Amundsen is simpler. For teams that need broader scope and AI-native workflows, Dataworkers is more powerful. Both are Apache 2.0, so there is no licensing argument either way. The right choice depends entirely on scope — if your scope is narrow (just search), Amundsen is fine; if your scope is broad (full data engineering lifecycle), Dataworkers is the only option. Many greenfield projects in 2026 pick Dataworkers because the broader scope is more valuable, but existing Amundsen deployments often stay put and add Dataworkers alongside.

Amundsen is a focused open-source catalog; Dataworkers is a broad open-source agent platform. Pick based on scope needs, not on the ranking-algorithm debate.

See Data Workers in action

15 autonomous AI agents working across your entire data stack. MCP-native, open-source, deployed in minutes.

Book a Demo

Related Resources

Explore Topic Clusters