guide5 min read

Mcp Server Amundsen Metadata

Mcp Server Amundsen Metadata

Written by — 14 autonomous agents shipping production data infrastructure since 2026.

Technically reviewed by the Data Workers engineering team.

Last updated .

An Amundsen MCP server wraps the metadata service's search and details APIs, authenticated via a service account, so agents can discover datasets and read popularity signals without browsing the Amundsen UI. Amundsen's heuristic-driven search is distinctive — it surfaces datasets by usage, not just keyword match — which makes it an unusually useful agent backend.

Amundsen is Lyft's open-source data catalog, and it pioneered popularity-based search (surface the datasets people actually use, not just the ones with matching names). Exposing Amundsen through MCP gives agents a quality signal most catalogs lack. This guide covers the setup, tool design, and the search ranking story.

Why Popularity Signals Matter

Most catalogs rank search results by name match, which means the agent often finds a deprecated table before the canonical one. Amundsen ranks by usage — tables with more readers, more dashboards built on them, and more recent activity surface first. That is exactly the signal an agent needs to avoid steering users toward dead data.

Lyft built Amundsen to solve this problem for their own analysts: with tens of thousands of tables, keyword search was useless. Usage-based ranking cut the which table is the right one? problem down to a single click. MCP brings the same benefit to agents.

Service Account Auth

Amundsen's metadata service supports HTTP basic auth and OIDC. For MCP, create a dedicated service user with read-only access and load the credentials via environment variable. If you run Amundsen behind an SSO provider, register the MCP server as a service client and use client credentials flow.

  • Service user — named mcp-agent
  • Read-only — no ability to edit metadata
  • HTTPS to metadata service — not frontend
  • Index refresh awareness — Elasticsearch backing search
  • Rate limit — default Amundsen limits apply

MCP Tools for Amundsen

Expose four core tools: searchTables, getTableDetail, getPopularTables, and getColumnDetail. Each wraps a metadata service endpoint. The searchTables tool should return both the result list and the popularity rank, so the agent can weight its recommendations.

ToolAmundsen EndpointPurpose
searchTables/search/tableUsage-ranked keyword search
getTableDetail/table/{key}Full metadata for one table
getPopularTables/popular_tablesTop N by usage
getColumnDetail/table/{key}/column/{col}Column-level metadata
getTags/tagsAvailable tags for filtering
getOwners/table/{key}/ownerOwner contact info

Search Ranking and Filters

Amundsen's search accepts filters for database, schema, and tag, which lets the agent scope queries narrowly. The MCP server should expose these as optional arguments on searchTables and default to the agent's primary warehouse. This prevents the agent from surfacing results from a sandbox database when the user wants production tables.

Popularity Refresh

Amundsen's popularity data refreshes on a schedule (typically daily) by ingesting query logs from Snowflake, BigQuery, or Redshift. If the popularity scores feel stale, check the ingestion job — it is the source of truth. The MCP server does not need to worry about freshness, but the agent's recommendations are only as good as the last popularity refresh.

Data Workers on Amundsen

Data Workers' Amundsen connector handles service auth, exposes the popularity-aware search tools, and federates results with other catalogs through the unified interface. The catalog agent uses popularity as one of several signals when ranking results. See AI for data infrastructure or read MCP server DataHub metadata for a comparison with DataHub.

To see an Amundsen MCP server surfacing usage-ranked tables in an agent workflow, book a demo. We will walk through service auth, search tuning, and popularity scores.

Amundsen's approach to ownership deserves special mention. The platform distinguishes between technical owners (the team that operates the pipeline) and business owners (the domain experts who define the metrics), and exposes both. An MCP tool that returns both kinds of owner lets the agent route questions to the right person — a technical question to the platform team, a definition question to the business team.

The dashboard integration in Amundsen is also valuable for MCP. Amundsen ingests metadata from Mode, Superset, Redash, and Looker, and can tell you which dashboards use a given dataset. An MCP tool that exposes this information lets the agent answer which dashboards depend on this table? without walking lineage manually. For incident response and schema evolution, this is a high-value capability.

For teams running older Amundsen deployments, consider upgrading to the latest version before wiring up MCP. Recent releases have improved the search API, added richer ownership models, and expanded the asset types the platform indexes. The MCP tool design benefits directly from these upgrades, and the upgrade itself is usually straightforward thanks to Amundsen's containerized deployment.

Amundsen brings a unique popularity-based ranking to catalog search, and MCP is the easiest way to give that ranking to agents. Four tools plus a service account is all it takes to expose usage-aware metadata to every agent query.

See Data Workers in action

15 autonomous AI agents working across your entire data stack. MCP-native, open-source, deployed in minutes.

Book a Demo

Related Resources

Explore Topic Clusters