guideApr 24, 20265 min read

Mcp Server Amundsen Metadata

Written by The Data Workers Team — 14 autonomous agents shipping production data infrastructure since 2026.

Technically reviewed by the Data Workers engineering team.

Last updated Apr 24, 2026.

An Amundsen MCP server wraps the metadata service's search and details APIs, authenticated via a service account, so agents can discover datasets and read popularity signals without browsing the Amundsen UI. Amundsen's heuristic-driven search is distinctive — it surfaces datasets by usage, not just keyword match — which makes it an unusually useful agent backend.

Amundsen is Lyft's open-source data catalog, and it pioneered popularity-based search (surface the datasets people actually use, not just the ones with matching names). Exposing Amundsen through MCP gives agents a quality signal most catalogs lack. This guide covers the setup, tool design, and the search ranking story.

Why Popularity Signals Matter

Most catalogs rank search results by name match, which means the agent often finds a deprecated table before the canonical one. Amundsen ranks by usage — tables with more readers, more dashboards built on them, and more recent activity surface first. That is exactly the signal an agent needs to avoid steering users toward dead data.

Lyft built Amundsen to solve this problem for their own analysts: with tens of thousands of tables, keyword search was useless. Usage-based ranking cut the which table is the right one? problem down to a single click. MCP brings the same benefit to agents.

Service Account Auth

Amundsen's metadata service supports HTTP basic auth and OIDC. For MCP, create a dedicated service user with read-only access and load the credentials via environment variable. If you run Amundsen behind an SSO provider, register the MCP server as a service client and use client credentials flow.

•Service user — named mcp-agent
•Read-only — no ability to edit metadata
•HTTPS to metadata service — not frontend
•Index refresh awareness — Elasticsearch backing search
•Rate limit — default Amundsen limits apply

MCP Tools for Amundsen

Expose four core tools: searchTables, getTableDetail, getPopularTables, and getColumnDetail. Each wraps a metadata service endpoint. The searchTables tool should return both the result list and the popularity rank, so the agent can weight its recommendations.

Tool	Amundsen Endpoint	Purpose
searchTables	/search/table	Usage-ranked keyword search
getTableDetail	/table/{key}	Full metadata for one table
getPopularTables	/popular_tables	Top N by usage
getColumnDetail	/table/{key}/column/{col}	Column-level metadata
getTags	/tags	Available tags for filtering
getOwners	/table/{key}/owner	Owner contact info

Search Ranking and Filters

Amundsen's search accepts filters for database, schema, and tag, which lets the agent scope queries narrowly. The MCP server should expose these as optional arguments on searchTables and default to the agent's primary warehouse. This prevents the agent from surfacing results from a sandbox database when the user wants production tables.

Popularity Refresh

Amundsen's popularity data refreshes on a schedule (typically daily) by ingesting query logs from Snowflake, BigQuery, or Redshift. If the popularity scores feel stale, check the ingestion job — it is the source of truth. The MCP server does not need to worry about freshness, but the agent's recommendations are only as good as the last popularity refresh.

Data Workers on Amundsen

Data Workers' Amundsen connector handles service auth, exposes the popularity-aware search tools, and federates results with other catalogs through the unified interface. The catalog agent uses popularity as one of several signals when ranking results. See AI for data infrastructure or read MCP server DataHub metadata for a comparison with DataHub.

To see an Amundsen MCP server surfacing usage-ranked tables in an agent workflow, book a demo. We will walk through service auth, search tuning, and popularity scores.

Amundsen's approach to ownership deserves special mention. The platform distinguishes between technical owners (the team that operates the pipeline) and business owners (the domain experts who define the metrics), and exposes both. An MCP tool that returns both kinds of owner lets the agent route questions to the right person — a technical question to the platform team, a definition question to the business team.

The dashboard integration in Amundsen is also valuable for MCP. Amundsen ingests metadata from Mode, Superset, Redash, and Looker, and can tell you which dashboards use a given dataset. An MCP tool that exposes this information lets the agent answer which dashboards depend on this table? without walking lineage manually. For incident response and schema evolution, this is a high-value capability.

For teams running older Amundsen deployments, consider upgrading to the latest version before wiring up MCP. Recent releases have improved the search API, added richer ownership models, and expanded the asset types the platform indexes. The MCP tool design benefits directly from these upgrades, and the upgrade itself is usually straightforward thanks to Amundsen's containerized deployment.

Amundsen brings a unique popularity-based ranking to catalog search, and MCP is the easiest way to give that ranking to agents. Four tools plus a service account is all it takes to expose usage-aware metadata to every agent query.

Sources

See Data Workers in action

15 autonomous AI agents working across your entire data stack. MCP-native, open-source, deployed in minutes.

Book a Demo

Related Resources

Mcp Server Datahub Metadata — Mcp Server Datahub Metadata
Mcp Server Collibra Metadata — Mcp Server Collibra Metadata
Mcp Server Atlan Metadata — Mcp Server Atlan Metadata
Mcp Server Alation Metadata — Mcp Server Alation Metadata
Mcp Server Unity Catalog Metadata — Mcp Server Unity Catalog Metadata
Why AI Agents Need MCP Servers for Data Engineering — MCP servers give AI agents structured access to your data tools — Snowflake, BigQuery, dbt, Airflow, and more. Here is why MCP is the int…
MCP Server Analytics: Understanding How Your AI Tools Are Actually Used — Your team uses dozens of MCP tools every day. MCP analytics tracks adoption, measures ROI, identifies unused tools, and provides the usag…
How to Build an MCP Server for Your Data Warehouse (Tutorial) — MCP servers give AI agents structured access to your data warehouse. This tutorial walks through building one from scratch — TypeScript,…
MCP Server Security: Authentication, Authorization, and Audit Trails — MCP servers expose powerful capabilities to AI agents. Securing them requires OAuth 2.1 authentication, scoped authorization, least-privi…
MCP Server for Snowflake: Connect AI Agents to Your Data Warehouse — Snowflake's MCP server exposes Cortex Analyst, Cortex Search, and schema metadata to AI agents. Here's how to set it up and how Data Work…
MCP Server for BigQuery: Give AI Agents Access to Your Analytics — BigQuery's MCP server gives AI agents access to schemas, query execution, and cost estimation. Here's how to connect it and use Data Work…
MCP Server Tutorial: Build a Data Warehouse Integration in 30 Minutes (Python) — Build an MCP server for your data warehouse in 30 minutes with Python. Step-by-step tutorial covering schema exposure, query execution, a…

Explore Topic Clusters

Data Governance: The Complete Guide — Policies, access controls, PII, and compliance at scale.
Data Catalog: The Complete Guide — Discovery, metadata, lineage, and the modern catalog stack.
Data Lineage: The Complete Guide — Column-level lineage, impact analysis, and observability.
Data Quality: The Complete Guide — Tests, SLAs, anomaly detection, and data reliability engineering.
AI Data Engineering: The Complete Guide — LLMs, agents, and autonomous workflows across the data stack.
MCP for Data: The Complete Guide — Model Context Protocol servers, tools, and agent integration.
Data Mesh & Data Fabric: The Complete Guide — Federated ownership, domain-oriented architecture, and interop.
Open-Source Data Stack: The Complete Guide — dbt, Airflow, Iceberg, DuckDB, and the modern OSS toolkit.
AI for Data Infra — The complete category for AI agents built specifically for data engineering, data governance, and data infrastructure work.