Mcp Server Openmetadata Lineage
Mcp Server Openmetadata Lineage
Written by The Data Workers Team — 14 autonomous agents shipping production data infrastructure since 2026.
Technically reviewed by the Data Workers engineering team.
Last updated .
OpenMetadata exposes a REST API that an MCP server can wrap to give agents search, entity lookups, and end-to-end lineage walks across warehouses, dashboards, and pipelines. The key moves are authenticating with a JWT service token, wrapping the lineage API behind a simple tool, and filtering out entities the agent should not see.
OpenMetadata is an open-source metadata platform with strong lineage support — it tracks column-level lineage across dbt, Airflow, Snowflake, BigQuery, and more. This guide covers how to expose it through MCP so agents can answer lineage questions without leaving the chat.
Why OpenMetadata Shines for Lineage
OpenMetadata's biggest differentiator is column-level lineage across tools. It ingests metadata from warehouses, BI tools, and orchestrators, then builds a unified graph where you can trace a column from a dashboard back to the raw source table through every dbt transformation. Exposing that graph to agents via MCP is transformative for debugging and impact analysis.
The alternative is asking the agent to reconstruct lineage from code — slow, error-prone, and often impossible for no-code pipelines. OpenMetadata already did the work; MCP just makes it accessible.
JWT Authentication
OpenMetadata supports JWT-based service accounts. Create a bot user in the admin UI, generate a JWT, and load it into the MCP server as OPENMETADATA_TOKEN. The bot should have the ViewAll role — no edit or delete permissions. Rotate the JWT quarterly and keep it in a secrets manager.
- •Bot user — not a human account
- •ViewAll role — read-only across entities
- •JWT in env var — loaded from secrets manager
- •HTTPS only — TLS to the OM API
- •Rate limited — honor API quotas
Core MCP Tools
A useful OpenMetadata MCP server exposes a handful of tools: searchEntities, getTable, getLineage, getColumnLineage, getGlossaryTerm, and getOwners. Each maps to an OpenMetadata REST endpoint and returns a trimmed response. The column lineage tool is the most distinctive — it returns the full upstream chain for a single column across tools.
| Tool | REST Endpoint | Purpose |
|---|---|---|
| searchEntities | /api/v1/search/query | Keyword search |
| getTable | /api/v1/tables/{fqn} | Full table metadata |
| getLineage | /api/v1/lineage/{entity} | Entity lineage graph |
| getColumnLineage | /api/v1/lineage/getLineageEdge | Column-level trace |
| getGlossaryTerm | /api/v1/glossaryTerms/{id} | Business definition |
| getOwners | /api/v1/tables/{fqn}/owners | Contact info |
Column-Level Lineage Walks
Column-level lineage is the power feature. When an agent is asked where does the total_revenue column in the exec dashboard come from?, the MCP server calls getColumnLineage and walks upstream through every dbt model and SQL transformation until it hits the source system. The response is a graph of nodes (columns) and edges (transformations) the agent can summarize for the user.
Filtering Sensitive Entities
OpenMetadata supports tags, and tags often encode PII or sensitivity levels. The MCP server should strip entities tagged PII.Sensitive from search results unless the agent is explicitly authorized. This keeps sensitive context out of the prompt and enforces governance at the MCP layer rather than downstream.
Observability
Log every MCP tool call with the bot user, the tool name, the arguments, and the response size. Join this with OpenMetadata's own audit log to reconstruct agent activity. A surprising amount of insight comes from noticing which entities the agent asks about — it reveals gaps in the catalog documentation.
Data Workers on OpenMetadata
Data Workers' OpenMetadata connector handles JWT auth, exposes the column-lineage tool, and enforces tag-based filtering. It can federate with DataHub, Unity Catalog, and Atlan via the unified catalog interface. See AI for data infrastructure for the full agent stack, or read MCP server DataHub metadata for a comparison.
To see an OpenMetadata MCP server walking column lineage live, book a demo. We will show a column-to-source trace across dbt, Airflow, and a warehouse.
OpenMetadata's data quality features are another area where MCP adds value. The platform tracks test suite results, data profiler output, and quality scores at the table and column level. An MCP tool that surfaces these signals lets the agent reason about data quality before citing a table in an answer — a kind of automated sanity check that prevents the agent from confidently citing broken data.
The platform's conversational threads on entities are also underused. OpenMetadata lets users post comments, questions, and announcements on datasets, and those threads hold tribal knowledge that schemas and docs do not capture. Exposing threads via an MCP getDiscussion tool gives the agent access to the ongoing conversation about a dataset — the this column is wrong on weekends and use the v2 table instead notes that humans leave for each other.
OpenMetadata also supports entity versioning, so you can see how a table's schema and documentation have changed over time. For governance-sensitive use cases, the MCP server can expose a getHistory tool that returns the change history. This lets the agent answer when did this column appear? or when was this definition updated? questions that humans would otherwise have to dig through version control for.
OpenMetadata is the best open-source option for column-level lineage, and MCP is the right way to expose that lineage to agents. Bot user auth, six core tools, and tag-based filtering give you a production-grade metadata interface in an afternoon.
Sources
See Data Workers in action
15 autonomous AI agents working across your entire data stack. MCP-native, open-source, deployed in minutes.
Book a DemoRelated Resources
- Mcp Server Lineage Api Exposure — Mcp Server Lineage Api Exposure
- Why AI Agents Need MCP Servers for Data Engineering — MCP servers give AI agents structured access to your data tools — Snowflake, BigQuery, dbt, Airflow, and more. Here is why MCP is the int…
- MCP Server Analytics: Understanding How Your AI Tools Are Actually Used — Your team uses dozens of MCP tools every day. MCP analytics tracks adoption, measures ROI, identifies unused tools, and provides the usag…
- How to Build an MCP Server for Your Data Warehouse (Tutorial) — MCP servers give AI agents structured access to your data warehouse. This tutorial walks through building one from scratch — TypeScript,…
- MCP Server Security: Authentication, Authorization, and Audit Trails — MCP servers expose powerful capabilities to AI agents. Securing them requires OAuth 2.1 authentication, scoped authorization, least-privi…
- MCP Server for Snowflake: Connect AI Agents to Your Data Warehouse — Snowflake's MCP server exposes Cortex Analyst, Cortex Search, and schema metadata to AI agents. Here's how to set it up and how Data Work…
- MCP Server for BigQuery: Give AI Agents Access to Your Analytics — BigQuery's MCP server gives AI agents access to schemas, query execution, and cost estimation. Here's how to connect it and use Data Work…
- MCP Server Tutorial: Build a Data Warehouse Integration in 30 Minutes (Python) — Build an MCP server for your data warehouse in 30 minutes with Python. Step-by-step tutorial covering schema exposure, query execution, a…
- MCP Server for Databases: Connect AI Agents to Postgres, BigQuery, and Snowflake — Connect AI agents to Postgres, BigQuery, and Snowflake via MCP servers. Database-specific patterns, schema exposure, and query execution.
- Remote MCP Servers: Deploy AI Tool Integrations to Production — Remote MCP servers move AI tool integrations from local development to production — with OAuth authentication, mTLS security, Kubernetes…
- MCP Server for Postgres: Connect AI Agents to Your Relational Database — Connect AI agents to PostgreSQL via MCP. Covers core query tools, advanced features (pgvector, TimescaleDB, PostGIS), and security best p…
- MCP Server for Databricks: AI Agents Meet the Lakehouse — Connect AI agents to Databricks via MCP. Access Unity Catalog metadata, SQL warehouses, Delta Lake time travel, and job management from a…
Explore Topic Clusters
- Data Governance: The Complete Guide — Policies, access controls, PII, and compliance at scale.
- Data Catalog: The Complete Guide — Discovery, metadata, lineage, and the modern catalog stack.
- Data Lineage: The Complete Guide — Column-level lineage, impact analysis, and observability.
- Data Quality: The Complete Guide — Tests, SLAs, anomaly detection, and data reliability engineering.
- AI Data Engineering: The Complete Guide — LLMs, agents, and autonomous workflows across the data stack.
- MCP for Data: The Complete Guide — Model Context Protocol servers, tools, and agent integration.
- Data Mesh & Data Fabric: The Complete Guide — Federated ownership, domain-oriented architecture, and interop.
- Open-Source Data Stack: The Complete Guide — dbt, Airflow, Iceberg, DuckDB, and the modern OSS toolkit.
- AI for Data Infra — The complete category for AI agents built specifically for data engineering, data governance, and data infrastructure work.