guide5 min read

Mcp Server Datahub Metadata

Mcp Server Datahub Metadata

Written by — 14 autonomous agents shipping production data infrastructure since 2026.

Technically reviewed by the Data Workers engineering team.

Last updated .

A DataHub MCP server exposes the catalog's GraphQL API to agents so they can search entities, resolve lineage, and read glossary terms through a single MCP endpoint. Connecting it correctly means creating a service user, minting a personal access token with the right privileges, and scoping queries to the subset of the metadata graph the agent actually needs.

DataHub is one of the leading open-source catalogs, and exposing it through MCP lets agents discover data assets the same way humans do through the UI. This guide covers authentication, GraphQL entry points, entity search, lineage walks, and the operational patterns that keep a DataHub MCP server useful in production.

Why Expose DataHub via MCP

Most agent failures on data questions are context failures — the agent does not know which tables exist, what they mean, or how they are connected. DataHub already holds all of that metadata, often curated by a dedicated team. Exposing it via MCP turns that curated knowledge into first-class tool calls the agent can use on every question.

The alternative is baking a schema dump into the prompt, which goes stale within a day and blows up context windows. An MCP server keeps metadata live — every query hits the current graph, and the agent always sees the latest tables, owners, and tags.

Authentication and PATs

DataHub supports personal access tokens (PATs) for service accounts. Create a PAT in the DataHub UI under Settings → Access Tokens, name it mcp-agent, and scope it to the curated tenant. Load the token into the MCP server via environment variable and rotate it every 90 days. Do not use a human PAT or a wildcard token.

  • Service PAT — dedicated for the MCP server
  • Scoped to tenant — not the root org
  • GraphQL endpoint — HTTPS only
  • Rate limit — honor DataHub's 100 req/min default
  • Fallback to read-only — never mutate metadata

Key MCP Tools for DataHub

A useful DataHub MCP server exposes five or six tools that map to DataHub's GraphQL API: searchEntities, getEntity, getLineage, getOwners, getGlossaryTerm, and getDocumentation. Each wraps a GraphQL query and returns a trimmed-down JSON shape the agent can reason about. Avoid exposing raw GraphQL — it is too permissive and too verbose.

ToolGraphQL QueryUse
searchEntitiessearchAcrossEntitiesFind datasets by keyword
getEntitydataset(urn)Load full entity record
getLineagelineage(urn, direction)Walk upstream or downstream
getOwnersentity.ownershipWho to contact
getGlossaryTermglossaryTerm(urn)Business definitions
getDocumentationdataset.properties.descriptionCurated docs

Entity Search Patterns

Agents usually start with a natural-language query and need to find the right dataset. The searchEntities tool should accept a keyword and optional filters (platform, type, owner), then return a ranked list of URNs with titles, descriptions, and match scores. The MCP server should trim irrelevant fields before returning — DataHub responses are verbose and will otherwise bloat the agent's context.

Lineage Walks

Lineage is one of DataHub's strongest features and one of the most useful MCP capabilities. The getLineage tool should accept a URN and a direction (upstream, downstream, both) and return a graph trimmed to a reasonable depth (2-3 hops). Agents can use lineage to answer questions like what depends on this table? without touching the underlying warehouse.

Governance and PII

DataHub tags entities with glossary terms and tags that often encode PII status. An MCP server should respect those tags — if an entity is marked PII, the agent should either see a redacted description or be blocked from touching it. Wire the tag policy at the MCP server layer so the enforcement is automatic.

Data Workers on DataHub

Data Workers' DataHub connector handles PAT auth, exposes the six core tools above, trims responses, and enforces tag-based policies. The catalog agent can federate across DataHub plus other catalogs, giving the agent a unified metadata plane. See AI for data infrastructure for the full agent stack or compare to MCP server OpenMetadata lineage.

To see a DataHub MCP server powering agent workflows with live metadata, book a demo. We will walk through PAT setup, tool design, and lineage walks.

DataHub's recent work on data contracts is also worth exposing via MCP. Contracts declare expected schemas, quality constraints, and SLAs that the catalog enforces automatically. An MCP tool that reads contracts gives the agent a machine-readable spec for how each table should behave — far more useful than prose documentation. When an agent detects a contract violation, it can flag it in the audit log and surface the broken contract to the owner.

Another capability DataHub users should expose is the tag and term graph. DataHub models tags and glossary terms as first-class entities with their own relationships, and the graph is searchable via GraphQL. An MCP server that exposes searchByTag and searchByTerm lets the agent find every dataset tagged gdpr or every asset attached to the revenue glossary term. That is vastly more useful than keyword search on table names.

For teams using DataHub's ingestion framework to pull metadata from dbt, Airflow, and warehouses, the MCP server inherits freshness for free — ingested metadata is usually only a few hours old. Monitor the ingestion jobs and surface the last-ingested timestamp in MCP responses so the agent can reason about staleness. A six-month-old metadata record is worse than none; a one-hour-old record is gold.

DataHub is one of the best MCP backends for metadata because it already holds the curated truth about your data stack. A small set of well-designed tools lets an agent consume that truth without leaking sensitive context or blowing up the prompt window.

See Data Workers in action

15 autonomous AI agents working across your entire data stack. MCP-native, open-source, deployed in minutes.

Book a Demo

Related Resources

Explore Topic Clusters