guide5 min read

Mcp Server Databricks Unity Catalog

Mcp Server Databricks Unity Catalog

Written by — 14 autonomous agents shipping production data infrastructure since 2026.

Technically reviewed by the Data Workers engineering team.

Last updated .

An MCP server on Databricks should route every query through Unity Catalog, authenticate with a service principal, and attach to a serverless SQL warehouse with auto-stop. That configuration gives an agent access to lakehouse data with the same governance humans see and the same cost envelope as a BI dashboard.

Databricks is the canonical lakehouse, and Unity Catalog is the governance layer that makes it safe for agents to touch. This guide covers authentication, Unity Catalog grants, warehouse selection, query policies, and the agent patterns that keep lakehouse MCP production-grade.

Unity Catalog Is the Foundation

Without Unity Catalog, an MCP server on Databricks is as dangerous as handing an intern the notebook URL. Unity Catalog provides three-level namespaces (catalog.schema.table), fine-grained grants, lineage, and audit logs. Every agent query should go through Unity Catalog so the same grants that govern humans also govern the agent.

If you have not turned on Unity Catalog yet, do that first. Running an MCP server on legacy Hive metastore is possible but leaves you without row filters, column masks, or centralized lineage — three things you will absolutely want when the security team asks how the agent got access to the customer PII table.

Service Principal Authentication

Create a Databricks service principal (mcp-agent-sp) and grant it the minimum catalog privileges it needs. Do not reuse a personal access token and do not run the agent as a human user. Store the OAuth token in a secrets manager and rotate it on a schedule; the Databricks SDK handles refresh automatically.

  • Service principal — never a human PAT
  • Unity Catalog grants — USE CATALOG, SELECT on curated schemas
  • OAuth token rotation — automated, not manual
  • Workspace binding — limit SP to one workspace
  • IP access list — restrict to agent runtime network

Serverless SQL Warehouse

Use a serverless SQL warehouse for MCP. Serverless gives you sub-10-second cold starts and scales to zero when idle, which is ideal for bursty agent traffic. A classic (provisioned) warehouse works too but costs more during idle periods. Set auto-stop to 5 minutes and pick 2X-SMALL as the starting cluster size.

SettingRecommendedWhy
Warehouse typeServerlessFast cold start, scale to zero
Cluster size2X-SMALLCheapest option, upgrade if needed
Auto-stop5 minutesBalances cost vs latency
PhotonEnabled3-5x faster SQL at same cost
Query result cacheEnabledFree cache hits on repeat queries
Statement timeout120 secondsKills runaway SQL

Fine-Grained Access Control

Unity Catalog supports row filters and column masks via dynamic views. Apply a row filter to restrict the agent to current-year data, apply a column mask to hash PII fields, and grant the MCP service principal SELECT on the view — not the raw table. The agent sees governed data without any code change, and the security team sees one place to audit.

Lineage and Audit

Unity Catalog captures lineage automatically for every query. That means you can answer the question which columns did the agent read today? from the lineage tables without instrumenting your MCP server. Combine that with the workspace audit log (system.access.audit) and you have a complete record of agent activity usable for compliance review.

Data Workers on Databricks

Data Workers' Databricks connector authenticates via service principal, routes every query through Unity Catalog, and picks up lineage automatically. The catalog agent discovers tables, the governance agent enforces grants, and the cost agent watches warehouse spend. See AI for data infrastructure for the full agent stack or compare with MCP server Snowflake production setup.

To see Databricks MCP with Unity Catalog grants enforced on a real lakehouse, book a demo. We will walk through service principal setup, row filters, and lineage capture.

One pattern worth adopting is the separation of discovery and execution. Unity Catalog's Information Schema tables let the agent discover what tables exist and what columns they contain without running a query. The MCP server should expose these as lightweight metadata tools that do not consume SQL warehouse compute. Only when the agent needs actual data does it fall through to an execution tool that spins up the warehouse.

Databricks also exposes the system.billing.usage table, which tracks warehouse spend at the query level. The MCP server can join its own audit log to that table to attribute cost per agent, per session, and per user. Feeding this into a daily dashboard gives the platform team early warning if an agent starts burning credits faster than expected, and it makes the value of the MCP server legible to the finance team when budgeting season rolls around.

For teams running the Databricks Asset Bundles workflow, the MCP server can be deployed as a bundle target alongside existing jobs and notebooks. That keeps the agent runtime governed by the same CI/CD pipeline the rest of the data platform uses. Bundles also make it easy to run the MCP server in both dev and prod workspaces with different service principals, so developers can iterate without touching production data.

Databricks plus MCP plus Unity Catalog is the cleanest lakehouse agent story available. Use a service principal, scope grants through Unity Catalog, and run on a serverless warehouse, and you get a production agent on top of an already-governed platform.

See Data Workers in action

15 autonomous AI agents working across your entire data stack. MCP-native, open-source, deployed in minutes.

Book a Demo

Related Resources

Explore Topic Clusters