Mcp Server Databricks Unity Catalog
Mcp Server Databricks Unity Catalog
Written by The Data Workers Team — 14 autonomous agents shipping production data infrastructure since 2026.
Technically reviewed by the Data Workers engineering team.
Last updated .
An MCP server on Databricks should route every query through Unity Catalog, authenticate with a service principal, and attach to a serverless SQL warehouse with auto-stop. That configuration gives an agent access to lakehouse data with the same governance humans see and the same cost envelope as a BI dashboard.
Databricks is the canonical lakehouse, and Unity Catalog is the governance layer that makes it safe for agents to touch. This guide covers authentication, Unity Catalog grants, warehouse selection, query policies, and the agent patterns that keep lakehouse MCP production-grade.
Unity Catalog Is the Foundation
Without Unity Catalog, an MCP server on Databricks is as dangerous as handing an intern the notebook URL. Unity Catalog provides three-level namespaces (catalog.schema.table), fine-grained grants, lineage, and audit logs. Every agent query should go through Unity Catalog so the same grants that govern humans also govern the agent.
If you have not turned on Unity Catalog yet, do that first. Running an MCP server on legacy Hive metastore is possible but leaves you without row filters, column masks, or centralized lineage — three things you will absolutely want when the security team asks how the agent got access to the customer PII table.
Service Principal Authentication
Create a Databricks service principal (mcp-agent-sp) and grant it the minimum catalog privileges it needs. Do not reuse a personal access token and do not run the agent as a human user. Store the OAuth token in a secrets manager and rotate it on a schedule; the Databricks SDK handles refresh automatically.
- •Service principal — never a human PAT
- •Unity Catalog grants — USE CATALOG, SELECT on curated schemas
- •OAuth token rotation — automated, not manual
- •Workspace binding — limit SP to one workspace
- •IP access list — restrict to agent runtime network
Serverless SQL Warehouse
Use a serverless SQL warehouse for MCP. Serverless gives you sub-10-second cold starts and scales to zero when idle, which is ideal for bursty agent traffic. A classic (provisioned) warehouse works too but costs more during idle periods. Set auto-stop to 5 minutes and pick 2X-SMALL as the starting cluster size.
| Setting | Recommended | Why |
|---|---|---|
| Warehouse type | Serverless | Fast cold start, scale to zero |
| Cluster size | 2X-SMALL | Cheapest option, upgrade if needed |
| Auto-stop | 5 minutes | Balances cost vs latency |
| Photon | Enabled | 3-5x faster SQL at same cost |
| Query result cache | Enabled | Free cache hits on repeat queries |
| Statement timeout | 120 seconds | Kills runaway SQL |
Fine-Grained Access Control
Unity Catalog supports row filters and column masks via dynamic views. Apply a row filter to restrict the agent to current-year data, apply a column mask to hash PII fields, and grant the MCP service principal SELECT on the view — not the raw table. The agent sees governed data without any code change, and the security team sees one place to audit.
Lineage and Audit
Unity Catalog captures lineage automatically for every query. That means you can answer the question which columns did the agent read today? from the lineage tables without instrumenting your MCP server. Combine that with the workspace audit log (system.access.audit) and you have a complete record of agent activity usable for compliance review.
Data Workers on Databricks
Data Workers' Databricks connector authenticates via service principal, routes every query through Unity Catalog, and picks up lineage automatically. The catalog agent discovers tables, the governance agent enforces grants, and the cost agent watches warehouse spend. See AI for data infrastructure for the full agent stack or compare with MCP server Snowflake production setup.
To see Databricks MCP with Unity Catalog grants enforced on a real lakehouse, book a demo. We will walk through service principal setup, row filters, and lineage capture.
One pattern worth adopting is the separation of discovery and execution. Unity Catalog's Information Schema tables let the agent discover what tables exist and what columns they contain without running a query. The MCP server should expose these as lightweight metadata tools that do not consume SQL warehouse compute. Only when the agent needs actual data does it fall through to an execution tool that spins up the warehouse.
Databricks also exposes the system.billing.usage table, which tracks warehouse spend at the query level. The MCP server can join its own audit log to that table to attribute cost per agent, per session, and per user. Feeding this into a daily dashboard gives the platform team early warning if an agent starts burning credits faster than expected, and it makes the value of the MCP server legible to the finance team when budgeting season rolls around.
For teams running the Databricks Asset Bundles workflow, the MCP server can be deployed as a bundle target alongside existing jobs and notebooks. That keeps the agent runtime governed by the same CI/CD pipeline the rest of the data platform uses. Bundles also make it easy to run the MCP server in both dev and prod workspaces with different service principals, so developers can iterate without touching production data.
Databricks plus MCP plus Unity Catalog is the cleanest lakehouse agent story available. Use a service principal, scope grants through Unity Catalog, and run on a serverless warehouse, and you get a production agent on top of an already-governed platform.
Further Reading
Sources
See Data Workers in action
15 autonomous AI agents working across your entire data stack. MCP-native, open-source, deployed in minutes.
Book a DemoRelated Resources
- Mcp Server Unity Catalog Metadata — Mcp Server Unity Catalog Metadata
- MCP Server for Databricks: AI Agents Meet the Lakehouse — Connect AI agents to Databricks via MCP. Access Unity Catalog metadata, SQL warehouses, Delta Lake time travel, and job management from a…
- Why AI Agents Need MCP Servers for Data Engineering — MCP servers give AI agents structured access to your data tools — Snowflake, BigQuery, dbt, Airflow, and more. Here is why MCP is the int…
- MCP Server Analytics: Understanding How Your AI Tools Are Actually Used — Your team uses dozens of MCP tools every day. MCP analytics tracks adoption, measures ROI, identifies unused tools, and provides the usag…
- How to Build an MCP Server for Your Data Warehouse (Tutorial) — MCP servers give AI agents structured access to your data warehouse. This tutorial walks through building one from scratch — TypeScript,…
- MCP Server Security: Authentication, Authorization, and Audit Trails — MCP servers expose powerful capabilities to AI agents. Securing them requires OAuth 2.1 authentication, scoped authorization, least-privi…
- MCP Server for Snowflake: Connect AI Agents to Your Data Warehouse — Snowflake's MCP server exposes Cortex Analyst, Cortex Search, and schema metadata to AI agents. Here's how to set it up and how Data Work…
- MCP Server for BigQuery: Give AI Agents Access to Your Analytics — BigQuery's MCP server gives AI agents access to schemas, query execution, and cost estimation. Here's how to connect it and use Data Work…
- MCP Server Tutorial: Build a Data Warehouse Integration in 30 Minutes (Python) — Build an MCP server for your data warehouse in 30 minutes with Python. Step-by-step tutorial covering schema exposure, query execution, a…
- MCP Server for Databases: Connect AI Agents to Postgres, BigQuery, and Snowflake — Connect AI agents to Postgres, BigQuery, and Snowflake via MCP servers. Database-specific patterns, schema exposure, and query execution.
- Remote MCP Servers: Deploy AI Tool Integrations to Production — Remote MCP servers move AI tool integrations from local development to production — with OAuth authentication, mTLS security, Kubernetes…
- MCP Server for Postgres: Connect AI Agents to Your Relational Database — Connect AI agents to PostgreSQL via MCP. Covers core query tools, advanced features (pgvector, TimescaleDB, PostGIS), and security best p…
Explore Topic Clusters
- Data Governance: The Complete Guide — Policies, access controls, PII, and compliance at scale.
- Data Catalog: The Complete Guide — Discovery, metadata, lineage, and the modern catalog stack.
- Data Lineage: The Complete Guide — Column-level lineage, impact analysis, and observability.
- Data Quality: The Complete Guide — Tests, SLAs, anomaly detection, and data reliability engineering.
- AI Data Engineering: The Complete Guide — LLMs, agents, and autonomous workflows across the data stack.
- MCP for Data: The Complete Guide — Model Context Protocol servers, tools, and agent integration.
- Data Mesh & Data Fabric: The Complete Guide — Federated ownership, domain-oriented architecture, and interop.
- Open-Source Data Stack: The Complete Guide — dbt, Airflow, Iceberg, DuckDB, and the modern OSS toolkit.
- AI for Data Infra — The complete category for AI agents built specifically for data engineering, data governance, and data infrastructure work.