guide8 min read

MCP Server for BigQuery: Give AI Agents Access to Your Analytics

Connect AI agents to BigQuery for cost-aware querying and schema context

An MCP server for BigQuery exposes BigQuery operations — running SQL, inspecting schemas, estimating bytes scanned, managing slots — as Model Context Protocol tools that AI agents can call. It lets Claude, Cursor, and ChatGPT query Google Cloud analytics safely with cost guardrails, audit logging, and IAM-scoped access.

An MCP server for BigQuery unlocks a new operational model for data teams on Google Cloud: AI agents that can query your analytics warehouse, browse schemas, estimate costs, and execute governed SQL — all through the Model Context Protocol. BigQuery's serverless architecture and IAM-based security model make it uniquely well-suited for MCP integration, but cost governance is critical. A single unguarded agent query can scan petabytes and generate thousands of dollars in charges. This guide covers how to deploy the BigQuery MCP server, connect it to Data Workers and other MCP clients, and implement cost-aware querying that prevents bill shock.

Google Cloud has embraced MCP aggressively. By April 2026, BigQuery, Vertex AI, Cloud SQL, and AlloyDB all have official MCP servers. The BigQuery server is the most mature and most widely deployed, reflecting BigQuery's position as the dominant analytics warehouse on GCP.

What the BigQuery MCP Server Provides

The BigQuery MCP server exposes a comprehensive set of tools that cover the full lifecycle of analytical querying:

ToolDescriptionRisk Level
list_datasetsEnumerate all datasets in a projectLow
list_tablesList tables and views in a datasetLow
get_table_schemaReturn column names, types, and descriptionsLow
dry_run_queryEstimate bytes scanned without executingLow
run_queryExecute SQL and return resultsMedium
get_job_infoCheck status and metadata of a query jobLow
list_modelsEnumerate BigQuery ML modelsLow
predictRun a BigQuery ML predictionMedium

The dry_run_query tool is the most important for cost governance. It returns the estimated bytes that a query will scan without actually executing it. Agents should always dry-run a query before executing it, and the MCP server can enforce this pattern by rejecting run_query calls that were not preceded by a dry_run_query for the same SQL.

Setting Up the BigQuery MCP Server

The BigQuery MCP server is available as a Python package (pip install bigquery-mcp-server) and as a Docker image. It authenticates using Google Cloud Application Default Credentials (ADC) or a service account key file.

For local development, configure the server in your MCP client. For Claude Desktop: { "mcpServers": { "bigquery": { "command": "python", "args": ["-m", "bigquery_mcp_server"], "env": { "GOOGLE_CLOUD_PROJECT": "your-project-id", "BIGQUERY_LOCATION": "US" } } } }. Ensure that gcloud auth application-default login has been run or GOOGLE_APPLICATION_CREDENTIALS points to a service account key file.

For production, deploy the MCP server as a Cloud Run service with a dedicated service account. Assign the roles/bigquery.dataViewer and roles/bigquery.jobUser roles — dataViewer grants read access to table data and metadata, while jobUser allows query execution. Do not assign roles/bigquery.dataEditor or roles/bigquery.admin unless the agent specifically needs write access.

Cost-Aware Querying: Preventing BigQuery Bill Shock

BigQuery's on-demand pricing charges $6.25 per TB scanned (as of April 2026). An agent that runs a SELECT * on a 10 TB table costs $62.50 in a single query. Without guardrails, agents can easily generate thousands of dollars in charges during a single debugging session.

Implement these cost controls at the MCP server level:

  • `maximumBytesBilled` on every query. Set this parameter on every BigQuery job created by the MCP server. A sensible default is 10 GB ($0.0625). If a query would scan more than this limit, BigQuery rejects it with an error rather than executing it. The agent sees the error and can refine its query.
  • Mandatory dry runs. Configure the MCP server to require a dry_run_query before every run_query. If the estimated bytes exceed a configurable threshold, the server returns the estimate and asks the agent to optimize the query before proceeding.
  • Per-session cost tracking. Track cumulative bytes scanned per MCP session and enforce a session-level budget. When the budget is reached, the server refuses further queries and returns a message explaining the limit.
  • Partition and clustering hints. When the MCP server sees a query that does not filter on a partitioned column, it can inject a warning: 'This table is partitioned by date. Adding a date filter will reduce cost by approximately 90%.' This guidance helps agents write cost-efficient queries.
  • Reserved slots for agent workloads. For teams with heavy agent usage, BigQuery Editions with reserved slots provide predictable pricing. Assign agent workloads to a dedicated reservation to isolate costs from human analyst queries.

IAM Configuration for BigQuery MCP Access

BigQuery's IAM model provides fine-grained access control that maps well to MCP security requirements. Configure access at the dataset level, not the project level, to implement least-privilege access:

  • Create a dedicated service account for the MCP server: mcp-bigquery-agent@your-project.iam.gserviceaccount.com.
  • Grant dataset-level permissions. For each dataset the agent should access, grant roles/bigquery.dataViewer on that specific dataset. This is more secure than project-level grants.
  • Use authorized views for sensitive data. Instead of granting direct table access, create authorized views that filter or mask sensitive columns, and grant the MCP service account access only to those views.
  • Enable BigQuery audit logs. Ensure that Data Access audit logs are enabled for BigQuery. These logs capture every query executed by the MCP server's service account, providing the audit trail required for SOC 2 and GDPR compliance.
  • Implement VPC Service Controls. For sensitive environments, wrap BigQuery in a VPC Service Perimeter that limits which services and IP ranges can execute queries. The MCP server's Cloud Run service must be inside the perimeter.

How Data Workers Connects to BigQuery via MCP

Data Workers connects to BigQuery as an MCP client, using the BigQuery MCP server alongside its own 15 specialized agents. The integration goes beyond basic querying:

The Schema Agent maps the entire BigQuery project — datasets, tables, views, routines, and ML models — into a unified schema graph. The Query Agent uses dry runs before every query execution and enforces the maximumBytesBilled parameter automatically. The Cost Agent analyzes BigQuery's INFORMATION_SCHEMA.JOBS table to identify expensive queries, recommend partition pruning opportunities, and flag tables that should be clustered.

The Governance Agent inspects IAM policies, authorized views, column-level security configurations, and data masking rules to maintain compliance. The Quality Agent runs data freshness and completeness checks across datasets, comparing actual refresh times against SLA targets defined in your data contracts.

This multi-agent approach means that when the Quality Agent detects stale data in a BigQuery table, the Pipeline Agent checks the upstream ETL status, the Incident Agent notifies the table owner, and the Cost Agent verifies that the remediation query will not exceed budget — all coordinated through MCP without human intervention.

BigQuery-Specific Use Cases for MCP Agents

  • Slot utilization optimization. Agents analyze INFORMATION_SCHEMA.JOBS_TIMELINE to identify periods of slot contention and recommend reservation adjustments or query scheduling changes.
  • Materialized view recommendations. By analyzing query patterns in INFORMATION_SCHEMA.JOBS, agents identify frequently executed aggregation queries and recommend materialized views that could reduce cost and latency by 10-100x.
  • Cross-region data transfer monitoring. Agents track data movement between BigQuery regions and flag expensive cross-region joins that should be restructured.
  • BigQuery ML model monitoring. Agents track model performance metrics over time, detect drift, and trigger retraining pipelines when accuracy degrades.
  • Storage optimization. Agents identify tables with long-tail partitions that could benefit from partition expiration policies, reducing storage costs without affecting query availability.

Comparing BigQuery MCP with Snowflake MCP

DimensionBigQuery MCPSnowflake MCP
Semantic layer integrationLimited (no native equivalent to Cortex Analyst)Deep (Cortex Analyst with YAML semantic model)
Cost governanceExcellent (maximumBytesBilled, dry runs)Good (resource monitors, warehouse sizing)
Auth modelGoogle IAM (fine-grained)Snowflake RBAC (role-based)
ML integrationBigQuery ML tools exposedCortex ML functions available
ServerlessFully serverless (no warehouse to manage)Requires warehouse provisioning
Multi-cloudGCP onlyAWS, Azure, GCP

The biggest difference is semantic layer integration. Snowflake's Cortex Analyst provides a governed semantic model that significantly reduces agent hallucinations. BigQuery lacks a native equivalent, making it more important to pair the BigQuery MCP server with a separate semantic layer (dbt Semantic Layer, Looker LookML, or Cube.dev) when using it with AI agents.

Pairing BigQuery MCP with Looker for Semantic Grounding

Since BigQuery lacks a native semantic model equivalent to Snowflake's Cortex Analyst, the best practice for reducing agent hallucinations is to pair the BigQuery MCP server with a Looker LookML-based semantic layer. Looker defines metrics, dimensions, and relationships in LookML files that serve as governed definitions for your business data.

The workflow is straightforward: when an agent receives an analytical question, it first queries the Looker MCP server (or the dbt Semantic Layer MCP server) to retrieve the governed metric definition. It then uses that definition to generate SQL against BigQuery. This two-step approach ensures that 'revenue' always means the same thing — regardless of which agent is asking or which table the data lives in. Teams that implement this pattern report a 40-60% reduction in agent-generated query errors compared to direct BigQuery SQL generation.

Troubleshooting BigQuery MCP Deployments

Common issues when deploying the BigQuery MCP server in production include:

  • 403 Permission Denied on dry runs. Dry runs require bigquery.jobs.create permission, which comes from the roles/bigquery.jobUser role. The common mistake is granting only roles/bigquery.dataViewer, which allows reading data but not creating query jobs. Fix: ensure the MCP service account has both roles.
  • Slow schema introspection on large projects. Projects with thousands of tables can take minutes to enumerate all schemas. Fix: configure the MCP server to use INFORMATION_SCHEMA.TABLES with dataset filters rather than iterating all datasets. Restrict the MCP service account to specific datasets using IAM.
  • Query results exceeding context window limits. BigQuery can return millions of rows, but LLM context windows are limited. Fix: always set max_rows (default 100) and implement pagination. For large result sets, return summary statistics instead of raw rows.
  • Cross-project access issues. If your data spans multiple GCP projects, the MCP service account needs roles/bigquery.dataViewer on each project. Fix: use a shared VPC or create a service account with organization-level access, scoped to BigQuery read-only permissions.

The BigQuery MCP server gives AI agents governed access to Google Cloud's most powerful analytics engine. Cost governance is the critical differentiator — without maximumBytesBilled limits and mandatory dry runs, agent-driven querying can generate unexpected charges. Data Workers connects to BigQuery with these guardrails built in, coordinating 15 agents across your entire data stack. Book a demo to see cost-aware BigQuery agent workflows, or read the documentation for setup instructions.

See Data Workers in action

15 autonomous AI agents working across your entire data stack. MCP-native, open-source, deployed in minutes.

Book a Demo

Related Resources

Explore Topic Clusters