guide5 min read

Claude Code Trino Integration

Claude Code Trino Integration

Claude Code connects to Trino through an MCP server that exposes federated query execution across every connected catalog. The agent can join Hive, Iceberg, Delta, and JDBC catalogs in a single query and reason about the results in your terminal.

Trino's superpower is federated query, and Claude Code amplifies it. Instead of asking a data engineer to remember which catalog has which table, you ask the agent and it queries the Trino information schema, picks the right catalog, and runs the join. Cross-source exploration becomes a one-line prompt.

Why Trino Plus Claude Code

Data teams that adopt Trino usually do so because their data lives in too many places: an Iceberg lake, a Snowflake warehouse, a Postgres OLTP database, and a Kafka topic. Claude Code closes the loop because the agent can discover the topology (via system.[metadata](/resources/what-is-metadata).catalogs), introspect schemas across every catalog, and write federated queries without human handholding.

Trino also has a clean REST API that MCP servers can wrap cheaply. Auth is via header-based tokens or password, connection pooling is straightforward, and the agent never has to manage long-lived state. It is one of the best-behaved data systems for agentic workflows.

MCP Server Installation

The Data Workers pipeline agent includes a Trino connector that supports all the standard auth modes. Configure it with a service-account token scoped to the catalogs Claude Code needs, then point the agent at your Trino coordinator URL. The whole setup takes under five minutes.

  • Use a service account — dedicated principal with claude_code user
  • Scope catalog access — only include catalogs the agent should see
  • Set query timeoutsquery.max-execution-time=10m
  • Tag queries — via X-Trino-Client-Info header
  • Configure spill-to-disk — protect against OOM on large joins

Federated Query Patterns

The magic starts when you ask Claude Code to 'join the orders table in Snowflake with the product catalog in Iceberg and the customer tier in Postgres.' The agent inspects all three schemas, writes the federated SQL, and runs it. What used to require three separate queries, three different tools, and manual joins becomes one prompt.

Trino's cost-based optimizer handles most of the cross-catalog work, but Claude Code can still help by ordering joins correctly and adding filter pushdown hints. The result is queries that run 5-10x faster than naive federated SQL.

Schema Discovery

Schema discovery is the highest-leverage operation on Trino. The agent queries information_schema.tables across every catalog, builds an in-memory map of the data landscape, and can answer 'where does customer data live' in real time. This is transformative for analysts who previously spent days tracing data lineage by hand.

TaskManualClaude Code + Trino
Find a table across catalogs30 min10 sec
Federated join1 hour3 min
Schema drift audit2 hours5 min
Query optimization45 min5 min
Cross-source data contract1 day1 hour

Iceberg and Delta Workflows

For teams running a lakehouse, Trino is the query engine and Claude Code is the interface. The agent can run Iceberg-specific operations (time travel, schema evolution, partition rewrites) through SQL, and it can trigger table maintenance jobs (compaction, snapshot expiration) via the system schema.

See AI for data infra for how Trino fits into a broader agent ecosystem, or review autonomous data engineering for the operational patterns that keep the lakehouse healthy.

Resource Management and Safety

Trino's resource group API gives you fine-grained control over agent query cost. Put Claude Code queries in a dedicated resource group with tight memory and CPU limits, so a runaway prompt cannot starve the cluster. Combine with a pre-tool hook that blocks destructive operations on production catalogs, and the agent is safe to run 24/7.

A common gotcha: large broadcast joins can OOM a Trino worker. Enable spill-to-disk and let the agent fall back gracefully when memory pressure hits. Claude Code will detect the spill and warn you that a query could benefit from a rewrite or a partition filter.

Production Rollout

Start with read-only access in a staging resource group, graduate to production read access, and only enable writes (INSERT INTO on Iceberg or Delta catalogs) once you have hook coverage. The most impactful workflows — federated exploration, schema discovery, cross-source data contracts — are all read-only.

Book a demo to see the Data Workers Trino connector running against a multi-catalog environment with Iceberg, Snowflake, and Postgres joined in a single agent loop.

The teams that get the most value from this pairing treat it as a daily-driver rather than a novelty. Every morning starts with the agent pulling recent incidents, surfacing anomalies, and queuing up the highest-leverage work before a human sits down. By the time an engineer opens their laptop, the backlog is already triaged and the obvious fixes are sitting in draft PRs. The shift in cadence is subtle at first and enormous by month three.

Onboarding a new engineer to this workflow takes hours instead of weeks because the agent already knows the conventions documented in your CLAUDE.md. New hires pair with Claude Code on their first ticket, watch how it reasons about the codebase, and absorb the local patterns faster than any wiki could teach them. That accelerated ramp compounds across every hire you make after the agent is installed.

A surprising second-order effect is that documentation quality goes up across the board. Because the agent reads the catalog, CLAUDE.md, and PR descriptions to do its job, any gap or staleness in those artifacts produces visibly worse output. That feedback loop pressures the team to keep docs honest in ways that a quarterly audit never does. Teams report cleaner catalogs and richer docs within a month of rolling out Claude Code seriously.

Metrics matter for sustaining momentum past the honeymoon. Track a few numbers every week — PR throughput, time-to-resolution on incidents, warehouse spend per analyst, number of agent-opened PRs that merge without edits. These become the scoreboard that justifies continued investment and surfaces any regressions early. The teams that measure the impact keep the integration healthy; teams that just assume it is working drift into disrepair.

The final caveat is that the agent is only as good as the context it can reach. If your CLAUDE.md is stale, the tools are under-scoped, or the catalog is half-populated, the agent will produce mediocre output — and a lot of teams blame the model when the real problem is the surrounding environment. Treat the agent like a new hire: give it docs, give it tools, give it feedback, and it will perform. Skip any of those inputs and the output degrades accordingly.

Trino plus Claude Code is the best federated query experience available today. Install the MCP server, scope the service account, add resource groups, and the agent turns three days of manual data archaeology into three minutes of conversation.

See Data Workers in action

15 autonomous AI agents working across your entire data stack. MCP-native, open-source, deployed in minutes.

Book a Demo

Related Resources

Explore Topic Clusters