Insights Agent Query Optimization
Insights Agent Query Optimization
Written by The Data Workers Team — 14 autonomous agents shipping production data infrastructure since 2026.
Technically reviewed by the Data Workers engineering team.
Last updated .
Data Workers' Insights Agent identifies and optimizes expensive queries across Snowflake, BigQuery, Redshift, and Databricks — analyzing execution profiles, suggesting index and clustering improvements, and generating rewritten SQL that reduces compute costs by 30-60% while maintaining query semantics. Query optimization is the highest-ROI activity in data platform management, but most teams lack the time and expertise to analyze the thousands of queries running daily.
This guide covers the Insights Agent's query analysis methodology, optimization strategies by warehouse platform, automated SQL rewriting capabilities, and strategies for building a query optimization culture that scales.
The Query Optimization Opportunity
In most data warehouses, 10% of queries consume 80% of compute resources. These expensive queries are often written by analysts who optimize for correctness (getting the right answer) rather than efficiency (getting it cheaply). Common patterns include: SELECT * when only three columns are needed, missing partition filters on date-partitioned tables, unnecessary DISTINCT on already-unique result sets, and cross joins hidden inside correlated subqueries.
The Insights Agent identifies these expensive queries automatically by analyzing warehouse query history. It ranks queries by total cost (frequency times per-execution cost), analyzes execution plans for optimization opportunities, and generates specific recommendations with estimated savings. The top-10 query optimizations typically deliver 30-50% of the total possible cost reduction.
| Optimization Type | Typical Savings | Detection Method | Automation Level |
|---|---|---|---|
| Column pruning (eliminate SELECT *) | 20-40% per query | Parse SQL for unused columns | Fully automated rewrite |
| Partition filter addition | 50-90% per query | Compare scan volume to filter potential | Suggest filter with values |
| Join order optimization | 10-30% per query | Analyze table sizes and filter selectivity | Suggest reordered query |
| Materialization of repeated CTEs | Variable | Detect identical CTE patterns across queries | Suggest materialized view |
| Approximate aggregation | 30-50% per query | Identify exact counts on large datasets | Suggest APPROX_COUNT_DISTINCT |
| Clustering key optimization | 20-40% ongoing | Analyze filter patterns across queries | Recommend clustering keys |
Warehouse-Specific Analysis
Each data warehouse platform has unique optimization levers. The Insights Agent tailors its analysis to the specific platform. For Snowflake, it analyzes micro-partition pruning efficiency, warehouse sizing, and result cache hit rates. For BigQuery, it analyzes slot utilization, partition pruning, and clustering effectiveness. For Redshift, it analyzes distribution key selection, sort key usage, and workload management queue configuration.
Platform-specific analysis produces actionable recommendations. A Snowflake recommendation might say: 'Add clustering on order_date for the sales.orders table — queries filtering on order_date currently scan 4.2TB but would scan 0.3TB with clustering, saving an estimated $1,200/month.' A BigQuery recommendation might say: 'Add required partition filter on event_date to the analytics.events table — 37 queries scan the full table when they only need the last 7 days.'
- •Snowflake — micro-partition pruning analysis, warehouse sizing, result cache optimization, clustering key recommendations
- •BigQuery — partition and cluster analysis, slot utilization, BI Engine candidacy, materialized view recommendations
- •Redshift — distribution and sort key analysis, WLM queue tuning, Spectrum pushdown optimization
- •Databricks — photon engine eligibility, Delta cache analysis, Z-order recommendations, auto-optimize configuration
- •Cross-platform — identifies queries that could benefit from moving between platforms based on cost profiles
- •Cost modeling — estimates monthly savings for each recommendation based on query frequency and current cost
Automated SQL Rewriting
For optimization patterns that can be expressed as SQL transformations, the Insights Agent generates rewritten SQL that maintains semantic equivalence while improving performance. It replaces SELECT * with explicit column lists, adds partition filter predicates, converts correlated subqueries to joins, replaces DISTINCT with GROUP BY where appropriate, and substitutes approximate aggregation functions for exact counts on large datasets.
Rewritten queries are validated through semantic equivalence testing: the agent runs both the original and rewritten query on a sample of production data and verifies that the results match. This testing prevents false optimizations — cases where the rewrite changes query semantics in subtle ways. Only validated rewrites are recommended to users.
Query Pattern Analysis
Beyond individual query optimization, the Insights Agent identifies query patterns across the platform. It detects duplicate queries run by different users (opportunities for shared materialized views), queries that scan the same base tables with different filters (opportunities for pre-aggregated tables), and queries that could be replaced by existing dashboards or reports (opportunities for self-service training).
Pattern analysis also identifies anti-patterns: teams that habitually use SELECT *, analysts who export full tables to CSV instead of using targeted queries, and scheduled queries that run hourly when the underlying data updates daily. These anti-patterns represent organizational optimization opportunities that go beyond individual query tuning.
Building a Query Optimization Culture
The Insights Agent supports cultural change by providing query cost visibility to the people who write queries. It can add cost annotations to query results ('This query scanned 2.3TB and cost $11.50'), send weekly cost summaries to heavy query users, and provide optimization suggestions in the SQL editor through IDE integrations. When people see the cost of their queries, they naturally optimize.
For teams building comprehensive platform intelligence, query optimization works alongside developer productivity metrics and Snowflake optimization for cost management. Book a demo to see query analysis on your data warehouse.
Query optimization delivers the highest ROI of any data platform improvement. The Insights Agent identifies the expensive queries, generates optimized rewrites, validates semantic equivalence, and provides cost visibility — turning query optimization from an occasional project into a continuous practice.
Further Reading
Sources
See Data Workers in action
15 autonomous AI agents working across your entire data stack. MCP-native, open-source, deployed in minutes.
Book a DemoRelated Resources
- Claude Code + Cost Optimization Agent: Cut Your Snowflake Bill from the Terminal — Ask 'which tables are wasting money?' in Claude Code. The Cost Optimization Agent scans your warehouse, identifies zombie tables, oversiz…
- Cost Agent Snowflake Optimization — Cost Agent Snowflake Optimization
- Insights Agent Developer Productivity — Insights Agent Developer Productivity
- Insights Agent Data Exploration — Insights Agent Data Exploration
- Dataworkers Vs Weaviate Query Agent — Dataworkers Vs Weaviate Query Agent
- Why One AI Agent Isn't Enough: Coordinating Agent Swarms Across Your Data Stack — A single AI agent can handle one domain. But data engineering spans 10+ domains — quality, governance, pipelines, schema, streaming, cost…
- Why Every Data Team Needs an Agent Layer (Not Just Better Tooling) — The data stack has a tool for everything — catalogs, quality, orchestration, governance. What it lacks is a coordination layer. An agent…
- Why Your dbt Semantic Layer Needs an Agent Layer on Top — The dbt semantic layer is the best way to define metrics. But definitions alone don't prevent incidents or optimize queries. An agent lay…
- Agent-Native Architecture: Why Bolting Agents onto Legacy Pipelines Fails — Bolting AI agents onto legacy data infrastructure amplifies problems. Agent-native architecture designs for autonomous operation from day…
- Multi-Agent Coordination Layers: Orchestrating AI Agents Across Your Data Stack — Multi-agent coordination layers manage handoffs, shared context, and conflict resolution across multiple AI agents.
- Database as Agent Memory: The Persistent Coordination Layer for Multi-Agent Systems — Databases are evolving from storage for human queries to persistent memory and coordination for multi-agent AI systems.
- Sub-Agents and Multi-Agent Teams for Data Engineering with Claude — Claude Code spawns sub-agents in parallel — one explores schemas, another writes SQL, another validates. Multi-agent data engineering.
Explore Topic Clusters
- Data Governance: The Complete Guide — Policies, access controls, PII, and compliance at scale.
- Data Catalog: The Complete Guide — Discovery, metadata, lineage, and the modern catalog stack.
- Data Lineage: The Complete Guide — Column-level lineage, impact analysis, and observability.
- Data Quality: The Complete Guide — Tests, SLAs, anomaly detection, and data reliability engineering.
- AI Data Engineering: The Complete Guide — LLMs, agents, and autonomous workflows across the data stack.
- MCP for Data: The Complete Guide — Model Context Protocol servers, tools, and agent integration.
- Data Mesh & Data Fabric: The Complete Guide — Federated ownership, domain-oriented architecture, and interop.
- Open-Source Data Stack: The Complete Guide — dbt, Airflow, Iceberg, DuckDB, and the modern OSS toolkit.
- AI for Data Infra — The complete category for AI agents built specifically for data engineering, data governance, and data infrastructure work.