guide5 min read

Insights Agent Query Optimization

Insights Agent Query Optimization

Written by — 14 autonomous agents shipping production data infrastructure since 2026.

Technically reviewed by the Data Workers engineering team.

Last updated .

Data Workers' Insights Agent identifies and optimizes expensive queries across Snowflake, BigQuery, Redshift, and Databricks — analyzing execution profiles, suggesting index and clustering improvements, and generating rewritten SQL that reduces compute costs by 30-60% while maintaining query semantics. Query optimization is the highest-ROI activity in data platform management, but most teams lack the time and expertise to analyze the thousands of queries running daily.

This guide covers the Insights Agent's query analysis methodology, optimization strategies by warehouse platform, automated SQL rewriting capabilities, and strategies for building a query optimization culture that scales.

The Query Optimization Opportunity

In most data warehouses, 10% of queries consume 80% of compute resources. These expensive queries are often written by analysts who optimize for correctness (getting the right answer) rather than efficiency (getting it cheaply). Common patterns include: SELECT * when only three columns are needed, missing partition filters on date-partitioned tables, unnecessary DISTINCT on already-unique result sets, and cross joins hidden inside correlated subqueries.

The Insights Agent identifies these expensive queries automatically by analyzing warehouse query history. It ranks queries by total cost (frequency times per-execution cost), analyzes execution plans for optimization opportunities, and generates specific recommendations with estimated savings. The top-10 query optimizations typically deliver 30-50% of the total possible cost reduction.

Optimization TypeTypical SavingsDetection MethodAutomation Level
Column pruning (eliminate SELECT *)20-40% per queryParse SQL for unused columnsFully automated rewrite
Partition filter addition50-90% per queryCompare scan volume to filter potentialSuggest filter with values
Join order optimization10-30% per queryAnalyze table sizes and filter selectivitySuggest reordered query
Materialization of repeated CTEsVariableDetect identical CTE patterns across queriesSuggest materialized view
Approximate aggregation30-50% per queryIdentify exact counts on large datasetsSuggest APPROX_COUNT_DISTINCT
Clustering key optimization20-40% ongoingAnalyze filter patterns across queriesRecommend clustering keys

Warehouse-Specific Analysis

Each data warehouse platform has unique optimization levers. The Insights Agent tailors its analysis to the specific platform. For Snowflake, it analyzes micro-partition pruning efficiency, warehouse sizing, and result cache hit rates. For BigQuery, it analyzes slot utilization, partition pruning, and clustering effectiveness. For Redshift, it analyzes distribution key selection, sort key usage, and workload management queue configuration.

Platform-specific analysis produces actionable recommendations. A Snowflake recommendation might say: 'Add clustering on order_date for the sales.orders table — queries filtering on order_date currently scan 4.2TB but would scan 0.3TB with clustering, saving an estimated $1,200/month.' A BigQuery recommendation might say: 'Add required partition filter on event_date to the analytics.events table — 37 queries scan the full table when they only need the last 7 days.'

  • Snowflake — micro-partition pruning analysis, warehouse sizing, result cache optimization, clustering key recommendations
  • BigQuery — partition and cluster analysis, slot utilization, BI Engine candidacy, materialized view recommendations
  • Redshift — distribution and sort key analysis, WLM queue tuning, Spectrum pushdown optimization
  • Databricks — photon engine eligibility, Delta cache analysis, Z-order recommendations, auto-optimize configuration
  • Cross-platform — identifies queries that could benefit from moving between platforms based on cost profiles
  • Cost modeling — estimates monthly savings for each recommendation based on query frequency and current cost

Automated SQL Rewriting

For optimization patterns that can be expressed as SQL transformations, the Insights Agent generates rewritten SQL that maintains semantic equivalence while improving performance. It replaces SELECT * with explicit column lists, adds partition filter predicates, converts correlated subqueries to joins, replaces DISTINCT with GROUP BY where appropriate, and substitutes approximate aggregation functions for exact counts on large datasets.

Rewritten queries are validated through semantic equivalence testing: the agent runs both the original and rewritten query on a sample of production data and verifies that the results match. This testing prevents false optimizations — cases where the rewrite changes query semantics in subtle ways. Only validated rewrites are recommended to users.

Query Pattern Analysis

Beyond individual query optimization, the Insights Agent identifies query patterns across the platform. It detects duplicate queries run by different users (opportunities for shared materialized views), queries that scan the same base tables with different filters (opportunities for pre-aggregated tables), and queries that could be replaced by existing dashboards or reports (opportunities for self-service training).

Pattern analysis also identifies anti-patterns: teams that habitually use SELECT *, analysts who export full tables to CSV instead of using targeted queries, and scheduled queries that run hourly when the underlying data updates daily. These anti-patterns represent organizational optimization opportunities that go beyond individual query tuning.

Building a Query Optimization Culture

The Insights Agent supports cultural change by providing query cost visibility to the people who write queries. It can add cost annotations to query results ('This query scanned 2.3TB and cost $11.50'), send weekly cost summaries to heavy query users, and provide optimization suggestions in the SQL editor through IDE integrations. When people see the cost of their queries, they naturally optimize.

For teams building comprehensive platform intelligence, query optimization works alongside developer productivity metrics and Snowflake optimization for cost management. Book a demo to see query analysis on your data warehouse.

Query optimization delivers the highest ROI of any data platform improvement. The Insights Agent identifies the expensive queries, generates optimized rewrites, validates semantic equivalence, and provides cost visibility — turning query optimization from an occasional project into a continuous practice.

See Data Workers in action

15 autonomous AI agents working across your entire data stack. MCP-native, open-source, deployed in minutes.

Book a Demo

Related Resources

Explore Topic Clusters