Product10 min read

You Are Paying for 30-40% of Data That Nobody Uses

The unsexy cost savings agent that enterprises actually need — and why we built it

By The Data Workers Team

Here is a number that should bother you: 30-40% of enterprise data warehouse storage is wasted on data nobody queries. Not "underutilized." Not "occasionally accessed." Nobody. Zero queries in the last 90 days. Zero users. Just bytes sitting in cold storage, accumulating costs month after month.

For a mid-size company spending $1M-$2M annually on warehouse compute and storage, that waste translates to $300K-$800K per year. For larger enterprises spending $10M+, the number climbs to $3M-$6M annually. That is real money disappearing into data entropy — tables created for a one-time analysis three years ago, backup snapshots nobody remembers, legacy pipelines feeding dashboards that were decommissioned two quarters back.

Everyone knows this is a problem. Nobody is solving it well. We built an agent for it.

Why This Problem Persists

The issue is not that data teams are careless. It is that cost optimization requires understanding usage patterns across your entire data ecosystem — not just what exists, but who queries it, how often, what depends on it downstream, and what breaks if you remove it.

That last part is why people are afraid to clean up. Delete the wrong table and you discover at 2 AM that three production dashboards depended on it. Archive a "legacy" dataset and learn that the finance team runs a quarterly report against it every March. The risk of breaking something outweighs the reward of saving storage costs, so nothing gets cleaned up. Ever.

Point solutions exist. Snowflake shows you query costs in ACCOUNT_USAGE views. Acceldata does warehouse performance monitoring. Your warehouse's native console has storage breakdowns. But none of these tools answer the question that actually matters: "Can I safely remove this table, and how much will I save?" That question requires crossing the boundaries of cataloging, monitoring, lineage, and governance. It requires an agent, not a dashboard.

What Our Cost Agent Actually Does

The dw-cost agent is an MCP server that exposes four tools, each handling a specific piece of the cost optimization workflow.

find_unused_data profiles your warehouse usage and flags stale tables. It identifies tables with zero queries in the last 30 days or tables not accessed in 90+ days. Results come back sorted by staleness — oldest access first — so you see the worst offenders immediately. Each profile includes the table name, database, schema, last access timestamp, 30-day query count, unique user count, storage size, and row count.

estimate_savings calculates per-table cost savings using Snowflake's storage pricing model. Tables with zero queries get a 100% savings estimate. Tables with moderate usage get a 50% savings estimate — these could potentially be moved to a cheaper storage tier. Actively used tables get 0% savings. Results are sorted by potential savings descending.

recommend_archival is where the safety logic lives. This tool classifies every table into one of three tiers:

  • Tier 1 (safe): Unused for 90+ days, zero queries, no downstream dependencies, and under 100MB. Low-hanging fruit — small, isolated, forgotten tables.
  • Tier 2 (review): Unused for 30-90 days, or unused but with downstream dependencies, or unused but large. These need human review before deciding.
  • Tier 3 (risky): Recently used but expensive, or tables with many downstream dependencies. Flagged as optimization opportunities with explicit risk warnings.

The tier classification checks downstream dependencies by traversing the lineage graph. If a table feeds a Looker dashboard or a dbt model, the agent finds that relationship and factors it into the recommendation. A table with zero queries but a downstream dependency gets bumped to Tier 2 with a reason explaining the dependency.

get_cost_dashboard aggregates costs across all profiled tables and breaks them down by team, pipeline, and dataset. It surfaces the top 10 most expensive tables, total unused table count, and total potential savings. This is the view that answers "where is our money going?" before you start making archival decisions.

The Safety Guarantee

We want to be explicit: the dw-cost agent never auto-archives anything. Every recommendation sets requiresApproval: true. There is no configuration to disable this. There is no override flag. The agent recommends. The human decides.

This is a deliberate design choice. Autonomous deletion of data assets is a trust-destroying action — even with 99% accuracy, the 1% wrong deletion erodes trust faster than the 99% correct actions build it. Trust is harder to rebuild than storage costs are to reduce.

Every recommendation includes the reason for the tier classification, the list of downstream dependencies (if any), and the estimated monthly savings — everything the human needs to make an informed decision in seconds rather than hours.

Honest Limitations

We believe in being transparent about what this agent can and cannot do today.

  • The data is stubbed. Our current implementation uses in-memory stores seeded with 15 representative tables. The usage profiles, lineage relationships, and cost calculations all work correctly against this seed data. But we are not yet querying real Snowflake ACCOUNT_USAGE views.
  • We only model Snowflake pricing. Databricks, BigQuery, and Redshift have different pricing models. Supporting multiple warehouse pricing is on our roadmap but not implemented.
  • No rollback mechanism yet. We describe a design where archived data moves to a _dw_archive schema and remains recoverable for 90 days. The recommendation logic is built. The actual archive workflow is not.
  • No real-time monitoring. The agent profiles usage at the time of the tool call. It does not continuously monitor access patterns or alert when a previously used table becomes stale.

Why This Is Where the Enterprise Money Is

We have 15 agents. The Incident Debugging Agent is more technically impressive. The Data Context Agent is more architecturally interesting. The Migration Agent has the flashier demo.

But the Cost Savings Agent is probably the easiest one to justify in a procurement meeting. It has a direct, quantifiable ROI. If the agent identifies $500K/year in wasted storage and compute, the value is self-evident. You do not need to calculate "time saved" or "incidents prevented." You saved $500K. Here is the receipt.

This is the unsexy reality of enterprise software: the product that saves money is the product that gets budget approval. We built the cost agent because every enterprise we talked to described the same problem: "We know we are wasting money on unused data. We do not have the tooling to systematically clean it up without breaking things." That is the problem. The agent is the solution.

Related Posts