Product12 min read

Why Every Enterprise Migration Takes 6-18 Months (And How to Fix It)

The migration agent nobody is building — and why we think it is our best enterprise entry point

By The Data Workers Team

Every enterprise we have talked to has a migration project. Oracle to Snowflake. Teradata to Databricks. Redshift to BigQuery. Hadoop decommissioning. Legacy on-prem SQL Server to something cloud-native. The specifics vary. The pain does not.

These projects take 6-18 months. They cost $2M-$5M. They frequently overrun on both. And the work is overwhelmingly manual — engineers sitting in front of two SQL consoles, translating queries line by line, running row count comparisons in spreadsheets, and discovering at month four that the stored procedures are going to take longer than everything else combined.

We built a migration agent because we believe this is the highest-leverage problem in enterprise data engineering. Not the most technically glamorous. The one that every enterprise dreads and nobody is solving end-to-end with agents.

Why Migrations Are So Painful

If you have never done a warehouse migration, the scope might seem manageable. You have tables in System A. You need those same tables in System B. Move the data. Done. Here is why it actually takes a year.

Type translation is not straightforward. Oracle's DATE type stores both date and time components. Snowflake's DATE stores only the date. If your Oracle application relies on DATE columns having time precision, a naive type mapping breaks your data. Our agent maps Oracle DATE to Snowflake TIMESTAMP_NTZ specifically because of this semantic difference. VARCHAR2 maps to VARCHAR. CLOB maps to VARCHAR(16777216). Each mapping has a reason, and getting any of them wrong means data corruption.

Syntax translation ranges from trivial to nightmare. NVL to COALESCE is easy. SYSDATE to CURRENT_TIMESTAMP() is easy. ROWNUM to ROW_NUMBER() OVER (ORDER BY 1) is tricky because it changes query semantics. Oracle's DECODE function needs restructuring into a CASE expression. And then there are patterns without clean rule-based translations — Oracle's CONNECT BY hierarchical queries need recursive CTEs. PL/SQL stored procedures with cursor loops and package-level state have no direct Snowflake equivalent.

Redshift and Teradata have their own pain. Redshift migrations require stripping DISTKEY, SORTKEY, DISTSTYLE, and ENCODE clauses — Redshift-specific storage directives with no Snowflake equivalent. Teradata has its own dialect quirks: SEL instead of SELECT, QUALIFY clauses, SAMPLE syntax, and TITLE annotations.

Validation is a project unto itself. After migrating schema and translating queries, you need to prove the data matches. Row counts are the baseline — but they can match while actual data is wrong. Column-level statistics catch more issues. Sample hash comparisons catch the rest. For hundreds of tables, this is weeks of manual work.

What Our Migration Agent Does

The dw-migration agent exposes four tools through its MCP server, designed to work as a pipeline: assess, translate, validate, compare.

assess_migration is the starting point. Point it at a source system and it discovers every database object — tables, views, procedures, functions — and classifies each by migration complexity:

  • Simple: Tables with no PII, no timezone columns, no encoding issues, under 1M rows. Automatable with high confidence.
  • Medium: Views (always involve query translation), tables with one complicating factor (PII, timezones, encoding, or large row count).
  • Complex: Stored procedures with timezone handling, tables with all three complicating factors together. These are where migrations silently break.

The assessment generates an effort estimate — simple objects weighted at 0.5 weeks, medium at 1.5 weeks, complex at 3 weeks. It also surfaces all PII and timezone columns across the inventory, because these are the two categories that most commonly cause post-migration data issues.

translate_sql handles dialect conversion with a dual-mode architecture we are proud of. For known patterns, the agent applies deterministic rule-based transformations — hardcoded regex rules, not LLM-generated. Oracle rules handle NVL, SYSDATE, ROWNUM, TO_DATE, DECODE, and NVL2. Teradata rules handle QUALIFY, SAMPLE, SEL, and TITLE. Redshift rules handle DISTKEY, SORTKEY, DISTSTYLE, ENCODE, GETDATE, and LEN.

Each rule carries a complexity weight. Simple substitutions like NVL to COALESCE carry zero weight. Trickier conversions like DECODE carry more. The weights accumulate and determine a confidence score starting from 1.0 and decreasing with complexity. When confidence drops below 0.7, the agent falls back to an LLM for translation. Rule-based handles the deterministic 80%. The LLM handles the complex 20%. Translations are cached with a one-hour TTL.

validate_migration generates validation checks comparing source and target — row count comparisons, column statistics (null percentages, distinct counts), and sample hash comparisons. The result includes pass count, fail count, and overall divergence percentage.

run_parallel_comparison simulates parallel source-versus-target comparisons across multiple objects simultaneously. Reports per-object match rates and an overall migration health score.

What This Is Not

We want to be clear about scope because overpromising on migration tooling is how projects end up six months behind schedule.

  • This is not an end-to-end migration executor. The agent does not move data. It assesses, translates, and validates. Data movement still happens through your existing tools.
  • SQL translation covers common patterns, not all patterns. It does not handle PL/SQL package bodies, Oracle materialized view refresh logic, Teradata BTEQ scripts, or Redshift UDFs. Complex procedural code still requires manual translation.
  • V1 scope is deliberately narrow. Three source dialects (Oracle, Teradata, Redshift) and one target (Snowflake). BigQuery and Databricks targets are on the roadmap.
  • The data is currently stubbed. The assessment logic, translation rules, and validation framework all work correctly against seed data. But we are not yet connecting to real source systems.

The 80% Thesis

Our thesis is not that an agent can fully automate migrations. It cannot. The last 20% — complex stored procedures, business logic edge cases, institutional knowledge — requires human engineers.

Our thesis is that an agent can automate the first 80%: classifying complexity so you know what you are dealing with, translating the common SQL patterns that make up the bulk of the work, and generating validation queries so you do not spend weeks writing comparison scripts.

If a migration project is 60% mechanical translation and 40% judgment-intensive decisions, automating the 60% saves more than 60% of the time. Because the mechanical work blocks the judgment work. Engineers cannot make good decisions about complex cases until they have a clear picture of the full inventory — and building that inventory manually is the first three months of every migration.

The assess tool gives you the inventory in minutes. The translate tool handles the straightforward conversions. The validate tool generates the comparison framework. The engineer focuses on the hard problems from day one instead of month four.

If you have a migration project planned or in progress and want to evaluate whether agent-assisted assessment and translation could compress your timeline, we want to hear from you.

Related Posts