Slowly Changing Dimensions: The Complete Guide to SCD Types
Slowly Changing Dimensions: The Complete Guide to SCD Types
Written by The Data Workers Team — 14 autonomous agents shipping production data infrastructure since 2026.
Technically reviewed by the Data Workers engineering team.
Last updated .
Slowly changing dimensions (SCD) is the pattern Kimball designed for dimension tables whose attributes change over time — customer addresses, product prices, employee titles. The six SCD types define how to handle that change, from overwriting the old value (Type 1) to versioning history (Type 2) to hybrid approaches that preserve both current and historical state.
Getting SCD wrong is one of the most common causes of wrong BI numbers. This guide walks through all six types with concrete examples, when to use each, how modern tools like dbt snapshots automate the whole pattern, and the monitoring tests you should add so silent SCD bugs never make it to production dashboards.
What Are Slowly Changing Dimensions?
A dimension table stores context like customer name, region, or product category. When those attributes change — a customer moves, a product is renamed — you have to decide how to represent the change. Slowly changing dimensions is the family of techniques for handling this gracefully so historical queries still return correct results and trend reports do not silently change meaning.
The term comes from Ralph Kimball's dimensional modeling framework. Kimball originally defined Types 1, 2, and 3, and the community later added Types 0, 4, 5, and 6 for more nuanced cases. Most real warehouses use Type 1 or Type 2 in 95 percent of tables, with hybrids for attributes that need both current and historical views on the same row.
The SCD Types
| Type | Name | Behavior | Example |
|---|---|---|---|
| 0 | Retain original | Never update; original value is fixed | Date of birth |
| 1 | Overwrite | Replace old value; no history kept | Typo correction |
| 2 | Add new row | Insert new version with start/end dates | Customer address change |
| 3 | Add new attribute | Store previous value in a second column | Current vs prior region |
| 4 | History table | Current row in dim, history in separate table | Employee title |
| 5 | Mini-dimension + Type 4 | Outrigger mini-dim joined to fact | Frequently changing demographics |
| 6 | Hybrid (1+2+3) | Combine overwrite, versioning, and prior column | Complex retail hierarchies |
Type 1: Overwrite
Type 1 is the simplest — you update the row in place and lose the old value. Use it when history does not matter or when the change is a data correction (fixing a typo). The big risk is silent data distortion: historical reports that grouped by the old value will now group by the new one, which can invalidate month-over-month comparisons without anyone noticing.
Rule of thumb: if a stakeholder asks 'what was this value on this date', Type 1 is wrong. If they only care about current state, Type 1 is fine and cheaper to maintain than any other type.
Type 2: Add New Row
Type 2 is the workhorse. When a dimension attribute changes, you close out the existing row (set end_date) and insert a new row with a new surrogate key. Facts link to the surrogate key, so historical facts keep pointing to the old version automatically. This is how 'who was the account manager when this deal closed' queries stay correct forever, without heroic analyst effort.
Implement Type 2 with start_date, end_date, and is_current columns. dbt snapshots automate the whole pattern with a single YAML config — see dbt snapshots explained for the full walkthrough. The main catch is row explosion on frequently changing attributes; switch to Type 5 when Type 2 bloats too fast.
Type 3, 4, 5, and 6
Type 3 adds a prior_value column instead of versioning rows. It works when you only care about the previous state and want to avoid the row explosion of Type 2 — useful for 'current region' and 'previous region' side-by-side analysis. Types 4, 5, and 6 are hybrids for edge cases.
Type 4 splits current state (main dim) from history (history table), which keeps current queries fast while preserving full audit. Type 5 uses a mini-dimension for rapidly changing attributes (demographics, loyalty tier) that would explode a Type 2 dim. Type 6 combines Types 1, 2, and 3 on the same row for maximum flexibility at maximum complexity — use only when you genuinely need current, historical, and prior-state columns together.
Automating SCD With Modern Tools
Modern transformation tools handle SCD almost automatically. dbt snapshots implement Type 2 with one config block. SQLMesh supports snapshots and forward-only changes. Data Vault's satellite pattern is essentially Type 2 baked into the modeling paradigm. Pick whichever fits your stack and let the tool do the tracking — hand-rolled merge logic is a bug farm that catches up to every team sooner or later.
Monitoring SCDs in Production
- •Row count trend — Type 2 tables should grow monotonically; sudden drops mean a full refresh wiped history
- •Current row test — exactly one is_current=true row per business key
- •Gap test — no overlapping valid_from/valid_to ranges
- •Orphan fact test — every fact row joins to a dim row in its effective date range
- •Growth rate alert — flag when Type 2 row count spikes 10x (usually a bug)
SCDs in a Data Vault World
Data Vault's satellite pattern is essentially SCD Type 2 built into the modeling methodology. If your warehouse uses Data Vault for the integration layer, you get Type 2 history automatically in every satellite. Downstream Kimball marts can then apply Type 1 overwrites where current state is all you need, without losing the audit trail — Vault keeps the history, marts stay fast.
This layered pattern is the cleanest answer to the 'which SCD type should I use' question in practice. The raw vault answers 'what happened when', and the marts answer 'what is true now'. Analysts working in the marts get simple Type 1 tables; auditors working in the vault get full history. Both are correct for their workload.
Agents and SCDs
Data Workers' catalog and migration agents can detect when a new dimension attribute needs history and suggest the right SCD type automatically. See autonomous data engineering or book a demo to see agent-managed SCDs in action.
Slowly changing dimensions are how dimensional warehouses stay honest about the past. Use Type 1 for corrections, Type 2 for meaningful history, Type 5 for rapid change, and automate all of them with dbt snapshots or equivalents so your analysts never have to write merge logic by hand.
Further Reading
Sources
See Data Workers in action
15 autonomous AI agents working across your entire data stack. MCP-native, open-source, deployed in minutes.
Book a DemoRelated Resources
- Slowly Changing Dimensions Types: Types 0 Through 6 — Comprehensive walkthrough of SCD types 0 through 6 with examples, tradeoffs, and testing patterns.
- Data Quality Dimensions: The DAMA Framework Explained — Guide to the six DAMA data quality dimensions, how to measure each, and how autonomous agents automate the scoring.
- Why AI Agents Need MCP Servers for Data Engineering — MCP servers give AI agents structured access to your data tools — Snowflake, BigQuery, dbt, Airflow, and more. Here is why MCP is the int…
- The Complete Guide to Agentic Data Engineering with MCP — Agentic data engineering replaces manual pipeline management with autonomous AI agents. Here is how to implement it with MCP — without lo…
- How AI Agents Cut Snowflake Costs by 40% Without Manual Tuning — Most Snowflake environments waste 30-40% of compute on zombie tables, oversized warehouses, and unoptimized queries. AI agents find and f…
- RBAC for Data Engineering Teams: Why Manual Access Control Doesn't Scale — Manual RBAC breaks down at 50+ data assets. Policy drift, orphaned permissions, and PII exposure become inevitable. AI agents enforce gov…
- From Alert to Resolution in Minutes: How AI Agents Debug Data Pipeline Incidents — The average data pipeline incident takes 4-8 hours to resolve. AI agents that understand your full data context can auto-diagnose and res…
- Build Data Pipelines with AI: From Description to Deployment in Minutes — Building a data pipeline still takes 2-6 weeks of engineering time. AI agents that understand your data context can generate, test, and d…
- Why Your Data Catalog Is Always Out of Date (And How AI Agents Fix It) — 40-60% of data catalog entries are outdated at any given time. AI agents that continuously scan, classify, and update metadata make the s…
- MLOps in 2026: Why Teams Are Moving from Tools to AI Agents — The average ML team uses 5-7 MLOps tools. AI agents that manage the full ML lifecycle — from experiment tracking to model deployment — ar…
- Why Text-to-SQL Accuracy Drops from 85% to 20% in Production (And How to Fix It) — Text-to-SQL tools score 85% on benchmarks but drop to 10-20% accuracy on real enterprise schemas. The fix is not better models — it is a…
- Data Migration Automation: How AI Agents Reduce 18-Month Timelines to Weeks — Enterprise data migrations take 6-18 months because schema mapping, data validation, and downtime coordination are manual. AI agents comp…
Explore Topic Clusters
- Data Governance: The Complete Guide — Policies, access controls, PII, and compliance at scale.
- Data Catalog: The Complete Guide — Discovery, metadata, lineage, and the modern catalog stack.
- Data Lineage: The Complete Guide — Column-level lineage, impact analysis, and observability.
- Data Quality: The Complete Guide — Tests, SLAs, anomaly detection, and data reliability engineering.
- AI Data Engineering: The Complete Guide — LLMs, agents, and autonomous workflows across the data stack.
- MCP for Data: The Complete Guide — Model Context Protocol servers, tools, and agent integration.
- Data Mesh & Data Fabric: The Complete Guide — Federated ownership, domain-oriented architecture, and interop.
- Open-Source Data Stack: The Complete Guide — dbt, Airflow, Iceberg, DuckDB, and the modern OSS toolkit.