guideLast updated Apr 10, 20265 min read

Slowly Changing Dimensions: The Complete Guide to SCD Types

Slowly changing dimensions (SCD) is the pattern Kimball designed for dimension tables whose attributes change over time — customer addresses, product prices, employee titles. The six SCD types define how to handle that change, from overwriting the old value (Type 1) to versioning history (Type 2) to hybrid approaches that preserve both current and historical state.

Getting SCD wrong is one of the most common causes of wrong BI numbers. This guide walks through all six types with concrete examples, when to use each, how modern tools like dbt snapshots automate the whole pattern, and the monitoring tests you should add so silent SCD bugs never make it to production dashboards.

What Are Slowly Changing Dimensions?

A dimension table stores context like customer name, region, or product category. When those attributes change — a customer moves, a product is renamed — you have to decide how to represent the change. Slowly changing dimensions is the family of techniques for handling this gracefully so historical queries still return correct results and trend reports do not silently change meaning.

The term comes from Ralph Kimball's dimensional modeling framework. Kimball originally defined Types 1, 2, and 3, and the community later added Types 0, 4, 5, and 6 for more nuanced cases. Most real warehouses use Type 1 or Type 2 in 95 percent of tables, with hybrids for attributes that need both current and historical views on the same row.

The SCD Types

Type	Name	Behavior	Example
0	Retain original	Never update; original value is fixed	Date of birth
1	Overwrite	Replace old value; no history kept	Typo correction
2	Add new row	Insert new version with start/end dates	Customer address change
3	Add new attribute	Store previous value in a second column	Current vs prior region
4	History table	Current row in dim, history in separate table	Employee title
5	Mini-dimension + Type 4	Outrigger mini-dim joined to fact	Frequently changing demographics
6	Hybrid (1+2+3)	Combine overwrite, versioning, and prior column	Complex retail hierarchies

Type 1: Overwrite

Type 1 is the simplest — you update the row in place and lose the old value. Use it when history does not matter or when the change is a data correction (fixing a typo). The big risk is silent data distortion: historical reports that grouped by the old value will now group by the new one, which can invalidate month-over-month comparisons without anyone noticing.

Rule of thumb: if a stakeholder asks 'what was this value on this date', Type 1 is wrong. If they only care about current state, Type 1 is fine and cheaper to maintain than any other type.

Type 2: Add New Row

Type 2 is the workhorse. When a dimension attribute changes, you close out the existing row (set end_date) and insert a new row with a new surrogate key. Facts link to the surrogate key, so historical facts keep pointing to the old version automatically. This is how 'who was the account manager when this deal closed' queries stay correct forever, without heroic analyst effort.

Implement Type 2 with start_date, end_date, and is_current columns. dbt snapshots automate the whole pattern with a single YAML config — see dbt snapshots explained for the full walkthrough. The main catch is row explosion on frequently changing attributes; switch to Type 5 when Type 2 bloats too fast.

Type 3, 4, 5, and 6

Type 3 adds a prior_value column instead of versioning rows. It works when you only care about the previous state and want to avoid the row explosion of Type 2 — useful for 'current region' and 'previous region' side-by-side analysis. Types 4, 5, and 6 are hybrids for edge cases.

Type 4 splits current state (main dim) from history (history table), which keeps current queries fast while preserving full audit. Type 5 uses a mini-dimension for rapidly changing attributes (demographics, loyalty tier) that would explode a Type 2 dim. Type 6 combines Types 1, 2, and 3 on the same row for maximum flexibility at maximum complexity — use only when you genuinely need current, historical, and prior-state columns together.

Automating SCD With Modern Tools

Modern transformation tools handle SCD almost automatically. dbt snapshots implement Type 2 with one config block. SQLMesh supports snapshots and forward-only changes. Data Vault's satellite pattern is essentially Type 2 baked into the modeling paradigm. Pick whichever fits your stack and let the tool do the tracking — hand-rolled merge logic is a bug farm that catches up to every team sooner or later.

Monitoring SCDs in Production

•Row count trend — Type 2 tables should grow monotonically; sudden drops mean a full refresh wiped history
•Current row test — exactly one is_current=true row per business key
•Gap test — no overlapping valid_from/valid_to ranges
•Orphan fact test — every fact row joins to a dim row in its effective date range
•Growth rate alert — flag when Type 2 row count spikes 10x (usually a bug)

SCDs in a Data Vault World

Data Vault's satellite pattern is essentially SCD Type 2 built into the modeling methodology. If your warehouse uses Data Vault for the integration layer, you get Type 2 history automatically in every satellite. Downstream Kimball marts can then apply Type 1 overwrites where current state is all you need, without losing the audit trail — Vault keeps the history, marts stay fast.

This layered pattern is the cleanest answer to the 'which SCD type should I use' question in practice. The raw vault answers 'what happened when', and the marts answer 'what is true now'. Analysts working in the marts get simple Type 1 tables; auditors working in the vault get full history. Both are correct for their workload.

Agents and SCDs

Data Workers' catalog and migration agents can detect when a new dimension attribute needs history and suggest the right SCD type automatically. See autonomous data engineering or book a demo to see agent-managed SCDs in action.

Slowly changing dimensions are how dimensional warehouses stay honest about the past. Use Type 1 for corrections, Type 2 for meaningful history, Type 5 for rapid change, and automate all of them with dbt snapshots or equivalents so your analysts never have to write merge logic by hand.

See Data Workers in action

15 autonomous AI agents working across your entire data stack. MCP-native, open-source, deployed in minutes.

Book a Demo

Related Resources

Slowly Changing Dimensions Types: Types 0 Through 6 — Comprehensive walkthrough of SCD types 0 through 6 with examples, tradeoffs, and testing patterns.
Data Quality Dimensions: The DAMA Framework Explained — Guide to the six DAMA data quality dimensions, how to measure each, and how autonomous agents automate the scoring.
Best Practices for Claude Code in Data Pipelines — Discover effective practices for optimizing Claude Code in your data pipelines with our detailed listicle format.
How to Integrate Claude Code with Snowflake — Learn how to integrate Claude Code with Snowflake to enhance your data analysis capabilities. Follow our step-by-step tutorial.
How to Use MCP to Automate Data Workflows — Explore how the Model Context Protocol (MCP) can be used to automate and optimize your data workflows, increasing efficiency and reducing…
Claude Code Snowflake Integration Tutorial — This tutorial guides you through integrating Claude Code with Snowflake, enhancing your data analytics capabilities.
How to Use Claude Code with dbt for Data Transformation — Learn how to integrate Claude Code with dbt for seamless data transformations. This tutorial covers setup, execution, and best practices.
How to Ensure Data Quality in Your MCP Implementations — Explore effective strategies to ensure data quality in your MCP implementations. Learn best practices to maintain accuracy and reliability.
Why AI Agents Need MCP Servers for Data Engineering — MCP servers give AI agents structured access to your data tools — Snowflake, BigQuery, dbt, Airflow, and more. Here is why MCP is the int…
The Complete Guide to Agentic Data Engineering with MCP — Agentic data engineering replaces manual pipeline management with autonomous AI agents. Here is how to implement it with MCP — without lo…
How AI Agents Cut Snowflake Costs by 40% Without Manual Tuning — Most Snowflake environments waste 30-40% of compute on zombie tables, oversized warehouses, and unoptimized queries. AI agents find and f…
RBAC for Data Engineering Teams: Why Manual Access Control Doesn't Scale — Manual RBAC breaks down at 50+ data assets. Policy drift, orphaned permissions, and PII exposure become inevitable. AI agents enforce gov…

Explore Topic Clusters

Data Governance: The Complete Guide — Policies, access controls, PII, and compliance at scale.
Data Catalog: The Complete Guide — Discovery, metadata, lineage, and the modern catalog stack.
Data Lineage: The Complete Guide — Column-level lineage, impact analysis, and observability.
Data Quality: The Complete Guide — Tests, SLAs, anomaly detection, and data reliability engineering.
AI Data Engineering: The Complete Guide — LLMs, agents, and autonomous workflows across the data stack.
MCP for Data: The Complete Guide — Model Context Protocol servers, tools, and agent integration.
Data Mesh & Data Fabric: The Complete Guide — Federated ownership, domain-oriented architecture, and interop.
Open-Source Data Stack: The Complete Guide — dbt, Airflow, Iceberg, DuckDB, and the modern OSS toolkit.
AI for Data Infra — The complete category for AI agents built specifically for data engineering, data governance, and data infrastructure work.