Data Quality Dimensions: The DAMA Framework Explained
Data Quality Dimensions: The DAMA Framework Explained
Written by The Data Workers Team — 14 autonomous agents shipping production data infrastructure since 2026.
Technically reviewed by the Data Workers engineering team.
Last updated .
Data quality dimensions are the categories used to measure whether data is fit for purpose — typically accuracy, completeness, consistency, timeliness, uniqueness, and validity. Frameworks like DAMA-DMBOK and ISO 8000 codify these dimensions so teams can score datasets, set SLAs, and prioritize remediation work instead of arguing about what 'quality' means.
This guide walks through the six core dimensions, how to measure each one with concrete metrics, and how tools like Great Expectations, Soda, and autonomous agents automate the scoring end-to-end so you can turn vague stakeholder complaints into trend lines with owners.
What Are Data Quality Dimensions?
Data quality dimensions are measurable attributes of a dataset — each one captures a different way data can be wrong or unfit for use. DAMA-DMBOK lists the canonical six (accuracy, completeness, consistency, timeliness, uniqueness, validity) and some frameworks add integrity, conformity, and reasonableness for finer granularity. The exact list is less important than the commitment to measure instead of argue.
The point of dimensions is measurability. Instead of 'this table has quality issues,' you say 'this table is 98 percent complete, 96 percent unique on the primary key, and stale by 4 hours against the SLA of 1 hour.' That lets you set targets, track trends, and decide whether to ship. Quality becomes something you can grade, not just feel.
The Six Core Dimensions
| Dimension | Question | How to Measure | Example Rule |
|---|---|---|---|
| Accuracy | Does the value match reality? | Compare to source of truth | Customer email valid in CRM |
| Completeness | Are required fields populated? | Null count / row count | orders.customer_id not null |
| Consistency | Do related values agree? | Cross-table checks | order_total = sum(line_items) |
| Timeliness | Is it fresh enough to use? | Time since last update | < 1 hour since ingest |
| Uniqueness | Are duplicates present? | distinct rows / total rows | unique(user_id) |
| Validity | Does it match the schema/format? | Regex, type, range checks | phone matches E.164 |
Accuracy vs Validity (the Most Confused Pair)
Accuracy asks whether the value is correct in the real world. Validity asks whether it matches the expected format or schema. A phone number like +15551234567 can be perfectly valid (matches E.164) and still inaccurate (customer never had that number). Validity is cheap to automate; accuracy usually requires a trusted reference system and is the hardest dimension to measure at scale.
Accuracy problems are usually what stakeholders complain about when they say 'the numbers are wrong.' Validity problems are usually what catches them during ingest. Both matter, but resist the urge to call them the same thing — the remediation approaches are totally different.
Completeness and Timeliness
Completeness is the easiest dimension to measure — just count nulls against expected values. Timeliness is the easiest to automate with freshness monitors on your ingest pipelines. Both are the fastest wins for a new data quality program because failures are obvious and remediation is usually a pipeline fix, not a business process change. Start here when you are building your first quality scorecard.
The nuance on completeness is that nullability rules should vary by column. Optional fields are supposed to be null; required fields must not be. A blanket 'no nulls anywhere' rule generates false alarms and teaches analysts to ignore alerts — the classic quality-program death spiral.
Consistency and Uniqueness
Consistency catches contradictions: an order total that does not equal the sum of its line items, a parent table count that disagrees with its child, a dim's conformed attribute that differs from the source. Cross-table checks are the only way to catch these — single-table rules will never surface them. They are also the rules that prevent the most stakeholder-facing disasters, because contradictions between systems are what trigger executive-level trust loss.
Uniqueness is the primary-key guarantee. Every dimension should have a unique test, every fact should have a grain test, and any time a join returns more rows than expected it is almost always a uniqueness violation upstream. Unique tests should run on every pipeline, and composite unique tests (multiple columns) are just as important as single-column tests when your grain is a combination.
Measuring Quality in Practice
Pick a few tables that power critical dashboards, define rules for each relevant dimension, run them on every pipeline run, and publish a scorecard. Tools like Great Expectations vs Soda or dbt tests give you the rule engine; autonomous quality agents can suggest new rules from profiling data.
- •Start with one scorecard — five tables, six dimensions, one owner
- •Define SLAs — target score per dimension per table
- •Alert on regression — break the build when a dimension score drops
- •Publish publicly — put the scorecard in the catalog so consumers see it
- •Iterate weekly — add rules as failures surface
- •Kill noisy rules — alerts that fire every day teach people to ignore them
Organizational Ownership
Quality scorecards only work if someone owns them. The DAMA framework assigns quality responsibility to data stewards — domain experts who understand what the numbers should look like and have authority to fix upstream issues. Without named owners, scorecards become orphaned dashboards nobody watches. Assign an owner to every tier-1 table as part of the governance rollout, and make scorecard review a monthly meeting, not a quarterly exercise.
The stewardship model also defines escalation. When a rule fails and the steward cannot fix it, who owns the remediation? The upstream data producer? The pipeline engineer? The consuming analyst? Writing down the answer up front prevents the 'nobody owns it' state where bad data sits in production for weeks because everyone thinks someone else is handling it.
Automating Quality With Agents
Data Workers' quality agent profiles every table on ingest, infers rules for each dimension, and escalates anomalies automatically. Pipeline agents hold bad data in quarantine until rules pass; governance agents report scorecards to stakeholders. See how autonomous data engineering keeps quality dimensions green without manual rule writing, or book a demo.
Data quality dimensions turn vague complaints into measurable targets. Adopt the DAMA six, build a scorecard, and automate the rules so every pipeline run keeps score — that is how quality programs actually stick past the first three months of leadership attention.
Sources
See Data Workers in action
15 autonomous AI agents working across your entire data stack. MCP-native, open-source, deployed in minutes.
Book a DemoRelated Resources
- Data Quality for AI Agents: Why Your LLM is Only as Good as Your Metadata — AI agent output quality depends directly on data quality. 86% of leaders agree. Here are the three quality levels agents need and how to…
- Autonomous Data Quality Agents: Beyond Dashboards to Self-Healing Quality — Autonomous data quality agents go beyond monitoring dashboards — they detect anomalies, diagnose root causes, and apply fixes without hum…
- The 15 Data Quality Metrics That Actually Matter for AI — Traditional data quality metrics (completeness, accuracy) are insufficient for AI agents. These 15 metrics predict whether your agents wi…
- When LLMs Hallucinate About Your Data: How Context Layers Prevent AI Misinformation — LLMs hallucinate 66% more often when querying raw tables vs through a semantic/context layer. Here is how context layers prevent AI misin…
- How to Implement Data Quality: A 6-Step Playbook — Walks through a practical six-step data quality program including ownership and alerting patterns.
- Data Quality for ML: Label, Feature, and Drift Issues — Covers ML-specific quality dimensions beyond traditional schema tests and the data-centric AI approach.
- Data Quality: Complete Guide to Building Trust in Your Data — Pillar hub covering the six dimensions of data quality, contracts vs tests, ML quality, anomaly detection, SLAs, semantic layer quality,…
- Great Expectations vs Soda Core vs AI Agents: Which Data Quality Approach Wins in 2026? — Great Expectations and Soda Core require you to write and maintain rules. AI agents learn your data patterns and detect anomalies autonom…
- Data Contracts vs Data Quality Tools: Prevention vs Detection — Data contracts prevent bad data at the source. Data quality tools detect it downstream. Here is when to use each — and why the best teams…
- What Is Data Quality? The Six Dimensions Explained — Defines data quality across six dimensions and covers measurement, ownership, and automation.
- Why AI Agents Need MCP Servers for Data Engineering — MCP servers give AI agents structured access to your data tools — Snowflake, BigQuery, dbt, Airflow, and more. Here is why MCP is the int…
- The Complete Guide to Agentic Data Engineering with MCP — Agentic data engineering replaces manual pipeline management with autonomous AI agents. Here is how to implement it with MCP — without lo…
Explore Topic Clusters
- Data Governance: The Complete Guide — Policies, access controls, PII, and compliance at scale.
- Data Catalog: The Complete Guide — Discovery, metadata, lineage, and the modern catalog stack.
- Data Lineage: The Complete Guide — Column-level lineage, impact analysis, and observability.
- Data Quality: The Complete Guide — Tests, SLAs, anomaly detection, and data reliability engineering.
- AI Data Engineering: The Complete Guide — LLMs, agents, and autonomous workflows across the data stack.
- MCP for Data: The Complete Guide — Model Context Protocol servers, tools, and agent integration.
- Data Mesh & Data Fabric: The Complete Guide — Federated ownership, domain-oriented architecture, and interop.
- Open-Source Data Stack: The Complete Guide — dbt, Airflow, Iceberg, DuckDB, and the modern OSS toolkit.