dbt Tests Best Practices: PKs, FKs, Severity, and CI
dbt Tests Best Practices: PKs, FKs, Severity, and CI
Written by The Data Workers Team — 14 autonomous agents shipping production data infrastructure since 2026.
Technically reviewed by the Data Workers engineering team.
Last updated .
dbt tests are assertions that run as SQL queries against your models and fail the build when they return any rows. Best practice is to test primary keys and foreign keys on every model, use generic tests for the cheap 80 percent, and save singular tests and dbt-expectations for complex business logic. Severity configuration lets you warn without breaking builds.
Getting dbt tests right is the difference between a trusted warehouse and one the business ignores. This guide covers the test types, where to use each, patterns that scale past 500 models, and the CI integration that turns tests into a gating signal for pull requests instead of a nightly email nobody reads.
The dbt Test Types
dbt supports two test types: generic tests (reusable, configured in YAML) and singular tests (custom SQL files). Generic tests cover the common checks — not_null, unique, accepted_values, relationships — and run against any column. Singular tests are standalone SQL queries that return failing rows.
Packages like dbt-utils, dbt-expectations, and dbt-meta-testing extend the generic test library with statistical, distributional, and metadata-aware checks. Most teams install dbt-utils on day one and add dbt-expectations when they outgrow the built-ins. Both are mature, well-maintained, and free.
The Non-Negotiables
Every model should have at least a primary key test (not_null + unique) and relationship tests to upstream models. Without those, downstream joins can silently double-count rows and nobody notices until an executive asks why revenue looks off. These tests take 30 seconds to write and catch the worst bugs before they reach dashboards.
- •Primary key — not_null + unique on the grain column
- •Foreign keys — relationships tests to every upstream dimension
- •Accepted values — on enum-like columns (status, region, tier)
- •Not null on required columns — the ones your dashboards assume
- •Freshness — sources.yml freshness check on every raw ingest
- •Grain test — count distinct on the presumed unique key combination
Severity and Warnings
Not every test needs to break the build. Use severity: warn for tests you want visibility on but cannot fix today — usually historical data that fails a new rule. Use severity: error for the non-negotiables. The warn_if and error_if thresholds let you promote warnings to errors once a certain number of rows fail, which is the cleanest way to phase in new rules without breaking existing pipelines.
A common pattern: every new test starts at warn severity, runs for two weeks, and gets promoted to error only once it is clean on live data. This prevents the classic mistake of adding a test that breaks the build on day one and training your team to add --exclude flags.
Testing at Scale
| Practice | Why It Matters |
|---|---|
| Store failures | store_failures: true lets analysts see what broke without rerunning |
| Tag tests | Tag critical tests so CI runs them on every PR, not just nightly |
| Use dbt-expectations | Add distributional and statistical checks for KPI models |
| Run in CI | dbt build on pull requests catches test failures before merge |
| Partition tests | Limit to recent data in CI to keep runtime under 5 minutes |
| Slim CI | State deferral runs only tests on affected models |
Singular Tests for Business Logic
Generic tests cover structural checks. Singular tests cover business logic — 'no customer has more than one active subscription', 'sum of line items equals order total'. Write them as plain SQL in tests/ and they fail when they return rows. Keep them short and named for the rule they enforce, and add a comment explaining the business reason so future maintainers do not delete them thinking they are redundant.
CI Integration
The biggest leverage point is wiring dbt build (which runs both model builds and tests) into pull request CI. A test that fails on the PR is 100x cheaper to fix than one that fails on production data at 3am. Use slim CI with state deferral to keep PR runs under five minutes — otherwise developers route around the check.
Slim CI works by comparing the current PR branch against a production manifest and running only the models (and their tests) that actually changed. For large dbt projects this is the difference between a 45-minute CI run and a 3-minute one, which decides whether tests are a gate or a bureaucratic obstacle.
Organizing Tests at Scale
Past 500 models, tests need organization. Group them by business domain using dbt selectors, tag critical models with 'tier:1' so the scheduler can run them more often, and separate fast structural tests from slow statistical tests. Run tier:1 tests on every PR; run statistical tests nightly. This keeps PR runtime manageable while still catching distributional drift.
Documentation matters too. Every test should have a short description explaining what business rule it enforces. Future maintainers who cannot tell why a test exists will delete it the first time it fires — preserving the context is the only way to keep coverage from eroding over years. Consider a template like 'enforces that {rule} because {business reason}' for every custom test.
Monorepo teams should also think about ownership. Tag tests with the owning team so failure alerts route to the right Slack channel instead of a firehose. Dagster and Elementary both support owner-based routing; dbt Cloud's jobs UI supports it via notifications. Pick whichever matches your orchestrator.
What dbt Tests Cannot Do
dbt tests are assertions, not monitors. They tell you a rule broke during a build; they do not tell you when data behavior drifts over time. For anomaly detection, distribution tracking, and freshness trends, you need a complementary observability layer. Elementary is the OSS choice that plugs into dbt artifacts directly; Monte Carlo and Bigeye are the SaaS options for larger teams with bigger budgets and stricter SLAs.
Beyond dbt Tests
Once you outgrow what dbt tests can express, layer on Great Expectations or Soda for distributional checks, anomaly detection, and richer alerting. Data Workers' quality agent can even suggest new dbt tests from profiling data — see autonomous data engineering.
Agents can also auto-tag tests, promote warnings to errors based on trends, and file tickets when a test starts failing consistently. Book a demo to see the full loop in action.
dbt tests are the first line of defense against bad data. Start with PKs and FKs on every model, escalate severity intentionally, wire tests into CI, and automate failure routing — your warehouse and your stakeholders will both thank you.
Further Reading
Sources
See Data Workers in action
15 autonomous AI agents working across your entire data stack. MCP-native, open-source, deployed in minutes.
Book a DemoRelated Resources
- Data Pipeline Best Practices for 2026: Architecture, Testing, and AI — Data pipeline best practices have evolved. Modern pipelines need idempotent design, layered testing, real-time monitoring, and AI-assiste…
- Data Governance Best Practices: 15 Rules That Actually Work — Fifteen operational rules for shipping data governance that works, including the new AI-era practices around agent access and prompt inje…
- Data Dictionary Best Practices: 10 Rules Teams Actually Follow — Ten operational rules for building a data dictionary that survives contact with real teams, plus dictionary health metrics.
- Why Your dbt Semantic Layer Needs an Agent Layer on Top — The dbt semantic layer is the best way to define metrics. But definitions alone don't prevent incidents or optimize queries. An agent lay…
- The 10 Best MCP Servers for Data Engineering Teams in 2026 — With 19,000+ MCP servers available, finding the right ones for data engineering is overwhelming. Here are the 10 that matter most — from…
- Claude Code + Snowflake/BigQuery/dbt: Integration Patterns for Data Teams — Practical integration patterns: Snowflake CLI + MCP, BigQuery MCP server, dbt MCP server with Claude Code.
- The 25 Best MCP Servers for Data Engineers in 2026 — With 19,000+ MCP servers available, here are the 25 that matter most for data engineers — ranked across warehouses, orchestrators, qualit…
- Data Engineering with dbt: The Modern Workflow — Covers dbt's role in modern data stacks, project structure, best practices, and automation.
- dbt Incremental Models: Strategies, unique_key, and Lookback Windows — Complete guide to dbt incremental models: strategies, unique_key, late-arriving data, cost tuning, and debugging drift.
- dbt Snapshots Explained: SCD Type 2 in Five Lines of YAML — Guide to dbt snapshots: timestamp vs check strategy, hard deletes, scaling considerations, and why never full-refresh.
- Which AI IDE Should Data Engineers Use in 2026? — Five AI IDEs compete for data engineers' attention. Here's how Claude Code, Cursor, GitHub Copilot, OpenClaw, and Windsurf compare for MC…
- dbt Alternatives in 2026: When Analytics Engineering Needs More — dbt is the analytics engineering standard. But Fivetran merger pricing, limited real-time support, and growing agent needs are driving te…
Explore Topic Clusters
- Data Governance: The Complete Guide — Policies, access controls, PII, and compliance at scale.
- Data Catalog: The Complete Guide — Discovery, metadata, lineage, and the modern catalog stack.
- Data Lineage: The Complete Guide — Column-level lineage, impact analysis, and observability.
- Data Quality: The Complete Guide — Tests, SLAs, anomaly detection, and data reliability engineering.
- AI Data Engineering: The Complete Guide — LLMs, agents, and autonomous workflows across the data stack.
- MCP for Data: The Complete Guide — Model Context Protocol servers, tools, and agent integration.
- Data Mesh & Data Fabric: The Complete Guide — Federated ownership, domain-oriented architecture, and interop.
- Open-Source Data Stack: The Complete Guide — dbt, Airflow, Iceberg, DuckDB, and the modern OSS toolkit.