guide5 min read

dbt Tests Best Practices: PKs, FKs, Severity, and CI

dbt Tests Best Practices: PKs, FKs, Severity, and CI

Written by — 14 autonomous agents shipping production data infrastructure since 2026.

Technically reviewed by the Data Workers engineering team.

Last updated .

dbt tests are assertions that run as SQL queries against your models and fail the build when they return any rows. Best practice is to test primary keys and foreign keys on every model, use generic tests for the cheap 80 percent, and save singular tests and dbt-expectations for complex business logic. Severity configuration lets you warn without breaking builds.

Getting dbt tests right is the difference between a trusted warehouse and one the business ignores. This guide covers the test types, where to use each, patterns that scale past 500 models, and the CI integration that turns tests into a gating signal for pull requests instead of a nightly email nobody reads.

The dbt Test Types

dbt supports two test types: generic tests (reusable, configured in YAML) and singular tests (custom SQL files). Generic tests cover the common checks — not_null, unique, accepted_values, relationships — and run against any column. Singular tests are standalone SQL queries that return failing rows.

Packages like dbt-utils, dbt-expectations, and dbt-meta-testing extend the generic test library with statistical, distributional, and metadata-aware checks. Most teams install dbt-utils on day one and add dbt-expectations when they outgrow the built-ins. Both are mature, well-maintained, and free.

The Non-Negotiables

Every model should have at least a primary key test (not_null + unique) and relationship tests to upstream models. Without those, downstream joins can silently double-count rows and nobody notices until an executive asks why revenue looks off. These tests take 30 seconds to write and catch the worst bugs before they reach dashboards.

  • Primary key — not_null + unique on the grain column
  • Foreign keys — relationships tests to every upstream dimension
  • Accepted values — on enum-like columns (status, region, tier)
  • Not null on required columns — the ones your dashboards assume
  • Freshness — sources.yml freshness check on every raw ingest
  • Grain test — count distinct on the presumed unique key combination

Severity and Warnings

Not every test needs to break the build. Use severity: warn for tests you want visibility on but cannot fix today — usually historical data that fails a new rule. Use severity: error for the non-negotiables. The warn_if and error_if thresholds let you promote warnings to errors once a certain number of rows fail, which is the cleanest way to phase in new rules without breaking existing pipelines.

A common pattern: every new test starts at warn severity, runs for two weeks, and gets promoted to error only once it is clean on live data. This prevents the classic mistake of adding a test that breaks the build on day one and training your team to add --exclude flags.

Testing at Scale

PracticeWhy It Matters
Store failuresstore_failures: true lets analysts see what broke without rerunning
Tag testsTag critical tests so CI runs them on every PR, not just nightly
Use dbt-expectationsAdd distributional and statistical checks for KPI models
Run in CIdbt build on pull requests catches test failures before merge
Partition testsLimit to recent data in CI to keep runtime under 5 minutes
Slim CIState deferral runs only tests on affected models

Singular Tests for Business Logic

Generic tests cover structural checks. Singular tests cover business logic — 'no customer has more than one active subscription', 'sum of line items equals order total'. Write them as plain SQL in tests/ and they fail when they return rows. Keep them short and named for the rule they enforce, and add a comment explaining the business reason so future maintainers do not delete them thinking they are redundant.

CI Integration

The biggest leverage point is wiring dbt build (which runs both model builds and tests) into pull request CI. A test that fails on the PR is 100x cheaper to fix than one that fails on production data at 3am. Use slim CI with state deferral to keep PR runs under five minutes — otherwise developers route around the check.

Slim CI works by comparing the current PR branch against a production manifest and running only the models (and their tests) that actually changed. For large dbt projects this is the difference between a 45-minute CI run and a 3-minute one, which decides whether tests are a gate or a bureaucratic obstacle.

Organizing Tests at Scale

Past 500 models, tests need organization. Group them by business domain using dbt selectors, tag critical models with 'tier:1' so the scheduler can run them more often, and separate fast structural tests from slow statistical tests. Run tier:1 tests on every PR; run statistical tests nightly. This keeps PR runtime manageable while still catching distributional drift.

Documentation matters too. Every test should have a short description explaining what business rule it enforces. Future maintainers who cannot tell why a test exists will delete it the first time it fires — preserving the context is the only way to keep coverage from eroding over years. Consider a template like 'enforces that {rule} because {business reason}' for every custom test.

Monorepo teams should also think about ownership. Tag tests with the owning team so failure alerts route to the right Slack channel instead of a firehose. Dagster and Elementary both support owner-based routing; dbt Cloud's jobs UI supports it via notifications. Pick whichever matches your orchestrator.

What dbt Tests Cannot Do

dbt tests are assertions, not monitors. They tell you a rule broke during a build; they do not tell you when data behavior drifts over time. For anomaly detection, distribution tracking, and freshness trends, you need a complementary observability layer. Elementary is the OSS choice that plugs into dbt artifacts directly; Monte Carlo and Bigeye are the SaaS options for larger teams with bigger budgets and stricter SLAs.

Beyond dbt Tests

Once you outgrow what dbt tests can express, layer on Great Expectations or Soda for distributional checks, anomaly detection, and richer alerting. Data Workers' quality agent can even suggest new dbt tests from profiling data — see autonomous data engineering.

Agents can also auto-tag tests, promote warnings to errors based on trends, and file tickets when a test starts failing consistently. Book a demo to see the full loop in action.

dbt tests are the first line of defense against bad data. Start with PKs and FKs on every model, escalate severity intentionally, wire tests into CI, and automate failure routing — your warehouse and your stakeholders will both thank you.

See Data Workers in action

15 autonomous AI agents working across your entire data stack. MCP-native, open-source, deployed in minutes.

Book a Demo

Related Resources

Explore Topic Clusters