guideLast updated Apr 10, 20265 min read

dbt Tests Best Practices: PKs, FKs, Severity, and CI

dbt tests are assertions that run as SQL queries against your models and fail the build when they return any rows. Best practice is to test primary keys and foreign keys on every model, use generic tests for the cheap 80 percent, and save singular tests and dbt-expectations for complex business logic. Severity configuration lets you warn without breaking builds.

Getting dbt tests right is the difference between a trusted warehouse and one the business ignores. This guide covers the test types, where to use each, patterns that scale past 500 models, and the CI integration that turns tests into a gating signal for pull requests instead of a nightly email nobody reads.

The dbt Test Types

dbt supports two test types: generic tests (reusable, configured in YAML) and singular tests (custom SQL files). Generic tests cover the common checks — not_null, unique, accepted_values, relationships — and run against any column. Singular tests are standalone SQL queries that return failing rows.

Packages like dbt-utils, dbt-expectations, and dbt-meta-testing extend the generic test library with statistical, distributional, and metadata-aware checks. Most teams install dbt-utils on day one and add dbt-expectations when they outgrow the built-ins. Both are mature, well-maintained, and free.

The Non-Negotiables

Every model should have at least a primary key test (not_null + unique) and relationship tests to upstream models. Without those, downstream joins can silently double-count rows and nobody notices until an executive asks why revenue looks off. These tests take 30 seconds to write and catch the worst bugs before they reach dashboards.

•Primary key — not_null + unique on the grain column
•Foreign keys — relationships tests to every upstream dimension
•Accepted values — on enum-like columns (status, region, tier)
•Not null on required columns — the ones your dashboards assume
•Freshness — sources.yml freshness check on every raw ingest
•Grain test — count distinct on the presumed unique key combination

Severity and Warnings

Not every test needs to break the build. Use severity: warn for tests you want visibility on but cannot fix today — usually historical data that fails a new rule. Use severity: error for the non-negotiables. The warn_if and error_if thresholds let you promote warnings to errors once a certain number of rows fail, which is the cleanest way to phase in new rules without breaking existing pipelines.

A common pattern: every new test starts at warn severity, runs for two weeks, and gets promoted to error only once it is clean on live data. This prevents the classic mistake of adding a test that breaks the build on day one and training your team to add --exclude flags.

Testing at Scale

Practice	Why It Matters
Store failures	store_failures: true lets analysts see what broke without rerunning
Tag tests	Tag critical tests so CI runs them on every PR, not just nightly
Use dbt-expectations	Add distributional and statistical checks for KPI models
Run in CI	dbt build on pull requests catches test failures before merge
Partition tests	Limit to recent data in CI to keep runtime under 5 minutes
Slim CI	State deferral runs only tests on affected models

Singular Tests for Business Logic

Generic tests cover structural checks. Singular tests cover business logic — 'no customer has more than one active subscription', 'sum of line items equals order total'. Write them as plain SQL in tests/ and they fail when they return rows. Keep them short and named for the rule they enforce, and add a comment explaining the business reason so future maintainers do not delete them thinking they are redundant.

CI Integration

The biggest leverage point is wiring dbt build (which runs both model builds and tests) into pull request CI. A test that fails on the PR is 100x cheaper to fix than one that fails on production data at 3am. Use slim CI with state deferral to keep PR runs under five minutes — otherwise developers route around the check.

Slim CI works by comparing the current PR branch against a production manifest and running only the models (and their tests) that actually changed. For large dbt projects this is the difference between a 45-minute CI run and a 3-minute one, which decides whether tests are a gate or a bureaucratic obstacle.

Organizing Tests at Scale

Past 500 models, tests need organization. Group them by business domain using dbt selectors, tag critical models with 'tier:1' so the scheduler can run them more often, and separate fast structural tests from slow statistical tests. Run tier:1 tests on every PR; run statistical tests nightly. This keeps PR runtime manageable while still catching distributional drift.

Documentation matters too. Every test should have a short description explaining what business rule it enforces. Future maintainers who cannot tell why a test exists will delete it the first time it fires — preserving the context is the only way to keep coverage from eroding over years. Consider a template like 'enforces that {rule} because {business reason}' for every custom test.

Monorepo teams should also think about ownership. Tag tests with the owning team so failure alerts route to the right Slack channel instead of a firehose. Dagster and Elementary both support owner-based routing; dbt Cloud's jobs UI supports it via notifications. Pick whichever matches your orchestrator.

What dbt Tests Cannot Do

dbt tests are assertions, not monitors. They tell you a rule broke during a build; they do not tell you when data behavior drifts over time. For anomaly detection, distribution tracking, and freshness trends, you need a complementary observability layer. Elementary is the OSS choice that plugs into dbt artifacts directly; Monte Carlo and Bigeye are the SaaS options for larger teams with bigger budgets and stricter SLAs.

Beyond dbt Tests

Once you outgrow what dbt tests can express, layer on Great Expectations or Soda for distributional checks, anomaly detection, and richer alerting. Data Workers' quality agent can even suggest new dbt tests from profiling data — see autonomous data engineering.

Agents can also auto-tag tests, promote warnings to errors based on trends, and file tickets when a test starts failing consistently. Book a demo to see the full loop in action.

dbt tests are the first line of defense against bad data. Start with PKs and FKs on every model, escalate severity intentionally, wire tests into CI, and automate failure routing — your warehouse and your stakeholders will both thank you.

See Data Workers in action

15 autonomous AI agents working across your entire data stack. MCP-native, open-source, deployed in minutes.

Book a Demo

Related Resources

dbt Documentation — external reference
Data Pipeline Best Practices for 2026: Architecture, Testing, and AI — Data pipeline best practices have evolved. Modern pipelines need idempotent design, layered testing, real-time monitoring, and AI-assiste…
Data Governance Best Practices: 15 Rules That Actually Work — Fifteen operational rules for shipping data governance that works, including the new AI-era practices around agent access and prompt inje…
Data Dictionary Best Practices: 10 Rules Teams Actually Follow — Ten operational rules for building a data dictionary that survives contact with real teams, plus dictionary health metrics.
Claude Code Elementary Dbt Tests — Claude Code Elementary Dbt Tests
How to Use Claude Code with dbt for Data Transformation — Learn how to integrate Claude Code with dbt for seamless data transformations. This tutorial covers setup, execution, and best practices.
Why Your dbt Semantic Layer Needs an Agent Layer on Top — The dbt semantic layer is the best way to define metrics. But definitions alone don't prevent incidents or optimize queries. An agent lay…
The 10 Best MCP Servers for Data Engineering Teams in 2026 — With 19,000+ MCP servers available, finding the right ones for data engineering is overwhelming. Here are the 10 that matter most — from…
Claude Code + Snowflake/BigQuery/dbt: Integration Patterns for Data Teams — Practical integration patterns: Snowflake CLI + MCP, BigQuery MCP server, dbt MCP server with Claude Code.
The 25 Best MCP Servers for Data Engineers in 2026 — With 19,000+ MCP servers available, here are the 25 that matter most for data engineers — ranked across warehouses, orchestrators, qualit…
Data Engineering with dbt: The Modern Workflow — Covers dbt's role in modern data stacks, project structure, best practices, and automation.
dbt Incremental Models: Strategies, unique_key, and Lookback Windows — Complete guide to dbt incremental models: strategies, unique_key, late-arriving data, cost tuning, and debugging drift.
dbt Snapshots Explained: SCD Type 2 in Five Lines of YAML — Guide to dbt snapshots: timestamp vs check strategy, hard deletes, scaling considerations, and why never full-refresh.

Explore Topic Clusters

Data Governance: The Complete Guide — Policies, access controls, PII, and compliance at scale.
Data Catalog: The Complete Guide — Discovery, metadata, lineage, and the modern catalog stack.
Data Lineage: The Complete Guide — Column-level lineage, impact analysis, and observability.
Data Quality: The Complete Guide — Tests, SLAs, anomaly detection, and data reliability engineering.
AI Data Engineering: The Complete Guide — LLMs, agents, and autonomous workflows across the data stack.
MCP for Data: The Complete Guide — Model Context Protocol servers, tools, and agent integration.
Data Mesh & Data Fabric: The Complete Guide — Federated ownership, domain-oriented architecture, and interop.
Open-Source Data Stack: The Complete Guide — dbt, Airflow, Iceberg, DuckDB, and the modern OSS toolkit.
AI for Data Infra — The complete category for AI agents built specifically for data engineering, data governance, and data infrastructure work.