guide4 min read

Data Engineering with dbt: The Modern Workflow

Data Engineering with dbt: The Modern Workflow

Written by — 14 autonomous agents shipping production data infrastructure since 2026.

Technically reviewed by the Data Workers engineering team.

Last updated .

Data engineering with dbt means writing SQL transformations as version-controlled models with tests, documentation, and lineage — run against any modern cloud warehouse. dbt transformed analytics engineering by making SQL models look like software engineering: pull requests, CI, testing, and observability are all first-class.

dbt is the most important data engineering tool of the last decade. This guide walks through what dbt does, how it fits into a modern stack, and the patterns that work in production teams.

What dbt Does

dbt (data build tool) compiles a project of SQL files into a DAG of models, runs them against your warehouse, and runs tests on the results. Every model is a SELECT statement that dbt wraps in CREATE TABLE or CREATE VIEW. Jinja templating adds variables, macros, and reusable logic. The whole project lives in git.

Crucially, dbt pushes transformation logic into the warehouse rather than pulling data into a separate processing engine. This ELT-over-ETL approach takes full advantage of cloud warehouse scale and avoids the operational complexity of managing Spark clusters or Python environments for every transformation. The simplicity is why dbt spread so quickly across the analytics engineering community.

FeatureWhat It Provides
ModelsSQL + CREATE TABLE/VIEW wrappers
TestsSchema and data quality assertions
DocsAuto-generated docs site with lineage
MacrosReusable SQL snippets
PackagesCommunity libraries like dbt_utils
SnapshotsSCD Type 2 automation

Why dbt Took Over

Before dbt, SQL transformations lived in stored procedures, Airflow operators, and ad-hoc scripts. They were hard to test, review, or reason about. dbt turned transformations into version-controlled code with tests and documentation — suddenly analytics engineers could use the same workflow as software engineers. That shift made the modern analytics engineering discipline possible.

dbt also benefitted from the rise of cloud warehouses. Snowflake, BigQuery, and Redshift made it trivial to spin up isolated environments, pay per query, and handle scale automatically. dbt took advantage of all three — isolated schemas for dev, pay-per-use CI runs, and automatic scaling for production models. It rode the cloud-warehouse wave and became the default SQL transformation tool.

The dbt Project Structure

  • Staging models — light cleanup of raw source data
  • Intermediate models — joins and business logic
  • Mart models — curated facts and dimensions
  • Tests — schema and data quality per model
  • Docs — descriptions for every model and column
  • Sources — freshness and documentation for raw tables

The staging, intermediate, and mart layering is the single most important dbt convention. Teams that skip it end up with sprawling models that reference raw tables directly, making refactors dangerous and onboarding slow. Even small projects benefit from the layering because it forces a clear separation between raw data, internal transformations, and consumer-facing marts.

dbt Core vs dbt Cloud

dbt Core is open source and free — run it from the CLI or any scheduler. dbt Cloud is the managed SaaS from dbt Labs: IDE, scheduling, CI, lineage UI, and semantic layer. Teams that want zero ops go with Cloud; teams that already have scheduling infrastructure run Core alongside Airflow or GitHub Actions.

The pricing gap matters as projects grow. Large dbt Cloud deployments can cost six figures per year, which pushes mature teams toward Core on Airflow or dagster for new projects. Smaller teams benefit from dbt Cloud's IDE and one-click deploys until they outgrow them. Neither choice is permanent — migrating between Core and Cloud is straightforward compared to rewriting all your models.

dbt Best Practices

Use staging models as the boundary between raw and internal logic. Name models with prefixes (stg_, int_, fct_, dim_). Write tests for every mart table. Use ref() not hardcoded table names. Review every PR. Use environment-specific profiles. These patterns are boring but they are what separate reliable dbt projects from chaotic ones.

Implementation Roadmap

A pragmatic dbt adoption starts with a single analytics engineer porting one mart model from an existing stored procedure or notebook. Prove the test and CI workflow on that one model, then expand. Organizations that try to rewrite everything at once stall for months. Incremental adoption, with the old and new paths running in parallel, is the proven path.

Common Pitfalls

Common dbt pitfalls are too many incremental models (hard to reason about and backfill), macro abuse (code becomes unreadable), ignoring test failures in CI, and letting the DAG grow without domain boundaries. Establish domain folders early, keep macros simple, and treat red tests the same way you treat red builds — not optional, not deferred.

Real-World Examples

Companies like JetBlue, HubSpot, and Gitlab have publicly shared dbt implementations with thousands of models and hundreds of contributors. The common thread in successful deployments is strong conventions, disciplined PR review, and automated test execution in CI. Size is not the challenge — drift from agreed conventions is.

Smaller teams get outsized value from dbt because one analytics engineer can own a production data stack without a dedicated platform team. A 200-model dbt project with tests and CI is comfortably maintained by two analytics engineers, while the same scope in stored procedures would need an ops team. This multiplier effect is why dbt adoption spreads fastest in fast-moving startups.

ROI Considerations

dbt ROI compounds over time as the test and documentation investment pays back. Teams typically see half the data incidents of comparable pre-dbt workflows, dramatically faster onboarding for new analytics engineers, and better cross-team collaboration because everything lives in git. The license cost of dbt Cloud is almost always far less than the engineering time saved.

For related reading see dbt vs dataform, how to build a data pipeline, and how to design a data warehouse.

Automating dbt Operations

Data Workers pipeline agents run dbt projects autonomously — monitoring runs, diagnosing test failures, writing fix PRs, and enforcing contracts. The result is dbt operations that scale past a single analytics engineer per project. Book a demo to see autonomous dbt operations.

dbt is the standard for SQL transformations in modern warehouses. Use it for staging, intermediate, and mart models, write tests for every table, and treat your dbt project like software. The teams that invest in dbt discipline early ship faster and trust their data more.

See Data Workers in action

15 autonomous AI agents working across your entire data stack. MCP-native, open-source, deployed in minutes.

Book a Demo

Related Resources

Explore Topic Clusters