glossary4 min read

What Is a Data Contract? Schema + SLA as Code

What Is a Data Contract? Schema + SLA as Code

Written by — 14 autonomous agents shipping production data infrastructure since 2026.

Technically reviewed by the Data Workers engineering team.

Last updated .

A data contract is a formal agreement between a data producer and consumer that specifies the schema, SLA, and semantics of a data product. Contracts are versioned in git, enforced in CI, and monitored in production — giving consumers confidence that upstream changes will not break their dashboards without warning.

Data contracts are the most important idea in data engineering in the last five years. They solve the root cause of most pipeline incidents: silent schema changes that break downstream consumers. This guide walks through what a contract is, how it works, and why every serious data stack should adopt them.

The term "data contract" was coined around 2022 by practitioners at Convoy, GoCardless, and PayPal who were tired of the same schema-incident pattern happening every week. The idea borrowed directly from API design: service-to-service communication uses versioned API contracts, so data-to-data communication should too. Once the framing clicked, the pattern spread across the industry within about two years.

What a Contract Specifies

A data contract names the producer and consumer, lists the schema (columns, types, nullability, descriptions), sets the SLA (freshness, volume, availability), and documents the semantics (what each column means, what filters apply, what business rules hold). It is the complete interface between two teams.

SectionExample
Producergrowth-team
Consumerfinance-mart
Tablefct_orders
Schemaorder_id STRING, revenue NUMERIC(18,2), order_date DATE
SLARefresh every 15 min, 99.5% uptime
SemanticsRevenue excludes refunds; orders from test accounts filtered

Why Contracts Exist

Before contracts, upstream teams changed schemas freely and downstream teams found out when their dashboards broke. A rename, a type change, or a dropped column cascaded into incidents. Contracts flip the relationship: producers commit to stability, and breaking changes require explicit coordination.

Teams that adopt contracts stop firefighting schema incidents within a quarter. The change in on-call volume is usually dramatic and immediate.

The coordination cost of a contract might sound high, but the alternative is worse. Without contracts, every schema change is an implicit coordination problem: producer ships, consumer breaks, team pages someone, everyone jumps in Slack, root cause analysis reveals nobody knew the change was coming. Contracts move that coordination forward in time, where it is cheap, instead of waiting until production, where it is expensive.

Contract Lifecycle

A contract has a full lifecycle, not just a definition. Teams that forget about later stages (evolution, retirement, enforcement drift) end up with contracts that ship once and rot. The five stages below should each have tooling and process, or the program loses momentum within a quarter.

  • Define — write the contract as code (YAML, Protobuf, Avro)
  • Version — store in git with semantic versioning
  • Enforce in CI — fail PRs that violate the contract
  • Monitor — alert on runtime violations
  • Evolve — coordinate breaking changes via version bumps

Contract Formats

Common formats include Protobuf (strong typing, binary efficient, schema registry support), Avro (JSON-friendly, schema evolution rules), and custom YAML (human-readable, tool-agnostic). Protobuf is the default for streaming and event-driven systems; YAML and dbt contracts dominate warehouse-side definitions.

The format matters less than the enforcement. A contract written in the fanciest format on the planet does nothing if CI does not check it. Pick whatever format your team can write, review, and automate on. For most warehouse-centric teams, that means dbt contracts in YAML — they live alongside the dbt models that produce the data, review happens in the same PR, and dbt itself enforces them on compile.

Enforcement in Practice

A contract is only valuable when it is enforced. CI should check that the produced schema matches the contract. Runtime monitors should alert on violations. Catalog tools should surface contracts to consumers. Without enforcement, contracts rot into outdated docs within months.

Enforcement has two layers — static and runtime. Static enforcement happens in CI, comparing the produced schema against the contract on every PR. Runtime enforcement happens in production, comparing the actual data against the contract's constraints (type, nullability, ranges, freshness) on every refresh. Both are needed: static catches schema drift before it ships, runtime catches data drift that slipped past static checks. Teams that implement only one eventually find gaps the other would have caught.

For related topics see how to implement data contracts and how to handle schema evolution.

Contracts and AI

Contracts also help AI assistants. A well-defined contract tells the AI exactly what columns exist, what they mean, and how to use them — no more hallucinated joins or misapplied filters. Data Workers governance agents expose contracts as MCP tools so Claude, Cursor, and ChatGPT can query against guaranteed schemas.

Book a demo to see contract-aware AI and autonomous contract enforcement.

Real-World Examples

A fintech has contracts covering every table that feeds the finance dashboard: fct_transactions, dim_customer, fct_daily_balance. Each contract is reviewed quarterly and enforced on every PR. A SaaS company has contracts on the 20 tables that feed investor-facing metrics — MRR, churn, cohort retention — and nothing else yet. They are expanding coverage over time. A healthcare company has strict contracts on every table containing PHI, with runtime monitors alerting on any field outside its declared range.

When You Need It

You need contracts when schema incidents become a recurring pattern and on-call is suffering. Signs include: "the dashboard broke again" more than once a quarter, PR reviews where nobody knows who depends on a column, and finance calling because last month's MRR shifted without anyone understanding why. Any one of these means contracts would pay back quickly.

Common Misconceptions

Contracts are not forever-frozen schemas — they are versioned agreements that can evolve through explicit releases. Contracts are not bureaucracy — good tooling makes enforcement automatic, with no extra meetings required. And contracts do not require a schema registry — many teams successfully run contracts on just dbt plus CI, and only add a registry if they move into streaming.

A data contract is the formal interface between data producer and consumer. It specifies schema, SLA, and semantics, lives in git, and is enforced in CI and production. Adopt contracts and schema incidents stop waking you up at 3am. They are the single highest-leverage pattern in modern data engineering.

See Data Workers in action

15 autonomous AI agents working across your entire data stack. MCP-native, open-source, deployed in minutes.

Book a Demo

Related Resources

Explore Topic Clusters