Engineering8 min read

What W. Edwards Deming's Plan-Do-Study-Act Taught Our Data Quality Agent

The statistician who rebuilt postwar Japanese manufacturing had one core message: most quality failures are system failures, not people failures. His PDSA cycle is the clearest framework ever written for fixing the system — not the symptom.

By The Data Workers Team

W. Edwards Deming did not become famous in his own country first. He spent the 1950s teaching statistical quality control to Japanese manufacturers at a moment when the country was rebuilding from the ground up, and the results — Sony, Toyota, the transformation of Japanese industry into a global export powerhouse — eventually became impossible to ignore. American executives flew to Japan in the late 1970s to understand what had happened. Most of them were expecting to find better machinery. They found a different philosophy of management.

Deming was a statistician and a management theorist, and his central argument was not complicated: most quality problems are not caused by workers making mistakes. They are caused by the system those workers operate inside. In Out of the Crisis (1982) he wrote: 'Quality comes not from inspection, but from improvement of the production process.' That sentence is still the right correction to how most data teams respond to quality failures — which is to add more monitoring, more alerts, more inspection, until the alert volume becomes its own problem.

The method he is best associated with is the PDSA cycle: Plan, Do, Study, Act. Four steps. One loop. Run it until the system improves. We built our data quality agent's plan-do-study-act skill around it. This post explains why it is the right structure for a quality agent, and what we had to work out to make it executable.

What Is Actually Worth Learning

Deming's framework contains more than the PDSA cycle, but three ideas do the real work for data quality. All three come directly from his published books and from the System of Profound Knowledge he articulated in The New Economics (1993).

The first is the distinction between common cause and special cause variation. Common causes are the natural outputs of a system — the variation built into the process design itself. Late upstream arrivals, schema drift, model staleness: if these happen on a schedule or in a pattern, they are common causes, and they require process redesign. Special causes are abnormal events that lie outside the system: a one-off backfill job that corrupts a partition, a deploy that breaks a connector, a single DAG run that stalled. Special causes need targeted fixes. The critical insight — and the one that data teams most often violate — is that treating a common cause as if it were special makes things worse. Deming called this tampering: patching a structural problem with a one-off fix adds variation to the system rather than removing it. The first decision a quality agent must make is which type of variation it is looking at.

The second principle is theory before action. Deming wrote in The New Economics: 'Experience by itself teaches nothing... Without theory, experience has no meaning.' Applied to data quality: chasing every anomaly without a hypothesis is not investigation, it is noise production. The PDSA cycle starts with Plan — and the Plan step requires writing down a falsifiable prediction before any change is made. If you cannot state what you expect to improve and by how much, you are not running an experiment. You are running a hope.

The third principle is the S in PDSA, which stands for Study, not Check. After a change, you do not ask 'is the latest run green?' You ask whether the trend has shifted in a statistically meaningful way. A single green run after a persistent red streak is not evidence of improvement. It may be regression to the mean, or a transient fix, or a coincidence. The study step requires comparing the post-change trend against the pre-change baseline across multiple runs — which is a different operation than reading the most recent quality score.

  • Common cause vs. special cause: classify the variation type before acting — every time, without exception.
  • Theory before action: write a falsifiable hypothesis (dimension, direction, magnitude) before making any change.
  • Study the trend, not the run: evaluate improvement against a multi-run baseline, not the most recent result.
  • Iterate, do not accumulate: if the hypothesis was wrong, record the finding and revise — do not add compensating tests on top of unresolved root causes.

How a Method Becomes a Skill

The plan-do-study-act skill in our quality agent runs seven steps. The first scopes the quality signal using get_quality_summary — collecting the dimensional breakdown across freshness, completeness, uniqueness, referential integrity, and distribution to identify which dimensions are degraded and since when.

The second step classifies the variation type using get_anomalies with historical lookback. Is the degradation recurring on a predictable schedule, or correlated with a system condition? Then it is common cause. Did it appear after a discrete event — a deploy, a schema change, an upstream outage? Then it is special cause. The classification is recorded explicitly before any further action. This is the gate that prevents tampering.

Steps three and four are the Plan phase: state the hypothesis in one sentence (the failing dimension, the suspected cause, the expected measurable improvement), then design the smallest reversible change that would falsify or confirm it. The skill uses create_quality_tests_for_pipeline to draft the targeted check. Step five is Do: apply the change at limited scope — one table, one environment — via set_sla or a test adjustment.

Step six is Study: run run_quality_check, fetch get_quality_score, and compare the trend over the last five to seven runs against the pre-change baseline. A score that improved in the targeted dimension and did not regress elsewhere is a confirmation. A score that moved in the wrong dimension, or moved by less than predicted, is a new data point for a revised hypothesis.

Step seven is Act: if the trend confirms the hypothesis, adopt the change across all affected pipelines. If not, iterate — revise the hypothesis and return to Plan. If the root cause is a common-cause structural issue that cannot be fixed at the quality layer, the skill escalates to dw-pipelines (for late-arrival handling or idempotency redesign) or dw-schema (for contract drift). The loop terminates only when the trend is confirmed or the problem is handed off with full classification and evidence.

One of More Than 400

This skill is one of more than 400 skills we have authored across 19 specialized agents — covering connectors, catalog and context, cost, governance, incidents, analytics, migration, ML, observability, orchestration, pipelines, quality, schema, search, streaming, and usage intelligence. Some are built from first principles. Some, like this one, are distilled from the public work of practitioners who articulated a method precisely enough to make it executable. All of them are version-controlled, validated against the tools they call, and composable with the rest of the swarm.

Deming's PDSA cycle is a good candidate for this treatment because the method is unusually precise. The four-step loop is not just a mnemonic. Each step has a specific epistemological requirement: a falsifiable prediction, a minimum-scope intervention, a trend-level evaluation, a branch that either adopts or revises. That precision is what makes it translatable from a quality management philosophy into a decision procedure an agent can actually run. The goal is the same in both domains: stop fixing symptoms and start improving the system.

Primary sources: deming.org/explore/sopk/ | deming.org/explore/fourteen-points/ | Out of the Crisis (MIT Press, 1982) | The New Economics for Industry, Government, Education (MIT Press, 1993)

A note on this post: This is independent commentary and homage. It distills publicly available writing and talks by W. Edwards Deming to illustrate a working method, and every quote is drawn from and verified against the primary sources linked above. The skill it describes is named for the method, not the person, and contains no marketing claims attributed to them. Data Workers is not affiliated with, sponsored by, or endorsed by W. Edwards Deming. If you are W. Edwards Deming and would like anything adjusted or removed, email hello@dataworkers.io and we will respond promptly.

Related Posts