EngineeringJune 7, 20268 min read

What Charity Majors's Wide-Event Method Taught Our Observability Agent

Name: Dataworkers
Availability: OnlineOnly
Author: Dataworkers

How one of observability's sharpest thinkers reshaped the way our dw-observability agent chases unknown-unknowns in production.

By The Data Workers Team

If you have spent any time debugging a distributed system in production, you have probably encountered the specific misery of staring at a dashboard and realizing it tells you nothing useful. The alert fired. The metrics are elevated. But the question you actually need to answer — why did this particular request, from this particular user, at this particular moment, fail in this particular way — is unanswerable from the information you have. You are missing context. And the context is gone forever because it was aggregated out at write time.

Charity Majors, co-founder and CTO of Honeycomb, has spent the better part of a decade writing and speaking about why this happens and what to do instead. She is the co-author of Observability Engineering (O'Reilly, 2022) and the force behind a body of writing on her blog charity.wtf that has shifted how a generation of engineers think about production systems. Her method — emit arbitrarily-wide structured events, preserve raw data, ask questions from outside — is one of the clearest articulations of a working debugging practice we have encountered.

We built a skill for our dw-observability agent around it. This post explains what the method actually says, where it gets interesting, and how it maps to an agent that has to debug things it was never told to expect.

The Problem Is Unknown-Unknowns, Not Monitoring Gaps

The distinction Majors draws most consistently is between known-unknowns and unknown-unknowns. Monitoring — dashboards, alerts, threshold checks — is built for known-unknowns: the failure modes you predicted in advance. It works well for those. But most production failures are not the failures you predicted. They are novel states the system arrived at through a combination of factors nobody modeled.

Her formulation from Observability: The 5-Year Retrospective is precise: "Observability lets you find answers to application issues that are unknown-unknowns. You have observability if you can ask any question of your system from the outside, to understand any state the system has gotten into, no matter how bizarre or novel, without shipping any new custom code to get answers."

This is a demanding definition. It rules out anything that requires you to know the question in advance. It rules out dashboards that only show what you instrumented for. It makes the test simple: can you ask a new question right now, without a deploy, and get an answer? If not, you do not have observability — you have monitoring.

The Architecture That Makes It Possible

The technical argument follows from the definition. If you need to ask arbitrary new questions, you need raw data — unfiltered, unschematized, unaggregated. And that means a specific data structure: one arbitrarily-wide structured event per request per service hop, capturing everything you know about that request at the time it happened.

From Live Your Best Life With Structured Events (charity.wtf, 2022): "if you aren't rolling out a solution based on arbitrarily wide, structured raw events that are unique and ordered and trace-aware and without any aggregation at write time, you are going to regret it."

The width matters because questions you have not thought of yet require dimensions you have not indexed yet. "The maturely instrumented datasets that we see are often 200-500 dimensions wide," Majors notes in the same post. Events "should collect context like sticky buns collect dust" — metadata, timing, database calls, infrastructure details, unique identifiers, application variables, all in one record.

And the aggregation prohibition is non-negotiable: "Aggregation is a one-way trip. You can always derive your pretty metrics…and you can never go in reverse." Aggregate at write time and you permanently destroy the connective tissue between events that makes novel questions answerable.

Three Principles That Do the Real Work

•Emit wide, store raw, slice later. One structured event per action, capturing full context. No pre-aggregation. The cost of width is nearly zero; the cost of missing a dimension when you need it is the entire incident.
•The test is unknown-unknowns. If you can only answer questions you predicted in advance, you have monitoring, not observability. The right question to ask of any telemetry system: can I ask something new right now, without shipping code?
•Observability-driven development, not test-driven. From LLMs Demand Observability-Driven Development (charity.wtf, 2023): 'the only way to write good software at scale is by looping in production via observability — not by test-driven development, but observability-driven development.' Instrument as you write. Deploy. Observe. Iterate. Deploying to production is the beginning of understanding, not the end of it.

How a Method Becomes a Skill

The dw-observability agent's job is to understand what is happening inside the Data Workers agent swarm — health, drift, quality regressions, silent failures — without being told in advance what to look for. That is exactly the unknown-unknowns problem Majors describes.

The skill we authored, wide-event-driven-debugging, encodes the method in nine steps. The key moves: start by confirming the failure is novel (not a known alert). Pull the raw audit trail via get_audit_trail before doing anything else — raw preservation is the prerequisite. Then ask the first question from outside: slice on a suspected dimension without writing new code. If you cannot form that slice because the dimension was not captured, that is the instrumentation gap to file, not a finding.

Steps 4 through 6 separate the three failure modes Majors's method implies: input drift (the questions changed), output drift (same inputs, different behavior), and operational drift (latency moved without behavior changing). Each points to a different cause. The agent uses detect_drift, get_agent_metrics, and get_evaluation_report in sequence to triangulate.

The final step — naming the root cause only when the evidence converges — is the discipline the method enforces. The temptation in an incident is to name a cause as soon as one hypothesis fits. Majors's method holds you to iteration: every answer should generate the next question until only one explanation remains, and you can point to the raw records that prove it.

One of More Than 400

Wide-event-driven-debugging is one of more than 400 method-named skills across 19 agents in the Data Workers swarm. Each skill is named for the method, not the person. Each one cites its sources and carries a provenance block with a non-affiliation disclaimer. The goal is to encode working practices precisely enough that an agent can follow them on its own, and honestly enough that the humans whose methods we distilled would recognize what we did with their ideas.

A note on this post: This is independent commentary and homage. It distills publicly available writing and talks by Charity Majors to illustrate a working method, and every quote is drawn from and verified against the primary sources linked above. The skill it describes is named for the method, not the person, and contains no marketing claims attributed to them. Data Workers is not affiliated with, sponsored by, or endorsed by Charity Majors. If you are Charity Majors and would like anything adjusted or removed, email hello@dataworkers.io and we will respond promptly.

EngineeringJune 7, 2026

What Ralph Kimball's Dimensional Modeling Taught Our Pipelines Agent

Ralph Kimball's four-step dimensional design process is one of the most durable ideas in data engineering — here is what it taught our pipelines agent.

EngineeringJune 7, 2026

What Jay Kreps's Log-Centric Architecture Taught Our Streaming Agent

Jay Kreps's core insight is deceptively simple: an append-only, totally-ordered log is not just a message bus — it is the single source of truth that eliminates N² integration pipelines and makes reprocessing routine. We studied his published writing and built a reusable streaming skill around the method.

EngineeringJune 7, 2026

What W. Edwards Deming's Plan-Do-Study-Act Taught Our Data Quality Agent

W. Edwards Deming spent a career arguing that quality comes from improving the process, not inspecting for defects. His Plan-Do-Study-Act cycle is the most rigorous improvement loop in the field. Here is how we encoded it into our data quality agent.

The Problem Is Unknown-Unknowns, Not Monitoring Gaps

The Architecture That Makes It Possible

Three Principles That Do the Real Work

How a Method Becomes a Skill

One of More Than 400

Related Posts

What Ralph Kimball's Dimensional Modeling Taught Our Pipelines Agent

What Jay Kreps's Log-Centric Architecture Taught Our Streaming Agent

What W. Edwards Deming's Plan-Do-Study-Act Taught Our Data Quality Agent