guide5 min read

Business Definitions For Ai Agents

Business Definitions For Ai Agents

Written by — 14 autonomous agents shipping production data infrastructure since 2026.

Technically reviewed by the Data Workers engineering team.

Last updated .

AI agents need explicit business definitions because terms like revenue, churn, and active user have multiple valid meanings inside one company. Without a glossary mapping each term to an owned SQL template, agents pick a definition, produce a number, and nobody can tell why it does not match the dashboard.

The single biggest source of wrong answers from data agents is ambiguous definitions. A finance team and a product team can both be right about revenue and still disagree by 12 percent. An agent with no glossary will pick whichever definition it saw first and confidently serve both teams the wrong number. This guide shows how to structure definitions so agents produce consistent answers. Related: fiscal vs calendar quarter for AI agents and AI for data infrastructure.

Why Definitions Matter More Than Models

Teams spend months comparing LLMs and almost no time on definitions. In practice the wrong definition hurts accuracy more than any model upgrade. A GPT-4 agent with a bad definition underperforms a GPT-3.5 agent with a good one. Definitions are the easier and more impactful lever, and they are almost always neglected.

The fix is a business glossary — a canonical list of terms, each with a definition, an owner, a SQL template, and a version history. The glossary is code, not Confluence. It ships in the same repo as your dbt models and gets tested the same way.

Anatomy of a Good Definition

Every business definition needs six fields. Without all six, agents will still misuse the term.

  • Term — the natural-language name (e.g., monthly recurring revenue)
  • Owner — the human accountable for the definition
  • SQL template — parameterized query producing the canonical number
  • Inputs — the source tables and columns the template depends on
  • Assumptions — timezone, fiscal calendar, refund treatment
  • Version — monotonically increasing, with changelog

Ambiguity Patterns You Will Hit

Every data team discovers the same ambiguity patterns the hard way. Revenue can be gross or net, booked or recognized, invoiced or collected. Churn can be customer count or revenue, annual or monthly, voluntary or involuntary. Active user can mean logged in, performed a key action, or showed up in product analytics. The glossary has to enumerate these variants and give each one its own entry.

The point of enumerating variants is not to pick winners — it is to let agents pick the right variant for the asker. A finance team asking about churn wants revenue churn on a cohorted basis. A product team asking about churn wants user churn on a rolling 30-day basis. Both are correct and both should resolve cleanly from the glossary.

How an Agent Loads Definitions

At query time the agent parses the user question, extracts business terms, and looks them up in the glossary. Each matched term expands into its SQL template, which gets inlined into the generated query. If two templates are plausible (finance vs product revenue), the agent asks the user to pick or defaults to the one owned by the asker's team.

This pattern makes the agent feel smarter because it asks the right clarifying question. A user asking about churn gets back: do you mean revenue churn or user churn? Without a glossary the agent just guesses, and the user does not know to question the answer.

Ownership and Drift

Definitions drift. Finance updates its revenue recognition rules every few quarters; product redefines active user after every relaunch. If the glossary does not have a named owner and a review cadence, it goes stale fast. Data Workers treats every definition as an owned artifact with a version history, so drift becomes visible and fixable.

Testing Definitions

Every definition gets a test: run the SQL template against known inputs and verify the output matches an expected number. When someone updates the template, the test catches regressions before the new definition reaches an agent. This is standard dbt practice applied to the glossary layer.

Common Mistakes

The biggest mistake is letting definitions live in Confluence instead of code. The second is not having a single owner per term. The third is not testing the SQL templates. The fourth is treating the glossary as a one-time project instead of a living artifact.

Data Workers ships a glossary agent that builds, versions, tests, and serves business definitions to every downstream text-to-SQL or insights agent. To see it running against your warehouse, book a demo.

Rolling Out a Glossary Without Revolt

Introducing a glossary to a team that has always defined terms ad-hoc is politically sensitive. People have working definitions that already work for them, and formalizing those definitions feels like bureaucracy. The rollout pattern that works is to start with the definitions that are already causing disputes and codify those first. Everyone benefits from resolving the dispute, so the glossary arrives as a solution rather than an imposition.

The next expansion is to the definitions that agents currently get wrong. Users have already been frustrated by inconsistent answers, so an official definition that fixes the inconsistency is welcomed. Over a few months the glossary grows to cover the most impactful terms, and the culture shifts from ad-hoc to explicit.

The final step is enforcement. Once the glossary exists, new dbt models that implement glossary terms must reference the entry. Pull request reviewers check this. Over time every term in the warehouse has a canonical definition and agents stop producing contradictory numbers.

Scaling to Hundreds of Definitions

A glossary with ten definitions is easy. A glossary with a thousand is a different beast. Scaling requires structure: definitions are organized by domain, tagged by scope, cross-referenced when they relate, and searched through natural language. Without structure, discovery becomes impossible and users give up.

Structure also enables governance. Who owns this definition. Who approved the last change. Which downstream agents depend on it. A mature glossary answers all of these through its own metadata, and each definition becomes an auditable artifact with a full history. That audit trail is what makes the glossary safe to rely on for finance and compliance use cases.

The scaling payoff is consistency across the organization. Every team uses the same definitions because the same definitions are served to every agent. The old problem of finance and product disagreeing about churn disappears because both teams query agents that share the same glossary. That cultural shift takes quarters to produce but is worth it.

Business definitions are the most leveraged fix for AI agent accuracy. Put them in code, give each one an owner, test them like you test dbt models, and your agents stop producing numbers that do not match the dashboards.

See Data Workers in action

15 autonomous AI agents working across your entire data stack. MCP-native, open-source, deployed in minutes.

Book a Demo

Related Resources

Explore Topic Clusters