guide5 min read

Catalog Agent Business Glossary Build

Catalog Agent Business Glossary Build

Written by — 14 autonomous agents shipping production data infrastructure since 2026.

Technically reviewed by the Data Workers engineering team.

Last updated .

Data Workers' Catalog Agent builds and maintains a business glossary by analyzing existing data assets, query patterns, and organizational terminology — producing standardized term definitions that bridge the gap between technical column names and business concepts. A business glossary is the foundation of data literacy, but building one manually takes months and maintaining it is a full-time job. The Catalog Agent automates both.

This guide covers the Catalog Agent's glossary generation methodology, term relationship mapping, integration with data catalogs and BI tools, and strategies for driving glossary adoption across business and technical teams.

The Business Glossary Gap

Every organization has implicit business terminology that goes undocumented. What exactly is 'revenue' — gross or net? Does 'customer' include trial users? Is 'churn' calculated monthly or annually? These ambiguities cause silent data errors: two dashboards showing different revenue numbers because they use different definitions, or a quarterly report that undercounts customers because it excludes a segment that another report includes.

A business glossary resolves these ambiguities by establishing authoritative definitions for business terms and mapping them to the technical assets (tables, columns, metrics) that implement them. Building this mapping manually requires interviewing stakeholders across the organization, reconciling conflicting definitions, and documenting the agreed-upon terms — a process that typically takes 3-6 months and produces a document that becomes stale immediately.

Glossary ChallengeManual ApproachCatalog Agent Approach
Term discoveryStakeholder interviewsMine terms from queries, dashboards, dbt docs, and Slack
Definition draftingWrite from scratchGenerate from column stats, lineage, and usage patterns
Term-to-asset mappingManually link terms to tablesAutomatic mapping based on naming, lineage, and query analysis
Conflict resolutionMeetings and politicsSurface conflicts with data evidence for stakeholder resolution
MaintenanceQuarterly review meetingsContinuous monitoring for term drift and new terms
AdoptionTraining sessionsEmbed glossary in catalog, BI tools, and SQL editor

Automated Term Discovery

The Catalog Agent discovers business terms from multiple sources. It analyzes dbt model and column descriptions for business terminology. It mines BI dashboard titles, metric names, and filter labels. It scans Slack channels for recurring data-related terminology. It examines SQL query comments and aliases for business context. These sources collectively reveal the vocabulary that the organization actually uses, which may differ significantly from what leadership assumes.

Discovered terms are deduplicated and normalized. The agent identifies synonyms ('revenue' and 'sales'), hierarchies ('gross revenue' is a specialization of 'revenue'), and conflicts (marketing defines 'customer' differently from finance). These relationships are surfaced for stakeholder review, with the agent providing data evidence for each definition variation.

  • dbt source mining — extracts terms from model descriptions, column descriptions, and test configurations
  • BI tool analysis — discovers terms from dashboard titles, metric definitions, and filter labels in Looker, Tableau, and Metabase
  • Query analysis — identifies business terms from SQL aliases, comments, and column naming patterns
  • Documentation mining — extracts terms from existing wikis, data dictionaries, and onboarding materials
  • Stakeholder communication — scans Slack and email for recurring data terminology and definition discussions
  • Industry templates — provides starter glossaries for common industries (fintech, healthcare, e-commerce, SaaS) that accelerate initial setup

Definition Generation and Enrichment

For each discovered term, the Catalog Agent generates a candidate definition based on the data evidence. A term like 'Monthly Active Users' gets a definition derived from how it is actually calculated in production queries: 'Count of distinct user_ids with at least one login event in the trailing 30-day window. Excludes internal staff and bot accounts. Sourced from the events.user_activity table, calculated daily in the analytics.mau_daily model.'

These generated definitions are starting points for stakeholder review, not final answers. The agent presents each definition with the supporting evidence (which queries use the term, which dashboards display it, which dbt models calculate it) so reviewers can quickly confirm or refine the definition. This evidence-first approach replaces the blank-page problem that stalls most glossary initiatives.

Term-to-Asset Mapping

A glossary without asset mapping is just a dictionary. The Catalog Agent automatically maps each business term to the data assets that implement it: the warehouse tables that store the data, the dbt models that transform it, the columns that contain it, the dashboards that display it, and the data quality tests that validate it. This mapping transforms the glossary from a reference document into a navigation tool.

When an analyst searches for 'revenue' in the catalog, the glossary mapping shows them which table to query, which column to use, which definition applies, and which dashboard already answers their question. This reduces duplicate work and ensures consistency — everyone uses the same 'revenue' column because the glossary guides them there.

Glossary Governance

Business terms need owners, just like data tables. The Catalog Agent assigns term ownership based on usage patterns: the team that queries a term most frequently and the stakeholder who most recently updated its definition. Term owners are responsible for approving definition changes and resolving conflicts when different teams use the same term differently.

The agent monitors for glossary drift: new terms that appear in production queries without glossary definitions, existing terms whose usage patterns diverge from their definitions, and deprecated terms that are still referenced in active assets. Drift reports are published weekly to term owners, keeping the glossary current without requiring dedicated maintenance staff.

Driving Adoption

A glossary that nobody uses is worse than no glossary — it creates a false sense of standardization. The Catalog Agent drives adoption by embedding glossary terms in the tools people already use: hovering over a column in the SQL editor shows the linked glossary term, BI dashboards display glossary definitions alongside metrics, and data quality alerts reference the business term to provide context for technical failures.

For teams building comprehensive data governance, the business glossary integrates with auto-documentation for technical descriptions and PII classification for sensitivity labeling. Together, these capabilities transform the data catalog from a technical metadata store into a business-friendly knowledge base. Book a demo to see glossary generation on your data warehouse.

A business glossary bridges the gap between technical data assets and business concepts. The Catalog Agent automates the hardest parts — term discovery, definition generation, and asset mapping — so teams can focus on the stakeholder alignment that only humans can do.

See Data Workers in action

15 autonomous AI agents working across your entire data stack. MCP-native, open-source, deployed in minutes.

Book a Demo

Related Resources

Explore Topic Clusters