Data Dictionary Example: A Real-World Template You Can Copy
Data Dictionary Example: A Real-World Template You Can Copy Today
A data dictionary example shows how to document the tables, columns, data types, business definitions, and ownership for a real dataset. The best example uses a concrete business scenario — like an ecommerce 'orders' table — with every column defined in plain English plus technical metadata.
This guide provides a full working data dictionary example for an ecommerce stack, plus the template you can adapt for your own warehouse. It is filled in with realistic values so you can see what good looks like, not a blank skeleton you still have to figure out.
Unlike generic blank templates, the example here is filled in with realistic values so you can see what good looks like. We also cover automation — how to generate data dictionaries from catalog metadata instead of maintaining them by hand.
What Goes in a Data Dictionary
Every data dictionary row should contain these columns:
- •Table name — Full qualified name including database and schema
- •Column name — As it exists in the warehouse
- •Data type — Snowflake/BigQuery/Postgres native type
- •Nullable — Can this column be NULL?
- •Description — Business definition in plain English
- •Example value — A realistic sample
- •Source — Upstream system that populates this column
- •PII classification — None / PII / SPI / Restricted
- •Owner — Person or team accountable
- •Last updated — When the definition was last reviewed
Example: Ecommerce Orders Table Data Dictionary
| Column | Type | Description | PII |
|---|---|---|---|
| order_id | VARCHAR(36) | Unique order identifier, UUID v4 | None |
| customer_id | VARCHAR(36) | FK to customers.id, identifies the buyer | Indirect PII |
| order_placed_at | TIMESTAMP_TZ | When the customer clicked 'Place Order' in UTC | None |
| order_status | VARCHAR(20) | One of: pending, paid, shipped, delivered, cancelled, refunded | None |
| total_amount_usd | NUMERIC(10,2) | Final order amount after discounts, before tax, in USD | None |
| shipping_address_id | VARCHAR(36) | FK to addresses.id with delivery address | PII |
| payment_method | VARCHAR(20) | Payment type: credit_card, paypal, apple_pay, google_pay | None |
| discount_code | VARCHAR(50) | Promo code applied, NULL if none | None |
| source_channel | VARCHAR(30) | Acquisition channel: organic, paid_social, email, affiliate | None |
| created_at | TIMESTAMP_TZ | Row creation time in warehouse | None |
| updated_at | TIMESTAMP_TZ | Last mutation of the row | None |
Example: Customers Table Data Dictionary
| Column | Type | Description | PII |
|---|---|---|---|
| customer_id | VARCHAR(36) | Unique customer identifier | Indirect PII |
| VARCHAR(320) | Customer's primary email, used for login | PII | |
| first_name | VARCHAR(100) | Legal first name | PII |
| last_name | VARCHAR(100) | Legal last name | PII |
| phone_number | VARCHAR(20) | E.164 format, for shipping notifications | PII |
| date_of_birth | DATE | For age verification; NULL if not collected | SPI |
| country_code | CHAR(2) | ISO 3166-1 alpha-2 country code | None |
| signup_date | TIMESTAMP_TZ | When the account was created | None |
| email_consent | BOOLEAN | Marketing email opt-in status, governed by GDPR | None |
| lifetime_value_usd | NUMERIC(10,2) | Computed total spend across all orders | None |
How to Automate Data Dictionary Generation
Manual data dictionaries go stale within weeks. Modern governance programs generate dictionaries automatically from catalog metadata. Data Workers does this via the catalog agent: it ingests warehouse metadata, enriches it with LLM-generated descriptions, human-approves the descriptions, and publishes a living data dictionary that updates continuously.
The workflow: ingest from Snowflake/BigQuery → classify PII automatically → draft descriptions with an LLM → route to data stewards for approval → publish to the catalog → expose as MCP tools so AI agents can query the dictionary directly. Read the data dictionary best practices guide for more.
Common Data Dictionary Mistakes
- •Documenting only technical metadata, not business definitions
- •Writing generic descriptions like 'customer email' instead of 'primary login email, enforced unique'
- •Forgetting PII classification
- •Creating a dictionary in Excel and never updating it
- •Not tying dictionary entries to ownership
- •Treating the dictionary as documentation rather than a runtime asset
A great data dictionary example is the fastest way to show your team what good looks like. Copy the orders and customers templates above, adapt them to your warehouse, and automate the generation so the dictionary stays current. Book a demo to see how Data Workers generates and maintains living data dictionaries automatically.
Further Reading
See Data Workers in action
15 autonomous AI agents working across your entire data stack. MCP-native, open-source, deployed in minutes.
Book a DemoRelated Resources
- Data Dictionary Best Practices: 10 Rules Teams Actually Follow — Ten operational rules for building a data dictionary that survives contact with real teams, plus dictionary health metrics.
- What Is Data Governance With Example: A Practical Guide — Real-world data governance examples from healthcare PHI, banking BCBS 239, and ecommerce GDPR with shared design principles.
- Data Catalog vs Data Dictionary: Key Differences Explained — How modern data catalogs evolved beyond static data dictionaries to include automated ingestion, lineage, and active metadata.
- Why AI Agents Need MCP Servers for Data Engineering — MCP servers give AI agents structured access to your data tools — Snowflake, BigQuery, dbt, Airflow, and more. Here is why MCP is the int…
- The Complete Guide to Agentic Data Engineering with MCP — Agentic data engineering replaces manual pipeline management with autonomous AI agents. Here is how to implement it with MCP — without lo…
- RBAC for Data Engineering Teams: Why Manual Access Control Doesn't Scale — Manual RBAC breaks down at 50+ data assets. Policy drift, orphaned permissions, and PII exposure become inevitable. AI agents enforce gov…
- From Alert to Resolution in Minutes: How AI Agents Debug Data Pipeline Incidents — The average data pipeline incident takes 4-8 hours to resolve. AI agents that understand your full data context can auto-diagnose and res…
- Build Data Pipelines with AI: From Description to Deployment in Minutes — Building a data pipeline still takes 2-6 weeks of engineering time. AI agents that understand your data context can generate, test, and d…
- Why Your Data Catalog Is Always Out of Date (And How AI Agents Fix It) — 40-60% of data catalog entries are outdated at any given time. AI agents that continuously scan, classify, and update metadata make the s…
- Data Migration Automation: How AI Agents Reduce 18-Month Timelines to Weeks — Enterprise data migrations take 6-18 months because schema mapping, data validation, and downtime coordination are manual. AI agents comp…
- Stop Building Data Connectors: How AI Agents Auto-Generate Integrations — Data teams spend 20-30% of their time maintaining connectors. AI agents that auto-generate and self-heal integrations eliminate this main…
- Why One AI Agent Isn't Enough: Coordinating Agent Swarms Across Your Data Stack — A single AI agent can handle one domain. But data engineering spans 10+ domains — quality, governance, pipelines, schema, streaming, cost…
Explore Topic Clusters
- Data Governance: The Complete Guide — Policies, access controls, PII, and compliance at scale.
- Data Catalog: The Complete Guide — Discovery, metadata, lineage, and the modern catalog stack.
- Data Lineage: The Complete Guide — Column-level lineage, impact analysis, and observability.
- Data Quality: The Complete Guide — Tests, SLAs, anomaly detection, and data reliability engineering.
- AI Data Engineering: The Complete Guide — LLMs, agents, and autonomous workflows across the data stack.
- MCP for Data: The Complete Guide — Model Context Protocol servers, tools, and agent integration.
- Data Mesh & Data Fabric: The Complete Guide — Federated ownership, domain-oriented architecture, and interop.
- Open-Source Data Stack: The Complete Guide — dbt, Airflow, Iceberg, DuckDB, and the modern OSS toolkit.