Data Dictionary Example: A Real-World Template You Can Copy
Data Dictionary Example: A Real-World Template You Can Copy Today
A data dictionary example shows how to document the tables, columns, data types, business definitions, and ownership for a real dataset. The best example uses a concrete business scenario — like an ecommerce 'orders' table — with every column defined in plain English plus technical metadata.
This guide provides a full working data dictionary example for an ecommerce stack, plus the template you can adapt for your own warehouse. It is filled in with realistic values so you can see what good looks like, not a blank skeleton you still have to figure out.
Unlike generic blank templates, the example here is filled in with realistic values so you can see what good looks like. We also cover automation — how to generate data dictionaries from catalog metadata instead of maintaining them by hand.
What Goes in a Data Dictionary
Every data dictionary row should contain these columns:
- •Table name — Full qualified name including database and schema
- •Column name — As it exists in the warehouse
- •Data type — Snowflake/BigQuery/Postgres native type
- •Nullable — Can this column be NULL?
- •Description — Business definition in plain English
- •Example value — A realistic sample
- •Source — Upstream system that populates this column
- •PII classification — None / PII / SPI / Restricted
- •Owner — Person or team accountable
- •Last updated — When the definition was last reviewed
Example: Ecommerce Orders Table Data Dictionary
| Column | Type | Description | PII |
|---|---|---|---|
| order_id | VARCHAR(36) | Unique order identifier, UUID v4 | None |
| customer_id | VARCHAR(36) | FK to customers.id, identifies the buyer | Indirect PII |
| order_placed_at | TIMESTAMP_TZ | When the customer clicked 'Place Order' in UTC | None |
| order_status | VARCHAR(20) | One of: pending, paid, shipped, delivered, cancelled, refunded | None |
| total_amount_usd | NUMERIC(10,2) | Final order amount after discounts, before tax, in USD | None |
| shipping_address_id | VARCHAR(36) | FK to addresses.id with delivery address | PII |
| payment_method | VARCHAR(20) | Payment type: credit_card, paypal, apple_pay, google_pay | None |
| discount_code | VARCHAR(50) | Promo code applied, NULL if none | None |
| source_channel | VARCHAR(30) | Acquisition channel: organic, paid_social, email, affiliate | None |
| created_at | TIMESTAMP_TZ | Row creation time in warehouse | None |
| updated_at | TIMESTAMP_TZ | Last mutation of the row | None |
Example: Customers Table Data Dictionary
| Column | Type | Description | PII |
|---|---|---|---|
| customer_id | VARCHAR(36) | Unique customer identifier | Indirect PII |
| VARCHAR(320) | Customer's primary email, used for login | PII | |
| first_name | VARCHAR(100) | Legal first name | PII |
| last_name | VARCHAR(100) | Legal last name | PII |
| phone_number | VARCHAR(20) | E.164 format, for shipping notifications | PII |
| date_of_birth | DATE | For age verification; NULL if not collected | SPI |
| country_code | CHAR(2) | ISO 3166-1 alpha-2 country code | None |
| signup_date | TIMESTAMP_TZ | When the account was created | None |
| email_consent | BOOLEAN | Marketing email opt-in status, governed by GDPR | None |
| lifetime_value_usd | NUMERIC(10,2) | Computed total spend across all orders | None |
How to Automate Data Dictionary Generation
Manual data dictionaries go stale within weeks. Modern governance programs generate dictionaries automatically from catalog metadata. Data Workers does this via the catalog agent: it ingests warehouse metadata, enriches it with LLM-generated descriptions, human-approves the descriptions, and publishes a living data dictionary that updates continuously.
The workflow: ingest from Snowflake/BigQuery → classify PII automatically → draft descriptions with an LLM → route to data stewards for approval → publish to the catalog → expose as MCP tools so AI agents can query the dictionary directly. Read the data dictionary best practices guide for more.
Common Data Dictionary Mistakes
- •Documenting only technical metadata, not business definitions
- •Writing generic descriptions like 'customer email' instead of 'primary login email, enforced unique'
- •Forgetting PII classification
- •Creating a dictionary in Excel and never updating it
- •Not tying dictionary entries to ownership
- •Treating the dictionary as documentation rather than a runtime asset
A great data dictionary example is the fastest way to show your team what good looks like. Copy the orders and customers templates above, adapt them to your warehouse, and automate the generation so the dictionary stays current. Book a demo to see how Data Workers generates and maintains living data dictionaries automatically.
See Data Workers in action
15 autonomous AI agents working across your entire data stack. MCP-native, open-source, deployed in minutes.
Book a DemoRelated Resources
- Data Dictionary Best Practices: 10 Rules Teams Actually Follow — Ten operational rules for building a data dictionary that survives contact with real teams, plus dictionary health metrics.
- Data Dictionary Local Documentation Claude — Data Dictionary Local Documentation Claude
- Mcp Server Data Dictionary Exposure — Mcp Server Data Dictionary Exposure
- What Is Data Governance With Example: A Practical Guide — Real-world data governance examples from healthcare PHI, banking BCBS 239, and ecommerce GDPR with shared design principles.
- Data Catalog vs Data Dictionary: Key Differences Explained — How modern data catalogs evolved beyond static data dictionaries to include automated ingestion, lineage, and active metadata.
- Best Practices for Claude Code in Data Pipelines — Discover effective practices for optimizing Claude Code in your data pipelines with our detailed listicle format.
- How to Use MCP to Automate Data Workflows — Explore how the Model Context Protocol (MCP) can be used to automate and optimize your data workflows, increasing efficiency and reducing…
- How to Ensure Data Quality in Your MCP Implementations — Explore effective strategies to ensure data quality in your MCP implementations. Learn best practices to maintain accuracy and reliability.
- Why AI Agents Need MCP Servers for Data Engineering — MCP servers give AI agents structured access to your data tools — Snowflake, BigQuery, dbt, Airflow, and more. Here is why MCP is the int…
- The Complete Guide to Agentic Data Engineering with MCP — Agentic data engineering replaces manual pipeline management with autonomous AI agents. Here is how to implement it with MCP — without lo…
- RBAC for Data Engineering Teams: Why Manual Access Control Doesn't Scale — Manual RBAC breaks down at 50+ data assets. Policy drift, orphaned permissions, and PII exposure become inevitable. AI agents enforce gov…
- From Alert to Resolution in Minutes: How AI Agents Debug Data Pipeline Incidents — The average data pipeline incident takes 4-8 hours to resolve. AI agents that understand your full data context can auto-diagnose and res…
Explore Topic Clusters
- Data Governance: The Complete Guide — Policies, access controls, PII, and compliance at scale.
- Data Catalog: The Complete Guide — Discovery, metadata, lineage, and the modern catalog stack.
- Data Lineage: The Complete Guide — Column-level lineage, impact analysis, and observability.
- Data Quality: The Complete Guide — Tests, SLAs, anomaly detection, and data reliability engineering.
- AI Data Engineering: The Complete Guide — LLMs, agents, and autonomous workflows across the data stack.
- MCP for Data: The Complete Guide — Model Context Protocol servers, tools, and agent integration.
- Data Mesh & Data Fabric: The Complete Guide — Federated ownership, domain-oriented architecture, and interop.
- Open-Source Data Stack: The Complete Guide — dbt, Airflow, Iceberg, DuckDB, and the modern OSS toolkit.
- AI for Data Infra — The complete category for AI agents built specifically for data engineering, data governance, and data infrastructure work.