What Doug Turnbull's Judgment-First Search Method Taught Our Search Agent
Build the evaluation layer before touching the retrieval stack — the disciplined approach that separates search teams that improve from those that spin
By The Data Workers Team
Doug Turnbull has spent more than a decade making search engines more honest. He co-authored Relevant Search and AI Powered Search, built Quepid (a widely-used open-source relevance evaluation tool), created the Elasticsearch Learning to Rank plugin, and led search at Reddit and Shopify. His blog at softwaredoug.com is one of the most practically useful bodies of work in the field.
The method that runs through all of it is something Turnbull calls judgment-first: before you reach for a fancier retrieval component, establish a graded evaluation framework and measure what you have. Without that baseline, every tuning decision is a guess dressed up as engineering.
What Is Actually Worth Learning
Turnbull's writing distills to four coupled ideas. First: relevance is a verb, not a noun — 'relevance is about deciding whether or not to verb the item.' The same asset can be the right answer for one query and wrong for another depending on where the user is in their decision process.
Second: the safety net grants permission to experiment. 'A judgment list gives your team permission to iterate quickly on relevance improvements.' Without graded baselines, engineers hesitate — every change risks silently breaking something else.
Third: no model is correct, but some models are useful. Every evaluation system gives a different view. Use them as one lens each, not ground truth.
Fourth: evaluation before retrieval complexity. 'The fanciest solutions don't matter as much as getting a good evaluation framework setup to evaluate the quality of search results.'
How a Method Becomes a Skill
When a query starts returning wrong results, the agent does not immediately reach for a retrieval change. It first surfaces a representative sample of queries, builds graded relevance scores for the current results, and establishes an NDCG baseline. Every subsequent ranking change is measured against that baseline before it ships.
Two tools wire this into the agent's actual capabilities. The agentic_search tool handles candidate retrieval and re-scoring. The reconcile_definitions tool handles a problem specific to data warehouse search: you cannot grade relevance for a query like 'monthly revenue' if the top two candidate assets define revenue differently.
One of More Than 400
The Data Workers agent swarm has more than 400 method-named skills across 19 agents. The judgment-first-search skill earns its place because it is the one most commonly skipped. Teams add vector search and rerankers before they have judgment lists. The skill tries to enforce the order: baseline first, complexity second.
A note on this post: This is independent commentary and homage. It distills publicly available writing and talks by Doug Turnbull to illustrate a working method, and every quote is drawn from and verified against the primary sources linked above. The skill it describes is named for the method, not the person, and contains no marketing claims attributed to them. Data Workers is not affiliated with, sponsored by, or endorsed by Doug Turnbull. If you are Doug Turnbull and would like anything adjusted or removed, email hello@dataworkers.io and we will respond promptly.
Related Posts
What Ralph Kimball's Dimensional Modeling Taught Our Pipelines Agent
Ralph Kimball's four-step dimensional design process is one of the most durable ideas in data engineering — here is what it taught our pipelines agent.
What Jay Kreps's Log-Centric Architecture Taught Our Streaming Agent
Jay Kreps's core insight is deceptively simple: an append-only, totally-ordered log is not just a message bus — it is the single source of truth that eliminates N² integration pipelines and makes reprocessing routine. We studied his published writing and built a reusable streaming skill around the method.
What W. Edwards Deming's Plan-Do-Study-Act Taught Our Data Quality Agent
W. Edwards Deming spent a career arguing that quality comes from improving the process, not inspecting for defects. His Plan-Do-Study-Act cycle is the most rigorous improvement loop in the field. Here is how we encoded it into our data quality agent.