glossary4 min read

What Is a Feature Store? Offline + Online ML Features

What Is a Feature Store? Offline + Online ML Features

Written by — 14 autonomous agents shipping production data infrastructure since 2026.

Technically reviewed by the Data Workers engineering team.

Last updated .

A feature store is a centralized system for storing, serving, and managing machine learning features — engineered attributes used to train and serve models. It bridges the offline world (batch training data) and the online world (low-latency feature serving at inference time), ensuring both paths produce identical features from identical logic.

Feature stores emerged to solve the training-serving skew problem in production ML. This guide walks through what a feature store does, the offline + online architecture, and the major tools in the category.

Uber's Michelangelo platform popularized the feature store concept around 2017, followed quickly by internal platforms at Airbnb, Netflix, Twitter, and others. By 2020 the category had consolidated around a few open-source and commercial options, and feature stores are now considered standard infrastructure for any team running more than a handful of production models. The productivity gain comes from the shared feature catalog — new models can reuse existing features instead of reinventing them, shrinking time-to-production by weeks.

The Training-Serving Skew Problem

ML models learn from features computed in offline pipelines (Spark, dbt, custom Python). At inference time, the same features must be computed in a low-latency serving environment — often in a different language, framework, and team. If the two implementations drift, the model sees different data at serving time than at training time, and accuracy drops silently.

PathEnvironmentTypical Latency
TrainingBatch Spark or SQLHours to days
InferenceOnline servingUnder 100 ms
Feature store (offline)Batch jobsMatches training
Feature store (online)Low-latency KV storeMatches inference

How a Feature Store Works

Features are defined once as code (SQL or Python) and materialized to both an offline store (warehouse or lakehouse) and an online store (Redis, DynamoDB, ScyllaDB). Training jobs read from offline; serving reads from online. Both paths use the same definition, so skew is impossible by construction.

The point-in-time join is the subtle innovation. When you build a training dataset for a model that predicts churn on day T, every feature must reflect the values that were known at day T — not today's values, not tomorrow's values. Without point-in-time joins, training data leaks future information, inflating offline accuracy and destroying production performance. Feature stores automate point-in-time joins across thousands of features, which is a significant engineering task to do correctly by hand.

Feature Store Components

  • Feature registry — catalog of all features
  • Offline store — batch features for training
  • Online store — low-latency features for serving
  • Materialization — compute + write to both stores
  • Point-in-time join — training data without leakage

Feature Store Tools

The category includes Feast (open source), Tecton (commercial, Feast co-founders), Databricks Feature Store (lakehouse-native), Google Vertex AI Feature Store (GCP-native), and AWS SageMaker Feature Store. Feast is the most popular open source option; Databricks and Tecton dominate managed deployments.

The choice depends on where the rest of your ML stack lives. Teams on Databricks almost always use the built-in feature store — no extra integration work required. Teams already running Snowflake plus open-source ML often pick Feast for the flexibility and low cost. Teams with strict latency requirements at serving time lean toward Tecton for its production-grade online store. Greenfield projects at large enterprises with strong support requirements sometimes pick Vertex AI or SageMaker because they are fully managed within the cloud provider they already use.

When You Need a Feature Store

Not every ML team needs a dedicated feature store. Teams with a handful of models can often get away with dbt + a KV cache for online features. Feature stores become essential when you have dozens of models, multiple teams reusing features, or strict latency requirements at serving time.

A practical rule of thumb: if two or more models share the same feature, a feature store starts paying back. If a single team maintains one model end to end, the overhead of standing up a feature store is usually not worth it. The ROI grows with the number of models and teams that share features, because each shared feature is one fewer pipeline to build and maintain separately.

For related reading see data lineage for ml features and data catalog for ml features.

Feature Stores and Governance

Feature stores are also governance tools. Every feature has an owner, a definition, a lineage trace back to raw data, and a quality history. When a feature drifts or a source system changes, the feature store flags affected models automatically. Data Workers ML and governance agents integrate with Feast and Databricks Feature Store for autonomous feature ops.

Book a demo to see feature store automation and lineage in action.

Real-World Examples

A recommendation team at a marketplace uses Feast with Snowflake as the offline store and Redis as the online store, serving 50+ features to the ranking model with sub-10ms latency. A fraud detection team at a fintech uses Tecton to manage 200+ features across 15 models, with point-in-time joins for training and low-latency serving for real-time scoring. A churn prediction team at a SaaS company uses the Databricks Feature Store because they already run on Databricks and did not want a separate tool. All three solve the same problem with tools picked for ecosystem fit.

When You Need It

You need a feature store once you have multiple production models sharing any features, or a single model with strict training-serving consistency requirements. Below that threshold, dbt plus a caching layer often suffices. Above it, feature stores pay back by eliminating duplicated feature pipelines and catching drift early. The tipping point usually arrives around the third or fourth production model.

Common Misconceptions

A feature store is not just a key-value store with fancy marketing. Point-in-time joins, backfill support, online-offline consistency, and feature versioning are all non-trivial engineering challenges. It is also not the same as a data warehouse — feature stores add ML-specific capabilities that warehouses do not have. And feature stores do not replace the data warehouse; they sit alongside it, reading curated tables as inputs.

A feature store centralizes ML feature definitions and materializes them to both offline and online stores, eliminating training-serving skew. Adopt one when you have multiple production models or teams sharing features. Without one, every production ML team eventually reinvents feature store concepts painfully from scratch.

See Data Workers in action

15 autonomous AI agents working across your entire data stack. MCP-native, open-source, deployed in minutes.

Book a Demo

Related Resources

Explore Topic Clusters