The visitor is looking for a centralized data management layer that stores, manages, and serves machine learning features for both training and inference.

feast-dev/feast is the closest match — Feast is a comprehensive, industry-standard feature store that provides the required point-in-time joins, dual-storage architecture, and transformation pipelines needed for centralized machine learning data management.. Other strong matches: gojek/feast, logicalclocks/hopsworks, great-expectations/great_expectations, ydataai/ydata-profiling.

Why does feast-dev/feast match “a feature store for ML features”?

Feast is a comprehensive, industry-standard feature store that provides the required point-in-time joins, dual-storage architecture, and transformation pipelines needed for centralized machine learning data management.

Why does gojek/feast match “a feature store for ML features”?

Feast is a comprehensive feature store that provides the required centralized management, dual-store architecture for online and offline retrieval, and point-in-time join capabilities for machine learning workflows.

Why does logicalclocks/hopsworks match “a feature store for ML features”?

Hopsworks is a comprehensive MLOps platform that includes a dedicated feature store providing both online and offline storage, point-in-time joins, and versioning capabilities to support the full machine learning lifecycle.

Why does great-expectations/great_expectations match “a feature store for ML features”?

This is a data quality and validation framework used to monitor pipeline health, but it lacks the storage, serving, and feature-engineering capabilities required for a centralized machine learning feature store.

Why does ydataai/ydata-profiling match “a feature store for ML features”?

This repository is an automated exploratory data analysis and profiling tool for assessing data quality, rather than a feature store designed to manage, version, and serve machine learning features for training and inference.

Machine Learning Feature Stores

Open-source platforms for managing, storing, and serving consistent data features for machine learning model training.

Find the best repos with AI.We'll search the best matching repositories with AI.

feast-dev/feast
feast-dev/feast
6,727View on GitHub
Feast is an open-source feature store for machine learning that provides a central platform for defining, storing, and serving features across both training and inference workflows. It operates as a declarative system where feature definitions are written as code in Python files, synchronized to a central registry, and made available for low-latency online retrieval or point-in-time correct historical joins for training datasets. The project abstracts storage behind a pluggable architecture, allowing offline and online backends to be swapped without changing retrieval logic, and coordinates materialization pipelines that move batch features from offline stores to online stores using configurable compute engines. Feast distinguishes itself through its multi-protocol serving surface, exposing the same feature values simultaneously via REST, gRPC, and MCP protocols to support diverse client ecosystems including AI agents. It includes an on-demand transformation framework that applies Python-based feature transformations at retrieval time, combining precomputed features with request-time data for flexible serving. The project also provides entity-key collocated storage, storing all features for a single entity in one document to reduce online reads to a single lookup per request, and a background registry cache refresh that prevents serving requests from blocking on cache updates. The platform covers the full lifecycle of feature management, including feature engineering and transformation from batch and streaming sources, governance and access control with application-level RBAC and OIDC authentication, real-time inference serving, and historical feature retrieval for training. It supports vector search and retrieval-augmented generation workflows by storing and querying embeddings for similarity search. Feast integrates with a wide range of storage backends, compute engines, and data sources, and provides tooling for deployment on Kubernetes, monitoring with Prometheus and OpenTelemetry, and lineage tracking with OpenLineage.
Feast is a comprehensive, industry-standard feature store that provides the required point-in-time joins, dual-storage architecture, and transformation pipelines needed for centralized machine learning data management.
PythonBatch Feature MaterializationFeature Retrieval APIsFeature Version Associations
View on GitHub6,727
gojek/feast
gojek/feast
7,095View on GitHub
Feast is a machine learning feature store and MLOps data infrastructure layer. It provides a centralized system for managing and serving features across offline training and online production environments, utilizing an online feature serving layer for low-latency retrieval. The project centers on a feature registry that acts as a central catalog for defining, governing, and discovering feature services. It employs a unified data access layer to decouple feature retrieval from physical storage and includes a point-in-time data generator to create historically accurate training datasets that prevent data leakage. The platform covers a broad range of capabilities including real-time model inference, streaming data feature engineering, and the generation of training datasets. It also supports vector embedding search for similarity-based retrieval and feature quality validation to maintain data integrity.
Feast is a comprehensive feature store that provides the required centralized management, dual-store architecture for online and offline retrieval, and point-in-time join capabilities for machine learning workflows.
PythonTemporal Join AlignmentTemporal Join Generators
View on GitHub7,095
logicalclocks/hopsworks
logicalclocks/hopsworks
1,302View on GitHub
Hopsworks - Data-Intensive AI platform with a Feature Store
Hopsworks is a comprehensive MLOps platform that includes a dedicated feature store providing both online and offline storage, point-in-time joins, and versioning capabilities to support the full machine learning lifecycle.
JavaData Science ToolsDeep Learning FrameworksFeature Stores
View on GitHub1,302
great-expectations/great_expectations
great-expectations/great_expectations
11,558View on GitHub
Great Expectations is a data quality testing framework and observability platform designed to monitor the reliability of data pipelines. It provides a structured environment for defining, documenting, and automating data quality assertions, allowing teams to validate datasets against expected structure and content before they move through downstream processes. The project distinguishes itself through a declarative domain-specific language that stores quality rules as version-controlled configuration files. It utilizes an execution engine abstraction to translate these high-level assertions into native queries for various data processing frameworks, while a rendering engine automatically transforms these rules and validation outcomes into human-readable documentation for stakeholders. The platform supports a broad range of operational capabilities, including the ability to connect to diverse data sources and persist metadata and validation results across distributed environments. It integrates directly into existing orchestration pipelines to automate recurring quality checks, track data health trends over time, and trigger notifications when datasets deviate from established benchmarks.
This is a data quality and validation framework used to monitor pipeline health, but it lacks the storage, serving, and feature-engineering capabilities required for a centralized machine learning feature store.
PythonData Quality Frameworks
View on GitHub11,558
ydataai/ydata-profiling
ydataai/ydata-profiling
13,388View on GitHub
Ydata-profiling is an automated exploratory data analysis framework designed to generate comprehensive statistical reports and visual summaries from dataframes. It functions as a diagnostic tool for assessing data quality, identifying missing values, duplicates, and outliers, while providing a scalable engine for profiling massive datasets across distributed enterprise environments. The project distinguishes itself through its ability to handle large-scale data through distributed task orchestration and lazy stream processing, which minimizes memory overhead during complex computations. It incorporates sensitive data governance by identifying and masking personally identifiable information, ensuring that generated reports remain compliant with security standards. Furthermore, the framework supports dataset drift detection by comparing multiple versions of data collections to pinpoint statistical shifts over time. Beyond its core profiling capabilities, the library offers a modular architecture that allows for schema-driven metadata enrichment and pluggable report rendering. It provides a broad surface for data quality monitoring, including the analysis of temporal trends and the export of metrics into standard formats for integration with other analytical tools.
This repository is an automated exploratory data analysis and profiling tool for assessing data quality, rather than a feature store designed to manage, version, and serve machine learning features for training and inference.
PythonData Quality Frameworks
View on GitHub13,388
featureform/featureform
featureform/featureform
1,979View on GitHub
The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
Featureform acts as a virtual feature store that orchestrates your existing data infrastructure to provide versioned feature management, transformation pipelines, and retrieval for both training and inference.
GoFeature EngineeringFeature StoreFeature Stores
View on GitHub1,979

Machine Learning Feature Stores

feast-dev/feast

gojek/feast

logicalclocks/hopsworks

great-expectations/great_expectations

ydataai/ydata-profiling

featureform/featureform