# online-ml/river

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/online-ml-river).**

5,853 stars · 635 forks · Python · BSD-3-Clause

## Links

- GitHub: https://github.com/online-ml/river
- Homepage: https://riverml.xyz
- awesome-repositories: https://awesome-repositories.com/repository/online-ml-river.md

## Description

River is a Python framework for online machine learning, designed to train and evaluate models on streaming data. It enables incremental learning by updating model parameters one observation at a time, eliminating the need to store full training datasets in memory.

The library distinguishes itself through a dedicated concept drift detection system that monitors changes in data distributions to trigger model adaptation. It also provides a progressive validation framework that simulates real-time deployment by testing models on samples before using them for training.

The system covers a broad range of streaming capabilities, including real-time feature engineering, time series forecasting, and online anomaly detection. It supports unsupervised learning via incremental clustering and decision trees, as well as ensemble aggregation and bandit policies for model selection. 

The project includes utilities for streaming data ingestion from sources such as CSV files and APIs, as well as tools for computing running statistics and memory-efficient data sketches.

## Tags

### Artificial Intelligence & ML

- [Incremental Model Training](https://awesome-repositories.com/f/artificial-intelligence-ml/incremental-model-training.md) — Implements incremental learning by updating model parameters one observation at a time to eliminate the need for storing full training datasets. ([source](https://cdn.jsdelivr.net/gh/online-ml/river@main/README.md))
- [Online Learning](https://awesome-repositories.com/f/artificial-intelligence-ml/online-learning.md) — Provides a framework for training and updating machine learning models on streaming data one observation at a time.
- [Incremental Model Updating](https://awesome-repositories.com/f/artificial-intelligence-ml/incremental-updates/incremental-model-updating.md) — Updates model weights one observation at a time to enable learning without storing full datasets in memory.
- [Incremental Training](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/training-frameworks/model-training-pipelines/incremental-training.md) — Fits models to data streams by processing observations one by one to eliminate the need for full dataset storage. ([source](https://riverml.xyz/latest/examples/batch-to-online/))
- [Model Performance Evaluators](https://awesome-repositories.com/f/artificial-intelligence-ml/model-performance-evaluators.md) — Performs progressive validation to test model performance on a stream by comparing predictions against ground truth labels. ([source](https://riverml.xyz/latest/api/overview/))
- [Model Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/model-pipelines.md) — Merges multiple preprocessing and modeling steps into a single sequence to process data streams. ([source](https://riverml.xyz/latest/api/overview/))
- [Progressive Stream Validators](https://awesome-repositories.com/f/artificial-intelligence-ml/model-training/validation-evaluators/progressive-stream-validators.md) — Simulates real-time deployment by testing each observation before using it for model training.
- [Stochastic Gradient Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/rmsprop-optimizers/optimizer-selections/stochastic-gradient-optimizers.md) — Updates model weights using stochastic optimization algorithms such as Adam, SGD, and RMSProp. ([source](https://riverml.xyz/latest/api/overview/))
- [Streaming Feature Engineering](https://awesome-repositories.com/f/artificial-intelligence-ml/streaming-feature-engineering.md) — Converts raw streaming data into meaningful representations using categorical encoding and interaction analysis. ([source](https://riverml.xyz/latest/api/overview/))
- [Time Series Forecasting](https://awesome-repositories.com/f/artificial-intelligence-ml/time-series-forecasting.md) — Predicts future values in sequential data streams by learning from historical observations in real time. ([source](https://cdn.jsdelivr.net/gh/online-ml/river@main/README.md))
- [Anomaly Detection](https://awesome-repositories.com/f/artificial-intelligence-ml/anomaly-detection.md) — Provides algorithms and tools for identifying outliers or unusual patterns in data streams. ([source](https://riverml.xyz/latest/api/overview/))
- [Filter-Based Feature Selection](https://awesome-repositories.com/f/artificial-intelligence-ml/automated-feature-selection-tools/filter-based-feature-selection.md) — Implements filter-based feature selection using statistical tests and variance thresholds to improve model efficiency on streaming data. ([source](https://riverml.xyz/latest/api/overview/))
- [Streaming Bandit Selectors](https://awesome-repositories.com/f/artificial-intelligence-ml/custom-model-training/reward-modeling/multi-armed-bandit-modeling/streaming-bandit-selectors.md) — Implements bandit policies to balance exploration and exploitation for selecting the best model or action in streaming scenarios. ([source](https://riverml.xyz/latest/api/overview/))
- [Imbalanced Stream Resampling](https://awesome-repositories.com/f/artificial-intelligence-ml/dataset-class-balancing/imbalanced-stream-resampling.md) — Adjusts class distributions of incoming streaming data through resampling to prevent bias in imbalanced datasets. ([source](https://riverml.xyz/latest/api/overview/))
- [Incremental Decision Tree Learners](https://awesome-repositories.com/f/artificial-intelligence-ml/decision-trees/incremental-decision-tree-learners.md) — Grows decision trees incrementally using the Hoeffding bound to ensure statistical significance of splits.
- [Ensemble Learning](https://awesome-repositories.com/f/artificial-intelligence-ml/ensemble-learning.md) — Implements ensemble methods like bagging and boosting to improve predictive performance and robustness on streaming data.
- [Model Ensembling](https://awesome-repositories.com/f/artificial-intelligence-ml/model-optimization/compression-techniques/model-ensembling.md) — Combines multiple incremental models through bagging, boosting, or stacking to improve overall predictive performance. ([source](https://riverml.xyz/latest/api/overview/))
- [Stochastic Gradient Descent](https://awesome-repositories.com/f/artificial-intelligence-ml/stochastic-gradient-descent.md) — Optimizes model parameters using iterative updates based on the gradient of individual samples.
- [Mini-Batch Processing](https://awesome-repositories.com/f/artificial-intelligence-ml/training-dataset-processing/mini-batch-processing.md) — Handles small groups of observations using data frames to balance batch efficiency with online learning requirements. ([source](https://riverml.xyz/latest/introduction/installation/))

### Part of an Awesome List

- [Drift Detection](https://awesome-repositories.com/f/awesome-lists/ai/drift-detection.md) — Identifies changes in the underlying distribution of a data stream to trigger model retraining or alerts. ([source](https://cdn.jsdelivr.net/gh/online-ml/river@main/README.md))
- [Input Distribution Drift](https://awesome-repositories.com/f/awesome-lists/ai/drift-detection/input-distribution-drift.md) — Includes a dedicated system for monitoring input distribution drift to trigger model adaptation or alerts.
- [General Machine Learning](https://awesome-repositories.com/f/awesome-lists/ai/general-machine-learning.md) — Framework for general-purpose online machine learning.
- [Incremental Learning](https://awesome-repositories.com/f/awesome-lists/ai/incremental-learning.md) — Online machine learning.
- [Online Machine Learning](https://awesome-repositories.com/f/awesome-lists/ai/online-machine-learning.md) — Library for online machine learning.

### Data & Databases

- [Streaming Preprocessing Pipelines](https://awesome-repositories.com/f/data-databases/data-preprocessing-pipelines/streaming-preprocessing-pipelines.md) — Chains preprocessing and estimation steps into sequential workflows for transforming raw streaming features.
- [Anomaly Detection](https://awesome-repositories.com/f/data-databases/anomaly-detection.md) — Identifies unusual observations in live data streams by scoring samples based on evolving distributions.
- [Feature Scaling and Transformations](https://awesome-repositories.com/f/data-databases/large-scale-data-computation/large-scale-feature-transformations/feature-scaling-and-transformations.md) — Scales numeric values and encodes categories in real time to ensure data compatibility with algorithms. ([source](https://riverml.xyz/latest/api/overview/))
- [Streaming Data Sketches](https://awesome-repositories.com/f/data-databases/probabilistic-data-structures/probabilistic-sketch-aggregators/streaming-data-sketches.md) — Implements memory-efficient probabilistic structures to track statistics of high-volume data streams.
- [Personalized Ranking Optimizers](https://awesome-repositories.com/f/data-databases/ranking-engines/personalized-ranking-optimizers.md) — Provides algorithms for maximizing the posterior probability of user preferences to sort items for recommendation tasks. ([source](https://riverml.xyz/latest/api/overview/))

### Education & Learning Resources

- [Distribution Monitoring](https://awesome-repositories.com/f/education-learning-resources/sliding-window-algorithms/distribution-monitoring.md) — Detects concept drift by comparing data distributions between recent and historical sliding windows.

### Scientific & Mathematical Computing

- [Online Running Statistics](https://awesome-repositories.com/f/scientific-mathematical-computing/numerical-mathematical-foundations/statistics-probability/statistical-analysis-libraries/statistical-metric-calculators/online-running-statistics.md) — Offers utilities for calculating running statistics and memory-efficient data sketches under fixed memory constraints.
- [Streaming Data Sketches](https://awesome-repositories.com/f/scientific-mathematical-computing/streaming-data-sketches.md) — Maintains memory-efficient sketches of data streams to track unique elements, heavy hitters, and histograms. ([source](https://riverml.xyz/latest/api/overview/))

### Testing & Quality Assurance

- [Progressive Model Evaluation](https://awesome-repositories.com/f/testing-quality-assurance/model-accuracy-evaluators/progressive-model-evaluation.md) — Tests model accuracy on a stream using progressive validation to simulate real-world deployment.

### Graphics & Multimedia

- [Streaming Data Clustering](https://awesome-repositories.com/f/graphics-multimedia/point-cloud-clustering/map-point-clustering/streaming-data-clustering.md) — Groups incoming observations into clusters incrementally without storing the full dataset. ([source](https://riverml.xyz/latest/api/overview/))
