30 open-source projects similar to online-ml/river, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best River alternative.
This project is an educational resource providing practical code examples and implementations of machine learning algorithms using the Python language. It serves as a guide for constructing predictive pipelines, clustering models, and dimensionality reduction within the Scikit-Learn ecosystem. The repository includes comprehensive demonstrations for supervised and unsupervised learning, as well as detailed examples for implementing neural networks and deep architectures. It also provides practical guidance on exporting model parameters to JSON and wrapping trained models in web APIs for produ
sktime is a machine learning framework designed for time series analysis. It provides a unified interface for performing time series forecasting, classification, and anomaly detection, integrating these capabilities into a standardized toolkit compatible with the scikit-learn API. The framework allows for the construction of complex analysis workflows through model pipelining and ensemble-based aggregation. It uses adapter-based integration to wrap external time series libraries, providing a single entry point for diverse algorithmic implementations. Its capabilities cover temporal data tran
This project serves as an educational and practical resource for mastering machine learning workflows using Python. It provides a comprehensive collection of code examples and exercises designed to guide users through the implementation of predictive systems, ranging from fundamental algorithms to deep learning architectures. The repository distinguishes itself by offering a structured approach to both classical machine learning and neural network training. It covers the full lifecycle of model development, including the orchestration of reusable data transformation pipelines, advanced ensemb
PyCaret is a Python AutoML platform and MLOps lifecycle manager designed to automate machine learning workflows. It functions as a low-code environment that leverages a scikit-learn native engine to execute preprocessing, training, and evaluation for tabular data. The platform distinguishes itself as an LLM-powered ML copilot, using large language model agents to analyze datasets, design experiment configurations, and explain model results. It also serves as a Kubernetes ML orchestrator and model registry, enabling the versioning of trained pipelines and their promotion to production API endp
Smile is a comprehensive JVM machine learning library and statistical computing toolkit. It provides a suite of algorithms for classification, regression, and clustering, implemented natively for Java, Scala, and Kotlin. The project also functions as a deep learning framework, a natural language processing library, and an inference engine for large language models. The library distinguishes itself through GPU acceleration via LibTorch bindings and support for the ONNX model interchange format. It includes specialized capabilities for large language model inference, featuring Byte-Pair Encodin
Vowpal Wabbit is an open-source machine learning system designed for online learning, where models update incrementally from streaming data without requiring full retraining. It provides a reduction-based learning framework that composes complex tasks from simpler algorithms, and includes a feature hashing trick that maps unbounded feature names into a fixed-size vector space to keep memory usage constant regardless of dataset size. The system supports distributed training across a cluster using an allreduce protocol for synchronized updates, and offers an active learning query strategy that s
This project is a scientific computing framework for the .NET ecosystem, providing a comprehensive suite of libraries for numerical analysis, statistics, and mathematical optimization. It serves as a foundational toolkit for developing applications in machine learning, digital signal processing, and computer vision. The framework provides specialized toolkits for training and deploying predictive models, including neural networks, support vector machines, and decision trees. It further distinguishes itself with deep integrations for real-time visual analysis, such as object tracking and facia
This project is a structured learning curriculum and technical reference for mastering deep learning with TensorFlow. It provides a comprehensive guide for building, training, and deploying neural networks, combining theoretical fundamentals with practical implementation examples. The repository distinguishes itself by covering the end-to-end machine learning workflow, from low-level tensor mathematics and linear algebra to the creation of complex model architectures. It includes specific guidance on developing data pipelines for diverse data types, such as images, text, and time-series seque
Deepchecks is a machine learning model validation framework and MLOps testing library. It serves as an AI data quality suite and performance evaluator designed to verify the integrity and performance of models and datasets from research through production. The project functions as a model monitoring tool for tracking data drift and performance degradation in production environments. It allows for the creation of custom validation suites and utilizes a pluggable check architecture to automate quality checks within continuous integration pipelines. The framework covers a broad range of capabil
AutoGluon is an automated machine learning framework designed to optimize model selection and hyperparameter tuning across tabular, text, image, and time series data. It functions as an ensemble learning library and a tabular data prediction engine, aiming to build high-accuracy predictive models without manual algorithm selection. The framework integrates multimodal machine learning pipelines that combine disparate data types into a single representation using specialized encoders. It also includes a probabilistic time series forecaster that fits multiple statistical and deep learning models
PyOD is a Python anomaly detection library used to identify outliers in tabular, time series, graph, text, and image data. It provides a collection of algorithms for detecting anomalous data points and includes a unified detector interface that standardizes input and output signatures across its available detection algorithms. The project features a multi-modal outlier detector for identifying anomalies across diverse formats including unstructured text and images, as well as a specialized toolkit for graph-based and time-series anomaly detection. It includes an ensemble framework for combini
Linfa is a classical machine learning framework and statistical learning suite implemented in Rust. It provides a collection of algorithms for supervised and unsupervised learning, focused on traditional statistical methods such as regression, clustering, and decision trees. The toolkit is distinguished by its ability to be compiled into WebAssembly, enabling analytical models to execute within browser environments. It employs a trait-based algorithm interface to standardize the process of training and prediction across its various models. The library covers a broad range of capabilities, in
LightFM is a Python recommendation library and machine learning framework designed to predict user preferences. It implements a hybrid recommendation engine that combines collaborative filtering with content filtering by integrating user-item interaction data with descriptive metadata. The system utilizes hybrid matrix factorization to learn latent representations of users and items. It is specifically designed to handle implicit feedback, utilizing specialized loss functions such as Weighted Approximate Rank Pairwise and Bayesian Personalized Ranking to optimize item preferences for datasets
Cleanlab is a data-centric AI library and toolkit designed to improve machine learning model performance by detecting label errors and increasing overall dataset quality. It implements a confident learning framework that iteratively refines label noise estimates by comparing model predictions with estimated label probabilities to identify mislabeled examples. The project provides specialized utilities for active learning optimization, allowing for the selection of the most impactful examples for labeling or re-labeling. It also includes an outlier detection tool to identify atypical data poin
This project is an educational platform and research toolkit designed to teach deep learning through a combination of mathematical theory, visual diagrams, and executable code. It provides a comprehensive environment for building, training, and evaluating neural networks, grounding complex concepts in interactive computational notebooks that allow for hands-on experimentation. The framework distinguishes itself by interleaving theoretical foundations—including linear algebra, calculus, and probability—with practical implementations across multiple industry-standard libraries. It supports flex
Neuralforecast is a neural time series forecasting library designed to predict future values for one or multiple series using deep learning architectures. It functions as a distributed machine learning forecasting framework that enables the training of global models across multiple time series to improve generalization through cross-learning. The project distinguishes itself as a probabilistic forecasting toolkit that produces uncertainty intervals and probability distributions rather than single point estimates. It also includes a hierarchical forecast reconciler to ensure that predictions a
Kats is a time series analysis framework and library providing tools for statistical characterization, anomaly detection, and trend forecasting. It functions as a toolkit for predicting future values based on historical data and identifying irregular patterns or structural change points within temporal sequences. The project includes a temporal feature extraction tool to calculate descriptive statistics and characteristics that summarize time series behavior. It also provides a system for model hyperparameter tuning using self-supervised learning to improve the scale and generalization of pre
AutoGluon is an automated machine learning framework and multimodal library designed to automate the end-to-end pipeline from data preprocessing to high-accuracy model training and validation. It functions as an automated model trainer for tabular, image, text, and time series data, as well as a tool for time series forecasting and foundation model finetuning. The project is distinguished by its ability to jointly process and fuse different data types, allowing for the construction of multimodal neural networks that integrate images, text, and structured tables. It supports zero-shot inferenc
DeepPavlov is a deep learning conversational AI framework designed for building end-to-end dialog systems and chatbots. It functions as an NLP model training library and a pipeline system that connects multiple natural language processing models into a single operational chain. The framework provides a REST API model server to expose trained deep learning models as web endpoints. This allows conversational agents to be deployed as web services that handle incoming HTTP requests and return predictions. The system covers the full lifecycle of conversational AI development, including NLP pipeli
GluonTS is a probabilistic time series library and deep learning forecasting framework. It provides a toolkit for building, training, and evaluating neural network architectures that predict future values as probability distributions to quantify uncertainty. The project distinguishes itself by supporting zero-shot forecasting and integrating diverse modeling approaches, including deep probabilistic neural networks and wrappers for external statistical libraries such as Prophet and R forecast. It implements specialized architectural primitives like causal convolutions and invertible residual n
sktime is a machine learning framework for time series analysis. It provides a unified toolkit for implementing time series classification, forecasting, and anomaly detection using standardized machine learning interfaces. The library serves as a collection of tools for assigning categorical labels to temporal sequences, predicting future values based on historical patterns, and identifying outliers or unusual patterns within temporal data. The framework includes capabilities for panel-data handling and pipeline-based transformations. It utilizes a unified API wrapper and plugin-based model
This project is a comprehensive machine learning educational resource and tutorial series delivered as a collection of interactive Jupyter Notebooks. It provides practical Python implementations for the end-to-end machine learning lifecycle, covering supervised and unsupervised learning, deep learning, and reinforcement learning. The resource distinguishes itself by providing detailed implementation guides for complex architectures, including transformers, generative adversarial networks, and convolutional neural networks. It also features specialized courseware for developing reinforcement l
BERTopic is a topic modeling library used to extract interpretable themes from collections of text documents and images. It functions as a document clustering framework that transforms unstructured data into numerical vectors to group semantically similar content. The project distinguishes itself through a multimodal embedding tool that allows for joint clustering of text and images in a shared vector space. It also features a class-based TF-IDF representation engine to identify representative words for clusters and an integrated system for using large language models to generate natural lang
This repository is a deep learning educational resource and a neural network project suite. It provides a collection of practical TensorFlow implementations and coding projects designed to demonstrate the application of various neural network architectures to real-world data. The project includes specific samples for generative adversarial networks, focusing on synthetic image generation and style translation. It also provides examples of deep learning model construction across different learning paradigms. The codebase covers a broad range of capabilities, including computer vision for imag
This project is a suite of machine learning and statistical tools designed for stock price prediction, financial time series forecasting, and the execution of algorithmic trading strategies. It provides a collection of deep learning and statistical models used to forecast asset prices and market trends. The system includes a market scenario simulator that uses Monte Carlo sampling to generate potential price paths and estimate financial risk. It further features a portfolio optimization tool for calculating asset distributions to maximize returns based on historical volatility, as well as a m
This project is a comprehensive collection of practical code examples and implementation libraries for machine learning. It provides a wide array of reference materials for building supervised, unsupervised, and reinforcement learning algorithms. The repository serves as a multi-domain resource, featuring specific implementation suites for financial AI, Bayesian statistical modeling, and deep learning architectures. It includes a framework for training intelligent agents using policy gradients and actor-critic models, as well as practical guides for fine-tuning transformers and utilizing larg
This project is a distributed machine learning platform and sparse deep learning framework designed for training and serving models with high-dimensional sparse data. It functions as an online model serving infrastructure and recommendation system engine, enabling real-time item retrieval and scoring using deep tree matching and neural networks. The system distinguishes itself through a multi-task learning framework that optimizes multiple objective functions within a shared representation space. It features a specialized online serving infrastructure that supports dynamic model hot-loading a
This repository is a comprehensive collection of instructional guides and practical examples for Python development, focusing on machine learning, data science, and web scraping. It provides implementations for neural networks, reinforcement learning algorithms, and deep learning architectures using PyTorch, alongside detailed manuals for scientific computing and data visualization. The project distinguishes itself by offering specialized tutorials on concurrent programming to optimize CPU performance and guides for setting up Linux development environments. It covers the implementation of ad
NuPIC is a machine learning framework that implements Hierarchical Temporal Memory (HTM) theory, a neuroscience-inspired approach to artificial intelligence. It models principles of the neocortex to build systems capable of learning patterns from streaming data, performing sequence prediction, and detecting anomalies in real-time data streams. The framework is built around a Cortical Learning Algorithm that combines spatial pooling and temporal memory to process streaming input. It uses Sparse Distributed Representations to encode input patterns, a Spatial Pooler to convert dense input into s