30 open-source projects similar to pystruct/pystruct, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Pystruct alternative.
A scikit-learn based module for multi-label et. al. classification
cuml is a GPU-accelerated machine learning library and framework that uses CUDA to accelerate tabular data preprocessing and model execution. It provides a suite of tools for training and deploying classification, regression, and clustering models on NVIDIA GPUs and GPU clusters. The library is designed for scalability, offering a distributed GPU machine learning environment that can spread computation and data across multiple hardware accelerators and nodes to handle datasets exceeding single-device memory. It mirrors standard estimator interfaces to allow the replacement of CPU-based models
mlxtend is a pure Python machine learning extension library that provides additional tools for association rule mining, ensemble learning, and feature selection. It is built on numpy and pandas, with all data operations accepting and returning pandas DataFrames, and custom estimators inherit from scikit-learn’s base classes to offer a uniform fit-predict interface compatible with grid search. The library implements the Apriori algorithm for mining frequent itemsets from transaction data and generating association rules with confidence and lift metrics. For classification, it combines multiple
CausalML is a machine learning library for causal inference, providing tools to estimate treatment effects and causal impacts using experimental and observational data. It functions as a framework for uplift modeling and the estimation of heterogeneous treatment effects to distinguish causation from correlation. The library focuses on identifying how different user segments respond to specific interventions. This includes calculating the incremental gain of target metrics to optimize marketing campaigns, targeting high-response customer segments, and personalizing user engagement through the
CONTRIBUTORS WELCOME Generalized Additive Models in Python
Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models
2-2000x faster ML algos, 50% less memory usage, works on all hardware - new and old.
dlib is a C++ machine learning toolkit and data analysis framework. It provides a collection of algorithms and utilities for building predictive modeling applications and performing statistical analysis on large datasets within native C++ environments. The project functions as a binding library that wraps low-level C++ machine learning algorithms into high-level Python scripting interfaces. This allows for the integration of high-performance native implementations with Python for machine learning development. The framework covers the implementation of predictive models, the execution of mach
mlpack is a header-only C++ machine learning library that defines matrix types as compile-time templates, enabling flexible numeric precision and memory layout without runtime overhead. Its core identity is built around a template metaprogramming architecture that allows algorithms to be included selectively as independent modules, reducing binary size, and supports compile-time serialization of neural network parameters by deducing matrix types and structure at compile time. The library distinguishes itself through a multi-language binding framework that automatically generates bindings for
PySpark Scikit-learn = Sparkit-learn
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).
A modular active learning framework for Python
PyTorch is a machine learning framework centered on a GPU-ready tensor library that supports multi-dimensional array operations across both CPU and accelerator hardware. It provides a foundational infrastructure for mathematical computation and dynamic neural network construction, utilizing a tape-based automatic differentiation system that allows for flexible, non-static graph execution. The framework is designed for deep integration with Python, enabling natural usage alongside standard scientific computing ecosystems. It distinguishes itself through a comprehensive distributed training sui
LightGBM is a high-performance machine learning framework designed for constructing gradient-boosted decision tree ensembles. It provides a platform for training classification, regression, and ranking models, with a focus on memory efficiency and large-scale distributed computing. The framework distinguishes itself through specialized algorithmic strategies, including leaf-wise tree growth and histogram-based decision learning, which prioritize convergence speed. It optimizes memory usage by bundling mutually exclusive features and employs gradient-based sampling to reduce training complexit
Python package for Bayesian Machine Learning with scikit-learn API
Caffe is a high-performance deep learning framework designed for training and deploying deep neural networks. It functions as a machine learning engine and a convolutional neural network library, providing a C++ backend to accelerate computations on both GPUs and CPUs. The system includes a specialized toolset for computer vision, enabling tasks such as object detection, semantic segmentation, and large-scale image retrieval. It supports the deployment of pre-trained models for image and scene recognition, as well as the ability to fine-tune neural network weights for specialized tasks. The
MindsDB is an AI-native database engine that treats machine learning models and autonomous agents as virtual tables. By mapping external data sources, predictive models, and third-party services directly into the database schema, it enables users to perform inference, data retrieval, and complex orchestration using standard SQL syntax. The platform distinguishes itself through an autonomous agent orchestrator that executes iterative reasoning loops, allowing agents to plan data access and synthesize natural language responses from connected knowledge bases. It functions as a federated data ga
Keras is a high-level deep learning framework designed for constructing and training neural networks through the composition of modular, functional layers. It serves as a comprehensive modeling toolkit that provides standardized procedures for defining, evaluating, and deploying complex architectures. By utilizing a directed acyclic graph approach, the framework allows users to build intricate models with multiple inputs, outputs, and shared layers, ensuring consistent numerical execution through functional state management. The project distinguishes itself as a multi-backend machine learning
open-source feature selection repository in python
Mmlspark is a distributed framework for executing machine learning models, data transformations, and AI service integrations across Apache Spark clusters. It functions as a distributed machine learning library and pipeline orchestrator, allowing users to integrate pre-trained cognitive services and custom models into large-scale batch and streaming workflows. The project is distinguished by its ability to incorporate external AI services and web APIs directly into big data pipelines for text and vision analysis. It provides a scalable model training framework that coordinates gradient boostin
A C library for product recommendations/suggestions using collaborative filtering (CF)
CatBoost is a gradient boosting machine learning library used to train decision tree ensembles for regression, classification, and ranking tasks. It functions as a high-performance framework that provides a categorical data processor for transforming non-numeric features, a distributed trainer for large-scale datasets, and GPU acceleration to speed up model construction. The library distinguishes itself through native handling of categorical data and text features, removing the need for manual encoding. It includes a specialized model interpretability tool that leverages SHAP values and featu
scikit-opt is a Python optimization library and numerical framework designed to solve complex global optimization problems. It provides a suite of metaheuristic algorithms and tools for finding global minima or maxima of objective functions. The library implements a variety of nature-inspired and swarm intelligence algorithms, including Genetic Algorithms, Particle Swarm Optimization, Differential Evolution, Simulated Annealing, and Ant Colony Optimization. It includes specialized solvers for discrete combinatorial challenges, such as the Traveling Salesman Problem. The framework supports th
Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.
A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.
XGBoost is a distributed machine learning library for implementing scalable gradient boosting decision trees used for regression, classification, and ranking. It functions as a predictive model framework and a cross-language toolkit, providing a core implementation with native bindings for Python, R, Java, Scala, and C++. The system is designed as a GPU-accelerated library that utilizes CUDA and NCCL to speed up the training of decision tree ensembles. It operates as a distributed framework capable of scaling training and prediction across multi-node clusters and GPU environments to process m
Deepchecks is a machine learning model validation framework and MLOps testing library. It serves as an AI data quality suite and performance evaluator designed to verify the integrity and performance of models and datasets from research through production. The project functions as a model monitoring tool for tracking data drift and performance degradation in production environments. It allows for the creation of custom validation suites and utilizes a pluggable check architecture to automate quality checks within continuous integration pipelines. The framework covers a broad range of capabil