30 open-source projects similar to xtra-computing/thundergbm, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Thundergbm alternative.
XGBoost is a distributed machine learning library for implementing scalable gradient boosting decision trees used for regression, classification, and ranking. It functions as a predictive model framework and a cross-language toolkit, providing a core implementation with native bindings for Python, R, Java, Scala, and C++. The system is designed as a GPU-accelerated library that utilizes CUDA and NCCL to speed up the training of decision tree ensembles. It operates as a distributed framework capable of scaling training and prediction across multi-node clusters and GPU environments to process m
LightGBM is a high-performance machine learning framework designed for constructing gradient-boosted decision tree ensembles. It provides a platform for training classification, regression, and ranking models, with a focus on memory efficiency and large-scale distributed computing. The framework distinguishes itself through specialized algorithmic strategies, including leaf-wise tree growth and histogram-based decision learning, which prioritize convergence speed. It optimizes memory usage by bundling mutually exclusive features and employs gradient-based sampling to reduce training complexit
CatBoost is a gradient boosting machine learning library used to train decision tree ensembles for regression, classification, and ranking tasks. It functions as a high-performance framework that provides a categorical data processor for transforming non-numeric features, a distributed trainer for large-scale datasets, and GPU acceleration to speed up model construction. The library distinguishes itself through native handling of categorical data and text features, removing the need for manual encoding. It includes a specialized model interpretability tool that leverages SHAP values and featu
A C library for product recommendations/suggestions using collaborative filtering (CF)
Deepchecks is a machine learning model validation framework and MLOps testing library. It serves as an AI data quality suite and performance evaluator designed to verify the integrity and performance of models and datasets from research through production. The project functions as a model monitoring tool for tracking data drift and performance degradation in production environments. It allows for the creation of custom validation suites and utilizes a pluggable check architecture to automate quality checks within continuous integration pipelines. The framework covers a broad range of capabil
Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).
dlib is a C++ machine learning toolkit and data analysis framework. It provides a collection of algorithms and utilities for building predictive modeling applications and performing statistical analysis on large datasets within native C++ environments. The project functions as a binding library that wraps low-level C++ machine learning algorithms into high-level Python scripting interfaces. This allows for the integration of high-performance native implementations with Python for machine learning development. The framework covers the implementation of predictive models, the execution of mach
CONTRIBUTORS WELCOME Generalized Additive Models in Python
Fastai is a high-level deep learning library built on PyTorch that provides a unified interface for managing the entire machine learning lifecycle. It functions as a comprehensive training toolkit, abstracting hardware management and automating complex training loops to simplify the construction and execution of neural network models. The framework is distinguished by its notebook-centric development environment and a type-dispatching data pipeline that automatically applies transformations based on input data formats. It emphasizes transfer learning through discriminative layer-wise optimiza
scikit-opt is a Python optimization library and numerical framework designed to solve complex global optimization problems. It provides a suite of metaheuristic algorithms and tools for finding global minima or maxima of objective functions. The library implements a variety of nature-inspired and swarm intelligence algorithms, including Genetic Algorithms, Particle Swarm Optimization, Differential Evolution, Simulated Annealing, and Ant Colony Optimization. It includes specialized solvers for discrete combinatorial challenges, such as the Traveling Salesman Problem. The framework supports th
Python package for Bayesian Machine Learning with scikit-learn API
Einops is a tensor manipulation library that provides a framework-agnostic interface for reshaping, Einstein summation, and multi-dimensional array operations. It serves as an abstraction layer that works across NumPy, PyTorch, TensorFlow, and JAX, allowing for tensor transformations without changing the API. The library distinguishes itself through a declarative notation system that uses readable string patterns to describe tensor rearrangements and reductions. This approach includes an extended Einstein summation interface that supports multi-letter axis names and a named dimension mapping
Julia implementation of Decision Tree (CART) and Random Forest algorithms
Caffe is a high-performance deep learning framework designed for training and deploying deep neural networks. It functions as a machine learning engine and a convolutional neural network library, providing a C++ backend to accelerate computations on both GPUs and CPUs. The system includes a specialized toolset for computer vision, enabling tasks such as object detection, semantic segmentation, and large-scale image retrieval. It supports the deployment of pre-trained models for image and scene recognition, as well as the ability to fine-tune neural network weights for specialized tasks. The
Chainer is an open-source deep learning framework built around define-by-run automatic differentiation, where computation graphs are constructed dynamically during forward execution. This imperative approach allows networks to be built using standard Python control flow, with gradients computed automatically through reverse-mode differentiation on the dynamically recorded graph. The framework supports GPU acceleration through a NumPy-compatible array backend with CUDA and cuDNN support, and provides a pluggable device abstraction that lets users switch between CPU and GPU computation without c
2-2000x faster ML algos, 50% less memory usage, works on all hardware - new and old.
Apache MXNet is a deep learning framework and distributed machine learning library designed for training and deploying neural networks across distributed systems, mobile devices, and hardware accelerators. It functions as a cross-platform runtime and a dynamic dataflow scheduler that optimizes neural network execution. The framework provides a multi-language API, enabling the development of machine learning models using Python, R, Julia, Scala, Go, and JavaScript. It supports high-performance model training and the scaling of workloads across multiple GPUs and machines. The system covers cap
Apache Mahout - an environment for quickly creating scalable, performant machine learning applications.
Mmlspark is a distributed framework for executing machine learning models, data transformations, and AI service integrations across Apache Spark clusters. It functions as a distributed machine learning library and pipeline orchestrator, allowing users to integrate pre-trained cognitive services and custom models into large-scale batch and streaming workflows. The project is distinguished by its ability to incorporate external AI services and web APIs directly into big data pipelines for text and vision analysis. It provides a scalable model training framework that coordinates gradient boostin
Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.
A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
This project is an automated machine learning framework and toolkit designed for training and tuning custom models for classification, regression, and recommendations. It functions as a multimodal machine learning toolkit capable of processing and training models using a combination of text, image, audio, and sensor data. The framework distinguishes itself as a multimodal data processor that can handle and visualize large datasets on a single machine using column-oriented disk storage. It includes a core machine learning model generator that converts trained models into formats compatible wit
Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.