# microsoft/LightGBM

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/microsoft-lightgbm).**

18,096 stars · 3,985 forks · C++ · mit

## Links

- GitHub: https://github.com/microsoft/LightGBM
- Homepage: https://lightgbm.readthedocs.io/en/latest/
- awesome-repositories: https://awesome-repositories.com/repository/microsoft-lightgbm.md

## Topics

`data-mining` `decision-trees` `distributed` `gbdt` `gbm` `gbrt` `gradient-boosting` `kaggle` `lightgbm` `machine-learning` `microsoft` `parallel` `python` `r`

## Description

LightGBM is a high-performance machine learning framework designed for constructing gradient-boosted decision tree ensembles. It provides a platform for training classification, regression, and ranking models, with a focus on memory efficiency and large-scale distributed computing.

The framework distinguishes itself through specialized algorithmic strategies, including leaf-wise tree growth and histogram-based decision learning, which prioritize convergence speed. It optimizes memory usage by bundling mutually exclusive features and employs gradient-based sampling to reduce training complexity. To handle large-scale datasets, the system supports distributed training across multiple computing nodes and offloads intensive mathematical operations to hardware accelerators.

The library includes native language bindings for Java, allowing for the integration of its core machine learning capabilities into existing application environments. Users can further tailor the training process by injecting custom objective functions, while built-in monitoring tools track accuracy and optimization progress throughout the model lifecycle.

## Tags

### Artificial Intelligence & ML

- [Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/gradient-boosting/frameworks.md) — Provides a high-performance framework for training gradient-boosted decision tree ensembles optimized for speed and memory efficiency.
- [Gradient Boosting Libraries](https://awesome-repositories.com/f/artificial-intelligence-ml/ensemble-learning-libraries/gradient-boosting-libraries.md) — Constructs classification, regression, and ranking models using gradient-boosted decision tree ensembles.
- [Gradient Boosting Algorithms](https://awesome-repositories.com/f/artificial-intelligence-ml/gradient-boosting-algorithms.md) — Prioritizes splitting the leaf node that reduces the overall loss the most to achieve faster convergence compared to level-wise growth.
- [Training](https://awesome-repositories.com/f/artificial-intelligence-ml/gradient-boosting/training.md) — Constructs high-performance decision tree ensembles using optimized techniques for speed and memory efficiency. ([source](https://cdn.jsdelivr.net/gh/microsoft/LightGBM@master/README.md))
- [Distributed Training](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-training-frameworks/distributed-training.md) — Executes model training across multiple machines to process datasets that exceed the capacity of a single node. ([source](https://cdn.jsdelivr.net/gh/microsoft/LightGBM@master/README.md))
- [Distributed Training Platforms](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/integrated-development-platforms/machine-learning-platforms/distributed-training-platforms.md) — Scales model training across multiple nodes and hardware accelerators to handle large-scale datasets.
- [Hardware-Accelerated](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/hardware-and-acceleration/tensor-computing-libraries/tensor-libraries/hardware-accelerated.md) — Offloads intensive mathematical training operations to graphics processing units to accelerate model convergence and reduce processing time for large datasets. ([source](https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html))
- [Distributed Learning](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/distributed-and-scaling-strategies/distributed-learning.md) — Scales model training across multiple computing nodes to process datasets that exceed single-machine memory capacity.
- [Hardware Acceleration](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/hardware-and-acceleration/hardware-acceleration.md) — Offloads intensive mathematical computations to specialized hardware to achieve faster model convergence.
- [Histogram-Based Learning](https://awesome-repositories.com/f/artificial-intelligence-ml/decision-trees/histogram-based-learning.md) — Accelerates the search for optimal split points by discretizing continuous feature values into bins.
- [Hardware Acceleration Kernels](https://awesome-repositories.com/f/artificial-intelligence-ml/hardware-acceleration-kernels.md) — Offloads intensive matrix and vector operations to graphics processing units to accelerate model training.
- [Gradient-Based Sampling](https://awesome-repositories.com/f/artificial-intelligence-ml/gradient-computation/gradient-based-sampling.md) — Reduces training complexity by prioritizing instances with large gradients during the tree construction process.
- [Custom Loss Functions](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/pipelines-and-orchestration/training-orchestration-systems/training-lifecycle-management/custom-loss-functions.md) — Enables domain-specific optimization by allowing the definition and integration of custom loss functions.
- [Custom Objective Configuration](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/pipelines-and-orchestration/training-orchestration-systems/training-lifecycle-management/custom-loss-functions/custom-objective-configuration.md) — Allows users to integrate specialized loss functions by passing custom objective logic directly into the model constructor. ([source](https://microsoft.github.io/FLAML/docs/Examples/AutoML-for-LightGBM/))

### Programming Languages & Runtimes

- [Native Execution Engines](https://awesome-repositories.com/f/programming-languages-runtimes/language-interoperability/native-c-interoperability/native-execution-engines.md) — Executes performance-critical mathematical operations in compiled code to maximize hardware utilization.
- [Machine Learning Bindings](https://awesome-repositories.com/f/programming-languages-runtimes/programming-language-varieties/programming-languages/jvm-languages/java/machine-learning-bindings.md) — Embeds high-performance gradient boosting capabilities into Java applications via native code wrappers.
- [Machine Learning Integrations](https://awesome-repositories.com/f/programming-languages-runtimes/programming-language-varieties/programming-languages/jvm-languages/java/machine-learning-integrations.md) — Provides native Java bindings to enable high-performance machine learning model training and inference within Java applications. ([source](https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html))

### Software Engineering & Architecture

- [Distributed Training Coordination](https://awesome-repositories.com/f/software-engineering-architecture/distributed-coordination-systems/distributed-training-coordination.md) — Coordinates machine learning tasks across multiple computing nodes to improve processing speed and handle large datasets. ([source](https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html))

### Scientific & Mathematical Computing

- [MPI Communication](https://awesome-repositories.com/f/scientific-mathematical-computing/high-performance-execution-environments/high-performance-and-parallel-computing/high-performance-computing/mpi-communication.md) — Coordinates parallel training tasks across multiple nodes using high-performance message passing interfaces.

### System Administration & Monitoring

- [Optimization Monitoring](https://awesome-repositories.com/f/system-administration-monitoring/model-observability-suites/optimization-monitoring.md) — Tracks and displays improvements in model accuracy during hyperparameter optimization to identify effective settings. ([source](https://microsoft.github.io/FLAML/docs/Examples/AutoML-for-LightGBM/))

### Development Tools & Productivity

- [Feature Bundling](https://awesome-repositories.com/f/development-tools-productivity/build-tooling/build-performance-optimization/build-optimization-tools/bundle-optimizers/feature-bundling.md) — Optimizes memory usage by bundling mutually exclusive features to reduce dimensionality during tree construction.
