# lightgbm-org/lightgbm

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/lightgbm-org-lightgbm).**

18,460 stars · 4,028 forks · C++ · MIT

## Links

- GitHub: https://github.com/lightgbm-org/LightGBM
- Homepage: https://lightgbm.readthedocs.io/en/latest/
- awesome-repositories: https://awesome-repositories.com/repository/lightgbm-org-lightgbm.md

## Description

LightGBM is a gradient boosting framework used to train decision tree ensembles for classification, regression, and ranking tasks. It functions as a distributed machine learning library and a decision tree ensemble implementation that utilizes leaf-wise growth and histogram-based feature binning.

The framework is distinguished by its ability to offload heavy computations to CUDA or OpenCL devices for GPU acceleration and its capacity to parallelize training across multiple nodes using sockets, MPI, or Dask. It includes a specialized categorical feature processor that optimizes partitions for non-numeric variables without requiring one-hot encoding.

The system covers a broad range of capabilities including large-scale data training, feature importance analysis via SHAP values, and model performance evaluation. It provides mechanisms for handling imbalanced data, managing ranking-specific data organization, and applying L1/L2 regularization to prevent overfitting.

Trained models can be serialized into JSON or text formats, or exported as C++ code to enable high-speed deployment without a runtime library.

## Tags

### Artificial Intelligence & ML

- [Ensemble Methods](https://awesome-repositories.com/f/artificial-intelligence-ml/decision-trees/ensemble-methods.md) — Implements a high-performance gradient boosting decision tree ensemble using leaf-wise growth and histogram binning.
- [Gradient Boosting](https://awesome-repositories.com/f/artificial-intelligence-ml/gradient-boosting.md) — Implements gradient boosting algorithms for high-performance classification, regression, and ranking tasks.
- [Histogram-Based Learning](https://awesome-repositories.com/f/artificial-intelligence-ml/decision-trees/histogram-based-learning.md) — Discretizes continuous feature values into integer bins to reduce memory and accelerate split calculations.
- [Distributed Training](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-training.md) — Distributes the learning process across multiple machines to handle large-scale datasets. ([source](https://github.com/lightgbm-org/LightGBM/blob/master/docs/Parallel-Learning-Guide.rst))
- [Distributed Training](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-training-frameworks/distributed-training.md) — Parallelizes model training using feature, data, or voting strategies across multiple nodes. ([source](https://github.com/lightgbm-org/LightGBM/blob/master/docs/Features.rst))
- [Gradient Boosting Libraries](https://awesome-repositories.com/f/artificial-intelligence-ml/ensemble-learning-libraries/gradient-boosting-libraries.md) — Provides a framework for constructing gradient-boosted decision tree ensembles that can be trained across multiple nodes.
- [GPU-Accelerated Training](https://awesome-repositories.com/f/artificial-intelligence-ml/hardware-acceleration-backends/cuda-mining-backends/gpu-accelerated-training.md) — Offloads heavy feature binning and split search operations to CUDA or OpenCL devices.
- [Large Scale Training](https://awesome-repositories.com/f/artificial-intelligence-ml/large-scale-training.md) — Processes massive datasets by distributing the learning workload across clusters to reduce training time.
- [Leaf-wise Tree Growth](https://awesome-repositories.com/f/artificial-intelligence-ml/leaf-wise-tree-growth.md) — Expands trees by splitting the node with the largest loss reduction to achieve faster convergence and lower error. ([source](https://github.com/lightgbm-org/LightGBM/blob/master/docs/Features.rst))
- [C-Based Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving/inference-engines/c-inference-backends/c-based-engines.md) — Implements the high-performance training and prediction engine in C for maximum execution speed.
- [Feature Contribution Analysis](https://awesome-repositories.com/f/artificial-intelligence-ml/feature-contribution-analysis.md) — Calculates SHAP values to determine the specific contribution of each feature to a final prediction. ([source](https://github.com/lightgbm-org/LightGBM/blob/master/docs/Parameters.rst))
- [Feature Importance Attribution](https://awesome-repositories.com/f/artificial-intelligence-ml/feature-importance-attribution.md) — Determines feature influence on predictions using SHAP values and feature contribution estimates.
- [Gradient-Based Sampling](https://awesome-repositories.com/f/artificial-intelligence-ml/gradient-computation/gradient-based-sampling.md) — Implements gradient-based one-sided sampling to speed up training by prioritizing instances with large gradients.
- [Learning to Rank Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/learning-to-rank-frameworks.md) — Groups training records into query sets to enable learning-to-rank objectives. ([source](https://github.com/lightgbm-org/LightGBM/blob/master/docs/Parameters.rst))
- [Model Evaluation Metrics](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-evaluation-and-validation/model-evaluation-metrics.md) — Calculates accuracy and error using specific evaluation metrics to monitor training progress and model quality. ([source](https://github.com/lightgbm-org/LightGBM/blob/master/docs/Parameters.rst))
- [Distributed Machine Learning Integrators](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/distributed-and-scaling-strategies/distributed-learning/distributed-machine-learning-integrators.md) — Integrates with Dask to execute training and prediction tasks across distributed data collections. ([source](https://github.com/lightgbm-org/LightGBM/blob/master/docs/Parallel-Learning-Guide.rst))
- [Model Deployment](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-fine-tuning/pre-trained-model-zoos/model-deployment.md) — Facilitates the loading of trained models into production environments for high-speed inference.
- [Early Stopping Monitors](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/training-monitoring-and-profiling/training-observability-systems/training-monitoring-tools/training-safety-monitors/early-stopping-monitors.md) — Tracks validation metrics and triggers early stopping to prevent overfitting during training. ([source](https://github.com/lightgbm-org/LightGBM/blob/master/docs/Features.rst))
- [Distributed Execution](https://awesome-repositories.com/f/artificial-intelligence-ml/model-execution-environments/distributed-execution.md) — Spreads computations across multiple machines or GPUs to process massive datasets via parallel execution. ([source](https://github.com/lightgbm-org/lightgbm#readme))
- [Model Performance Optimization](https://awesome-repositories.com/f/artificial-intelligence-ml/model-optimization/profiling-and-benchmarking/model-performance-optimization.md) — Implements techniques like leaf-wise growth and regularization to improve prediction accuracy and prevent overfitting.
- [Model Predictions](https://awesome-repositories.com/f/artificial-intelligence-ml/model-predictions.md) — Generates predictions from trained models including raw scores, transformed scores, and leaf indices. ([source](https://github.com/lightgbm-org/LightGBM/blob/master/docs/Parameters.rst))
- [Model Exporting](https://awesome-repositories.com/f/artificial-intelligence-ml/model-training/model-exporting.md) — Exports trained models into C++ code or standalone files for production deployment without a runtime. ([source](https://github.com/lightgbm-org/LightGBM/blob/master/docs/Parameters.rst))
- [Regularization and Sampling](https://awesome-repositories.com/f/artificial-intelligence-ml/regularization-and-sampling.md) — Implements L1/L2 regularization, bagging, and column sub-sampling to prevent overfitting during the training process. ([source](https://github.com/lightgbm-org/LightGBM/blob/master/docs/Features.rst))

### Data & Databases

- [Native Categorical Splitting](https://awesome-repositories.com/f/data-databases/categorical-data-optimization/native-categorical-splitting.md) — Optimizes partitions for categorical variables using native splitting instead of one-hot encoding.

### Networking & Communication

- [Distributed Learning Communication](https://awesome-repositories.com/f/networking-communication/distributed-learning-communication.md) — Synchronizes training state across multiple machines using socket or MPI communication.

### Programming Languages & Runtimes

- [Language Wrappers](https://awesome-repositories.com/f/programming-languages-runtimes/language-wrappers.md) — Provides high-level Python and R wrappers to simplify model configuration and data handling. ([source](https://github.com/lightgbm-org/LightGBM/blob/master/docs/Development-Guide.rst))

### Scientific & Mathematical Computing

- [MPI Communication](https://awesome-repositories.com/f/scientific-mathematical-computing/high-performance-execution-environments/high-performance-and-parallel-computing/high-performance-computing/mpi-communication.md) — Uses MPI communication protocols to synchronize parallel training tasks across multiple machines. ([source](https://github.com/lightgbm-org/LightGBM/blob/master/docs/Parameters.rst))

### Software Engineering & Architecture

- [Model](https://awesome-repositories.com/f/software-engineering-architecture/code-generators/model.md) — Translates trained decision tree structures into C++ if-else statements for high-speed deployment.
- [Regularized Tree Pruning](https://awesome-repositories.com/f/software-engineering-architecture/trees/regularized-tree-pruning.md) — Applies L1 and L2 regularization during tree construction to prevent overfitting.

### Part of an Awesome List

- [Machine Learning](https://awesome-repositories.com/f/awesome-lists/ai/machine-learning.md) — A fast, distributed, high performance gradient boosting framework.