LightGBM is a gradient boosting framework used to train decision tree ensembles for classification, regression, and ranking tasks. It functions as a distributed machine learning library and a decision tree ensemble implementation that utilizes leaf-wise growth and histogram-based feature binning.
The framework is distinguished by its ability to offload heavy computations to CUDA or OpenCL devices for GPU acceleration and its capacity to parallelize training across multiple nodes using sockets, MPI, or Dask. It includes a specialized categorical feature processor that optimizes partitions for non-numeric variables without requiring one-hot encoding.
The system covers a broad range of capabilities including large-scale data training, feature importance analysis via SHAP values, and model performance evaluation. It provides mechanisms for handling imbalanced data, managing ranking-specific data organization, and applying L1/L2 regularization to prevent overfitting.
Trained models can be serialized into JSON or text formats, or exported as C++ code to enable high-speed deployment without a runtime library.