# intel-analytics/bigdl

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/intel-analytics-bigdl).**

8,845 stars · 1,428 forks · Python · Apache-2.0 · archived

## Links

- GitHub: https://github.com/intel-analytics/BigDL
- awesome-repositories: https://awesome-repositories.com/repository/intel-analytics-bigdl.md

## Description

BigDL is a PyTorch acceleration framework and distributed inference engine designed for large language models. It provides a toolkit for running models on Intel hardware, integrating quantization tools and libraries for parameter-efficient fine-tuning.

The project distinguishes itself through the use of pipeline parallelism to distribute model workloads across multiple hardware accelerators. It utilizes low-bit integer quantization and speculative decoding to reduce memory footprints and decrease text generation latency.

The system covers broad capabilities in model optimization, including weight compression and quantized model loading. It also supports hardware-accelerated training routines to adapt pre-trained models to specific tasks.

## Tags

### Artificial Intelligence & ML

- [XPU Acceleration Toolkits](https://awesome-repositories.com/f/artificial-intelligence-ml/xpu-acceleration-toolkits.md) — Optimizes compute kernels specifically for Intel CPUs and GPUs to improve inference and fine-tuning performance.
- [PyTorch-Based Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/deep-learning-frameworks/pytorch-based-frameworks.md) — Provides a toolkit for optimizing and executing PyTorch models on hardware accelerators via weight compression and parallelism.
- [Distributed Inference Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-inference-engines.md) — Provides a distributed engine that splits large model workloads across multiple accelerators using pipeline parallelism.
- [Distributed Inference Scaling](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-inference-scaling.md) — Scales inference by executing large scale models across multiple hardware accelerators via pipeline parallelism. ([source](https://github.com/intel-analytics/bigdl#readme))
- [Distributed Model Execution](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-model-execution.md) — Executes large model workloads across multiple compute devices to balance heavy computational loads.
- [Hardware-Accelerated Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/hardware-accelerated-inference.md) — Optimizes model execution across different hardware processors to increase speed and reduce latency. ([source](https://github.com/intel-analytics/bigdl#readme))
- [Intel XPU LLM Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/intel-hardware-export/intel-xpu-llm-inference.md) — Runs large language models on Intel hardware using INT4 quantization for high-performance, low-latency inference.
- [Large Language Model Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/large-language-model-fine-tuning.md) — Provides hardware-accelerated training routines and parameter-efficient tuning to adapt pre-trained models to specific tasks. ([source](https://github.com/intel-analytics/bigdl#readme))
- [Intel XPU](https://awesome-repositories.com/f/artificial-intelligence-ml/large-language-models/inference-libraries/intel-xpu.md) — Ships a library for running large language models on Intel hardware using INT4 quantization.
- [Pipeline Parallelisms](https://awesome-repositories.com/f/artificial-intelligence-ml/neural-networks/model-training-pipelines/pipeline-parallelisms.md) — Distributes model layers across multiple hardware accelerators using pipeline parallelism to handle massive models.
- [Parameter Efficient Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/parameter-efficient-fine-tuning.md) — Provides hardware-accelerated routines for adapting pre-trained models using parameter-efficient fine-tuning.
- [PyTorch Backends](https://awesome-repositories.com/f/artificial-intelligence-ml/pytorch-backends.md) — Interfaces with PyTorch to enable seamless loading and execution of standard model architectures on accelerated hardware.
- [Weight Quantization](https://awesome-repositories.com/f/artificial-intelligence-ml/quantized-inference-runtimes/weight-quantization.md) — Implements weight quantization to compress model weights into low-bit formats, reducing memory footprint and increasing speed.
- [Speculative Decoding Strategies](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving/inference-optimization/inference-acceleration-techniques/speculative-decoding-strategies.md) — Decreases text generation latency by predicting and validating multiple tokens in a single forward pass.
- [Self-Speculative Decoding](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving/inference-optimization/inference-acceleration-techniques/speculative-decoding-strategies/self-speculative-decoding.md) — Implements self-speculative decoding to speed up text generation by predicting multiple tokens in parallel. ([source](https://github.com/intel-analytics/bigdl#readme))
- [Quantized Model Loading](https://awesome-repositories.com/f/artificial-intelligence-ml/model-quantization/quantized-model-loading.md) — Provides the ability to import models from common compressed formats for higher efficiency and lower resource overhead. ([source](https://github.com/intel-analytics/bigdl#readme))
- [Parameter-Efficient Training Toolkits](https://awesome-repositories.com/f/artificial-intelligence-ml/parameter-efficient-training-toolkits.md) — Implements a framework for adapting pre-trained models to specific tasks using hardware-accelerated, parameter-efficient tuning.
- [PyTorch Model Optimizations](https://awesome-repositories.com/f/artificial-intelligence-ml/pytorch-model-optimizations.md) — Accelerates the execution of PyTorch based language models by optimizing them for Intel XPU hardware targets.
- [LLM Quantization Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/quantized-inference-runtimes/llm-quantization-frameworks.md) — Provides a system for reducing model memory usage by converting weights into low-bit formats.

### DevOps & Infrastructure

- [Low-Bit Weight Quantization](https://awesome-repositories.com/f/devops-infrastructure/intel-hardware-acceleration/low-bit-weight-quantization.md) — Compresses LLM weights into low-bit precision formats to reduce memory usage and increase execution speed.

### Part of an Awesome List

- [Large Language Models](https://awesome-repositories.com/f/awesome-lists/ai/large-language-models.md) — Distributed deep learning library for big data platforms.
- [Machine Learning](https://awesome-repositories.com/f/awesome-lists/ai/machine-learning.md) — Distributed deep learning library.
- [Large Language Models (LLMs)](https://awesome-repositories.com/f/awesome-lists/more/large-language-models-llms.md) — Listed in the “Large Language Models (LLMs)” section of the The Incredible Pytorch awesome list.
