BigDL

BigDL - accelerate PyTorch LLM inference | Awesome Repos

Features

XPU Acceleration Toolkits - Optimizes compute kernels specifically for Intel CPUs and GPUs to improve inference and fine-tuning performance.
PyTorch-Based Frameworks - Provides a toolkit for optimizing and executing PyTorch models on hardware accelerators via weight compression and parallelism.
Distributed Inference Engines - Provides a distributed engine that splits large model workloads across multiple accelerators using pipeline parallelism.
Distributed Inference Scaling - Scales inference by executing large scale models across multiple hardware accelerators via pipeline parallelism.
Distributed Model Execution - Executes large model workloads across multiple compute devices to balance heavy computational loads.
Hardware-Accelerated Inference - Optimizes model execution across different hardware processors to increase speed and reduce latency.
Intel XPU LLM Inference - Runs large language models on Intel hardware using INT4 quantization for high-performance, low-latency inference.
Large Language Model Fine-Tuning - Provides hardware-accelerated training routines and parameter-efficient tuning to adapt pre-trained models to specific tasks.
Intel XPU - Ships a library for running large language models on Intel hardware using INT4 quantization.
Pipeline Parallelisms - Distributes model layers across multiple hardware accelerators using pipeline parallelism to handle massive models.
Parameter Efficient Fine-Tuning - Provides hardware-accelerated routines for adapting pre-trained models using parameter-efficient fine-tuning.
PyTorch Backends - Interfaces with PyTorch to enable seamless loading and execution of standard model architectures on accelerated hardware.
Weight Quantization - Implements weight quantization to compress model weights into low-bit formats, reducing memory footprint and increasing speed.
Low-Bit Weight Quantization - Compresses LLM weights into low-bit precision formats to reduce memory usage and increase execution speed.
Speculative Decoding Strategies - Decreases text generation latency by predicting and validating multiple tokens in a single forward pass.
Self-Speculative Decoding - Implements self-speculative decoding to speed up text generation by predicting multiple tokens in parallel.
Quantized Model Loading - Provides the ability to import models from common compressed formats for higher efficiency and lower resource overhead.
Parameter-Efficient Training Toolkits - Implements a framework for adapting pre-trained models to specific tasks using hardware-accelerated, parameter-efficient tuning.
PyTorch Model Optimizations - Accelerates the execution of PyTorch based language models by optimizing them for Intel XPU hardware targets.
LLM Quantization Frameworks - Provides a system for reducing model memory usage by converting weights into low-bit formats.
Large Language Models - Distributed deep learning library for big data platforms.
Machine Learning - Distributed deep learning library.
Large Language Models (LLMs) - Listed in the “Large Language Models (LLMs)” section of the The Incredible Pytorch awesome list.

Open-source alternatives to BigDL

Similar open-source projects, ranked by how many features they share with BigDL.

intel/ipex-llm
intel/ipex-llm
8,836View on GitHub
Intel XPU LLM Acceleration Library is a toolkit designed to accelerate large language model inference and finetuning on Intel CPUs, GPUs, and NPUs. It provides a distributed inference engine for scaling models across multiple accelerators, a multimodal model runtime for vision and speech tasks, and a low-bit model quantization tool for converting weights into INT4, FP8, and GGUF formats. The project features a parameter-efficient finetuning framework that enables model adaptation using QLoRA and DPO on Intel hardware. It distinguishes itself by providing specialized optimizations for Intel XP
Python
View on GitHub8,836
facebookresearch/llama-recipes
facebookresearch/llama-recipes
18,379View on GitHub
This repository is a collection of frameworks and guides for Llama models, functioning as a fine-tuning framework, an inference pipeline, and an AI workflow orchestrator. It provides tools for adapting large language models to specific datasets and domains. The project includes a parameter-efficient fine-tuning toolkit that utilizes techniques like low-rank adaptation to reduce memory and compute requirements. It also serves as an implementation guide for retrieval-augmented generation, combining model inference with external data retrieval to improve response accuracy. The capability surfac
Jupyter Notebook
View on GitHub18,379
intel-analytics/ipex-llm
intel-analytics/ipex-llm
8,836View on GitHub
ipex-llm is an acceleration library and inference engine designed to optimize the execution and finetuning of large language models on Intel GPUs and NPUs. It provides a HuggingFace compatible model backend and a dedicated quantization toolkit for converting model weights into low-bit precision formats. The project facilitates distributed inference by splitting large model workloads across multiple accelerators using pipeline and tensor parallelism. It enables the deployment of models on Intel Arc, Flex, and Max GPUs to increase throughput and reduce latency. The library covers a broad range
Python
View on GitHub8,836
openbmb/minicpm
OpenBMB/MiniCPM
9,464View on GitHub
MiniCPM is a collection of small language models designed for local, on-device deployment in resource-constrained environments. The project focuses on running dense Transformer models on consumer hardware, including GPUs, CPUs, and Apple Silicon, without requiring custom code forks. The project distinguishes itself through heavy optimization for edge hardware, utilizing quantized weight compression in GGUF and MLX formats to reduce memory overhead. It implements advanced inference techniques such as speculative sampling and radix-tree prefix caching to accelerate generation speed and throughp
Jupyter Notebook
View on GitHub9,464

See all 30 alternatives to BigDL

intel-analyticsBigDLArchived

Features

Open-source alternatives to BigDL

intel/ipex-llm

facebookresearch/llama-recipes

intel-analytics/ipex-llm

OpenBMB/MiniCPM

Star history

Open-source alternatives to BigDL

intel/ipex-llm

facebookresearch/llama-recipes

intel-analytics/ipex-llm

OpenBMB/MiniCPM