DeepSpeedExamples

Features

Distributed Memory Optimizers - Implements Zero Redundancy Optimizer (ZeRO) to partition model states and gradients across distributed GPUs.
Reference Implementations - Provides functional application examples and codebases that serve as standardized models for implementing distributed AI training and inference.
Distributed Training - Offers frameworks and utilities for scaling model training across multiple processors, GPUs, or nodes using distributed optimization.
Data-Parallel Training - Implements data-parallel training strategies to synchronize gradients across multiple compute nodes.
Distributed Training - Demonstrates how to configure data and model parallelism to train large neural networks across multiple nodes.
GPU Memory Optimizers - Manages optimizer states and model weights across CPU and GPU memory to optimize VRAM usage.
Inference Benchmarking Tools - Includes utilities for measuring processing speed, latency, and performance metrics of machine learning models across various hardware configurations.
Large Language Model Fine-Tuning - Provides practical workflows for adapting pre-trained large language models to specific tasks using distributed optimization.
Large-Scale Model Training - Provides scripts and implementations for training large-scale models that exceed single-device memory capacity.
Inference Optimizations - Uses specialized inference engines to reduce latency and increase throughput for model predictions.
Model Parallelism Frameworks - Provides reference implementations for dividing neural network layers across multiple devices using pipeline parallelism.
Optimizer State Offloading - Implements mechanisms to move optimizer states from GPU memory to system RAM to support massive scales.
Mixed Precision Training - Provides implementations for using 16-bit and 32-bit precision to reduce memory usage during training.
Model Performance Benchmarking - Includes standardized test suites to measure and compare the execution speed and efficiency of model implementations.
Model Inference - Includes workflows for executing predictions on trained models across diverse hardware configurations.
Model Compression - Includes techniques for reducing the memory footprint and computational requirements of large models.
Pipeline Parallelism Partitioners - Implements utilities for partitioning large neural networks into sequential layers across multiple GPUs to enable pipeline-parallel training.
Weight Quantization - Demonstrates weight compression and quantization to improve inference deployment efficiency and speed.
Model Execution Benchmarks - Provides tools for benchmarking the computational efficiency and hardware utilization of model execution.
Natural Language Processing - Listed in the “Natural Language Processing” section of the FunNLP awesome list.

Open-source alternatives to DeepSpeedExamples

Similar open-source projects, ranked by how many features they share with DeepSpeedExamples.

microsoft/deepspeedexamples
microsoft/DeepSpeedExamples
6,822View on GitHub
DeepSpeedExamples is a collection of reference implementations for training and deploying large scale AI models using the DeepSpeed optimization library. It provides Python code examples for training massive models across multiple GPUs through distributed optimization techniques. The repository includes optimized patterns for deploying and running large language model predictions in production environments. It also serves as a guide for model compression to reduce memory footprints and as a source for performance benchmarks to measure execution speed and resource utilization. The project cov
Python
View on GitHub6,822
eleutherai/gpt-neox
EleutherAI/gpt-neox
7,392View on GitHub
gpt-neox is a distributed training system and framework for building large-scale autoregressive language models. It implements the transformer architecture and provides a toolkit for training models with billions of parameters by distributing weights across compute clusters. The framework distinguishes itself through extensive support for distributed model parallelism, including pipeline and sequence parallelism, to overcome single-device memory limits. It further supports sparse model architectures using a mixture of experts system with Sinkhorn-based routing. The project covers a broad ran
Pythondeepspeed-librarygpt-3language-model
View on GitHub7,392
artidoro/qlora
artidoro/qlora
10,929View on GitHub
This project is a quantized fine-tuning framework for large language models. It implements a low-rank adaptation library and a four-bit quantizer to reduce the GPU memory requirements needed to train large models. The framework utilizes four-bit quantization and low-rank adapters to enable model training on consumer-grade hardware. It further reduces the memory footprint through double quantization and a paged optimizer that offloads states to system RAM. The system supports distributed training across multiple GPUs to handle larger parameter scales and includes utilities for custom dataset
Jupyter Notebook
View on GitHub10,929
openbmb/minicpm
OpenBMB/MiniCPM
9,464View on GitHub
MiniCPM is a collection of small language models designed for local, on-device deployment in resource-constrained environments. The project focuses on running dense Transformer models on consumer hardware, including GPUs, CPUs, and Apple Silicon, without requiring custom code forks. The project distinguishes itself through heavy optimization for edge hardware, utilizing quantized weight compression in GGUF and MLX formats to reduce memory overhead. It implements advanced inference techniques such as speculative sampling and radix-tree prefix caching to accelerate generation speed and throughp
Jupyter Notebook
View on GitHub9,464

See all 30 alternatives to DeepSpeedExamples

deepspeedaiDeepSpeedExamples

Features

Open-source alternatives to DeepSpeedExamples

microsoft/DeepSpeedExamples

EleutherAI/gpt-neox

artidoro/qlora

OpenBMB/MiniCPM

Star history

Open-source alternatives to DeepSpeedExamples

microsoft/DeepSpeedExamples

EleutherAI/gpt-neox

artidoro/qlora

OpenBMB/MiniCPM