# deepspeedai/deepspeedexamples

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/deepspeedai-deepspeedexamples).**

6,822 stars · 1,119 forks · Python · Apache-2.0

## Links

- GitHub: https://github.com/deepspeedai/DeepSpeedExamples
- awesome-repositories: https://awesome-repositories.com/repository/deepspeedai-deepspeedexamples.md

## Description

DeepSpeedExamples is a collection of reference implementations and scripts for training, fine-tuning, and executing inference on large-scale AI models using DeepSpeed optimization. It provides a distributed model training guide and practical workflows for adapting large language models through memory-efficient techniques.

The repository includes specialized implementations for pipeline parallelism to handle models exceeding single GPU memory and a suite of examples for ZeRO memory optimization to reduce per-device overhead. It also features standardized test suites for benchmarking the throughput and latency of models running on DeepSpeed inference engines.

The project covers broad capability areas including GPU memory optimization, distributed AI benchmarking, and high-performance model inference. It demonstrates the use of weight compression and distributed optimization to scale neural networks across multiple computing nodes.

## Tags

### Artificial Intelligence & ML

- [Distributed Memory Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-memory-optimizers.md) — Implements Zero Redundancy Optimizer (ZeRO) to partition model states and gradients across distributed GPUs.
- [Distributed Training](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-training.md) — Offers frameworks and utilities for scaling model training across multiple processors, GPUs, or nodes using distributed optimization.
- [Data-Parallel Training](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-training-frameworks/data-parallel-training.md) — Implements data-parallel training strategies to synchronize gradients across multiple compute nodes.
- [Distributed Training](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-training-frameworks/distributed-training.md) — Demonstrates how to configure data and model parallelism to train large neural networks across multiple nodes.
- [GPU Memory Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/gpu-memory-optimizers.md) — Manages optimizer states and model weights across CPU and GPU memory to optimize VRAM usage.
- [Inference Benchmarking Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/inference-benchmarking-tools.md) — Includes utilities for measuring processing speed, latency, and performance metrics of machine learning models across various hardware configurations.
- [Large Language Model Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/large-language-model-fine-tuning.md) — Provides practical workflows for adapting pre-trained large language models to specific tasks using distributed optimization. ([source](https://github.com/deepspeedai/deepspeedexamples#readme))
- [Large-Scale Model Training](https://awesome-repositories.com/f/artificial-intelligence-ml/large-scale-model-training.md) — Provides scripts and implementations for training large-scale models that exceed single-device memory capacity. ([source](https://github.com/deepspeedai/deepspeedexamples#readme))
- [Inference Optimizations](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/serving-and-runtime/inference-optimizations.md) — Uses specialized inference engines to reduce latency and increase throughput for model predictions. ([source](https://github.com/deepspeedai/deepspeedexamples#readme))
- [Model Parallelism Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/model-parallelism-frameworks.md) — Provides reference implementations for dividing neural network layers across multiple devices using pipeline parallelism.
- [Mixed Precision Training](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/distributed-and-accelerated-compute/training-acceleration-tools/mixed-precision-training.md) — Provides implementations for using 16-bit and 32-bit precision to reduce memory usage during training.
- [Model Performance Benchmarking](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-evaluation-analysis/model-analysis/model-performance-benchmarking.md) — Includes standardized test suites to measure and compare the execution speed and efficiency of model implementations. ([source](https://github.com/deepspeedai/deepspeedexamples#readme))
- [Model Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving/model-integration-pipelines/model-inference.md) — Includes workflows for executing predictions on trained models across diverse hardware configurations. ([source](https://github.com/deepspeedai/deepspeedexamples#readme))
- [Model Compression](https://awesome-repositories.com/f/artificial-intelligence-ml/neural-networks/model-compression.md) — Includes techniques for reducing the memory footprint and computational requirements of large models.
- [Pipeline Parallelism Partitioners](https://awesome-repositories.com/f/artificial-intelligence-ml/pipeline-parallelism-partitioners.md) — Implements utilities for partitioning large neural networks into sequential layers across multiple GPUs to enable pipeline-parallel training.
- [Weight Quantization](https://awesome-repositories.com/f/artificial-intelligence-ml/quantized-inference-runtimes/weight-quantization.md) — Demonstrates weight compression and quantization to improve inference deployment efficiency and speed.

### Education & Learning Resources

- [Reference Implementations](https://awesome-repositories.com/f/education-learning-resources/educational-resources/reference-and-media/books-docs-reference/code-examples/reference-implementations.md) — Provides functional application examples and codebases that serve as standardized models for implementing distributed AI training and inference.

### Data & Databases

- [Optimizer State Offloading](https://awesome-repositories.com/f/data-databases/memory-optimization-strategies/training-memory-optimizers/optimizer-state-offloading.md) — Implements mechanisms to move optimizer states from GPU memory to system RAM to support massive scales.

### Development Tools & Productivity

- [Model Execution Benchmarks](https://awesome-repositories.com/f/development-tools-productivity/performance-optimization-tools/performance-benchmarks/model-execution-benchmarks.md) — Provides tools for benchmarking the computational efficiency and hardware utilization of model execution.

### Part of an Awesome List

- [Natural Language Processing](https://awesome-repositories.com/f/awesome-lists/ai/natural-language-processing.md) — Listed in the “Natural Language Processing” section of the FunNLP awesome list.