# zhaochenyang20/awesome-ml-sys-tutorial

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/zhaochenyang20-awesome-ml-sys-tutorial).**

5,371 stars · 350 forks · Python · apache-2.0

## Links

- GitHub: https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial
- awesome-repositories: https://awesome-repositories.com/repository/zhaochenyang20-awesome-ml-sys-tutorial.md

## Description

This project provides a comprehensive technical guide and framework for engineering large-scale machine learning systems. It covers the full lifecycle of model development, focusing on the infrastructure and computational principles required to build, train, and serve generative AI models across distributed GPU clusters.

The repository distinguishes itself by offering deep-dive tutorials and implementation strategies for complex system challenges. It emphasizes high-performance architectural primitives, such as collective communication orchestration, distributed tensor sharding, and static graph kernel capture. These capabilities are complemented by advanced inference optimizations, including speculative decoding, memory-efficient activation offloading, and tree-structured key-value cache prefix sharing, which collectively enable efficient model execution and resource management.

Beyond core training and inference, the project details a broad capability surface for managing agentic workflows and multimodal architectures. This includes automated reinforcement learning pipelines, structured grammar-based decoding for constrained output, and sophisticated traffic management for distributed request scheduling. The framework also provides extensive tooling for system observability, performance profiling, and hardware-aware resource allocation to ensure stability and efficiency in production environments.

## Tags

### Repository Format

- [Awesome List](https://awesome-repositories.com/f/repository-format/awesome-list.md) — A community-curated directory that catalogs and links out to other open-source projects, rather than a standalone tool you run yourself.

### Artificial Intelligence & ML

- [Distributed Training Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-training-frameworks.md) — Serves as a comprehensive distributed training framework for orchestrating multi-node model training and collective communication.
- [Distributed Training](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-training-frameworks/distributed-training.md) — Orchestrates large-scale model training across multiple GPU devices using data and model parallelism. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/fsdp/release_log/setup_fsdp.md))
- [Distributed Training Orchestration](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-training-orchestration.md) — Coordinates training and inference actors across GPU clusters using resource-aware placement groups. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/code-walk-through/readme_en.md))
- [Distributed Training Sharding](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-training-sharding.md) — Distributes model parameters and optimizer states across multiple GPU devices to enable large-scale model training. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/sys-design/readme-2.md))
- [Machine Learning Systems](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning-systems.md) — Provides a comprehensive framework for engineering and scaling distributed machine learning systems across GPU clusters. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial#readme))
- [Large Language Model Configurations](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/inference-optimization-and-tuning/large-language-model-configurations.md) — Configures and executes high-performance serving for large language models with support for tensor parallelism. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/qwen/coder.md))
- [Large Language Model Optimization](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/serving-and-runtime/large-language-model-optimization.md) — Provides comprehensive infrastructure and optimization techniques for accelerating large language model inference in production. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/tree/main/sglang))
- [Fully Sharded Data Parallelism](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/distributed-and-scaling-strategies/distributed-learning/fully-sharded-data-parallelism.md) — Implements memory-efficient training by partitioning model parameters, gradients, and optimizer states across data-parallel processes. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/fsdp/readme.md))
- [Model Serving](https://awesome-repositories.com/f/artificial-intelligence-ml/model-serving.md) — Provides a comprehensive inference engine for deploying and serving large language models via web endpoints. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/nvidia-dynamo/dynamo.md))
- [Reinforcement Learning](https://awesome-repositories.com/f/artificial-intelligence-ml/reinforcement-learning.md) — Provides technical guides on training workflows, rollout engine design, and memory-efficient weight update mechanisms for large-scale models. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial#readme))
- [Generative Inference Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/text-generation-apis/generative-inference-engines.md) — Acts as a high-performance generative AI inference engine with support for speculative decoding and hardware-aware execution. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/diffusion-llm/readme.md))
- [Agentic Training Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-training-frameworks.md) — Supports training models on complex multi-turn dialogues and multi-step tool-use workflows. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/fast_tokenization/multiturn_tokenization_and_masking_ZH.md))
- [Distributed Model Checkpointing](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-model-checkpointing.md) — Write model states to disk in a sharded format where each worker saves only its local portion to improve efficiency. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/sys-design/readme-2.md))
- [Training Loop Managers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/pipelines-and-orchestration/training-orchestration-systems/training-loop-managers.md) — Orchestrates the collection of model experiences and performs multi-epoch parameter updates for actor and critic models. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/OpenRLHF/readme.md))
- [Model Inference APIs](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/inference-servers-and-runtimes/model-inference-apis.md) — Generates content from models using web APIs or offline engines with support for streaming and tensor parallelism. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/diffusion-llm/readme-en.md))
- [Model Inference and Serving](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving.md) — Exposes model inference capabilities via web endpoints and direct engine execution for production use. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/code-walk-through/readme.md))
- [Multi-GPU Distribution](https://awesome-repositories.com/f/artificial-intelligence-ml/model-optimization/inference-deployment/model-deployment-toolkits/distributed-deployment-utilities/multi-gpu-distribution.md) — Manages high-speed data exchange between multiple graphics processors to synchronize gradients and model parameters across distributed clusters. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/tree/main/torch))
- [Model Quantization](https://awesome-repositories.com/f/artificial-intelligence-ml/model-optimization/quantization/model-quantization.md) — Defines hardware-specific quantization parameters and activation data types to optimize model memory usage and inference performance. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/quantization/quantization_architecture_en.md))
- [Multimodal Models](https://awesome-repositories.com/f/artificial-intelligence-ml/multimodal-models.md) — Provides technical infrastructure for aligning and processing interleaved multimodal inputs within reasoning engines.
- [Prefix Caching](https://awesome-repositories.com/f/artificial-intelligence-ml/prompt-caching/prefix-caching.md) — Organizes KV cache in a tree structure to share common prefixes across requests and reduce redundant computation. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/kvcache-code-walk-through/readme.md))
- [Reinforcement Learning Training Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/reinforcement-learning-training-pipelines.md) — Orchestrates the full reinforcement learning training cycle including rollout generation and policy updates. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme-2-EN.md))
- [Agentic Execution Loops](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-reasoning-loops/critic-agent-loops/agentic-execution-loops.md) — Executes conversational workflows that support tool calling and state management over multiple turns. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme-6.md))
- [Multi-turn Interaction Managers](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/conversational-voice-interaction/conversational-ai-agents/conversational-turn-detection/multi-turn-interaction-managers.md) — Manages stateful multi-turn dialogues and tool-calling workflows by injecting observations into the model context. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/vlm-multi-turn/readme.md))
- [Compute Graph Captures](https://awesome-repositories.com/f/artificial-intelligence-ml/compute-graph-builders/compute-graph-captures.md) — Captures GPU operation sequences into static graphs to eliminate kernel launch overhead. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/torch/cuda-graph/readme-2.md))
- [Distributed Training Scaling Utilities](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-training-scaling-utilities.md) — Documents strategies for parallelizing model training, including tensor parallelism and communication optimization via high-speed backends. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial#readme))
- [Expert Parallelism Configurations](https://awesome-repositories.com/f/artificial-intelligence-ml/expert-parallelism-configurations.md) — Uses all-to-all communication patterns to dispatch tokens to relevant experts and aggregate results across distributed hardware. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/sys-design/readme-4-en.md))
- [Chunked Prefill Mechanisms](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/diffusion-visual-models/generation-utilities/chunked-prefill-mechanisms.md) — Implements chunked prefill mechanisms to process large input sequences in segments, maintaining attention masking across fragmented computation steps. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/light-duoattention/light-duoattention.md))
- [Distributed Gradient Synchronization](https://awesome-repositories.com/f/artificial-intelligence-ml/gradient-computation/distributed-gradient-synchronization.md) — Coordinates gradient updates across multiple compute nodes to ensure consistent model training in distributed environments. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/sys-design/readme-2.md))
- [Inference Backends](https://awesome-repositories.com/f/artificial-intelligence-ml/inference-backends.md) — Provides hardware-agnostic layers for executing models across diverse computing environments. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/OpenRLHF/develop-log.md))
- [Inference Pipeline Orchestrators](https://awesome-repositories.com/f/artificial-intelligence-ml/inference-pipeline-orchestrators.md) — Provides a unified framework for chaining sequential inference tasks like audio encoding, reasoning, and synthesis. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/transformers/omni/readme-en.md))
- [Large-Scale Model Training](https://awesome-repositories.com/f/artificial-intelligence-ml/large-scale-model-training.md) — Splits large neural network layers across multiple devices to enable training of models exceeding single-processor memory. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/tree/main/torch))
- [Long Context Training Optimizations](https://awesome-repositories.com/f/artificial-intelligence-ml/long-context-training-optimizations.md) — Segments input sequences across GPUs to enable memory-efficient training on long-context tasks. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/sys-design/readme-3.md))
- [Policy Loss Calculators](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/architectures/neural-network-components/loss-functions/perceptual-loss/content-loss-calculators/focal-loss-calculators/detection-loss-calculators/policy-loss-calculators.md) — Computes loss for policy updates using clipped probability ratios to constrain model changes and ensure stable training updates. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/tree/main/rlhf/OpenRLHF))
- [Mixed Precision Training](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/distributed-and-accelerated-compute/training-acceleration-tools/mixed-precision-training.md) — Implements low-precision training techniques to reduce memory footprint and accelerate reinforcement learning workflows. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/fp8/readme_en.md))
- [Reward Functions](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/objectives-and-optimization/mathematical-training-objectives/reward-functions.md) — Integrates custom reward functions for evaluating model outputs with support for asynchronous execution. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme-5-EN.md))
- [Training Lifecycle Management](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/pipelines-and-orchestration/training-orchestration-systems/training-lifecycle-management.md) — Orchestrates the training process by handling checkpointing, logging, and state resumption. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme-5-EN.md))
- [Model Rollout Executions](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/pipelines-and-orchestration/training-orchestration-systems/training-methodologies/reinforcement-learning-integrations/model-rollout-executions.md) — Orchestrates asynchronous model generation cycles and manages state transitions between text generation and tool execution. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme-2.md))
- [Asynchronous Rollout Decoupling](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/pipelines-and-orchestration/training-orchestration-systems/training-methodologies/reinforcement-learning-integrations/model-rollout-executions/asynchronous-rollout-decoupling.md) — Separates model generation from gradient updates to enable asynchronous training and flexible deployment. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/code-walk-through/readme.md))
- [Request Schedulers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving/inference-engines/request-schedulers.md) — Prioritizes incoming inference requests to optimize batch composition and system throughput. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/scheduler/readme.md))
- [Speculative Decoding Strategies](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving/inference-optimization/inference-acceleration-techniques/speculative-decoding-strategies.md) — Uses lightweight draft models to generate tokens in parallel, verified by larger target models to improve sampling throughput. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/spec/readme-en.md))
- [ML Performance Profilers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/training-algorithms/machine-learning-optimization/ml-performance-profilers.md) — Provides profiling and optimization tools to analyze hardware topology and memory patterns for high-performance machine learning.
- [Pretrained Sequence Model Loaders](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/data-and-checkpointing/model-loading/pretrained-sequence-model-loaders.md) — Provides utilities for downloading and initializing pretrained model weights for inference and fine-tuning. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/how-model-is-loaded/readme.md))
- [Asynchronous Training](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/distributed-and-scaling-strategies/asynchronous-training-utilities/asynchronous-training.md) — Decouples rollout generation from gradient updates to enable simultaneous processing and increase training throughput. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/code-walk-through/readme_en.md))
- [Mixture of Experts](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-customization/mixture-of-experts.md) — Partitions model experts across multiple devices to enable training and inference of massive models that exceed single-GPU capacity. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/sys-design/readme-4.md))
- [Training Performance Profiling](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/training-monitoring-and-profiling/training-performance-profiling.md) — Records granular timestamps across distributed workers to analyze throughput and resource efficiency during model training. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/tool_examples/profile_en.md))
- [Model Quantization](https://awesome-repositories.com/f/artificial-intelligence-ml/model-quantization.md) — Reduces memory footprint and accelerates inference by applying precision schemes like FP8 or INT8 to model weights. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/quantization/quantization_architecture.md))
- [Dynamic Weight Updates](https://awesome-repositories.com/f/artificial-intelligence-ml/model-weight-management/dynamic-weight-updates.md) — Injects new weight tensors into a running distributed engine to update model parameters dynamically without requiring a full restart. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/sglang-verl-engine/readme.md))
- [Weight Offloading](https://awesome-repositories.com/f/artificial-intelligence-ml/model-weight-management/weight-offloading.md) — Moves inactive model weights and optimizer states to CPU memory to free GPU capacity for larger batch sizes. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/fsdp/readme.md))
- [Multi-GPU Training Utilities](https://awesome-repositories.com/f/artificial-intelligence-ml/multi-gpu-training-utilities.md) — Performs collective operations across multiple GPUs to enable distributed training and data exchange. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/torch/nccl/readme.md))
- [Policy Gradient Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/policy-gradient-optimizers.md) — Updates actor and critic networks based on calculated advantages and performance metrics to improve decision-making. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme-3.md))
- [Precision Quantization](https://awesome-repositories.com/f/artificial-intelligence-ml/precision-quantization.md) — Converts model weights and activations to FP8 formats to balance numerical precision with hardware efficiency. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/fp8/readme_en.md))
- [Value Loss Calculators](https://awesome-repositories.com/f/artificial-intelligence-ml/prediction-visualization/loss-function-calculators/value-loss-calculators.md) — Minimizes the difference between predicted token values and actual returns using mean squared error to refine value model accuracy. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/tree/main/rlhf/OpenRLHF))
- [Policy Clipping](https://awesome-repositories.com/f/artificial-intelligence-ml/reinforcement-learning-optimizers/policy-clipping.md) — Updates the actor model using a clipped objective function to prevent large policy shifts during reinforcement learning. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/OpenRLHF/readme.md))
- [Reinforcement Learning Training Utilities](https://awesome-repositories.com/f/artificial-intelligence-ml/reinforcement-learning-training-utilities.md) — Update draft model parameters during reinforcement learning by calculating cross-entropy loss against the target model's hidden states. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/spec/readme-en.md))
- [Reinforcement Learning Value Estimators](https://awesome-repositories.com/f/artificial-intelligence-ml/reinforcement-learning-value-estimators.md) — Implements generalized advantage estimation methods to determine the relative benefit of actions during reinforcement learning. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/OpenRLHF/readme.md))
- [Structured Output Parsers](https://awesome-repositories.com/f/artificial-intelligence-ml/structured-output-parsers.md) — Enforces schemas on model-generated content to ensure reliable data integration and structural compliance. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/constraint-decoding/readme.md))
- [Grammar-Constrained Samplers](https://awesome-repositories.com/f/artificial-intelligence-ml/text-generation-strategies/token-prediction/grammar-constrained-samplers.md) — Pre-calculates vocabulary masks to enforce grammar constraints without blocking GPU sampling pipelines. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/zero-overhead-scheduler/zero-overhead-batch-scheduler.md))
- [Activation Recomputation Strategies](https://awesome-repositories.com/f/artificial-intelligence-ml/activation-recomputation-strategies.md) — Minimizes peak memory usage by recomputing intermediate activations during the backward pass. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/sys-design/readme-2.md))
- [Critic Loss Minimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/actor-critic-architectures/critic-loss-minimizers.md) — Trains the critic model by minimizing the mean squared error between predicted value estimates and actual returns. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/OpenRLHF/readme.md))
- [Custom Tool Definitions](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/integration-deployment/agent-frameworks/tool-definitions-and-registration/custom-tool-definitions.md) — Provides frameworks for defining custom tools by extending base classes with specific action and observation schemas. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/release_log/verl-multiturn-rollout-Release_ZH.md))
- [Checkpoint Management](https://awesome-repositories.com/f/artificial-intelligence-ml/checkpoint-management.md) — Supports sharded checkpointing to persist model states across multiple ranks without global aggregation. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/sys-design/readme-2-en.md))
- [Sequence Packing](https://awesome-repositories.com/f/artificial-intelligence-ml/convolutional-operations/input-padding-utilities/padding-maskers/sequence-padding-utilities/sequence-packing.md) — Combines sequences of varying lengths into continuous vectors to eliminate computational waste from padding. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/fsdp/readme_en.md))
- [Reward Modeling](https://awesome-repositories.com/f/artificial-intelligence-ml/custom-model-training/reward-modeling.md) — Calculates token-level rewards by combining divergence scores between models with final-token reward scores from a reward model. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/tree/main/rlhf/OpenRLHF))
- [Custom State Tracking](https://awesome-repositories.com/f/artificial-intelligence-ml/custom-state-tracking.md) — Maintains request lifecycles and state machines for managing tool calls and conversation history in agentic workflows. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme-2-EN.md))
- [Data Preprocessing Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/data-preprocessing-pipelines.md) — Converts raw datasets into structured formats by generating prompts and parsing ground truth answers for model training. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme_EN.md))
- [Diffusion Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/diffusion-pipelines.md) — Executes iterative noise-refinement image generation using optimized pipelines for diffusion models. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/code-walk-through/sgl_diffusion.md))
- [External Tool Execution](https://awesome-repositories.com/f/artificial-intelligence-ml/external-tool-execution.md) — Enables asynchronous multi-turn dialogues where models independently invoke external tools within a unified training workflow. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/release_log/verl-multiturn-rollout-Release_ZH.md))
- [Output Constraint Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/decoding-generation-controls/output-constraint-engines.md) — Enforces structured output formats like JSON or specific grammars during model inference. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/tree/main/sglang))
- [Generation Accelerators](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-models/generation-accelerators.md) — Increases inference throughput by using smaller draft models to predict tokens in parallel for verification by larger models. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/speculative-decoding/speculative-decoding.md))
- [GPU Kernel Implementations](https://awesome-repositories.com/f/artificial-intelligence-ml/gpu-kernel-implementations.md) — Records GPU operation sequences into static graphs to eliminate kernel launch overhead during repetitive execution. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/torch/cuda-graph/readme-2-en.md))
- [Inference Scaling](https://awesome-repositories.com/f/artificial-intelligence-ml/inference-scaling.md) — Distributes model computation across multiple physical machines to handle large-scale inference tasks. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/nvidia-dynamo/dynamo.md))
- [Knowledge Retrieval Systems](https://awesome-repositories.com/f/artificial-intelligence-ml/knowledge-retrieval-systems.md) — Implements dense retrieval engines to provide external knowledge context for models during training and inference. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/tool_examples/verl-multiturn-searchR1-like_ZH.md))
- [Sequence Importance Sampling](https://awesome-repositories.com/f/artificial-intelligence-ml/large-language-models/reinforcement-learning-alignment/sequence-importance-sampling.md) — Corrects for distribution shifts between rollout and training policies by weighting updates based on action probability ratios. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/mismatch/blog-cn.md))
- [Inference Optimizations](https://awesome-repositories.com/f/artificial-intelligence-ml/long-context-training-optimizations/inference-optimizations.md) — Optimizes memory and compute efficiency for long-sequence inference using specialized attention hardware kernels. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/light-duoattention/light-duoattention.md))
- [Inference Optimization Techniques](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/inference-optimization-and-tuning/inference-optimization-techniques.md) — Implements speculative decoding, chunked prefill, and zero-overhead scheduling to improve generative model throughput and latency. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial#readme))
- [Model Training and Inference Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-management/model-training-and-inference-engines.md) — Ensures bitwise-identical operator execution between training and rollout phases using deterministic kernels to eliminate numerical divergence. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/fp8/readme.md))
- [Numerical Consistency Verifiers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-management/model-training-and-inference-engines/numerical-consistency-verifiers.md) — Aligns log-probability calculations between training and inference backends to eliminate numerical drift. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/fsdp/readme_en.md))
- [Large Model Optimizations](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/serving-and-runtime/inference-optimizations/large-model-optimizations.md) — Constructs large models using deferred allocation to bypass GPU memory limits during initialization. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/sys-design/readme-2-en.md))
- [Model Compression Suites](https://awesome-repositories.com/f/artificial-intelligence-ml/model-optimization/compression-techniques/model-pruning/model-compression-suites.md) — Converts high-precision model weights into packed formats to reduce memory footprint and enable single-node execution of large-scale models. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/int4/readme-en.md))
- [Attention Kernel Fusion](https://awesome-repositories.com/f/artificial-intelligence-ml/model-optimization/inference-deployment/attention-backends/attention-kernel-fusion.md) — Optimizes attention computation by fusing projection and attention operations into specialized kernels. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/light-duoattention/light-duoattention.md))
- [Performance Profilers](https://awesome-repositories.com/f/artificial-intelligence-ml/model-optimization/profiling-and-benchmarking/performance-profilers.md) — Provides interactive tools to analyze execution bottlenecks, memory usage, and latency during the inference lifecycle. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/tree/main/sglang))
- [Multimodal Agent Capabilities](https://awesome-repositories.com/f/artificial-intelligence-ml/multimodal-agent-capabilities.md) — Enables agents to process and generate non-textual data like images, audio, and video. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/transformers/omni/readme-en.md))
- [Multimodal Integration Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/multimodal-integration-frameworks.md) — Integrates diverse data modalities like text, audio, and visual inputs into unified reasoning pipelines.
- [Multimodal Token Interleaving](https://awesome-repositories.com/f/artificial-intelligence-ml/multimodal-models/multimodal-token-interleaving.md) — Processes interleaved multimodal tokens to generate coherent responses while maintaining semantic context. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/transformers/omni/readme.md))
- [Mixed Granularity Quantization](https://awesome-repositories.com/f/artificial-intelligence-ml/precision-quantization/mixed-granularity-quantization.md) — Balances numerical precision and hardware efficiency by applying per-token and per-block quantization. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/fp8/readme.md))
- [Quantized Inference Runtimes](https://awesome-repositories.com/f/artificial-intelligence-ml/quantized-inference-runtimes.md) — Executes quantized models using hardware-specific acceleration for improved inference performance. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/quantization/quantization_architecture_en.md))
- [Reinforcement Learning Data Filters](https://awesome-repositories.com/f/artificial-intelligence-ml/reinforcement-learning-data-filters.md) — Selects high-quality training data based on performance metrics to optimize model learning efficiency during reinforcement learning. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/dapo.md))
- [Reinforcement Learning Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/reinforcement-learning-optimizers.md) — Applies fine-grained loss masking and reward propagation strategies to ensure accurate gradient updates across complex interaction sequences. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/release_log/verl-multiturn-rollout-Release_ZH.md))
- [Sequence Parallelism Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/sequence-parallelism-frameworks.md) — Slices input data across sequence dimensions to support longer response lengths than a single device can process. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/fsdp/readme_en.md))
- [Tool-Augmented Reasoning Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/tool-augmented-reasoning-engines.md) — Connects large language models to external data and function execution environments for search-augmented reasoning. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/tool_examples/verl-multiturn-searchR1-like_ZH.md))
- [Training Configurations](https://awesome-repositories.com/f/artificial-intelligence-ml/training-configurations.md) — Loads model-specific configuration parameters into the environment to prepare the system for distributed training tasks. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/fsdp/release_log/setup_fsdp.md))
- [Batch Size Scaling](https://awesome-repositories.com/f/artificial-intelligence-ml/training-convergence-optimization/batch-size-scaling.md) — Manages training throughput and memory usage through dynamic token-based batching. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme-5-EN.md))
- [Asynchronous Prefetchers](https://awesome-repositories.com/f/artificial-intelligence-ml/training-data-prefetchers/asynchronous-prefetchers.md) — Overlaps communication and computation by proactively fetching model layers during forward and backward passes. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/sys-design/readme-4-en.md))
- [Agent Lifecycle Management](https://awesome-repositories.com/f/artificial-intelligence-ml/agent-lifecycle-management.md) — Orchestrates the execution of agent loops and manages the lifecycle of worker processes and language model servers. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme-6.md))
- [Attention Mechanisms](https://awesome-repositories.com/f/artificial-intelligence-ml/attention-mechanisms.md) — Calculates relevance between input sequences using dot products and softmax normalization to determine data prioritization. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/transformers/attention/cross_attention_en.md))
- [Streaming Attention Kernels](https://awesome-repositories.com/f/artificial-intelligence-ml/attention-mechanisms/streaming-attention-kernels.md) — Applies attention masks to retain sink and window tokens for efficient long-sequence processing. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/light-duoattention/light-duoattention.md))
- [Audio Tokenization](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-tokenization.md) — Converts high-frequency continuous audio waveforms into low-frequency discrete token sequences using vector quantization to enable efficient processing. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/transformers/omni/readme-en.md))
- [Batch Inference Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/batch-inference-engines.md) — Executes inference on large collections of input prompts and saves generated responses to structured output files. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/nvidia-dynamo/dynamo.md))
- [Curriculum Learning Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/curriculum-learning-frameworks.md) — Trains models on tasks of increasing difficulty to improve training efficiency and convergence speed. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/partial-rollout/readme.md))
- [Custom Decoding Strategies](https://awesome-repositories.com/f/artificial-intelligence-ml/decoder-architectures/custom-decoding-strategies.md) — Customizes diffusion decoding behavior through external configuration files to decouple algorithm parameters from core inference logic. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/diffusion-llm/readme.md))
- [Distributed Layer Synchronizers](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-acceleration-layers/distributed-layer-synchronizers.md) — Coordinates data movement between independent attention workers and shared MLP layers to maintain state consistency across parallel processing units. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/dp-attention/readme.md))
- [Distributed Inference Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-inference-frameworks.md) — Integrates high-performance inference engines to support distributed, multi-turn tool usage and efficient sequence generation. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme_EN.md))
- [Dynamic Parameter Initialization](https://awesome-repositories.com/f/artificial-intelligence-ml/dynamic-parameter-initialization.md) — Defers model parameter creation to CPU or virtual devices to optimize memory usage during startup. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/sys-design/readme-2.md))
- [Encoder-Decoder Architectures](https://awesome-repositories.com/f/artificial-intelligence-ml/encoder-decoder-architectures.md) — Connects data sequences by matching decoder queries against encoder keys for generation tasks. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/transformers/attention/cross_attention_en.md))
- [Feature Alignment](https://awesome-repositories.com/f/artificial-intelligence-ml/feature-alignment.md) — Aligns visual embeddings with language model sequences to maintain spatial context across data types. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/code-walk-through/multimodal_request_lifecycle.md))
- [Sequence Completion Sampling](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/decoding-generation-controls/ai-completion-services/ai-completion-sampling/sequence-completion-sampling.md) — Adjusts model generation behavior during training using sampling parameters like temperature and top-p. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme-5-EN.md))
- [Cross-Attention Mechanisms](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/diffusion-visual-models/generative-ai-architectures/cross-attention-mechanisms.md) — Integrates multi-modal conditioning signals into neural network layers using cross-attention mechanisms. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/transformers/attention/cross_attention.md))
- [Diffusion Models](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/diffusion-visual-models/generative-ai-models/diffusion-models.md) — Deploys diffusion-based models via a unified server interface supporting custom decoding algorithms. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/diffusion-llm/readme-en.md))
- [Reinforcement Learning Sampling](https://awesome-repositories.com/f/artificial-intelligence-ml/gradient-computation/gradient-based-sampling/reinforcement-learning-sampling.md) — Allocates training samples dynamically to focus learning on model weaknesses and improve training efficiency. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/partial-rollout/readme.md))
- [Dynamic Sampling Filters](https://awesome-repositories.com/f/artificial-intelligence-ml/gradient-computation/gradient-based-sampling/reinforcement-learning-sampling/dynamic-sampling-filters.md) — Applies dynamic filters to training samples to optimize learning efficiency and reward diversity. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/code-walk-through/readme_en.md))
- [Consistency Enforcement Kernels](https://awesome-repositories.com/f/artificial-intelligence-ml/inference-optimization-kernels/consistency-enforcement-kernels.md) — Ensures bitwise identical log-probability calculations by standardizing kernels and disabling non-deterministic optimizations. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/fsdp/readme.md))
- [Model Compilers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/frameworks/training-systems/model-performance-optimizations/model-compilers.md) — Transforms high-level neural network definitions into optimized, hardware-specific executable code for dynamic workloads. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/torch/cuda-graph/readme_en.md))
- [Training Configuration Management](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/training-configuration-management.md) — Provides configurable templates to preserve reasoning traces and optimize context usage for agentic models. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/fast_tokenization/multiturn_tokenization_and_masking.md))
- [Model Inference Accelerators](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/inference-servers-and-runtimes/model-inference-accelerators.md) — Boosts token generation speeds during reinforcement learning and inference by integrating speculative decoding and chunked parallel computation. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/README.md))
- [Inference Adapters](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving/model-integration-pipelines/model-inference/inference-adapters.md) — Separates model training from inference engines to prevent resource contention during high-demand operations. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/server-based/veRL-server-based-rollout.md))
- [Data Preprocessing](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/data-and-checkpointing/data-preprocessing.md) — Provides utilities for loading, tokenizing, and preprocessing diverse datasets for reinforcement learning and multimodal training. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme-2-EN.md))
- [Expert Load Balancers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-customization/mixture-of-experts/expert-selection-analysis/expert-load-balancers.md) — Mitigates computational bottlenecks by managing expert routing to prevent load imbalances. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/sys-design/readme-4-en.md))
- [Memory Optimization Techniques](https://awesome-repositories.com/f/artificial-intelligence-ml/memory-optimization-techniques.md) — Offloads model parameters and optimizer states to CPU memory to enable larger batch sizes and model capacities. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/sys-design/readme-2.md))
- [Mixed Precision Training Utilities](https://awesome-repositories.com/f/artificial-intelligence-ml/mixed-precision-training-utilities.md) — Simulates low-precision arithmetic during training to ensure model convergence with reduced precision. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/int4/readme-en.md))
- [Attention Backends](https://awesome-repositories.com/f/artificial-intelligence-ml/model-optimization/inference-deployment/attention-backends.md) — Performs token generation using optimized attention backends to accelerate computation and memory reuse. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/code-walk-through/readme.md))
- [Training Backend Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/model-optimization/training-efficiency/training-backend-optimizers.md) — Maintains bit-level consistency between rollout and training operations using batch-invariant operators and deterministic kernels. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/mismatch/blog-cn.md))
- [Model Parameter Management](https://awesome-repositories.com/f/artificial-intelligence-ml/model-parameter-management.md) — Serializes model weights into dictionary structures to facilitate saving, loading, and memory analysis. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/how-model-is-loaded/readme.md))
- [Model Training](https://awesome-repositories.com/f/artificial-intelligence-ml/model-training.md) — Mitigates training collapse by using truncated importance sampling to re-weight policy gradient losses during model training. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/fsdp/readme_en.md))
- [Model Training Interfaces](https://awesome-repositories.com/f/artificial-intelligence-ml/model-training-interfaces.md) — Standardizes APIs for model state management, output processing, and observation formatting in training loops. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/vlm-multi-turn/readme-en.md))
- [Model Weight Converters](https://awesome-repositories.com/f/artificial-intelligence-ml/model-weight-converters.md) — Exports high-precision model weights into compact formats compatible with optimized inference engines to reduce memory footprint and accelerate execution. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/int4/readme.md))
- [Multimodal Input Processors](https://awesome-repositories.com/f/artificial-intelligence-ml/multimodal-input-processors.md) — Processes diverse multimodal inputs including images and text into token sequences for model consumption. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/code-walk-through/multimodal_request_lifecycle.md))
- [Text-to-Speech](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech.md) — Converts semantic text and acoustic hidden states into audible waveforms using autoregressive token generation and vocoder decoding. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/transformers/omni/readme-en.md))
- [Training Memory Management](https://awesome-repositories.com/f/artificial-intelligence-ml/training-memory-management.md) — Reduces GPU memory consumption by offloading weights and optimizer states to CPU during training and rollout. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/fsdp/readme_en.md))
- [Agent Configuration Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/integration-deployment/agent-configuration-tools.md) — Configures tool interactions via schema definitions and manages response truncation for agentic tasks. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme-6.md))
- [Parallel Tool Execution](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/integration-deployment/agent-frameworks/tool-use-and-execution/parallel-tool-execution.md) — Coordinates tool calls within parallel training environments by restricting execution to primary ranks. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/release_log/verl-multiturn-rollout-Release_ZH.md))
- [Chat Template Configurations](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/decoding-generation-controls/chat-template-management/chat-template-formatters/chat-template-configurations.md) — Selects between training and inference templates to balance consistency with training data against optimized context formatting. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/fast_tokenization/multiturn_tokenization_and_masking_ZH.md))
- [Gradient Computation](https://awesome-repositories.com/f/artificial-intelligence-ml/gradient-computation.md) — Prevents auxiliary training losses from updating shared model parameters by detaching gradient flows. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/spec/readme-en.md))
- [Asynchronous Kernel Launchers](https://awesome-repositories.com/f/artificial-intelligence-ml/kernel-schedulers/asynchronous-kernel-launchers.md) — Decouples task scheduling from kernel execution to allow the scheduler to proceed without waiting for hardware. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/zero-overhead-scheduler/zero-overhead-batch-scheduler.md))
- [Partial Rollout Runners](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/pipelines-and-orchestration/training-orchestration-systems/training-methodologies/reinforcement-learning-integrations/model-rollout-executions/partial-rollout-runners.md) — Trains on completed samples as they arrive to prevent long-tail blocking during reinforcement learning. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/partial-rollout/readme.md))
- [Training and Evaluation Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/training-frameworks/training-and-evaluation-pipelines.md) — Provides automated workflows for filtering and validating training prompts to ensure data quality and prevent reward hacking. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/partial-rollout/readme.md))
- [Lazy Scheduling Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/model-operation-schedulers/lazy-scheduling-pipelines.md) — Pipelines inference tasks using lazy resolution to minimize cross-device synchronization latency. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/scheduler/readme.md))
- [Quantization Plugin Interfaces](https://awesome-repositories.com/f/artificial-intelligence-ml/model-optimization/quantization/quantization-plugin-interfaces.md) — Allows adding custom quantization methods by implementing configuration and weight processing classes without modifying core framework logic. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/quantization/quantization_architecture_en.md))
- [Multimodal Training](https://awesome-repositories.com/f/artificial-intelligence-ml/multimodal-training.md) — Buffers fragmented multimodal tensors across turns and performs single concatenation operations to minimize memory allocation overhead. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/vlm-multi-turn/readme-en.md))
- [Sequence Alignment Models](https://awesome-repositories.com/f/artificial-intelligence-ml/sequence-alignment-models.md) — Implements attention-based architectures for dynamic alignment in sequence-to-sequence tasks. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/transformers/attention/cross_attention.md))
- [Attention Parallelism Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/tensor-parallelism/attention-parallelism-optimizers.md) — Reduces KV cache memory usage in models with few KV heads by using data parallelism for attention layers. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/dp-attention/readme.md))
- [Length Penalization](https://awesome-repositories.com/f/artificial-intelligence-ml/text-sequence-processing/sequence-length-constraints/length-penalization.md) — Adjusts reward signals based on sequence length to discourage overly verbose reasoning paths. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/partial-rollout/readme.md))

### DevOps & Infrastructure

- [Training Orchestrators](https://awesome-repositories.com/f/devops-infrastructure/worker-node-management/distributed-orchestration/training-orchestrators.md) — Coordinates distributed reinforcement learning training cycles and policy updates across GPU clusters. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme-2.md))
- [Distributed Task Orchestration](https://awesome-repositories.com/f/devops-infrastructure/distributed-task-orchestration.md) — Utilizes remote task and actor primitives to scale training workloads across multi-node clusters. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme.md))
- [Distributed Inference Clusters](https://awesome-repositories.com/f/devops-infrastructure/multi-gpu-deployment/distributed-inference-clusters.md) — Enables distributed operation by allowing inference servers to communicate across multiple network nodes. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/server-based/veRL-server-based-rollout.md))
- [Resource Allocation](https://awesome-repositories.com/f/devops-infrastructure/resource-allocation.md) — Manages hardware resource distribution and process placement across distributed training and inference clusters. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/areal/code-walk-through_CN.md))
- [AI Model Load Balancers](https://awesome-repositories.com/f/devops-infrastructure/traffic-load-balancers/ai-model-load-balancers.md) — Distributes incoming requests across multiple language model servers using load balancing algorithms to optimize performance and enable prefix caching. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme-6.md))
- [Compute Throughput Optimizers](https://awesome-repositories.com/f/devops-infrastructure/performance-optimization-utilities/compute-throughput-optimizers.md) — Maximizes hardware efficiency through tensor parallelism, activation offloading, and custom fused kernels. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/fsdp/release_log/setup_fsdp.md))
- [Device Mesh Topologies](https://awesome-repositories.com/f/devops-infrastructure/device-mesh-topologies.md) — Organizes compute resources into logical meshes to coordinate parallel execution and tensor sharding. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/sglang-verl-engine/readme.md))
- [Multi-GPU Load Balancing](https://awesome-repositories.com/f/devops-infrastructure/multi-gpu-load-balancing.md) — Equalizes token counts across GPU ranks to prevent performance bottlenecks caused by uneven sequence lengths. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/areal/code-walk-through_CN.md))
- [Inference Load Balancers](https://awesome-repositories.com/f/devops-infrastructure/traffic-load-balancers/inference-load-balancers.md) — Redistributes generated sequences across GPU ranks using bin-packing algorithms to ensure consistent computational load. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/areal/code-walk-through_EN.md))
- [Process Grouping Utilities](https://awesome-repositories.com/f/devops-infrastructure/process-grouping-utilities.md) — Organizes processes into custom communication units to enable flexible, isolated data exchange strategies. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/torch/torch-distributed/readme.md))

### Networking & Communication

- [Parallelism Integrators](https://awesome-repositories.com/f/networking-communication/distributed-systems-p2p/distributed-computing/model-parallelism-techniques/model-parallelism-strategies/parallelism-integrators.md) — Distributes large-scale model training across multiple GPUs using combinations of tensor, pipeline, and data parallelism. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/sys-design/readme-3.md))
- [Request Batching](https://awesome-repositories.com/f/networking-communication/network-infrastructure-routing/network-routing-traffic-management/network-traffic-management/request-batching.md) — Prioritizes and batches incoming requests for prefill and decoding phases using dynamic reordering strategies. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/scheduler/readme-en.md))
- [Distributed Parameter Synchronisation](https://awesome-repositories.com/f/networking-communication/distributed-systems-p2p/distributed-computing/model-parallelism-techniques/distributed-parameter-synchronisation.md) — Coordinates gradient updates and parameter synchronization across multiple compute nodes using collective communication protocols. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/latency-accelerate-for-weight-updates/readme.md))
- [Response Streaming](https://awesome-repositories.com/f/networking-communication/api-integration-frameworks/http-client-libraries/http-client-utilities/response-streaming.md) — Streams generated text incrementally to clients to reduce perceived latency and improve user experience. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/code-walk-through/readme.md))
- [Hardware Topology Optimizers](https://awesome-repositories.com/f/networking-communication/network-topology-extensions/topology-abstraction-layers/hardware-topology-optimizers.md) — Automatically detects and selects the most efficient data transfer path between GPUs based on hardware interconnects. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/torch/nccl/readme_en.md))
- [High-Performance Data Transfer](https://awesome-repositories.com/f/networking-communication/high-performance-data-transfer.md) — Provides high-performance data transfer mechanisms to separate control signaling from large tensor data movement. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/sglang-omni/why-sglang-omni.md))

### Part of an Awesome List

- [Alignment and RLHF](https://awesome-repositories.com/f/awesome-lists/ai/alignment-and-rlhf.md) — Provides comprehensive reinforcement learning workflows including tool calling and search integration for model alignment. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/README.md))
- [Training and Fine-Tuning](https://awesome-repositories.com/f/awesome-lists/ai/training-and-fine-tuning.md) — Provides configuration for reinforcement learning strategies including advantage estimation and preference-feedback mechanisms. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme-5-EN.md))

### Data & Databases

- [Collective Communication Operations](https://awesome-repositories.com/f/data-databases/collective-communication-operations.md) — Coordinates high-performance collective communication operations like AllReduce and Broadcast across distributed GPU clusters. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/torch/torch-distributed/readme.md))
- [Distributed Tensor Sharding](https://awesome-repositories.com/f/data-databases/distributed-tensor-sharding.md) — Handles model parameters as distributed objects to simplify sharding and state management across GPU clusters. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/sys-design/readme-2.md))
- [Modular Pipeline Orchestration](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/processing-pipelines/modular-pipeline-orchestration.md) — Decomposes complex processing tasks into independent, swappable stages to transform raw inputs into final outputs. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/transformers/omni/readme.md))
- [Cache Quantization](https://awesome-repositories.com/f/data-databases/storage-engines/key-value/cache-quantization.md) — Reduces memory usage during long-context generation by storing attention caches in lower-precision formats. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/quantization/quantization_architecture.md))
- [Batch Matrix Multiplication Utilities](https://awesome-repositories.com/f/data-databases/batch-processing/batch-matrix-multiplication-utilities.md) — Distributes large matrix multiplications across multiple processing units to fit within memory constraints. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/torch/tensor-parallelism/readme.md))
- [Shared Memory Data Exchange](https://awesome-repositories.com/f/data-databases/shared-memory-data-exchange.md) — Enables zero-copy memory sharing between training and inference engines to minimize data transfer overhead. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/sys-design/readme-1-EN.md))
- [Data Exchange Protocols](https://awesome-repositories.com/f/data-databases/data-exchange-protocols.md) — Implements high-performance interfaces for transferring tensor data between system components with minimal overhead. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme-2-EN.md))
- [Batched Data Loading](https://awesome-repositories.com/f/data-databases/data-pipeline-orchestration/data-engineering-pipelines/batched-data-loading.md) — Provides automated collation of data samples into batches to optimize GPU utilization during training. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme-5-EN.md))
- [Pipeline Customizers](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/processing-pipelines/pipeline-customizers.md) — Defines custom generation logic and verification steps within the data rollout phase. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/code-walk-through/readme.md))
- [Sync Parameter Configurations](https://awesome-repositories.com/f/data-databases/data-synchronization-configurations/sync-endpoint-configurations/sync-parameter-configurations.md) — Provides protocols for resharding and synchronizing parameters between training and rollout engines. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/readme.md))
- [Shared Memory Transports](https://awesome-repositories.com/f/data-databases/shared-memory-transports.md) — Implements shared memory transports to optimize communication efficiency by separating control and data layers. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/sglang-omni/why-sglang-omni-en.md))

### Education & Learning Resources

- [Machine Learning Tutorials](https://awesome-repositories.com/f/education-learning-resources/machine-learning-tutorials.md) — Offers a comprehensive technical tutorial on the infrastructure and engineering principles required to build large-scale machine learning systems.

### Scientific & Mathematical Computing

- [Distributed Inference Orchestrators](https://awesome-repositories.com/f/scientific-mathematical-computing/high-performance-execution-environments/high-performance-and-parallel-computing/parallel-processing/distributed-inference-orchestrators.md) — Manages multi-node and multi-GPU model execution by coordinating replicas and request scheduling. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/code-walk-through/readme.md))
- [High-Performance Computing](https://awesome-repositories.com/f/scientific-mathematical-computing/high-performance-execution-environments/high-performance-and-parallel-computing/high-performance-computing.md) — Executes pre-compiled operation graphs in single calls to bypass dispatcher overhead during inference. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/torch/cuda-graph/readme-2.md))
- [Policy Divergence Monitors](https://awesome-repositories.com/f/scientific-mathematical-computing/numerical-mathematical-foundations/statistics-probability/probability-distributions/divergence-measures/policy-divergence-monitors.md) — Adjusts reward signals using divergence metrics to prevent policy drift during reinforcement learning. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme-3.md))

### Software Engineering & Architecture

- [Distributed Training Coordination](https://awesome-repositories.com/f/software-engineering-architecture/distributed-coordination-systems/distributed-training-coordination.md) — Manages the lifecycle of actor, rollout, and reward model workers across distributed GPU clusters. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/fsdp/readme.md))
- [External Tool Integrations](https://awesome-repositories.com/f/software-engineering-architecture/application-frameworks/autonomous-agent-frameworks/external-tool-integrations.md) — Parses model-generated tool calls and injects responses back into conversation history for autonomous task execution. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme-2.md))
- [Low-Bit Inference Engines](https://awesome-repositories.com/f/software-engineering-architecture/memory-layout-optimizations/bit-packed-storage/low-bit-inference-engines.md) — Minimizes memory bandwidth and latency by executing models with packed low-precision weights. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/int4/readme.md))
- [FP8 Training Optimization](https://awesome-repositories.com/f/software-engineering-architecture/performance-reliability/performance-optimization/computational-efficiency/fp8-training-optimization.md) — Optimizes training performance by utilizing 8-bit floating point formats to reduce memory footprint and increase computational throughput. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/fp8/readme.md))
- [Training Workflow Orchestrators](https://awesome-repositories.com/f/software-engineering-architecture/training-workflow-orchestrators.md) — Orchestrates automated pipelines for model training and multi-round parameter updates. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/tree/main/rlhf/OpenRLHF))
- [Distributed Complexity Abstractions](https://awesome-repositories.com/f/software-engineering-architecture/distributed-complexity-abstractions.md) — Exposes a unified single-controller interface while managing complex multi-controller sub-modules for high-performance execution. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/readme.md))
- [Latency Optimization](https://awesome-repositories.com/f/software-engineering-architecture/performance-reliability/performance-engineering/latency-optimization.md) — Provides techniques for measuring and reducing response times in network and system operations. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/sys-design/readme-2-en.md))
- [Graph Execution Compilers](https://awesome-repositories.com/f/software-engineering-architecture/execution-graphs/graph-execution-compilers.md) — Compiles captured graphs into executable objects for optimized GPU execution. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/torch/cuda-graph/readme-2-en.md))
- [Tail Latency Measurement](https://awesome-repositories.com/f/software-engineering-architecture/performance-reliability/performance-engineering/latency-optimization/tail-latency-measurement.md) — Generates cumulative distribution function graphs to identify long-tail latency issues and optimize response configurations. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/tool_examples/profile_en.md))
- [Latency-Based Sampling Filters](https://awesome-repositories.com/f/software-engineering-architecture/performance-reliability/performance-engineering/latency-optimization/tail-latency-measurement/latency-based-sampling-filters.md) — Maintains system throughput by discarding requests exceeding latency thresholds during rollout phases. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/tool_examples/profile.md))
- [Pipeline Optimization Techniques](https://awesome-repositories.com/f/software-engineering-architecture/pipeline-optimization-techniques.md) — Reduces idle time in pipeline parallelism using scheduling strategies like 1F1B and interleaved virtual pipelines. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/sys-design/readme-3.md))
- [Process Coordinators](https://awesome-repositories.com/f/software-engineering-architecture/process-coordinators.md) — Coordinates distributed processes via remote procedure calls to synchronize state and trigger collective actions. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/sglang-verl-engine/readme.md))

### Game Development

- [Rollout Engines](https://awesome-repositories.com/f/game-development/rollout-engines.md) — Manages iterative interaction between a model and an environment by sampling actions and maintaining context until termination. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/vlm-multi-turn/readme-en.md))
- [Rollout Context Managers](https://awesome-repositories.com/f/game-development/rollout-engines/rollout-context-managers.md) — Dynamically adjusts or reloads inference engines to handle varying sequence lengths without stalling training pipelines. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/partial-rollout/readme.md))

### Operating Systems & Systems Programming

- [Paged KV Cache Management](https://awesome-repositories.com/f/operating-systems-systems-programming/kernel-core-internals/process-and-memory-management/memory-management/buffer-and-cache-management/paged-kv-cache-management.md) — Uses fixed-size blocks to store and manage key-value cache states for improved memory efficiency. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/kvcache-code-walk-through/readme.md))
- [GPU Memory Allocators](https://awesome-repositories.com/f/operating-systems-systems-programming/kernel-core-internals/process-and-memory-management/memory-management/allocation-strategies/dynamic-memory-allocation/gpu-memory-allocators.md) — Partitions available GPU memory into static and dynamic regions to reserve space for model weights and runtime activations. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/kvcache-code-walk-through/mem-fraction-static.md))
- [Inference Cache Management](https://awesome-repositories.com/f/operating-systems-systems-programming/kernel-core-internals/process-and-memory-management/memory-management/inference-cache-management.md) — Allocates and manages key-value cache buffers during model inference to optimize memory usage. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/README.md))
- [GPU Memory Lifecycle Managers](https://awesome-repositories.com/f/operating-systems-systems-programming/kernel-core-internals/process-and-memory-management/memory-management-systems/gpu-memory-lifecycle-managers.md) — Reports reserved and allocated GPU memory statistics to provide a high-level overview of hardware resource consumption. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/torch/mem-snapshot/readme-en.md))
- [Encoder Memory Reservoirs](https://awesome-repositories.com/f/operating-systems-systems-programming/kernel-core-internals/process-and-memory-management/memory-management/allocation-strategies/dynamic-memory-allocation/gpu-memory-allocators/encoder-memory-reservoirs.md) — Allocates dedicated memory buffers for multi-modal encoders to prevent resource contention during inference. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/kvcache-code-walk-through/mem-fraction-static-en.md))
- [Stage-Based Memory Managers](https://awesome-repositories.com/f/operating-systems-systems-programming/kernel-core-internals/process-and-memory-management/memory-management/allocation-strategies/dynamic-memory-allocation/gpu-memory-allocators/stage-based-memory-managers.md) — Manages GPU memory budgets across processing stages to prevent resource exhaustion during complex model execution. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/sglang-omni/why-sglang-omni-en.md))
- [Memory Allocation Tracers](https://awesome-repositories.com/f/operating-systems-systems-programming/kernel-core-internals/process-and-memory-management/memory-management/allocation-strategies/dynamic-memory-allocation/memory-allocation-tracers.md) — Tracks and visualizes memory usage patterns to identify fragmentation and leaks during complex computational workloads. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/tree/main/torch))
- [Inference Resource Controllers](https://awesome-repositories.com/f/operating-systems-systems-programming/kernel-core-internals/process-and-memory-management/memory-management/inference-cache-management/inference-resource-controllers.md) — Optimizes model execution by configuring tensor parallelism, memory utilization, and caching strategies. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme-5-EN.md))

### Hardware & IoT

- [GPU Operation Batchers](https://awesome-repositories.com/f/hardware-iot/integration-performance/gpu-performance/gpu-operation-batchers.md) — Reduces CPU-to-GPU launch overhead by packaging multiple operations into single executable graphs. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/torch/cuda-graph/readme_en.md))

### Development Tools & Productivity

- [Inference Runtime Integrations](https://awesome-repositories.com/f/development-tools-productivity/third-party-integrations/inference-runtime-integrations.md) — Connects training workflows to standalone inference servers via web endpoints for model weight updates. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/GRPO/SGLang_GRPO.md))
- [Environment Initialization](https://awesome-repositories.com/f/development-tools-productivity/environment-initialization.md) — Configures process groups and communication backends to enable multi-process data exchange. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/torch/torch-distributed/readme.md))

### Programming Languages & Runtimes

- [Static Graph Execution](https://awesome-repositories.com/f/programming-languages-runtimes/runtime-execution-environments/runtime-environments/execution-engines/static-graph-execution.md) — Minimizes latency in fixed computation flows by replaying pre-compiled graphs with minimal CPU overhead. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/torch/cuda-graph/readme-2-en.md))

### Security & Cryptography

- [Security Trust Models](https://awesome-repositories.com/f/security-cryptography/security-trust-models.md) — Enforces trust regions by filtering or masking training samples that deviate from the target policy to prevent model divergence. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/mismatch/blog-cn.md))

### System Administration & Monitoring

- [Gradient Path Isolators](https://awesome-repositories.com/f/system-administration-monitoring/diagnostic-tools/diagnostics/telemetry-and-log-collectors/output-capture-utilities/model-layer-capture-utilities/gradient-path-isolators.md) — Detaches auxiliary model branches from the main model's output layers to ensure only specific parameters are updated during training. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/spec/readme.md))
- [Training Metrics](https://awesome-repositories.com/f/system-administration-monitoring/logging/training-metrics.md) — Computes reward scores, log probabilities, and advantage estimates to evaluate model performance during training. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme-3.md))
- [Infrastructure Monitoring](https://awesome-repositories.com/f/system-administration-monitoring/monitoring-and-observability/observability-platforms/metric-performance-monitors/infrastructure-monitoring.md) — Inspects hardware topology and system resource utilization to identify bottlenecks across GPU infrastructure. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/torch/nccl/readme.md))
- [Processing Engine Metrics](https://awesome-repositories.com/f/system-administration-monitoring/monitoring-and-observability/observability-platforms/metric-performance-monitors/processing-engine-metrics.md) — Tracks and reports real-time generation metrics such as tokens per second to diagnose inference engine performance. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/tool_examples/profile.md))
- [GPU Performance Monitoring](https://awesome-repositories.com/f/system-administration-monitoring/performance-monitoring-tools/gpu-performance-monitoring.md) — Tracks real-time GPU utilization and metrics to ensure efficient resource usage during distributed computing tasks. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/torch/nccl/readme_en.md))
- [Memory Usage Analyzers](https://awesome-repositories.com/f/system-administration-monitoring/memory-usage-analyzers.md) — Calculates and monitors the memory footprint of running processes to provide a high-level overview of reserved and active memory. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/torch/mem-snapshot/readme.md))
- [Server Metrics](https://awesome-repositories.com/f/system-administration-monitoring/monitoring-and-observability/observability-platforms/metric-performance-monitors/server-metrics.md) — Exposes performance data through metrics endpoints to track server health and operational status. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/server-based/veRL-server-based-rollout.md))
- [Performance Profiling Tools](https://awesome-repositories.com/f/system-administration-monitoring/performance-profiling-tools.md) — Provides integrated profiling tools to track training metrics and hardware utilization for identifying performance bottlenecks. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme-5.md))

### Testing & Quality Assurance

- [Execution Profilers](https://awesome-repositories.com/f/testing-quality-assurance/performance-testing-analysis/performance-diagnostics/execution-profilers.md) — Records timestamps during workflow execution to identify performance bottlenecks across distributed workers. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/tool_examples/profile.md))
- [Standardized Interfaces](https://awesome-repositories.com/f/testing-quality-assurance/api-network-testing/api-testing/api-and-ui-integration-tools/standardized-interfaces.md) — Provides a consistent set of methods for resetting states, processing model actions, and formatting observations for feedback. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/vlm-multi-turn/readme.md))
- [Worker Lifecycle Management](https://awesome-repositories.com/f/testing-quality-assurance/testing-infrastructure-management/test-orchestration/worker-process-management/worker-lifecycle-management.md) — Instantiates and manages specialized worker roles for actors and critics in distributed systems. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme_EN.md))

### Web Development

- [Computational Parallelization](https://awesome-repositories.com/f/web-development/performance-optimizations/computational-parallelization.md) — Enables parallel processing of recursive calculations by chunking long sequences across hardware accelerators. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/batch-GAE/ppo-gae-chunk.md))
- [GPU Command Batchers](https://awesome-repositories.com/f/web-development/performance-optimizations/computational-parallelization/parallel-gpu-schedulers/gpu-command-batchers.md) — Reduces CPU-side scheduling frequency by executing multiple generation steps on the GPU in a single command cycle. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/scheduler/readme-en.md))
- [Scheduling Overlaps](https://awesome-repositories.com/f/web-development/performance-optimizations/computational-parallelization/parallel-gpu-schedulers/scheduling-overlaps.md) — Implements asynchronous scheduling to hide latency by overlapping CPU-bound task preparation with GPU-bound computation. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/zero-overhead-scheduler/zero-overhead-batch-scheduler.md))
- [Symbolic Scheduling Links](https://awesome-repositories.com/f/web-development/performance-optimizations/computational-parallelization/parallel-gpu-schedulers/symbolic-scheduling-links.md) — Uses symbolic references to decouple scheduling from synchronization, allowing continuous GPU pipeline execution. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/scheduler/readme-en.md))

### Graphics & Multimedia

- [Chunked Audio Streaming](https://awesome-repositories.com/f/graphics-multimedia/audio-music/audio-streaming-engines/audio-playback-engines/chunked-audio-streaming.md) — Enables low-latency playback by generating and decoding audio frames incrementally as soon as the corresponding tokens are produced. ([source](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/transformers/omni/readme-en.md))