# infrasys-ai/aiinfra

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/infrasys-ai-aiinfra).**

6,106 stars · 831 forks · Jupyter Notebook · apache-2.0

## Links

- GitHub: https://github.com/Infrasys-AI/AIInfra
- Homepage: https://infrasys-ai.github.io/aiinfra-docs/
- awesome-repositories: https://awesome-repositories.com/repository/infrasys-ai-aiinfra.md

## Topics

`aiinfra` `aisystem`

## Tags

### Artificial Intelligence & ML

- [Agentic RAG Development](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-rag-development.md) — Provides frameworks for building intelligent, self-correcting retrieval systems with agent-driven RAG. ([source](https://infrasys-ai.github.io/aiinfra-docs/))
- [Multi-Agent Collaboration Systems](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/agent-orchestration-multi-agent/autonomous-agents/multi-agent-collaboration-systems.md) — Enables multiple AI agents to work together in shared workspaces for complex task execution. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/03TTScaling.html))
- [Chip-Level Optimizations](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-hardware-acceleration/chip-level-optimizations.md) — Coordinates chip-level features, communication protocols, and framework optimizations for AI accelerators. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/06Future.html))
- [AI Workload Orchestration](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-workload-orchestration.md) — Deploys and manages AI training and inference jobs on Kubernetes with containerization.
- [Cloud-Native Deployments](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-workload-orchestration/cloud-native-deployments.md) — Teaches container and Kubernetes orchestration for AI workloads with task scheduling and observability. ([source](https://cdn.jsdelivr.net/gh/infrasys-ai/aiinfra@main/README.md))
- [Distributed Training](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-training-frameworks/distributed-training.md) — Offers a hands-on guide to parallel strategies and acceleration algorithms for multi-node training.
- [Distributed Training Scaling Utilities](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-training-scaling-utilities.md) — Describes parallel strategies and acceleration algorithms for scaling training across multiple nodes. ([source](https://cdn.jsdelivr.net/gh/infrasys-ai/aiinfra@main/README.md))
- [Full Parameter Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/full-parameter-fine-tuning.md) — Updates all model parameters through backpropagation to adapt pretrained models to specific tasks. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train05FineTune/Code01Qwen3SFT.html))
- [Large Model Deployment Patterns](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-integration-patterns/application-pattern-explanations/large-model-deployment-patterns.md) — Introduces AI agents, RAG, and other patterns for deploying large models in production. ([source](https://cdn.jsdelivr.net/gh/infrasys-ai/aiinfra@main/README.md))
- [Multi-Node Inference Scaling](https://awesome-repositories.com/f/artificial-intelligence-ml/gpu-model-deployments/multi-node-inference-scaling.md) — Distributes large-scale inference across multiple GPUs and nodes using pipeline parallelism. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/05InferStack.html))
- [Instruction Fine-tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/instruction-fine-tuning.md) — Trains models on labeled instruction-response pairs to improve instruction-following capabilities. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train05FineTune/Code01Qwen3SFT.html))
- [Diffusion Model Adaptations](https://awesome-repositories.com/f/artificial-intelligence-ml/language-model-fine-tuning/partial-layer-fine-tunings/lora-fine-tuning-pipelines/diffusion-model-adaptations.md) — Applies LoRA to diffusion model attention layers for efficient style adaptation. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train05FineTune/Code02SDLoRA.html))
- [UNet Attention Injections](https://awesome-repositories.com/f/artificial-intelligence-ml/language-model-fine-tuning/partial-layer-fine-tunings/lora-fine-tuning-pipelines/unet-attention-injections.md) — Injects LoRA layers into UNet attention modules for efficient diffusion model adaptation. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train05FineTune/Code02SDLoRA.html))
- [System Design Surveys](https://awesome-repositories.com/f/artificial-intelligence-ml/large-scale-model-training/system-design-surveys.md) — Surveys scaling laws, training pipelines, and future trends for large model system design.
- [Large-Scale Pre-Training Orchestrators](https://awesome-repositories.com/f/artificial-intelligence-ml/large-scale-model-training/vision-transformer-pre-training/pre-trained-model-checkpoints/large-scale-pre-training-orchestrators.md) — Executes large-scale pre-training workflows with automated mixed precision and fault recovery. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/04TrainingStack.html))
- [Large Scale Training](https://awesome-repositories.com/f/artificial-intelligence-ml/large-scale-training.md) — Covers scaling model training across thousands of GPUs using parallel strategies and memory optimization.
- [Inference Optimizations](https://awesome-repositories.com/f/artificial-intelligence-ml/long-context-training-optimizations/inference-optimizations.md) — Covers inference acceleration techniques including KV cache optimization, batching, and model compression.
- [LoRA Training](https://awesome-repositories.com/f/artificial-intelligence-ml/lora-training.md) — Trains LoRA-adapted models by predicting noise and updating only low-rank weights. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train05FineTune/Code02SDLoRA.html))
- [GPU Training Accelerators](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/distributed-and-accelerated-compute/training-acceleration-tools/gpu-training-accelerators.md) — Teaches distributed parallel strategies and acceleration techniques for large model training. ([source](https://infrasys-ai.github.io/aiinfra-docs))
- [Supervised Instruction Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/fine-tuning-and-alignment/supervised-instruction-fine-tuning.md) — Adjusts pretrained models on human-annotated instruction-response pairs to teach command following. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train04PostTrainRL/Code01InstructGPT.html))
- [Inference Optimization Techniques](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/inference-optimization-and-tuning/inference-optimization-techniques.md) — Covers inference acceleration techniques, KV cache optimization, batching, and model compression. ([source](https://cdn.jsdelivr.net/gh/infrasys-ai/aiinfra@main/README.md))
- [Model Inference Accelerators](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/inference-servers-and-runtimes/model-inference-accelerators.md) — Runs inference on large models with acceleration, scheduling, sampling, and compression techniques. ([source](https://infrasys-ai.github.io/aiinfra-docs/))
- [Model Performance Benchmarking](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-evaluation-analysis/model-analysis/model-performance-benchmarking.md) — Measures model capabilities across academic benchmarks, safety tests, and domain-specific tasks. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/04TrainingStack.html))
- [Efficient Training Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving/model-integration-pipelines/efficient-training-pipelines.md) — Reduces per-GPU memory by partitioning model states and offloading to CPU or NVMe.
- [Large Model Optimizations](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/serving-and-runtime/inference-optimizations/large-model-optimizations.md) — Provides techniques for deploying and accelerating inference for large models. ([source](https://infrasys-ai.github.io/aiinfra-docs))
- [Low-Rank Adaptation](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-fine-tuning/low-rank-adaptation.md) — Freezes most weights and trains only low-rank matrices in attention layers for efficient fine-tuning. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train05FineTune/Code01Qwen3SFT.html))
- [Task-Specific Adaptation Methods](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-fine-tuning/task-specific-adaptation-methods.md) — Provides hands-on implementations of LoRA and full fine-tuning for adapting models to specific tasks. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/04TrainingStack.html))
- [Model Parallelism](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/training-frameworks/model-training-pipelines/model-parallelism.md) — Splits model layers and parameters across multiple GPUs using data, tensor, and pipeline parallelism. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/04TrainingStack.html))
- [Distributed Training Loops](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/training-frameworks/model-training-pipelines/model-parallelism/distributed-training-loops.md) — Provides complete distributed training loops with automatic parameter distribution across devices. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train01ParallelBegin/Code02MP.html))
- [Model Fine-Tuning and Adaptation](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/model-fine-tuning-adaptation.md) — Provides techniques for adapting pretrained models to specific tasks via instruction tuning and RLHF.
- [Group Relative Policy Optimization Training](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/model-fine-tuning-adaptation/language-model-training/group-relative-policy-optimization-training.md) — Applies Group Relative Policy Optimization for stable reinforcement learning fine-tuning of language models. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train04PostTrainRL/Code03GRPO.html))
- [Model Compression Suites](https://awesome-repositories.com/f/artificial-intelligence-ml/model-optimization/compression-techniques/model-pruning/model-compression-suites.md) — Reduces model size and compute cost through quantization, distillation, or pruning techniques. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/05InferStack.html))
- [Parallelism Orchestration](https://awesome-repositories.com/f/artificial-intelligence-ml/model-orchestrators/parallelism-orchestration.md) — Coordinates data, tensor, and pipeline parallelism across thousands of GPUs.
- [Model Serving Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/model-serving-frameworks.md) — Integrates deployment, scheduling, and monitoring for model serving frameworks. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/05InferStack.html))
- [Synchronous Gradient Averaging](https://awesome-repositories.com/f/artificial-intelligence-ml/multi-gpu-training-utilities/synchronous-gradient-averaging.md) — Splits datasets across GPUs and averages gradients synchronously for model training. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train01ParallelBegin/Code01DDP.html))
- [LoRA-Exclusive Freezings](https://awesome-repositories.com/f/artificial-intelligence-ml/parameter-freezing/lora-exclusive-freezings.md) — Freezes all parameters except LoRA layers to restrict updates to low-rank matrices. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train05FineTune/Code02SDLoRA.html))
- [Chain-of-Thought Enforcement](https://awesome-repositories.com/f/artificial-intelligence-ml/reasoning-models/chain-of-thought-enforcement.md) — Guides language models to generate intermediate reasoning steps for complex problem-solving. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/03TTScaling.html))
- [Reasoning Chain Training](https://awesome-repositories.com/f/artificial-intelligence-ml/reinforcement-learning-training-utilities/reasoning-chain-training.md) — Trains models to spontaneously learn extended reasoning chains using reinforcement learning. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/03TTScaling.html))
- [Training Throughput Optimization](https://awesome-repositories.com/f/artificial-intelligence-ml/training-throughput-optimization.md) — Maximizes GPU cluster utilization through mixed-parallelism and memory-efficient training techniques. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/06Future.html))
- [Model Architecture Innovations](https://awesome-repositories.com/f/artificial-intelligence-ml/artificial-intelligence-research/model-architecture-innovations.md) — Explains core Transformer and MoE architectures and innovations for text, image, video, and speech. ([source](https://infrasys-ai.github.io/aiinfra-docs))
- [Block-Wise Attention](https://awesome-repositories.com/f/artificial-intelligence-ml/attention-mechanisms/flash-attention-implementations/block-wise-attention.md) — Processes attention in smaller sequential blocks to reduce memory from quadratic to linear.
- [Hardware Utilization Analyzers](https://awesome-repositories.com/f/artificial-intelligence-ml/batch-sequence-training/hardware-utilization-analyzers.md) — Analyzes how batch sizes and sequence lengths affect hardware FLOPs utilization. ([source](https://infrasys-ai.github.io/aiinfra-docs/01AICluster04Performance/CODE03MFU.html))
- [Reward Modeling](https://awesome-repositories.com/f/artificial-intelligence-ml/custom-model-training/reward-modeling.md) — Learns a scalar scoring function from ranked human judgments for preference alignment. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train04PostTrainRL/Code01InstructGPT.html))
- [Flexible Scoring Paradigms](https://awesome-repositories.com/f/artificial-intelligence-ml/custom-model-training/reward-modeling/flexible-scoring-paradigms.md) — Combines multiple reward generation paradigms with scoring patterns for flexible response evaluation. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/03TTScaling.html))
- [Inference Scaling Aggregators](https://awesome-repositories.com/f/artificial-intelligence-ml/custom-model-training/reward-modeling/inference-scaling-aggregators.md) — Aggregates multiple sampled critiques from a generative reward model for more robust evaluations. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/03TTScaling.html))
- [Direct Preference Optimization](https://awesome-repositories.com/f/artificial-intelligence-ml/custom-model-training/reward-modeling/preference-alignment/direct-preference-optimization.md) — Implements Direct Preference Optimization for aligning models without a separate reward model. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train04PostTrainRL/Code02DPOPPO.html))
- [GPU-Accelerated Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/data-preparation/gpu-accelerated-pipelines.md) — Processes raw data through cleaning, deduplication, and tokenization with GPU acceleration. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/04TrainingStack.html))
- [Distributed Memory Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-memory-optimizers.md) — Partitions optimizer states across GPUs to reduce per-device memory during distributed training. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train02ParallelAdv/Code01ZeRO.html))
- [Performance Comparisons](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-training-frameworks/distributed-training/performance-comparisons.md) — Compares training performance between single-device and distributed setups with speedup metrics. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train01ParallelBegin/Code01DDP.html))
- [Embedding Table Sharding](https://awesome-repositories.com/f/artificial-intelligence-ml/embedding-adaptation-utilities/vocabulary-embedding-adapters/embedding-table-sharding.md) — Implements vocabulary embedding table sharding across devices with masked All-Reduce. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train02ParallelAdv/Code02Megatron.html))
- [Sequential Model Splitting](https://awesome-repositories.com/f/artificial-intelligence-ml/foundation-model-pipelines/sequential-model-splitting.md) — Implements sequential model splitting to distribute layers across multiple devices. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train01ParallelBegin/Code02MP.html))
- [Expert Routing Gates](https://awesome-repositories.com/f/artificial-intelligence-ml/gated-sequence-models/expert-routing-gates.md) — Implements learned gating functions for routing inputs to experts in MoE models. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train02ParallelAdv/Code04Expert.html))
- [Research Report Drafting](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/generative-ai/generative-text-inference/iterative-refinement-generation/research-report-drafting.md) — Retrieves high-quality information in a loop to iteratively draft and refine research reports. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/03TTScaling.html))
- [Gradient Sharding Strategies](https://awesome-repositories.com/f/artificial-intelligence-ml/gradient-computation/gradient-aggregators/gradient-sharding-strategies.md) — Implements gradient sharding via reduce-scatter to reduce per-GPU memory during distributed training. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train02ParallelAdv/Code01ZeRO.html))
- [Group Relative Policy Optimization](https://awesome-repositories.com/f/artificial-intelligence-ml/group-relative-policy-optimization.md) — Updates language models by comparing multiple policy copies using relative advantage estimates.
- [Instruction Tuning Datasets](https://awesome-repositories.com/f/artificial-intelligence-ml/instruction-tuning-datasets.md) — Formats training data as structured instruction-input-output triples for supervised fine-tuning. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train05FineTune/Code01Qwen3SFT.html))
- [Kernel Fusion Compilers](https://awesome-repositories.com/f/artificial-intelligence-ml/kernel-fusion-compilers.md) — Applies graph fusion, kernel tuning, and kernel fusion to reduce data movement during inference. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/05InferStack.html))
- [Activation and KV Cache Offloaders](https://awesome-repositories.com/f/artificial-intelligence-ml/kv-cache-optimizations/activation-and-kv-cache-offloaders.md) — Moves intermediate activations to slower storage and spreads KV cache across nodes for large-scale clusters. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/05InferStack.html))
- [Weight Expansion Strategies](https://awesome-repositories.com/f/artificial-intelligence-ml/large-scale-model-training/weight-expansion-strategies.md) — Provides weight expansion techniques for progressive model growth without retraining from scratch. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train02ParallelAdv/Code04Expert.html))
- [Petabyte-Scale Data Stores](https://awesome-repositories.com/f/artificial-intelligence-ml/large-scale-training/petabyte-scale-data-stores.md) — Stores and processes petabytes of diverse data using distributed file systems and object storage. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/04TrainingStack.html))
- [Rank Configuration Comparisons](https://awesome-repositories.com/f/artificial-intelligence-ml/lora-training/rank-configuration-comparisons.md) — Compares LoRA rank configurations to analyze trade-offs in parameter count and style fidelity. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train05FineTune/Code02SDLoRA.html))
- [Transformer Architectures](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/frameworks/model-construction/transformer-architectures.md) — Explores Transformer and MoE architectures, multimodal models, and prompt engineering for large models. ([source](https://cdn.jsdelivr.net/gh/infrasys-ai/aiinfra@main/README.md))
- [Text Generation Environments](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/frameworks/reinforcement-learning-environments/text-generation-environments.md) — Creates simulated text generation environments for scoring and guiding reinforcement learning. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train04PostTrainRL/Code03GRPO.html))
- [FP8 Training Implementations](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/distributed-and-accelerated-compute/training-acceleration-tools/mixed-precision-training/fp8-training-implementations.md) — Implements FP8 mixed precision training to accelerate computations and reduce memory usage. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train03TrainAcceler/Code03FP8.html))
- [Precision Comparison Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/distributed-and-accelerated-compute/training-acceleration-tools/mixed-precision-training/precision-comparison-frameworks.md) — Compares FP8 and FP32 training to measure differences in convergence, speed, and memory. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train03TrainAcceler/Code03FP8.html))
- [Group-Relative Advantage Calculators](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/utilities/gradient-optimization-techniques/policy-gradient-methods/advantage-estimation/group-relative-advantage-calculators.md) — Calculates group-relative advantages for policy updates, reducing bias and variance. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train04PostTrainRL/Code03GRPO.html))
- [Expert Device Placement](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-customization/mixture-of-experts/expert-selection-analysis/expert-device-placement.md) — Places MoE expert subnetworks across separate devices to exceed single-device memory limits. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train02ParallelAdv/Code04Expert.html))
- [Expert Load Balancers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-customization/mixture-of-experts/expert-selection-analysis/expert-load-balancers.md) — Selects which expert sub-models to activate per request and balances load across experts. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/05InferStack.html))
- [Fine-Tuning Benchmarking](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-fine-tuning/fine-tuning-benchmarking.md) — Evaluates fine-tuned models using perplexity and task-specific metrics on held-out test sets. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train05FineTune/Code01Qwen3SFT.html))
- [MLP Layer Splitting](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/training-frameworks/model-training-pipelines/model-parallelism/mlp-layer-splitting.md) — Implements column-parallel and row-parallel MLP layer splitting across GPUs. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train02ParallelAdv/Code02Megatron.html))
- [Parallel Transformer Assemblies](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/training-frameworks/model-training-pipelines/model-parallelism/parallel-transformer-assemblies.md) — Assembles complete parallel transformer models running across multiple GPUs. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train02ParallelAdv/Code02Megatron.html))
- [Rapid Retraining Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/training-frameworks/training-and-evaluation-pipelines/rapid-retraining-pipelines.md) — Accelerates model refinement with low-latency checkpoint transfers and experiment tracking. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/04TrainingStack.html))
- [Block-Wise Attention Processors](https://awesome-repositories.com/f/artificial-intelligence-ml/model-optimization/inference-deployment/attention-backends/block-wise-attention-processors.md) — Breaks attention computation into smaller sequential blocks to avoid storing the full attention matrix. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train03TrainAcceler/Code01FlashAtten.html))
- [Scaling Law Predictors](https://awesome-repositories.com/f/artificial-intelligence-ml/model-predictions/scaling-law-predictors.md) — Predicts model loss improvements using power-law scaling relationships for size and data. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/02StandardScaling.html))
- [LoRA-Weighted Image Generations](https://awesome-repositories.com/f/artificial-intelligence-ml/model-weight-management/lora-adapter-loaders/lora-weighted-image-generations.md) — Generates images using LoRA-adapted weights and compares results with the original model. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train05FineTune/Code02SDLoRA.html))
- [Head Distribution Strategies](https://awesome-repositories.com/f/artificial-intelligence-ml/multi-head-attention-mechanisms/head-distribution-strategies.md) — Implements attention head distribution across GPUs for parallel computation. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train02ParallelAdv/Code02Megatron.html))
- [Efficiency Benchmarks](https://awesome-repositories.com/f/artificial-intelligence-ml/parameter-efficient-fine-tuning/efficiency-benchmarks.md) — Ships benchmarking utilities to compare computational costs of different fine-tuning approaches. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train05FineTune/Code01Qwen3SFT.html))
- [Preference Alignment Datasets](https://awesome-repositories.com/f/artificial-intelligence-ml/preference-alignment-datasets.md) — Creates synthetic pairs of preferred and dispreferred responses for preference alignment. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train04PostTrainRL/Code02DPOPPO.html))
- [Reasoning Optimization](https://awesome-repositories.com/f/artificial-intelligence-ml/reasoning-models/reasoning-optimization.md) — Generates multiple reasoning paths and selects the best using reward models and search. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/03TTScaling.html))
- [RLHF Alignment Algorithms](https://awesome-repositories.com/f/artificial-intelligence-ml/rlhf-alignment-algorithms.md) — Uses reward model scores to update policy via PPO for improved output quality. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train04PostTrainRL/Code01InstructGPT.html))
- [Prefill-Decode Disaggregation](https://awesome-repositories.com/f/artificial-intelligence-ml/sequence-decoding-models/sequence-decoders/prefill-decode-disaggregation.md) — Implements separation of prefill and decode phases to avoid resource contention. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/05InferStack.html))
- [Sparse Model Architectures](https://awesome-repositories.com/f/artificial-intelligence-ml/sparse-model-architectures.md) — Supports Transformer variants, mixture-of-experts, and compression techniques for sparse networks. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/04TrainingStack.html))
- [Dense Model FLOPs Estimators](https://awesome-repositories.com/f/artificial-intelligence-ml/throughput-estimation/moe-training-flops-estimators/dense-model-flops-estimators.md) — Ships computational cost estimation for dense Transformer models, a core training analysis capability. ([source](https://infrasys-ai.github.io/aiinfra-docs/01AICluster04Performance/CODE01Modeling.html))
- [Hardware Peak Utilization Ratios](https://awesome-repositories.com/f/artificial-intelligence-ml/throughput-estimation/moe-training-flops-estimators/hardware-peak-utilization-ratios.md) — Measures model FLOPs utilization against theoretical hardware peaks for training iterations. ([source](https://infrasys-ai.github.io/aiinfra-docs/01AICluster04Performance/CODE03MFU.html))
- [MoE Transformer FLOPs Calculators](https://awesome-repositories.com/f/artificial-intelligence-ml/throughput-estimation/moe-training-flops-estimators/moe-transformer-flops-calculators.md) — Calculates FLOPs for MoE Transformer models, a key performance analysis capability. ([source](https://infrasys-ai.github.io/aiinfra-docs/01AICluster04Performance/CODE03MFU.html))
- [Transformer Architectures](https://awesome-repositories.com/f/artificial-intelligence-ml/transformer-architectures.md) — Explains Transformer and MoE architectures and their use in multimodal vision and language models. ([source](https://infrasys-ai.github.io/aiinfra-docs/))
- [Transformer FLOPs Calculators](https://awesome-repositories.com/f/artificial-intelligence-ml/vector-embeddings/dense-embeddings/transformer-flops-calculators.md) — Calculates FLOPs for dense Transformer models, a key performance analysis capability. ([source](https://infrasys-ai.github.io/aiinfra-docs/01AICluster04Performance/CODE03MFU.html))
- [Loss Scaling Techniques](https://awesome-repositories.com/f/artificial-intelligence-ml/vector-embeddings/embedding-layer-initialization/gradient-stabilization/loss-scaling-techniques.md) — Provides gradient scaling techniques for numerical stability in low-precision training. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train03TrainAcceler/Code03FP8.html))

### Part of an Awesome List

- [Hardware Optimized Inference](https://awesome-repositories.com/f/awesome-lists/ai/hardware-optimized-inference.md) — Provides tools and frameworks for running large language models on specific hardware like CPUs and NPUs. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/05InferStack.html))
- [Model Evaluation and Benchmarking](https://awesome-repositories.com/f/awesome-lists/ai/model-evaluation-and-benchmarking.md) — Runs standardized benchmarks across reasoning, generation, and safety dimensions to measure model performance. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train06VerifValid/Code01OpenCompass.html))
- [Low-Latency Serving Techniques](https://awesome-repositories.com/f/awesome-lists/ai/model-serving-deployment/low-latency-serving-techniques.md) — Deploys models with dynamic batching and quantization for low-latency responses. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/06Future.html))
- [Virtual Token Insertions](https://awesome-repositories.com/f/awesome-lists/ai/model-training-and-fine-tuning/frozen-encoder-fine-tunings/virtual-token-insertions.md) — Inserts trainable virtual tokens at the input layer for rapid model adaptation with minimal data. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train05FineTune/Code01Qwen3SFT.html))
- [AI Cluster Topologies](https://awesome-repositories.com/f/awesome-lists/devtools/networking-and-storage/ai-cluster-topologies.md) — Covers network topology, interconnect protocols, and storage design for AI clusters. ([source](https://cdn.jsdelivr.net/gh/infrasys-ai/aiinfra@main/README.md))
- [Kernel-Level Optimizations](https://awesome-repositories.com/f/awesome-lists/ai/attention-optimization/kernel-level-optimizations.md) — Ships optimized kernels for accelerating core matrix and attention operations. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/05InferStack.html))
- [Large Model Application Surveys](https://awesome-repositories.com/f/awesome-lists/ai/real-world-applications/large-model-application-surveys.md) — Surveys application scenarios, AI agents, RAG, autonomous driving, embodied AI, and ethical challenges. ([source](https://infrasys-ai.github.io/aiinfra-docs))
- [Token Generation Optimizations](https://awesome-repositories.com/f/awesome-lists/ai/sampling-and-optimization/token-generation-optimizations.md) — Provides hardware-accelerated sampling and streaming output for reduced latency. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/05InferStack.html))
- [Proximal Policy Optimization Alignment](https://awesome-repositories.com/f/awesome-lists/ai/training-and-alignment/proximal-policy-optimization-alignment.md) — Applies PPO with reward models for stable policy updates during alignment. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train04PostTrainRL/Code02DPOPPO.html))
- [Large Model System Surveys](https://awesome-repositories.com/f/awesome-lists/devtools/research-and-surveys/large-model-system-surveys.md) — Surveys the full stack of large model system design including scaling laws and pipelines. ([source](https://cdn.jsdelivr.net/gh/infrasys-ai/aiinfra@main/README.md))

### Content Management & Publishing

- [AI Infrastructure Curricula](https://awesome-repositories.com/f/content-management-publishing/content-management-systems/educational-curriculum-platforms/ai-infrastructure-curricula.md) — Teaches the full lifecycle of large-model systems from hardware to deployment.

### Data & Databases

- [Collective GPU Communication](https://awesome-repositories.com/f/data-databases/collective-gpu-communication.md) — Orchestrates high-bandwidth data transfers across GPUs using ring all-reduce and in-network computing.
- [Compute Cluster Orchestration](https://awesome-repositories.com/f/data-databases/compute-cluster-orchestration.md) — Controls the lifecycle and configuration of remote compute clusters for scalable data processing. ([source](https://infrasys-ai.github.io/aiinfra-docs/))
- [Training Memory Optimizers](https://awesome-repositories.com/f/data-databases/memory-optimization-strategies/training-memory-optimizers.md) — Provides comprehensive training memory optimization through state partitioning and offloading. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/04TrainingStack.html))
- [Inter-Device Activation Transfers](https://awesome-repositories.com/f/data-databases/data-synchronization/cross-device-synchronization-engines/cross-device-operation-execution/gpu-device-synchronization/inter-device-activation-transfers.md) — Manages activation and gradient transfers between GPUs for pipeline-parallel training. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train01ParallelBegin/Code02MP.html))
- [SmartNIC Offload Engines](https://awesome-repositories.com/f/data-databases/hardware-acceleration/smartnic-offload-engines.md) — Offloads network processing and data preprocessing to SmartNIC hardware for reduced latency. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/05InferStack.html))
- [Tiered Memory Management](https://awesome-repositories.com/f/data-databases/large-scale-dataset-management/tiered-memory-management.md) — Stores model weights, KV cache, and activations across HBM, DRAM, NVMe, and object storage. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/05InferStack.html))
- [Activation Memory Profilers](https://awesome-repositories.com/f/data-databases/memory-optimization-strategies/training-memory-optimizers/activation-memory-profilers.md) — Provides activation memory estimation tools critical for understanding training memory bottlenecks. ([source](https://infrasys-ai.github.io/aiinfra-docs/01AICluster04Performance/CODE01Modeling.html))
- [Per-Component Memory Profilers](https://awesome-repositories.com/f/data-databases/memory-optimization-strategies/training-memory-optimizers/per-component-memory-profilers.md) — Provides detailed per-component GPU memory analysis for training steps. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train02ParallelAdv/Code01ZeRO.html))
- [Multi-Modal Preprocessing Pipelines](https://awesome-repositories.com/f/data-databases/multi-modal-data-management/multi-modal-preprocessing-pipelines.md) — Tokenizes text, resizes images, and extracts audio features with hardware acceleration. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/05InferStack.html))
- [Distributed Matrix Multiplications](https://awesome-repositories.com/f/data-databases/parallel-matrix-operations/distributed-matrix-multiplications.md) — Implements distributed matrix multiplication with column/row partitioning across GPUs. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train02ParallelAdv/Code02Megatron.html))
- [Dynamic Inference Batching](https://awesome-repositories.com/f/data-databases/request-batching/dynamic-inference-batching.md) — Combines short requests into batches and splits long sequences across GPUs for balanced throughput. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/05InferStack.html))
- [AI Memory Tiering](https://awesome-repositories.com/f/data-databases/storage-tiering/ai-memory-tiering.md) — Stores model weights, KV cache, and activations across HBM, DRAM, NVMe, and object storage.
- [Ring Attention Distributors](https://awesome-repositories.com/f/data-databases/text-processing-pipelines/long-context-sequence-processors/ring-attention-distributors.md) — Splits long sequences into blocks distributed across devices in a ring topology for memory-efficient attention. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train03TrainAcceler/Code04RingAttn.html))
- [Tiled Attention Memory Reducers](https://awesome-repositories.com/f/data-databases/text-processing-pipelines/long-context-sequence-processors/tiled-attention-memory-reducers.md) — Lowers peak memory usage from quadratic to linear by processing attention in tiled chunks. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train03TrainAcceler/Code01FlashAtten.html))

### DevOps & Infrastructure

- [Kubernetes Cluster Setups](https://awesome-repositories.com/f/devops-infrastructure/ai-deployment-containers/kubernetes-cluster-setups.md) — Guides hands-on setup of Docker and Kubernetes clusters for running AI training and inference jobs. ([source](https://infrasys-ai.github.io/aiinfra-docs))
- [AI Infrastructure Deployments](https://awesome-repositories.com/f/devops-infrastructure/ai-infrastructure-deployments.md) — Automates setups for launching AI model interfaces and training pipelines on GPU-accelerated cloud hardware. ([source](https://infrasys-ai.github.io/aiinfra-docs/))
- [Vertically Integrated Stacks](https://awesome-repositories.com/f/devops-infrastructure/ai-infrastructure/vertically-integrated-stacks.md) — Designs vertically integrated compute, storage, and networking systems for large-scale AI. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/06Future.html))
- [Cluster Architecture Designs](https://awesome-repositories.com/f/devops-infrastructure/cloud-infrastructure-management/gpu-training-clusters/cluster-architecture-designs.md) — Explains cluster architecture, performance modeling, and GPU/NPU precision debugging for large-scale AI. ([source](https://cdn.jsdelivr.net/gh/infrasys-ai/aiinfra@main/README.md))
- [Cluster Design Manuals](https://awesome-repositories.com/f/devops-infrastructure/cloud-infrastructure-management/gpu-training-clusters/cluster-design-manuals.md) — Provides a manual explaining cluster architecture and performance modeling for AI training.
- [AI Workload Network and Storage Configurations](https://awesome-repositories.com/f/devops-infrastructure/cloud-infrastructure/networking-connectivity/software-defined-networking-services/storage-networking/ai-workload-network-and-storage-configurations.md) — Configures high-speed network topologies and multi-tier storage systems to support distributed AI workloads. ([source](https://infrasys-ai.github.io/aiinfra-docs/))
- [Collective Communication and Storage Architectures](https://awesome-repositories.com/f/devops-infrastructure/cloud-infrastructure/networking-connectivity/software-defined-networking-services/storage-networking/collective-communication-and-storage-architectures.md) — Covers communication primitives, collective communication libraries, and storage architectures for AI clusters. ([source](https://infrasys-ai.github.io/aiinfra-docs))
- [Cloud Native GPU Orchestration](https://awesome-repositories.com/f/devops-infrastructure/cloud-native-orchestration/cloud-native-gpu-orchestration.md) — Provides a practical walkthrough of container and Kubernetes orchestration for AI workloads.
- [GPU Cluster Communications](https://awesome-repositories.com/f/devops-infrastructure/cluster-node-management/gpu-cluster-communications.md) — Orchestrates high-bandwidth, low-latency data transfers across thousands of GPUs using ring all-reduce. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/06Future.html))
- [AI Workload Schedulers](https://awesome-repositories.com/f/devops-infrastructure/control-planes/workload-scheduling/ai-workload-schedulers.md) — Dynamically allocates compute, storage, and networking resources for AI training and inference. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/06Future.html))
- [Dataset Partitioning Strategies](https://awesome-repositories.com/f/devops-infrastructure/distributed-task-workers/dataset-partitioning-strategies.md) — Provides distributed samplers that partition datasets across workers for non-overlapping training. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train01ParallelBegin/Code01DDP.html))
- [GPU Cluster Job Schedulers](https://awesome-repositories.com/f/devops-infrastructure/job-scheduling/gpu-cluster-job-schedulers.md) — Orchestrates resource allocation, task assignment, and fault recovery across GPU clusters for distributed training. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/04TrainingStack.html))
- [Multi-Thousand GPU Cluster Constructions](https://awesome-repositories.com/f/devops-infrastructure/multi-gpu-deployment/distributed-inference-clusters/multi-thousand-gpu-cluster-constructions.md) — Explains how to design and operate L0/L1 infrastructure and network topologies for multi-thousand-GPU clusters. ([source](https://infrasys-ai.github.io/aiinfra-docs))
- [Process Group Initializations](https://awesome-repositories.com/f/devops-infrastructure/process-grouping-utilities/training-process-synchronization/process-group-initializations.md) — Initializes distributed process groups with communication backends and unique ranks for training. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train01ParallelBegin/Code01DDP.html))
- [GPU and Interconnect Provisioning](https://awesome-repositories.com/f/devops-infrastructure/storage-provisioning/gpu-and-interconnect-provisioning.md) — Deploys a coordinated stack of GPUs, tiered storage, and high-bandwidth interconnects for large-scale training. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/04TrainingStack.html))
- [Energy-Aware Schedulers](https://awesome-repositories.com/f/devops-infrastructure/scalability-management/energy-aware-schedulers.md) — Mitigates compute gaps and utilization bottlenecks through energy-aware scheduling. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/04TrainingStack.html))

### Education & Learning Resources

- [AI Infrastructure Curricula](https://awesome-repositories.com/f/education-learning-resources/curricula-instructional-design/educational-frameworks-architectures/curriculum-design-patterns/ai-infrastructure-curricula.md) — Designs a full-stack curriculum teaching the entire lifecycle of large-model systems. ([source](https://infrasys-ai.github.io/aiinfra-docs))

### Networking & Communication

- [AI Cluster Interconnects](https://awesome-repositories.com/f/networking-communication/cluster-network-orchestration/ai-cluster-interconnects.md) — Designs and operates compute clusters with high-speed interconnects for AI workloads.
- [Distributed Parameter Sharding](https://awesome-repositories.com/f/networking-communication/distributed-systems-p2p/distributed-computing/model-parallelism-techniques/distributed-parameter-sharding.md) — Implements distributed parameter sharding to partition model tensors across multiple GPUs. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train02ParallelAdv/Code01ZeRO.html))

### Operating Systems & Systems Programming

- [ZeRO Stage Memory Savings Comparisons](https://awesome-repositories.com/f/operating-systems-systems-programming/kernel-core-internals/process-and-memory-management/memory-management/allocation-strategies/dynamic-memory-allocation/gpu-memory-allocators/stage-based-memory-managers/zero-stage-memory-savings-comparisons.md) — Runs DDP, ZeRO-1, ZeRO-2, and ZeRO-3 on the same model and reports per-GPU peak memory and theoretical savings for each stage. ([source](https://infrasys-ai.github.io/aiinfra-docs/04Train02ParallelAdv/Code01ZeRO.html))
- [Paged KV Cache Management](https://awesome-repositories.com/f/operating-systems-systems-programming/kernel-core-internals/process-and-memory-management/memory-management/buffer-and-cache-management/paged-kv-cache-management.md) — Stores and retrieves key-value cache in non-contiguous pages with tiered migration for long sequences. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/05InferStack.html))

### Business & Productivity Software

- [Compute Budget Allocators](https://awesome-repositories.com/f/business-productivity-software/budgeting-tools/compute-budget-allocators.md) — Provides compute budget allocation between model parameters and training tokens for loss minimization. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/02StandardScaling.html))

### Scientific & Mathematical Computing

- [Low-Rank Adaptation](https://awesome-repositories.com/f/scientific-mathematical-computing/high-performance-execution-environments/scientific-computing-platforms/scientific-computing/matrix-operations/matrix-vector-products/low-rank-decompositions/low-rank-adaptation.md) — Inserts trainable low-rank matrices into frozen attention layers for efficient adaptation.

### Software Engineering & Architecture

- [Computational Efficiency](https://awesome-repositories.com/f/software-engineering-architecture/performance-reliability/performance-optimization/computational-efficiency.md) — Implements computational efficiency strategies for maximizing GPU utilization during model training. ([source](https://infrasys-ai.github.io/aiinfra-docs/00Summary/04TrainingStack.html))

### System Administration & Monitoring

- [Phase-Aware Schedulers](https://awesome-repositories.com/f/system-administration-monitoring/concurrency-management-systems/inference-batching-schedulers/phase-aware-schedulers.md) — Separates prefill and decode phases with dynamic batching and KV cache management.
