# datawhalechina/so-large-lm

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/datawhalechina-so-large-lm).**

7,400 stars · 615 forks

## Links

- GitHub: https://github.com/datawhalechina/so-large-lm
- Homepage: https://datawhalechina.github.io/so-large-lm
- awesome-repositories: https://awesome-repositories.com/repository/datawhalechina-so-large-lm.md

## Description

This project is a comprehensive educational curriculum and structured learning path covering the full lifecycle of large language models. It provides a guided progression through the theory, architecture, training, and deployment of these models.

The curriculum includes specialized guides on transformer architecture, model training tutorials, and frameworks for designing autonomous agents. It also provides dedicated resources for studying model safety and ethics.

The material covers a wide range of technical capabilities, including distributed training strategies, parameter-efficient fine-tuning, and the implementation of retrieval-augmented generation. It further addresses data curation, content moderation, and the analysis of environmental and legal impacts associated with model development.

## Tags

### Education & Learning Resources

- [Large Language Model Curricula](https://awesome-repositories.com/f/education-learning-resources/deep-learning-curriculum/large-language-model-curricula.md) — Provides a structured educational program for learning the engineering and application of large language models.
- [LLM Training Modules](https://awesome-repositories.com/f/education-learning-resources/educational-resources/courses-training-certifications/software-engineering-training-courses/data-engineering-training/llm-training-modules.md) — Provides instructional modules on data preparation, tokenization, and distributed training strategies for large models.
- [LLM Education](https://awesome-repositories.com/f/education-learning-resources/llm-education.md) — Provides a comprehensive educational curriculum covering the theory, architecture, training, and deployment of large language models.
- [Transformer Architecture Walkthroughs](https://awesome-repositories.com/f/education-learning-resources/neural-network-tutorials/transformer-architecture-walkthroughs.md) — Ships a structured tutorial explaining transformer internals, attention mechanisms, and decoder-only architectures.

### Artificial Intelligence & ML

- [Autonomous Agent Designers](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-llm-frameworks/autonomous-agent-designers.md) — Provides design patterns and architectural frameworks for building autonomous systems that use LLMs as reasoning engines. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch13.md))
- [Agentic Reasoning Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-reasoning-frameworks.md) — Outlines architectural patterns that integrate internal reasoning loops with external tool execution.
- [AI Ethics and Fairness](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-ethics-and-fairness.md) — Examines social bias, hallucinations, copyright law, and ethical considerations in the development of LLMs.
- [LLM Safety and Ethics](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-ethics-and-fairness/llm-safety-and-ethics.md) — Includes a dedicated educational module examining social bias, hallucination, copyright, and environmental impact.
- [Attention Computations](https://awesome-repositories.com/f/artificial-intelligence-ml/attention-based-pair-representations/attention-computations.md) — Details the fundamental computation of matching queries against key-value pairs to produce weighted token combinations. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch03.md))
- [Adapter Layers](https://awesome-repositories.com/f/artificial-intelligence-ml/backbone-integrations/adapter-layers.md) — Explains the use of lightweight trainable adapter layers to specialize frozen pretrained models for specific tasks. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch07.md))
- [Bottleneck Layers](https://awesome-repositories.com/f/artificial-intelligence-ml/backbone-integrations/adapter-layers/bottleneck-layers.md) — Covers the implementation of bottleneck residual networks for efficient model adaptation. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch07.md))
- [Search-Based Reasoning Strategies](https://awesome-repositories.com/f/artificial-intelligence-ml/branching-reasoning-explorations/search-based-reasoning-strategies.md) — Explores multiple reasoning paths using search strategies to select the most promising route for problem solving. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch13.md))
- [Chain-of-Thought Prompting](https://awesome-repositories.com/f/artificial-intelligence-ml/chain-of-thought-prompting.md) — Covers chain-of-thought prompting techniques to enable complex reasoning through structured intermediate steps. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch13.md))
- [Attention Weight Injection](https://awesome-repositories.com/f/artificial-intelligence-ml/context-injection/runtime-context-injections/attention-weight-injection.md) — Teaches the implementation of learnable attention weights to capture structural dependencies. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch07.md))
- [Retrieval-Augmented Generation](https://awesome-repositories.com/f/artificial-intelligence-ml/conversational-interfaces/retrieval-augmented-generation.md) — Teaches how to ground language model responses by retrieving relevant documents from a search index. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch04.md))
- [Decoder Architectures](https://awesome-repositories.com/f/artificial-intelligence-ml/decoder-architectures.md) — Explains transformer structures utilizing causal attention mechanisms for autoregressive sequence generation.
- [Distributed Training](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-training-frameworks/distributed-training.md) — Teaches data and model parallelism strategies to train large neural networks across multiple devices.
- [Prompt Embedding Adaptation](https://awesome-repositories.com/f/artificial-intelligence-ml/embedding-model-fine-tuning/prompt-embedding-adaptation.md) — Provides a guide on tuning model behavior by prepending learnable continuous token embeddings. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch07.md))
- [Encoder-Decoder Architectures](https://awesome-repositories.com/f/artificial-intelligence-ml/encoder-decoder-architectures.md) — Covers neural network designs that map input sequences to output sequences via intermediate representations. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch06.md))
- [External Tool Integrations](https://awesome-repositories.com/f/artificial-intelligence-ml/external-service-integrations/external-knowledge-integrators/external-tool-integrations.md) — Explains how to connect AI assistants to external utilities like search APIs and browser automation tools. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch13.md))
- [Full Parameter Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/full-parameter-fine-tuning.md) — Guides users through full-parameter fine-tuning workflows to optimize core weights and task-specific heads. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch07.md))
- [LLM Application Development](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/generative-ai/llm-application-development.md) — Guides the development of autonomous agents and retrieval-augmented generation systems using language models.
- [Human Preference Alignment](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-fine-tuning/quantized-fine-tuning/human-preference-alignment.md) — Provides resources on using human feedback and reinforcement learning to align model outputs with safety guidelines. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch07.md))
- [Byte Pair Encodings](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/language-tools/tokenization-algorithms/byte-pair-encodings.md) — Covers subword tokenization using iterative character pair merging to build model vocabularies.
- [Multi-Head Attention Mechanisms](https://awesome-repositories.com/f/artificial-intelligence-ml/multi-head-attention-mechanisms.md) — Provides detailed instruction on running parallel attention heads to capture diverse dependencies within text. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch03.md))
- [Text Tokenization](https://awesome-repositories.com/f/artificial-intelligence-ml/natural-language-processing/text-tokenization.md) — Includes guides on transforming raw strings into sequences using BPE or Unigram tokenization models. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch03.md))
- [Contextual Embeddings](https://awesome-repositories.com/f/artificial-intelligence-ml/natural-language-processing/word-embeddings/contextual-embeddings.md) — Explains how transformer architectures map token sequences into vectors that capture meaning based on surrounding context. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch03.md))
- [Parameter Efficient Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/parameter-efficient-fine-tuning.md) — Offers comprehensive tutorials on memory-efficient fine-tuning by updating only a small subset of model parameters. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch07.md))
- [Prefix Vector Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/prefix-vector-tuning.md) — Explains how to use learnable prefix vectors in attention layers for task-specific adaptation. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch07.md))
- [Prompt Embedding Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/prompt-embedding-tuning.md) — Details how to optimize model inputs using learnable continuous token embeddings. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch07.md))
- [Reinforcement Learning Alignment](https://awesome-repositories.com/f/artificial-intelligence-ml/reinforcement-learning-alignment.md) — Details reinforcement learning alignment techniques to reduce toxicity and improve human value adherence. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch07.md))
- [Representation Probing](https://awesome-repositories.com/f/artificial-intelligence-ml/representation-probing.md) — Provides tutorials on using lightweight prediction layers to probe learned features in frozen models. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch07.md))
- [Self-Attention Mechanisms](https://awesome-repositories.com/f/artificial-intelligence-ml/self-attention-mechanisms.md) — Explains the mechanism where each token in a sequence is compared to all others to exchange information. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch03.md))
- [Dynamic Plan Refinement](https://awesome-repositories.com/f/artificial-intelligence-ml/task-planning-systems/dynamic-plan-refinement.md) — Details processes for adjusting task subgoals and action sequences based on real-time environmental feedback. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch13.md))
- [Encoder-Only Classification Methods](https://awesome-repositories.com/f/artificial-intelligence-ml/text-classification/encoder-only-classification-methods.md) — Details bidirectional text classification using contextual embeddings for tasks like sentiment analysis. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch03.md))
- [Token Prediction](https://awesome-repositories.com/f/artificial-intelligence-ml/text-generation-strategies/token-prediction.md) — Details the specific token-level prediction mechanisms used to generate text sequences.
- [Transformer Architectures](https://awesome-repositories.com/f/artificial-intelligence-ml/transformer-architectures.md) — Provides detailed study of transformer internals, including attention mechanisms, tokenization, and decoder architectures.
- [Encoder-Decoder Training Methods](https://awesome-repositories.com/f/artificial-intelligence-ml/artificial-intelligence-tooling/encoder-decoder-model-integrations/encoder-decoder-training-methods.md) — Explains training methods for sequence-to-sequence encoder-decoder architectures. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch06.md))
- [Attention Mechanisms](https://awesome-repositories.com/f/artificial-intelligence-ml/attention-mechanisms.md) — Describes the mathematical implementation of the query-key-value attention mechanism.
- [Text Dataset Curators](https://awesome-repositories.com/f/artificial-intelligence-ml/dataset-management/evaluation-datasets/dataset-curation/text-dataset-curators.md) — Provides guidance on pipelines that filter, format, and deduplicate text data to create high-quality LLM training datasets. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch05.md))
- [Representational Harm Auditors](https://awesome-repositories.com/f/artificial-intelligence-ml/dataset-quality-analyzers/representational-harm-auditors.md) — Guides the analysis of training data for representational harms and demographic imbalances. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch05.md))
- [Environmental Impact Assessment](https://awesome-repositories.com/f/artificial-intelligence-ml/environmental-impact-assessment.md) — Includes tools and methods for quantifying the energy and carbon footprint of machine learning workloads. ([source](https://cdn.jsdelivr.net/gh/datawhalechina/so-large-lm@main/README.md))
- [Feed-Forward Network Layers](https://awesome-repositories.com/f/artificial-intelligence-ml/feed-forward-network-layers.md) — Describes the two-layer nonlinear transformations used to add representational capacity to each token. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch03.md))
- [One-Shot Learning](https://awesome-repositories.com/f/artificial-intelligence-ml/few-shot-learning-baselines/one-shot-learning.md) — Explains machine learning techniques for executing tasks based on a single training example. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch02.md))
- [Few-Shot Text Learning](https://awesome-repositories.com/f/artificial-intelligence-ml/few-shot-text-learning.md) — Covers techniques for text classification and entity extraction using minimal labeled examples in prompts. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch01.md))
- [Reliability Optimizations](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-training/scalable-generative-ai-model-training/reliability-optimizations.md) — Explains how to implement automated error detection and scalable storage to minimize downtime during long training runs. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch14.md))
- [Toxicity Evaluation Metrics](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-model-evaluation/toxicity-evaluation-metrics.md) — Provides metrics for measuring the probability of toxic completions generated from specific prompts. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch10.md))
- [Multi-Node Training Scaling](https://awesome-repositories.com/f/artificial-intelligence-ml/gpu-model-deployments/multi-node-training-scaling.md) — Provides techniques for distributing model and data across multiple hardware nodes to scale training. ([source](https://cdn.jsdelivr.net/gh/datawhalechina/so-large-lm@main/README.md))
- [Data Parallelism](https://awesome-repositories.com/f/artificial-intelligence-ml/gradient-computation/distributed-gradient-synchronization/data-parallelism.md) — Explains how to split datasets across multiple devices and synchronize gradients using AllReduce. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch08.md))
- [Learning Rate Schedulers](https://awesome-repositories.com/f/artificial-intelligence-ml/learning-rate-schedulers.md) — Describes algorithms for dynamically adjusting learning rates and momentum to improve convergence. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch06.md))
- [LLM Operational Cost Optimization](https://awesome-repositories.com/f/artificial-intelligence-ml/llm-operational-cost-optimization.md) — Provides strategies to minimize financial and environmental expenses associated with training and running models. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch01.md))
- [LLM Training Cost Estimation](https://awesome-repositories.com/f/artificial-intelligence-ml/llm-training-cost-estimation.md) — Teaches how to calculate financial and energy costs associated with training and running large models. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch01.md))
- [Logical and Arithmetic Reasoning](https://awesome-repositories.com/f/artificial-intelligence-ml/logical-and-arithmetic-reasoning.md) — Covers the capability of models to solve non-linguistic logic and arithmetic problems. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch02.md))
- [Long-Form Text Generation](https://awesome-repositories.com/f/artificial-intelligence-ml/long-form-text-generation.md) — Teaches the capability to produce extended, coherent sequences of natural language text. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch02.md))
- [Perplexity Calculators](https://awesome-repositories.com/f/artificial-intelligence-ml/loss-function-utilities/average-loss-calculators/perplexity-calculators.md) — Includes utilities for calculating model perplexity using cross-entropy loss to assess predictive quality. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch02.md))
- [Sequential Text Processing Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/frameworks/model-construction/neural-network-layers/convolution-layers/layered-architectures/sequential-text-processing-pipelines.md) — Teaches how recursive hidden states are computed using RNN, LSTM, or GRU cells for sequential text processing. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch03.md))
- [Mixed Precision Training](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/distributed-and-accelerated-compute/training-acceleration-tools/mixed-precision-training.md) — Covers techniques for using lower-bit precision formats to accelerate training and reduce memory usage. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch06.md))
- [Mixture of Experts](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-customization/mixture-of-experts.md) — Implements routing and recording of expert paths within Mixture-of-Experts architectures. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch04.md))
- [Expert Load Balancers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-customization/mixture-of-experts/expert-selection-analysis/expert-load-balancers.md) — Explains strategies for routing traffic across model experts to prevent computational bottlenecks. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch04.md))
- [Multi-Stage Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-fine-tuning/quantized-fine-tuning/domain-adaptation/multi-stage-pipelines.md) — Explains the progression from self-supervised pretraining through supervised fine-tuning to reinforcement learning from human feedback.
- [Model Parallelism](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/training-frameworks/model-training-pipelines/model-parallelism.md) — Covers strategies for partitioning model parameters across multiple GPUs to exceed single-device memory limits. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch08.md))
- [Masked Language Modeling](https://awesome-repositories.com/f/artificial-intelligence-ml/masked-language-modeling.md) — Explains pre-training techniques using masked language modeling to build bidirectional representations. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch06.md))
- [Model Auditing Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/model-auditing-tools.md) — Provides a framework for auditing model performance disparities across different demographic groups. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch09.md))
- [Decoder-Only Training Methods](https://awesome-repositories.com/f/artificial-intelligence-ml/model-training/decoder-only-training-methods.md) — Covers training procedures for autoregressive decoder-only models using causal left-to-right context. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch06.md))
- [Encoder-Only Training Methods](https://awesome-repositories.com/f/artificial-intelligence-ml/model-training/encoder-only-training-methods.md) — Provides guides on training bidirectional encoder-only models using masked language modeling. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch06.md))
- [N-Gram Language Models](https://awesome-repositories.com/f/artificial-intelligence-ml/natural-language-processing/word-embeddings/skip-gram-model-architectures/n-gram-language-models.md) — Explains statistical n-gram models for predicting word sequences. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch01.md))
- [Long-Range Dependency Mechanisms](https://awesome-repositories.com/f/artificial-intelligence-ml/natural-language-processing/word-embeddings/skip-gram-model-architectures/n-gram-language-models/long-range-dependency-mechanisms.md) — Covers techniques for capturing dependencies between distant tokens to overcome the limitations of n-gram models. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch01.md))
- [Stabilization Techniques](https://awesome-repositories.com/f/artificial-intelligence-ml/neural-network-architectures/stabilization-techniques.md) — Covers the use of residual connections and layer normalization to ensure stable neural network training. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch03.md))
- [Pipeline Parallelisms](https://awesome-repositories.com/f/artificial-intelligence-ml/neural-networks/model-training-pipelines/pipeline-parallelisms.md) — Describes the strategy of splitting a neural network into sequential stages assigned to different devices. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch08.md))
- [Adam Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/optimization-algorithms/adam-optimizers.md) — Provides educational material on Adam optimizers for stable gradient-based optimization. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch06.md))
- [Sinusoidal Encodings](https://awesome-repositories.com/f/artificial-intelligence-ml/positional-encoding-techniques/sinusoidal-encodings.md) — Explains the use of fixed sine and cosine functions to encode sequence order in transformer models.
- [Prompt-Based Language Switching](https://awesome-repositories.com/f/artificial-intelligence-ml/prompt-based-language-switching.md) — Provides guidance on changing output languages via instructions within the prompt context. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch02.md))
- [Prompt-Based Text Generation](https://awesome-repositories.com/f/artificial-intelligence-ml/prompt-based-text-generation.md) — Covers generating text sequences conditioned on input prompts using generative models. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch01.md))
- [Retrieval Augmented Generation Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/retrieval-augmented-generation-pipelines.md) — Describes the end-to-end pipeline for integrating external data retrieval with language model generation.
- [Sequence-to-Sequence Transformer Architectures](https://awesome-repositories.com/f/artificial-intelligence-ml/sequence-to-sequence-transformer-architectures.md) — Explains integrated transformer architectures that combine encoder and decoder components for translation. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch03.md))
- [Supervised Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/supervised-fine-tuning.md) — Provides tutorials on adapting pre-trained models using labeled instruction datasets for specific tasks. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch14.md))
- [Task Decompositions](https://awesome-repositories.com/f/artificial-intelligence-ml/task-decompositions.md) — Explains methods for breaking down complex objectives into a sequence of executable sub-tasks. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch13.md))
- [Decoding-Based Toxicity Reducers](https://awesome-repositories.com/f/artificial-intelligence-ml/toxic-behavior-removal/decoding-based-toxicity-reducers.md) — Details techniques for reducing toxic output via data-based training and decoding guidance. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch10.md))
- [Source Documenters](https://awesome-repositories.com/f/artificial-intelligence-ml/training-data-curators/source-documenters.md) — Offers resources on documenting the origin, composition, and curation process of datasets used for model training. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch05.md))
- [Training Memory Management](https://awesome-repositories.com/f/artificial-intelligence-ml/training-memory-management.md) — Teaches utilities for optimizing memory usage and handling hardware constraints during large-scale training. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch06.md))
- [Transformer Blocks](https://awesome-repositories.com/f/artificial-intelligence-ml/transformer-blocks.md) — Explains the interaction between self-attention and feed-forward layers within a transformer block. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch03.md))
- [Weight Initialization](https://awesome-repositories.com/f/artificial-intelligence-ml/weight-initialization.md) — Describes methods for setting initial neural network parameter values to improve stability. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch06.md))
- [Zero-Shot Generalization Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/zero-shot-generalization-tuning.md) — Covers strategies for improving zero-shot performance through diverse prompt format fine-tuning. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch07.md))

### Part of an Awesome List

- [LLM Training and Optimization](https://awesome-repositories.com/f/awesome-lists/ai/llm-training-and-optimization.md) — Manages large-scale model development including distributed training and parameter-efficient fine-tuning.
- [Transformer Representation Analysis](https://awesome-repositories.com/f/awesome-lists/ai/representation-learning-and-analysis/interpretable-representation-analysis/transformer-representation-analysis.md) — Explains how to analyze internal transformer representations using linear and feed-forward probes. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch07.md))
- [Probability-Based Bias Detection](https://awesome-repositories.com/f/awesome-lists/ai/bias-and-fairness/bias-detection-loops/probability-based-bias-detection.md) — Covers the detection of social bias by analyzing probability differences across demographic groups. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch09.md))
- [Conditional Generation Tasks](https://awesome-repositories.com/f/awesome-lists/ai/downstream-vision-tasks/downstream-task-enhancers/conditional-generation-tasks.md) — Demonstrates the use of conditional generation for tasks like question answering and article creation. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch01.md))
- [Natural Language Understanding](https://awesome-repositories.com/f/awesome-lists/ai/natural-language-understanding.md) — Covers the use of encoder-only architectures for generating contextual embeddings used in classification tasks. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch03.md))
- [Poisoning Detections](https://awesome-repositories.com/f/awesome-lists/ai/poisoning-attack-implementations/poisoning-detections.md) — Covers the identification and mitigation of malicious training data used to corrupt model behavior. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch01.md))
- [From-Scratch Training](https://awesome-repositories.com/f/awesome-lists/ai/pre-trained-models/from-scratch-training.md) — Teaches the process of training foundational models from scratch without pre-existing weights. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch06.md))
- [Question Answering](https://awesome-repositories.com/f/awesome-lists/ai/question-answering.md) — Explains prompting strategies for generating answers using only the model's internal knowledge. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch02.md))
- [Training Carbon Footprint Estimation](https://awesome-repositories.com/f/awesome-lists/devops/cloud-carbon-management/training-carbon-footprint-estimation.md) — Provides methods to calculate the carbon footprint of training based on hardware, runtime, and energy sources. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch12.md))
- [AI and Machine Learning](https://awesome-repositories.com/f/awesome-lists/ai/ai-and-machine-learning.md) — Deep dives into the architecture of massive language models.

### Data & Databases

- [Agent Memory Management](https://awesome-repositories.com/f/data-databases/session-management/agent-memory-management.md) — Teaches systems for storing and retrieving long-term memories and execution traces to maintain interaction continuity. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch13.md))
- [K-Nearest Neighbor Retrieval](https://awesome-repositories.com/f/data-databases/k-nearest-neighbor-retrieval.md) — Provides educational material on k-nearest neighbor retrieval for identifying similar sequences. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch04.md))
- [Optimizer State Compression](https://awesome-repositories.com/f/data-databases/memory-optimization-strategies/training-memory-optimizers/optimizer-state-compression.md) — Details the use of AdaFactor to reduce training memory via low-rank momentum approximations. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch06.md))

### Security & Cryptography

- [AI Risk Assessments](https://awesome-repositories.com/f/security-cryptography/ai-risk-assessments.md) — Provides processes for evaluating the security posture and misinformation risks of AI systems. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch01.md))
- [Content Moderation](https://awesome-repositories.com/f/security-cryptography/content-moderation.md) — Covers the classification of user text as harmful or acceptable using trained moderation models. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch10.md))
- [Disinformation Detection](https://awesome-repositories.com/f/security-cryptography/content-moderation-filters/disinformation-detection.md) — Teaches how to identify false or misleading content by analyzing deceptive intent in text. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch10.md))
- [Few-Shot Harm Detection Models](https://awesome-repositories.com/f/security-cryptography/content-moderation/few-shot-harm-detection-models.md) — Covers identifying hate speech and incitement to violence using few-shot learning models. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch10.md))
- [Lifecycle Policies](https://awesome-repositories.com/f/security-cryptography/data-governance-policies/lifecycle-policies.md) — Establishes policies for the responsible creation, maintenance, and use of datasets throughout their entire lifecycle. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch05.md))
- [LLM Safety Enforcers](https://awesome-repositories.com/f/security-cryptography/request-authorization-enforcers/llm-safety-enforcers.md) — Provides resources for evaluating and enforcing controls against jailbreaks, PII leaks, and hallucinations. ([source](https://cdn.jsdelivr.net/gh/datawhalechina/so-large-lm@main/README.md))

### System Administration & Monitoring

- [Energy Consumption Analyzers](https://awesome-repositories.com/f/system-administration-monitoring/energy-management/energy-consumption-analyzers.md) — Provides utilities for measuring and reporting the power usage and carbon footprint of software workloads. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch12.md))

### Testing & Quality Assurance

- [Dataset Contamination Detection](https://awesome-repositories.com/f/testing-quality-assurance/dataset-contamination-detection.md) — Teaches methods for identifying overlap between training data and evaluation benchmarks to ensure valid performance metrics. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch05.md))
- [Naturalness Scoring](https://awesome-repositories.com/f/testing-quality-assurance/response-quality-scoring/naturalness-scoring.md) — Explains how to use probability scores for token sequences to evaluate the naturalness of generated text. ([source](https://github.com/datawhalechina/so-large-lm/blob/main/docs/content/ch01.md))