# blinkdl/rwkv-lm

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/blinkdl-rwkv-lm).**

14,568 stars · 1,008 forks · Python · Apache-2.0

## Links

- GitHub: https://github.com/BlinkDL/RWKV-LM
- awesome-repositories: https://awesome-repositories.com/repository/blinkdl-rwkv-lm.md

## Description

RWKV-LM is a framework for training and deploying recurrent language models. It utilizes a linear-time recurrent architecture that enables text generation and sequence processing with constant memory and time complexity, avoiding the quadratic scaling of traditional attention caches.

The project implements a parallelizable training mechanism that allows recurrent models to be trained using global operations while maintaining cache-free inference. It includes state-tuning capabilities to optimize the initial hidden state and utilizes adaptive probability-mass sampling to control token diversity during generation.

The system covers the full lifecycle of large language model development, including recurrent model training, custom fine-tuning via datasets, and high-dimensional text embedding extraction.

## Tags

### Artificial Intelligence & ML

- [Linear-Time Sequence Models](https://awesome-repositories.com/f/artificial-intelligence-ml/sequence-learning-models/linear-time-sequence-models.md) — Implements a linear-time recurrent architecture that processes sequences with constant memory and time complexity.
- [Generative Text Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/generative-ai/generative-text-inference.md) — Produces token sequences via a recurrent architecture that maintains infinite context length without a cache. ([source](https://github.com/blinkdl/rwkv-lm#readme))
- [Large Language Models](https://awesome-repositories.com/f/artificial-intelligence-ml/large-language-models.md) — Implements a recurrent architecture for processing long sequences and generating text without traditional cache overhead.
- [Recurrent Parallel Training](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/frameworks/training-systems/model-training-engines/transformer-training-engines/recurrent-parallel-training.md) — Ships a parallelizable training mechanism that combines transformer-like global operations with recurrent inference properties.
- [Language Model Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/language-model-fine-tuning.md) — Provides specialized workflows for adapting pre-trained recurrent language models to specific domains using custom datasets.
- [Model Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-fine-tuning.md) — Allows adjusting pre-trained model weights using JSONL datasets to adapt to specific domains. ([source](https://github.com/blinkdl/rwkv-lm#readme))
- [Model Training Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/training-frameworks/model-training-frameworks.md) — Provides infrastructure for training custom recurrent language models using parallel training techniques. ([source](https://github.com/blinkdl/rwkv-lm#readme))
- [Memory-Efficient Deep Learning](https://awesome-repositories.com/f/artificial-intelligence-ml/memory-efficient-deep-learning.md) — Enables high-performance model deployment with constant memory usage and linear time complexity.
- [Recurrent Neural Network Training](https://awesome-repositories.com/f/artificial-intelligence-ml/recurrent-neural-network-training.md) — Provides a framework for training recurrent neural networks using parallel processing to match transformer performance.
- [Adaptive Probability Sampling](https://awesome-repositories.com/f/artificial-intelligence-ml/adaptive-probability-sampling.md) — Utilizes adaptive probability-mass sampling to dynamically filter token candidates and control generation diversity.
- [Text Embedding Extraction](https://awesome-repositories.com/f/artificial-intelligence-ml/feature-extraction/text-embedding-extraction.md) — Provides a mechanism to extract high-dimensional vector representations of text from the model's internal hidden states.
- [Hidden State Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/neural-network-optimizers/weight-optimization-utilities/hidden-state-tuning.md) — Includes state-tuning capabilities to optimize the initial recurrent hidden state for better performance.
- [Adaptive Top-A Sampling](https://awesome-repositories.com/f/artificial-intelligence-ml/nucleus-sampling/adaptive-top-a-sampling.md) — Implements a dynamic threshold filtering mechanism to balance token variety and quality. ([source](https://github.com/blinkdl/rwkv-lm#readme))
- [Precise Top-P-X Sampling](https://awesome-repositories.com/f/artificial-intelligence-ml/nucleus-sampling/precise-top-p-x-sampling.md) — Combines probability mass thresholds with a minimum floor to optimize the quality of generated tokens. ([source](https://github.com/blinkdl/rwkv-lm#readme))
- [State Optimization](https://awesome-repositories.com/f/artificial-intelligence-ml/performance-tuning/state-optimization.md) — Optimizes the initial recurrent hidden state to refine model performance at no additional inference cost. ([source](https://github.com/blinkdl/rwkv-lm#readme))

### Operating Systems & Systems Programming

- [Cache-Free Inference](https://awesome-repositories.com/f/operating-systems-systems-programming/kernel-core-internals/process-and-memory-management/memory-management/inference-cache-management/cache-free-inference.md) — Implements a recurrent architecture that generates text without the memory overhead of a traditional KV cache.

### Part of an Awesome List

- [Alternative Model Architectures](https://awesome-repositories.com/f/awesome-lists/ai/alternative-model-architectures.md) — RNN-based architecture designed for efficient transformer-era sequence modeling.
- [General Purpose Models](https://awesome-repositories.com/f/awesome-lists/ai/general-purpose-models.md) — RNN-based architecture that scales like a transformer for efficient inference.