# QwenLM/Qwen

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/qwenlm-qwen).**

20,423 stars · 1,709 forks · Python · apache-2.0

## Links

- GitHub: https://github.com/QwenLM/Qwen
- awesome-repositories: https://awesome-repositories.com/repository/qwenlm-qwen.md

## Topics

`chinese` `flash-attention` `large-language-models` `llm` `natural-language-processing` `pretrained-models`

## Description

Qwen is a comprehensive framework for large language model development, serving, and deployment. It provides a complete ecosystem for transformer-based sequence modeling, offering base models alongside specialized tools for instruction-tuned alignment, fine-tuning, and long-context inference. The project is designed to support both research and production environments, enabling users to train, optimize, and host generative models locally or across distributed hardware.

The framework distinguishes itself through its focus on high-performance serving and extensibility. It features a high-performance inference engine that exposes OpenAI-compatible HTTP endpoints, allowing for integration into existing application architectures. To support complex workflows, it includes native capabilities for agentic tool use and function calling, which can be further refined through dedicated fine-tuning processes.

The platform covers a broad range of operational requirements, including model quantization, multi-device tensor parallelism, and memory-efficient key-value caching to optimize throughput and resource usage. It also provides robust utilities for benchmarking performance, managing system-level behaviors, and securing model endpoints through authentication and safety-aligned configurations.

The repository includes extensive documentation and scripts for model weight conversion, vocabulary expansion, and deployment across both CPU and GPU hardware.

## Tags

### Artificial Intelligence & ML

- [Large Language Models](https://awesome-repositories.com/f/artificial-intelligence-ml/large-language-models.md) — Provides base generative models trained on diverse datasets for reasoning, coding, and natural language tasks.
- [OpenAI-Compatible APIs](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/model-integration-serving/model-integration-interfaces/ai-integration-apis/openai-compatible-apis.md) — Provides local HTTP endpoints compatible with standard OpenAI API clients for seamless integration. ([source](https://github.com/QwenLM/Qwen#readme))
- [Sequence Learning Models](https://awesome-repositories.com/f/artificial-intelligence-ml/sequence-learning-models.md) — Processes input tokens through stacked attention layers to predict subsequent text based on learned statistical patterns.
- [Tool Calling](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/decoding-generation-controls/tool-calling.md) — Enables models to interpret natural language instructions and invoke external software tools for complex tasks. ([source](https://github.com/QwenLM/Qwen#readme))
- [LLM Fine-Tuning Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/frameworks/training-systems/model-training-engines/llm-fine-tuning-engines.md) — Includes specialized training tools and scripts for adapting model weights and vocabularies to specialized domains.
- [Large Language Model Fine-Tuning Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/integrated-development-platforms/machine-learning-platforms/large-language-model-fine-tuning-frameworks.md) — Provides a comprehensive framework for fine-tuning large language models on custom datasets to improve domain-specific performance.
- [Supervised Instruction Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/fine-tuning-and-alignment/supervised-instruction-fine-tuning.md) — Refines pretrained model weights using supervised datasets to ensure responses follow human intent and safety constraints.
- [High-Throughput Model Serving](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/inference-servers-and-runtimes/high-throughput-model-serving.md) — Wraps model execution in high-performance serving environments to increase throughput and reduce latency. ([source](https://github.com/QwenLM/Qwen/tree/main/examples))
- [Model Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-fine-tuning.md) — Supports efficient fine-tuning of internal model parameters using custom datasets to improve task-specific performance. ([source](https://github.com/QwenLM/Qwen#readme))
- [AI Agent Tool Integrations](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-agent-integrations/ai-agent-tool-integrations.md) — Enables models to interpret natural language instructions and invoke external software tools for complex workflows.
- [Hardware Acceleration Support](https://awesome-repositories.com/f/artificial-intelligence-ml/hardware-acceleration-support.md) — Utilizes specialized hardware kernels to speed up model calculations on compatible graphics processors. ([source](https://github.com/QwenLM/Qwen/blob/main/FAQ_zh.md))
- [Context Window Management](https://awesome-repositories.com/f/artificial-intelligence-ml/long-context-training-optimizations/context-window-management.md) — Optimizes serving environments to handle extended token windows and long-context sequences using advanced attention scaling.
- [Local AI Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/local-and-on-device-inference/local-ai-inference.md) — Supports local execution of generative models on CPU and GPU hardware to ensure data privacy and operational control.
- [Model Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving/model-integration-pipelines/model-inference.md) — Manages dependencies for model inference, tokenization, and text generation to ensure consistent performance across environments. ([source](https://github.com/QwenLM/Qwen/blob/main/requirements.txt))
- [Function Calling Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-fine-tuning/function-calling-fine-tuning.md) — Enables models to learn accurate function-calling patterns through specialized training on tool-interaction datasets. ([source](https://github.com/QwenLM/Qwen/tree/main/examples))
- [Model Quantization](https://awesome-repositories.com/f/artificial-intelligence-ml/model-quantization.md) — Reduces memory footprint and computational requirements by converting model weights to lower-bit precision for efficient deployment. ([source](https://github.com/QwenLM/Qwen/blob/main/run_gptq.py))
- [Tensor Parallelism](https://awesome-repositories.com/f/artificial-intelligence-ml/tensor-parallelism.md) — Splits large model layers across multiple graphics processors to distribute computational load and memory usage.
- [Positional Embedding Scaling](https://awesome-repositories.com/f/artificial-intelligence-ml/attention-mechanisms/positional-embedding-scaling.md) — Adjusts internal positional embeddings to maintain coherence and retrieval accuracy when processing inputs exceeding original training lengths.
- [Batch Inference Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/batch-inference-engines.md) — Handles multiple inputs simultaneously to increase throughput and improve the speed of text generation. ([source](https://github.com/QwenLM/Qwen/blob/main/README.md))
- [Inference Optimizations](https://awesome-repositories.com/f/artificial-intelligence-ml/long-context-training-optimizations/inference-optimizations.md) — Implements advanced attention scaling and cache techniques to maintain coherence over extended input sequences.
- [Preference-Based Model Alignments](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/fine-tuning-and-alignment/preference-based-model-alignments.md) — Refines pretrained models through fine-tuning to ensure responses follow human intent and maintain safety standards. ([source](https://github.com/QwenLM/Qwen/blob/main/tech_memo.md))
- [Model Performance Benchmarking](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-evaluation-analysis/model-analysis/model-performance-benchmarking.md) — Provides standardized evaluation scripts to benchmark model performance on reasoning, knowledge, and coding tasks. ([source](https://github.com/QwenLM/Qwen/blob/main/eval/EVALUATION.md))
- [Model Quantization Utilities](https://awesome-repositories.com/f/artificial-intelligence-ml/model-quantization-utilities.md) — Provides a toolkit for compressing model parameters into lower-bit formats to accelerate inference on various hardware.
- [Model Serving Interfaces](https://awesome-repositories.com/f/artificial-intelligence-ml/model-serving-interfaces.md) — Exposes generative language models through a standard web interface for external integration. ([source](https://github.com/QwenLM/Qwen/blob/main/openai_api.py))
- [System Prompts](https://awesome-repositories.com/f/artificial-intelligence-ml/system-prompts.md) — Allows users to set persistent character traits and system prompts to guide model responses consistently. ([source](https://github.com/QwenLM/Qwen/blob/main/examples/system_prompt.md))
- [Vocabulary Builders](https://awesome-repositories.com/f/artificial-intelligence-ml/embedding-adaptation-utilities/vocabulary-embedding-adapters/vocabulary-builders.md) — Supports adding custom tokens to an existing vocabulary by learning new merge rules from text frequency data. ([source](https://github.com/QwenLM/Qwen/blob/main/tokenization_note.md))
- [Model Validation Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/model-validation-tools.md) — Verifies model capabilities in executing external tools by testing performance against predefined task scenarios. ([source](https://github.com/QwenLM/Qwen/blob/main/eval/EVALUATION.md))
- [CPU Optimizations](https://awesome-repositories.com/f/artificial-intelligence-ml/cpu-optimizations.md) — Provides optimized implementations for running model inference on central processing units. ([source](https://github.com/QwenLM/Qwen/blob/main/README.md))
- [Byte Pair Encodings](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/language-tools/tokenization-algorithms/byte-pair-encodings.md) — Converts raw text into numerical sequences by iteratively merging frequent character pairs.
- [Text Tokenization Utilities](https://awesome-repositories.com/f/artificial-intelligence-ml/text-tokenization-utilities.md) — Converts raw text into numerical sequences using byte-pair encoding to prepare data for model processing. ([source](https://github.com/QwenLM/Qwen/blob/main/tokenization_note.md))

### Data & Databases

- [Cache Quantization](https://awesome-repositories.com/f/data-databases/storage-engines/key-value/cache-quantization.md) — Quantizes attention states to reduce memory footprint and increase throughput during long sequence generation.

### DevOps & Infrastructure

- [Model Conversion](https://awesome-repositories.com/f/devops-infrastructure/model-conversion.md) — Provides scripts and utilities for converting model weights into formats compatible with various inference backends. ([source](https://github.com/QwenLM/Qwen/tree/main/ascend-support))
- [Multi-GPU Deployment](https://awesome-repositories.com/f/devops-infrastructure/multi-gpu-deployment.md) — Implements tensor parallelism to distribute large model layers across multiple graphics cards for memory-efficient inference. ([source](https://github.com/QwenLM/Qwen/blob/main/utils.py))

### Operating Systems & Systems Programming

- [Paged KV Cache Management](https://awesome-repositories.com/f/operating-systems-systems-programming/kernel-core-internals/process-and-memory-management/memory-management/buffer-and-cache-management/paged-kv-cache-management.md) — Quantizes and compresses attention key-value states to reduce memory usage and support longer generation sequences. ([source](https://github.com/QwenLM/Qwen/blob/main/README.md))

### Web Development

- [Response Streaming Interfaces](https://awesome-repositories.com/f/web-development/response-streaming-interfaces.md) — Delivers model output incrementally as it is produced to provide immediate feedback during text generation. ([source](https://github.com/QwenLM/Qwen/blob/main/FAQ_ja.md))

### Security & Cryptography

- [API Access Control](https://awesome-repositories.com/f/security-cryptography/api-access-control.md) — Secures model endpoints by requiring valid authentication headers for programmatic access to inference services. ([source](https://github.com/QwenLM/Qwen/blob/main/openai_api.py))