# zai-org/chatglm3

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/zai-org-chatglm3).**

13,764 stars · 1,614 forks · Python · apache-2.0

## Links

- GitHub: https://github.com/zai-org/ChatGLM3
- awesome-repositories: https://awesome-repositories.com/repository/zai-org-chatglm3.md

## Description

ChatGLM3 is a comprehensive framework for deploying, fine-tuning, and serving large language models. It functions as a high-performance inference engine designed to support conversational AI, enabling developers to build interactive agents capable of multi-turn dialogue, autonomous code execution, and structured tool invocation.

The project distinguishes itself through its focus on hardware-agnostic deployment and resource optimization. It supports distributed model parallelism across multiple graphics cards, paged key-value caching for concurrent request processing, and weight quantization to reduce memory footprints. These capabilities allow the system to run on diverse hardware, including specialized acceleration backends for Apple Silicon and high-performance production environments.

Beyond inference, the framework provides a complete pipeline for model adaptation. It includes tools for fine-tuning base models on custom datasets, managing training checkpoints, and configuring optimization parameters. The system also features a sandboxed environment for executing dynamically generated code and a standardized message formatting protocol to ensure secure, consistent interactions between the model and external tools.

The repository includes support for deploying web-based interactive interfaces and standard-compliant API servers for integration into external applications.

## Tags

### Artificial Intelligence & ML

- [Conversational AI Agents](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/conversational-voice-interaction/conversational-ai-agents.md) — Provides a comprehensive framework for building interactive agents capable of multi-turn dialogue, autonomous code execution, and structured tool invocation. ([source](https://github.com/zai-org/ChatGLM3/tree/main/composite_demo))
- [Large Language Models](https://awesome-repositories.com/f/artificial-intelligence-ml/large-language-models.md) — Enables local deployment and hosting of large language models across diverse hardware configurations.
- [Local AI Runtimes](https://awesome-repositories.com/f/artificial-intelligence-ml/local-ai-runtimes.md) — Functions as a high-performance runtime for executing large language models locally on diverse hardware.
- [Large Language Model Fine-Tuning Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/integrated-development-platforms/machine-learning-platforms/large-language-model-fine-tuning-frameworks.md) — Serves as a comprehensive toolkit for deploying, fine-tuning, and serving conversational AI models.
- [Model Serving APIs](https://awesome-repositories.com/f/artificial-intelligence-ml/model-serving-apis.md) — Exposes model inference capabilities through standard web interfaces and protocols for external application integration. ([source](https://github.com/zai-org/ChatGLM3/blob/main/requirements.txt))
- [AI Agent](https://awesome-repositories.com/f/artificial-intelligence-ml/agent-architectures/orchestration-engines/ai-agent.md) — Enables language models to act as agents by executing external code and interacting with custom functions.
- [Conversational Interfaces](https://awesome-repositories.com/f/artificial-intelligence-ml/artificial-intelligence-tooling/chat-conversational-interfaces/conversational-interfaces.md) — Interacts with users through a chat interface to provide responses and maintain context during multi-turn conversations. ([source](https://github.com/zai-org/ChatGLM3/blob/main/composite_demo/README.md))
- [Conversational AI](https://awesome-repositories.com/f/artificial-intelligence-ml/conversational-ai.md) — Executes optimized language models to generate conversational text responses. ([source](https://github.com/zai-org/ChatGLM3/blob/main/Intel_device_demo/openvino_demo/README.md))
- [Generative Text Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/generative-ai/generative-text-inference.md) — Executes text generation sessions using compiled model engines, supporting multi-turn conversations and configurable sampling parameters. ([source](https://github.com/zai-org/ChatGLM3/blob/main/tensorrt_llm_demo/tensorrt_llm_cli_demo.py))
- [Language Model Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/language-model-fine-tuning.md) — Trains models on custom datasets using multi-turn conversation formats and specialized tool-calling interactions. ([source](https://github.com/zai-org/ChatGLM3#readme))
- [LLM Fine-Tuning Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/frameworks/training-systems/model-training-engines/llm-fine-tuning-engines.md) — Includes a complete pipeline for fine-tuning base models on custom datasets with checkpoint management.
- [Model Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-fine-tuning.md) — Adapts base models to specific datasets using a training framework to improve performance on specialized tasks. ([source](https://github.com/zai-org/ChatGLM3/blob/main/README_en.md))
- [Text Generation APIs](https://awesome-repositories.com/f/artificial-intelligence-ml/text-generation-apis.md) — Processes natural language prompts to produce conversational text, supporting both standard request-response cycles and incremental streaming of output tokens. ([source](https://github.com/zai-org/ChatGLM3/blob/main/openai_api_demo/openai_api_request.py))
- [Conversation History Managers](https://awesome-repositories.com/f/artificial-intelligence-ml/artificial-intelligence-tooling/language-model-integrations/conversation-history-managers.md) — Maintains context across a sequence of user and assistant exchanges by assigning specific roles to each message in the dialogue history. ([source](https://github.com/zai-org/ChatGLM3/blob/main/PROMPT_en.md))
- [Code Execution Environments](https://awesome-repositories.com/f/artificial-intelligence-ml/code-execution-environments.md) — Runs generated code within a sandboxed environment to perform complex tasks like data visualization or symbolic computation. ([source](https://github.com/zai-org/ChatGLM3/blob/main/composite_demo/README.md))
- [Conversational AI APIs](https://awesome-repositories.com/f/artificial-intelligence-ml/conversational-ai-apis.md) — Exposes programmatic interfaces for interacting with chat-based AI systems and managing conversation histories.
- [Conversational AI Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/conversational-ai-frameworks.md) — Provides a framework for building interactive chat applications with support for multi-turn dialogue and tool invocation.
- [Distributed Training](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-training-frameworks/distributed-training.md) — Runs fine-tuning jobs across single or multiple graphics processing units using accelerated backends to optimize performance. ([source](https://github.com/zai-org/ChatGLM3/blob/main/finetune_demo/README.md))
- [External Tool Execution](https://awesome-repositories.com/f/artificial-intelligence-ml/external-tool-execution.md) — Performs external operations like mathematical calculations or code interpretation by invoking integrated tools through a standardized calling environment. ([source](https://github.com/zai-org/ChatGLM3/blob/main/README.md))
- [Structured Tool Invocations](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/decoding-generation-controls/tool-calling/structured-tool-invocations.md) — Parses model output into defined function calls and parameters to enable autonomous interaction with external data and code environments.
- [Local Model Management](https://awesome-repositories.com/f/artificial-intelligence-ml/local-model-management.md) — Includes web-based interactive interfaces for local model interaction and demonstration. ([source](https://github.com/zai-org/ChatGLM3/tree/main/composite_demo))
- [AI Service Integrations](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/inference-servers-and-runtimes/machine-learning-model-apis/ai-service-integrations.md) — Provides standard-compliant API interfaces for integrating conversational AI capabilities into external applications.
- [Model Inference Servers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving/engines-runtimes-servers/model-inference-servers.md) — Deploys scalable network services for high-volume production inference. ([source](https://github.com/zai-org/ChatGLM3/blob/main/tensorrt_llm_demo/README.md))
- [Inference Optimization](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving/inference-optimization.md) — Implements quantization and hardware-specific acceleration to optimize inference speed and memory usage.
- [Model Capability Extensions](https://awesome-repositories.com/f/artificial-intelligence-ml/model-capability-extensions.md) — Registers custom functions as external tools that the model can invoke using function metadata. ([source](https://github.com/zai-org/ChatGLM3/tree/main/composite_demo))
- [AI Agent Development](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-agent-development.md) — Provides tools for building interactive systems that allow models to execute code and interact with external functions.
- [External Tool Integration](https://awesome-repositories.com/f/artificial-intelligence-ml/external-tool-integration.md) — Parses model requests into structured data and feeds tool execution results back into the conversation flow to generate informed responses. ([source](https://github.com/zai-org/ChatGLM3/tree/main/tools_using_demo))
- [Hardware-Accelerated Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/hardware-accelerated-inference.md) — Utilizes specialized hardware to minimize latency and maximize text generation speed. ([source](https://github.com/zai-org/ChatGLM3/blob/main/tensorrt_llm_demo/README.md))
- [Hardware Acceleration Backends](https://awesome-repositories.com/f/artificial-intelligence-ml/hardware-acceleration-backends.md) — Utilizes optimized compute kernels and specialized acceleration libraries to maximize performance across diverse graphics and system processors.
- [Fine-tuned Model Deployment](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-fine-tuning/fine-tuned-model-deployment.md) — Integrates custom-trained model weights into inference pipelines by loading adapter configurations alongside base model architectures. ([source](https://github.com/zai-org/ChatGLM3/blob/main/finetune_demo/README.md))
- [Distributed Deployment Utilities](https://awesome-repositories.com/f/artificial-intelligence-ml/model-optimization/inference-deployment/model-deployment-toolkits/distributed-deployment-utilities.md) — Supports scaling model inference across multiple hardware devices using parameter sharding. ([source](https://github.com/zai-org/ChatGLM3/blob/main/DEPLOYMENT_en.md))
- [Multi-GPU Distribution](https://awesome-repositories.com/f/artificial-intelligence-ml/model-optimization/inference-deployment/model-deployment-toolkits/distributed-deployment-utilities/multi-gpu-distribution.md) — Enables inference on large models by splitting parameters across multiple graphics cards. ([source](https://github.com/zai-org/ChatGLM3/blob/main/DEPLOYMENT.md))
- [Model Quantization](https://awesome-repositories.com/f/artificial-intelligence-ml/model-quantization.md) — Supports weight quantization to reduce memory footprints for efficient model deployment on limited hardware.
- [Precision Quantization](https://awesome-repositories.com/f/artificial-intelligence-ml/precision-quantization.md) — Reduces the memory footprint of large language models by converting weights to lower precision for execution on hardware with limited memory.
- [Training Checkpointing](https://awesome-repositories.com/f/artificial-intelligence-ml/training-checkpointing.md) — Restarts interrupted training sessions from specific saved states to maintain progress without starting the entire process over. ([source](https://github.com/zai-org/ChatGLM3/tree/main/finetune_demo))
- [Training Parameter Configurations](https://awesome-repositories.com/f/artificial-intelligence-ml/training-configurations/training-parameter-configurations.md) — Defines training workflows, optimization settings, and hardware acceleration strategies through centralized configuration files. ([source](https://github.com/zai-org/ChatGLM3/tree/main/finetune_demo))
- [Model Conversion Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/serving-and-runtime/large-language-model-optimization/model-inference-optimizations/model-conversion-tools.md) — Transforms standard language models into optimized intermediate representations to improve execution speed and reduce memory usage. ([source](https://github.com/zai-org/ChatGLM3/blob/main/Intel_device_demo/openvino_demo/README.md))
- [Memory Optimization Techniques](https://awesome-repositories.com/f/artificial-intelligence-ml/memory-optimization-techniques.md) — Reduces memory footprint through quantization to enable execution on resource-constrained hardware. ([source](https://github.com/zai-org/ChatGLM3/blob/main/DEPLOYMENT_en.md))
- [Model Deployment Toolkits](https://awesome-repositories.com/f/artificial-intelligence-ml/model-optimization/inference-deployment/model-deployment-toolkits.md) — Supports high-performance inference deployment using specialized hardware acceleration toolkits. ([source](https://github.com/zai-org/ChatGLM3/tree/main/tensorrt_llm_demo))
- [Training Checkpointers](https://awesome-repositories.com/f/artificial-intelligence-ml/training-checkpointers.md) — Restarts interrupted training processes from specific checkpoints or the latest saved state to avoid losing progress. ([source](https://github.com/zai-org/ChatGLM3/blob/main/finetune_demo/README.md))
- [Chat Message Formats](https://awesome-repositories.com/f/artificial-intelligence-ml/chat-message-formats.md) — Structures conversational history into role-based sequences to ensure consistent input processing and prevent injection during multi-turn interactions.
- [Conversational Input Protocols](https://awesome-repositories.com/f/artificial-intelligence-ml/conversational-input-protocols.md) — Structures messages into standardized formats to ensure consistent and secure interactions. ([source](https://github.com/zai-org/ChatGLM3/blob/main/PROMPT.md))
- [Embedding Generators](https://awesome-repositories.com/f/artificial-intelligence-ml/embedding-generators.md) — Converts input text into numerical vector representations to enable semantic search, clustering, or similarity analysis tasks. ([source](https://github.com/zai-org/ChatGLM3/blob/main/openai_api_demo/openai_api_request.py))
- [Inference Benchmarking Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/inference-benchmarking-tools.md) — Provides utilities for measuring latency and throughput of model execution. ([source](https://github.com/zai-org/ChatGLM3/tree/main/tensorrt_llm_demo))
- [Model Inference and Serving](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving.md) — Supports scalable online inference services using dedicated backends for production traffic. ([source](https://github.com/zai-org/ChatGLM3/tree/main/tensorrt_llm_demo))
- [Natural Language Processing](https://awesome-repositories.com/f/artificial-intelligence-ml/natural-language-processing.md) — Performs text processing tasks such as summarization and interactive dialogue by leveraging pre-trained model engines. ([source](https://github.com/zai-org/ChatGLM3/blob/main/tensorrt_llm_demo/README.md))

### Networking & Communication

- [Model Parallelism Strategies](https://awesome-repositories.com/f/networking-communication/distributed-systems-p2p/distributed-computing/model-parallelism-techniques/model-parallelism-strategies.md) — Splits large model parameters across multiple graphics cards to overcome individual device memory limitations during inference and training.

### Operating Systems & Systems Programming

- [Inference Cache Management](https://awesome-repositories.com/f/operating-systems-systems-programming/kernel-core-internals/process-and-memory-management/memory-management/inference-cache-management.md) — Manages memory dynamically during inference to increase throughput by processing multiple concurrent requests within a shared memory space.

### Web Development

- [Interactive Model Interfaces](https://awesome-repositories.com/f/web-development/web-interfaces/interactive-model-interfaces.md) — Provides web-based graphical user interfaces for real-time model interaction and demonstration of capabilities. ([source](https://github.com/zai-org/ChatGLM3/blob/main/requirements.txt))

### Development Tools & Productivity

- [Dialogue Interaction Engines](https://awesome-repositories.com/f/development-tools-productivity/interactive-execution-interfaces/dialogue-interaction-engines.md) — Engages in dialogue, executes external tools, or runs code in a notebook environment to complete complex tasks through a unified interface. ([source](https://github.com/zai-org/ChatGLM3/blob/main/README_en.md))
- [Sandboxed Execution Environments](https://awesome-repositories.com/f/development-tools-productivity/sandboxed-execution-environments.md) — Performs interactive text-based dialogue, utilizes external tools, and runs code in a sandboxed environment to solve complex problems. ([source](https://github.com/zai-org/ChatGLM3#readme))

### DevOps & Infrastructure

- [Apple Silicon Deployment](https://awesome-repositories.com/f/devops-infrastructure/apple-silicon-deployment.md) — Utilizes specialized hardware backends on desktop devices to perform model inference using local system memory. ([source](https://github.com/zai-org/ChatGLM3/blob/main/DEPLOYMENT.md))
- [Apple Silicon Inference](https://awesome-repositories.com/f/devops-infrastructure/apple-silicon-deployment/apple-silicon-inference.md) — Utilizes Apple Silicon hardware via metal performance shaders to perform model inference on desktop operating systems. ([source](https://github.com/zai-org/ChatGLM3/blob/main/DEPLOYMENT_en.md))
- [Code Execution Sandboxes](https://awesome-repositories.com/f/devops-infrastructure/execution-environments/code-execution-runtimes/code-execution-sandboxes.md) — Runs dynamically generated code in an isolated environment to perform complex tasks while protecting the host system from unauthorized operations.
- [Workload Orchestration](https://awesome-repositories.com/f/devops-infrastructure/workload-orchestration.md) — Manages the distribution of inference workloads across diverse hardware processors. ([source](https://github.com/zai-org/ChatGLM3/blob/main/README_en.md))