# mlc-ai/web-llm

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/mlc-ai-web-llm).**

17,376 stars · 1,198 forks · TypeScript · apache-2.0

## Links

- GitHub: https://github.com/mlc-ai/web-llm
- Homepage: https://webllm.mlc.ai
- awesome-repositories: https://awesome-repositories.com/repository/mlc-ai-web-llm.md

## Topics

`chatgpt` `deep-learning` `language-model` `llm` `tvm` `webgpu` `webml`

## Description

WebLLM is a library for executing large language models directly within web browsers. It provides a framework for building conversational artificial intelligence applications that perform inference locally, ensuring user data privacy by eliminating the need for external server dependencies.

The project distinguishes itself by leveraging browser-native graphics APIs to perform intensive machine learning computations on the client side. It maintains application responsiveness by offloading heavy model tasks to background threads and ensures continuous operation through service workers that function independently of the active browser tab lifecycle. Additionally, it supports persistent storage of model weights to avoid redundant downloads across sessions and allows for the integration of custom model architectures.

The library includes a comprehensive suite of tools for managing the model lifecycle, including initialization, weight loading, and memory management. It offers a standardized interface that mimics common service protocols, allowing developers to integrate local inference into existing workflows. The system also provides fine-grained control over output through logit bias configuration and includes utilities for inspecting hardware capabilities to verify environment compatibility.

## Tags

### Artificial Intelligence & ML

- [Browser-based Inference Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving/engines-runtimes-servers/browser-based-inference-engines.md) — Provides a browser-based inference engine for running large language models locally with hardware acceleration.
- [Local Model Execution](https://awesome-repositories.com/f/artificial-intelligence-ml/local-model-execution.md) — Executes large language models directly on local hardware to ensure user data privacy and offline capability. ([source](https://webllm.mlc.ai/docs/))
- [AI Runtimes](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-runtimes.md) — Provides a high-performance AI runtime that leverages WebGPU for client-side machine learning.
- [Hardware-Accelerated Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/hardware-accelerated-inference.md) — Leverages WebGPU to execute tensor operations on the graphics processor for high-performance local inference.
- [Chat Completion Services](https://awesome-repositories.com/f/artificial-intelligence-ml/chat-completion-services.md) — Provides standardized interfaces for generating conversational text responses using local language models. ([source](https://webllm.mlc.ai/docs/user/api_reference.html))
- [LLM Application Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/llm-application-frameworks.md) — Offers a framework for building conversational AI applications that run entirely within the browser.
- [OpenAI-Compatible APIs](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/model-integration-serving/model-integration-interfaces/ai-integration-apis/openai-compatible-apis.md) — Implements standard HTTP endpoints that mimic the OpenAI API specification for local model interoperability. ([source](https://webllm.mlc.ai/docs/))
- [Local Model Lifecycle Managers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-management/local-model-lifecycle-managers.md) — Provides utilities for initializing, managing, and clearing large language models from browser memory. ([source](https://webllm.mlc.ai/docs/user/api_reference.html))
- [Conversation State Management](https://awesome-repositories.com/f/artificial-intelligence-ml/conversation-state-management.md) — Manages conversational context and interaction history for multi-turn AI chat sessions. ([source](https://webllm.mlc.ai/docs/user/api_reference.html))
- [Custom Model Integrations](https://awesome-repositories.com/f/artificial-intelligence-ml/custom-model-integrations.md) — Supports the integration and execution of custom, externally compiled language model architectures within the browser. ([source](https://webllm.mlc.ai/docs/developer/add_models.html))
- [Local Model Integrations](https://awesome-repositories.com/f/artificial-intelligence-ml/local-model-integrations.md) — Provides tools for integrating and testing local language model services within web projects.
- [Model Weight Management](https://awesome-repositories.com/f/artificial-intelligence-ml/model-weight-management.md) — Handles the downloading, storage, and initialization of model weights for local inference. ([source](https://webllm.mlc.ai/docs/user/basic_usage.html))

### Web Development

- [Background Processing Workers](https://awesome-repositories.com/f/web-development/background-processing-workers.md) — Offloads intensive model computations to background threads to maintain main application responsiveness. ([source](https://webllm.mlc.ai/docs/user/advanced_usage.html))
- [WebAssembly Runtimes](https://awesome-repositories.com/f/web-development/webassembly-runtimes.md) — Executes high-performance model logic and tensor operations using WebAssembly within the browser.
- [Persistent Background Workers](https://awesome-repositories.com/f/web-development/background-processing-workers/persistent-background-workers.md) — Maintains model state and data availability across browser sessions using background service workers. ([source](https://webllm.mlc.ai/docs/user/advanced_usage.html))
- [Browser Extensions](https://awesome-repositories.com/f/web-development/browser-integration-utilities/browser-extension-development/browser-extensions.md) — Facilitates the development of browser extensions that provide persistent, local AI capabilities. ([source](https://webllm.mlc.ai/docs/user/advanced_usage.html))
- [Response Streaming Interfaces](https://awesome-repositories.com/f/web-development/response-streaming-interfaces.md) — Supports incremental text delivery to enable real-time response rendering in web interfaces. ([source](https://webllm.mlc.ai/docs/user/basic_usage.html))

### Data & Databases

- [IndexedDB Stores](https://awesome-repositories.com/f/data-databases/database-management-systems/database-systems-management/database-systems/indexeddb-stores.md) — Utilizes IndexedDB for persistent storage of model weights to avoid redundant downloads across sessions.

### Software Engineering & Architecture

- [Background Thread Dispatchers](https://awesome-repositories.com/f/software-engineering-architecture/background-thread-dispatchers.md) — Offloads intensive model computations to background threads to ensure main UI responsiveness.
