The visitor wants a desktop application that provides a graphical interface for running and interacting with large language models locally on their own hardware.

nomic-ai/gpt4all is the closest match — This is a cross-platform desktop application that provides a complete graphical interface for managing and running local LLMs, featuring offline inference, GPU acceleration, chat history, and an OpenAI-compatible API.. Other strong matches: josstorer/rwkv-runner, bin-huang/chatbox, chatboxai/chatbox, janhq/jan.

Why does nomic-ai/gpt4all match “a desktop client for running local LLMs”?

This is a cross-platform desktop application that provides a complete graphical interface for managing and running local LLMs, featuring offline inference, GPU acceleration, chat history, and an OpenAI-compatible API.

Why does josstorer/rwkv-runner match “a desktop client for running local LLMs”?

This application provides a graphical interface for running local models with support for GPU acceleration, model management, and an OpenAI-compatible API, making it a functional desktop client for local LLM interaction.

Why does bin-huang/chatbox match “a desktop client for running local LLMs”?

This is a cross-platform desktop client that provides a graphical interface for interacting with LLMs, though it primarily functions as a frontend for external APIs and local model servers rather than performing the inference itself.

Why does chatboxai/chatbox match “a desktop client for running local LLMs”?

Chatbox is a cross-platform desktop application that provides a unified graphical interface for managing local LLMs and remote AI providers, featuring offline storage, chat history management, and support for local model backends like Ollama.

Why does janhq/jan match “a desktop client for running local LLMs”?

Jan is a cross-platform desktop application that provides a graphical interface for running local LLMs, managing model weights, and exposing an OpenAI-compatible API, fulfilling all the requirements for a local AI client.

Local LLM Chat Clients

Desktop applications that enable private, offline interaction with large language models running on local hardware.

Find the best repos with AI.We'll search the best matching repositories with AI.

nomic-ai/gpt4all
nomic-ai/gpt4all
77,375View on GitHub
GPT4All is a cross-platform runtime environment designed to execute large language models directly on local consumer hardware. By leveraging an optimized C++ inference backend, it enables private, offline AI interactions without requiring an internet connection or external cloud services. The project provides a comprehensive ecosystem for managing the entire model lifecycle, including discovery, downloading, and configuration of local weights. What distinguishes the platform is its integrated retrieval-augmented generation engine, which allows users to index local documents into semantic vector spaces. This capability enables context-aware chat sessions where the model can reference private files, notes, and spreadsheets to provide grounded, relevant responses. The system also features a local HTTP server that exposes an OpenAI-compatible API, allowing developers to integrate these private, self-hosted models into existing applications and workflows. Beyond its core inference and retrieval capabilities, the project includes a graphical desktop interface for end-user interaction and a Python software development kit for programmatic access. These tools support advanced configuration of model parameters, performance monitoring, and the management of local embedding pipelines for custom semantic search tasks. The software is distributed as a unified application package, with documentation available to guide users through installation and local environment setup.
This is a cross-platform desktop application that provides a complete graphical interface for managing and running local LLMs, featuring offline inference, GPU acceleration, chat history, and an OpenAI-compatible API.
C++Model ManagementOpenAI-CompatibleOpenAI-Compatible APIs
View on GitHub77,375
josstorer/rwkv-runner
josStorer/RWKV-Runner
6,219View on GitHub
This application provides a graphical interface for running local models with support for GPU acceleration, model management, and an OpenAI-compatible API, making it a functional desktop client for local LLM interaction.
TypeScriptOpenAI-CompatibleLocal LLM ToolsModel Downloaders
View on GitHub6,219
bin-huang/chatbox
Bin-Huang/chatbox
40,509View on GitHub
Chatbox is a desktop client and multi-provider chat interface for interacting with large language model APIs across various service providers and local installations. It functions as a local-first AI conversation manager that stores chat history and user settings directly on the device. The application provides a unified interface to connect multiple AI backends for text generation and image creation. It includes a specialized rendering system for AI responses that supports technical documentation through syntax highlighting, Markdown, and Latex mathematical notation. The platform manages prompt engineering workflows through a searchable library of reusable templates and supports real-time streaming of AI responses. It also includes capabilities for local data privacy, including the local storage of API credentials and conversation histories.
This is a cross-platform desktop client that provides a graphical interface for interacting with LLMs, though it primarily functions as a frontend for external APIs and local model servers rather than performing the inference itself.
TypeScriptDesktop AI ClientsDesktop Clients
View on GitHub40,509
chatboxai/chatbox
chatboxai/chatbox
40,499View on GitHub
Chatbox is a cross-platform desktop application that provides a unified interface for interacting with a wide range of artificial intelligence models. It functions as a model-agnostic client, allowing users to connect to various third-party AI providers or execute open-source models directly on their own hardware. By centralizing these diverse services into a single workspace, the application enables users to manage multiple chat sessions, adjust model parameters, and switch between different AI backends with ease. The project distinguishes itself through a local-first architecture that prioritizes data privacy and user control. All conversation logs, settings, and uploaded documents are stored directly on the local device, ensuring that sensitive information remains private and accessible offline. Furthermore, the application features a built-in vector-based knowledge retrieval system that parses and indexes local files, allowing the AI to reference private documents during chat sessions to provide context-aware responses. Beyond its core chat capabilities, the application includes tools for productivity and workflow management. It supports real-time web search integration, image generation, and the ability to render professional content like formulas and charts. Users can navigate the interface efficiently using global keyboard shortcuts and automate the configuration of external services through deep-link injection, which simplifies the process of importing provider settings and credentials. The application is distributed as a native desktop shell that wraps web-based interface components to provide system-level window management. It is designed to be installed and run on standard desktop operating systems.
Chatbox is a cross-platform desktop application that provides a unified graphical interface for managing local LLMs and remote AI providers, featuring offline storage, chat history management, and support for local model backends like Ollama.
TypeScriptAI Orchestration PlatformsLocal Model RuntimesModel Provider Integrations
View on GitHub40,499
janhq/jan
janhq/jan
43,043View on GitHub
Jan is a desktop application that functions as a local artificial intelligence model runtime and an open-standard API server. It enables the execution of large language models directly on local hardware, ensuring that data remains private and accessible offline while providing a unified interface for managing model weights and inference runtimes. The platform distinguishes itself by offering a modular inference backend that allows users to swap execution engines based on hardware compatibility and performance needs. It acts as a cross-platform orchestrator, providing the ability to switch between local model files and remote cloud-based AI providers through a single interface. By exposing these capabilities via an open-standard server layer, the application supports the integration of local AI into external software and development tools. Beyond its core runtime capabilities, the software provides an environment for configuring agentic workflows and autonomous task automation. It includes tools for managing server behaviors, such as network access, authentication, and remote tool execution, while maintaining state persistence through a local file-based database. The application is distributed as a cross-platform container to ensure consistent access to local files and system resources across different operating systems.
Jan is a cross-platform desktop application that provides a graphical interface for running local LLMs, managing model weights, and exposing an OpenAI-compatible API, fulfilling all the requirements for a local AI client.
TypeScriptLocal Model RuntimesDesktop AI RuntimesOpenAI-Compatible Servers
View on GitHub43,043
cherryhq/cherry-studio
CherryHQ/cherry-studio
47,419View on GitHub
Cherry Studio is a cross-platform desktop application that serves as a centralized workspace for managing and interacting with multiple artificial intelligence models. It functions as a local-first orchestrator, prioritizing user privacy by storing all conversation history and knowledge bases directly on your device. By providing a unified interface for both cloud-based and local AI services, the platform simplifies API key management and allows for consistent model interaction across different operating systems. The application distinguishes itself through a robust retrieval-augmented generation pipeline that grounds model responses in your own local documents and web content. It features an extensible agent framework that connects language models to external tools and persistent memory, enabling the development of autonomous agents for complex, multi-step workflows. Users can further refine their experience by configuring custom AI assistants, comparing model performance side-by-side, and utilizing execution trace visualization to monitor token usage and interaction flows. Beyond core orchestration, the platform includes a suite of productivity tools such as global keyboard shortcuts for immediate AI access, real-time web search integration, and automated translation capabilities. The interface is highly customizable, allowing users to adjust layouts, visual styles, and input settings to suit their specific workflows. The software is distributed as a native desktop client, ensuring system-level integration and offline availability for all managed data and AI tasks.
Cherry Studio is a cross-platform desktop application that provides a unified graphical interface for managing and interacting with both local and cloud-based LLMs, featuring robust support for chat history, RAG, and model orchestration.
TypeScriptModel OrchestratorsRetrieval Augmented Generation PipelinesAutonomous Agent Frameworks
View on GitHub47,419
meta-llama/llama
meta-llama/llama
59,464View on GitHub
Llama is a computational framework and runtime environment designed for executing transformer-based neural networks locally. It functions as a generative AI inference engine, enabling the processing of input sequences through pre-trained model weights to produce text completions and structured data outputs directly on your own hardware. The system distinguishes itself through specialized memory and computation management techniques, including memory-mapped weight loading and quantization-aware inference, which allow for efficient execution on standard consumer hardware. It utilizes a stateless request execution model and a tensor-based computation graph to handle token-based sequence processing, ensuring that each inference task operates independently without reliance on persistent server state. This project provides the necessary tools for local large language model deployment, including a command-line interface for retrieving authorized model checkpoints and configuration files. It supports offline research and the integration of text generation capabilities into custom software applications, allowing users to manage model parameters such as sequence length and batch size to meet specific performance requirements.
This is a command-line inference engine and runtime library for executing transformer models, rather than a desktop application with a graphical user interface for interacting with LLMs.
PythonModel ManagementLocal Inference EnginesLocal Inference Runners
View on GitHub59,464
imartinez/privategpt
imartinez/privateGPT
57,281View on GitHub
PrivateGPT is a private AI document assistant and local knowledge base manager designed for querying private files and documents using retrieval-augmented generation. It functions as a local language model application and API gateway, allowing users to obtain cited answers from unstructured data without sending information to external servers. The system differentiates itself by acting as a tool integrator that connects language models to external functions, including web search, tabular data analysis, and custom action extensions. It provides a standardized API layer that allows local inference servers to communicate with third-party applications and execute multi-step agentic workflows. The platform covers a broad capability surface including document-to-embedding pipelines, vector database indexing, and the processing of tabular data from CSV files. It also supports asynchronous request handling, response streaming, and API interaction debugging for troubleshooting model exchanges.
PrivateGPT is a local LLM-powered document assistant that provides a graphical interface for querying private files, though its primary focus is on RAG and knowledge management rather than general-purpose model interaction.
PythonOpenAI-Compatible APIsLocal API Servers
View on GitHub57,281
jaymody/picogpt
jaymody/picoGPT
3,449View on GitHub
picoGPT is a lightweight, low-level runtime environment and inference engine designed to load pre-trained checkpoints and execute generative transformer model inference. It provides a minimal implementation of the generative pre-trained transformer architecture to facilitate local language model execution. The project includes a C++ machine learning library for converting model parameters and executing greedy token generation without heavy external dependencies. It handles remote asset synchronization by downloading pre-trained weights, hyperparameters, and vocabulary files from remote servers for local use. The system covers model management through weight-tensor conversion and pre-trained weight loading. It supports text sequence generation using a transformer-based language modeling approach to predict tokens based on provided prompts.
This is a low-level inference engine and runtime library for executing transformer models, rather than a desktop application with a graphical user interface for interacting with LLMs.
PythonLocal LLM ExecutionModel Weight ManagementModel Downloaders
View on GitHub3,449
oobabooga/text-generation-webui
oobabooga/text-generation-webui
47,323View on GitHub
This project is a comprehensive platform for hosting and interacting with large language models directly on local hardware. It provides a web-based graphical interface that allows users to manage model loading, configure generation parameters, and execute text or chat interactions entirely offline. By running models locally, the software ensures complete data privacy and eliminates reliance on external cloud services for generative tasks. Beyond basic inference, the platform functions as a versatile workbench for generative AI development. It includes an integrated pipeline for fine-tuning models on local compute resources, enabling users to adapt pre-trained models to specialized datasets or niche requirements. The system also exposes its internal capabilities through a standardized network interface, allowing developers to integrate local text generation into external software applications and custom workflows. The environment is designed for portability and consistent performance across diverse host operating systems. It supports multiple deployment methods, including containerized environments and automated installation scripts, which manage complex machine learning dependencies and hardware acceleration settings. Users can further customize the application behavior at startup through command-line arguments to suit specific computing environments.
This project provides a robust web-based graphical interface for local LLM interaction, model management, and offline inference, serving as a comprehensive workbench for generative AI tasks.
PythonLocal Inference Engines
View on GitHub47,323
ggerganov/llama.cpp
ggerganov/llama.cpp
116,912View on GitHub
llama.cpp is a high-performance C++ inference engine and runtime for executing large language models locally across various hardware architectures. It provides the core components for local model execution, including a dedicated model quantizer for compressing weights into the GGUF format and a system for generating text embeddings for semantic search. The project distinguishes itself through specialized memory and execution optimizations, such as block-wise weight quantization to reduce memory footprints and memory-mapped model loading. It supports structured text generation by using formal grammars to force model outputs to adhere to specific JSON schemas or patterns, and it implements speculative decoding to increase inference speed. Broad capabilities include hardware acceleration for GPUs, tools for converting models between different data formats, and utilities for measuring model quality via perplexity and divergence metrics. The engine can be wrapped in an HTTP server that provides an OpenAI-compatible API for integration with external tools.
This is a high-performance inference engine and backend library rather than a desktop application with a graphical user interface for end-user interaction.
C++OpenAI-Compatible APIsLocal Inference EnginesLocal Inference Engines
View on GitHub116,912
chatgptnextweb/nextchat
ChatGPTNextWeb/NextChat
88,256View on GitHub
NextChat is a self-hosted web application that provides a unified interface for interacting with multiple large language models. It functions as a conversational platform where users can manage and switch between diverse AI providers through configurable API backends, maintaining full control over their data and infrastructure. The platform features a persistent session layer designed to handle long-running dialogues by managing message history and context. It distinguishes itself through a structured prompt engineering environment that allows for the development and application of templates to refine model inputs. To ensure consistent performance during extended interactions, the application includes automated context window compression and dynamic prompt injection, which adjust historical message arrays to fit within model token limits. The software supports secure deployment via containerization, utilizing server-side proxying to manage sensitive API keys and authentication headers. It also incorporates local browser storage for low-latency access and offers options for synchronizing chat records across multiple sessions and devices. The application is configured through environment variables, allowing for flexible integration into private hosting environments.
This application provides a cross-platform desktop interface for interacting with local and remote LLMs, though it functions primarily as a web-based UI that relies on external API backends or local model runners like Ollama rather than performing inference itself.
TypeScriptPersistent Chat Histories
View on GitHub88,256
mlc-ai/mlc-llm
mlc-ai/mlc-llm
22,057View on GitHub
MLC LLM is a machine learning compiler and inference engine designed to execute large language models locally across diverse hardware platforms, including desktop, mobile, and web environments. By utilizing machine learning compilation, the project transforms high-level model definitions into specialized, hardware-specific binary libraries. This process optimizes model weights and generates compute kernels tailored to the unique memory and processing characteristics of target graphics and mobile hardware. The engine distinguishes itself by providing a unified runtime abstraction that enables native execution on consumer hardware while maintaining compatibility with standard development workflows. It includes a local server architecture that exposes inference endpoints compatible with common chat completion patterns, allowing developers to integrate private, offline language models into external applications. The toolchain supports the entire lifecycle of model deployment, from the conversion and quantization of weights to the generation of standalone binary libraries. These capabilities ensure that models run efficiently with minimal runtime dependencies, regardless of the underlying hardware backend. The project provides both a command-line interface for direct interaction and programmatic interfaces for embedding model execution into custom application logic.
This project is a high-performance inference engine and compiler for running models locally, but it functions as a backend library and server rather than providing the user-facing desktop application with a graphical interface that you are looking for.
PythonOpenAI-Compatible APIsLocal API ServersLocal Inference Engines
View on GitHub22,057
fauxpilot/fauxpilot
fauxpilot/fauxpilot
14,732View on GitHub
Fauxpilot is a self-hosted AI coding assistant and local inference server. It functions as a proxy and API gateway that redirects traffic from IDE plugins to a local large language model, allowing for AI-assisted programming without external cloud dependencies. The project provides a specialized API emulation layer that mimics coding assistant protocols and a standardized OpenAI-compatible interface. This enables supported code editors to use local models for completions and suggestions by overriding default proxy URLs. The system includes capabilities for downloading and deploying local models, as well as a format-conversion pipeline to transform model files into optimized versions for specific inference engines. A model-agnostic backend allows for switching between different inference engines while maintaining the same API interfaces.
This project is a backend inference server and API proxy designed to integrate with IDE plugins rather than a desktop application with a graphical chat interface for general LLM interaction.
PythonOpenAI-CompatibleOpenAI-Compatible APIs
View on GitHub14,732

Local LLM Chat Clients

nomic-ai/gpt4all

josStorer/RWKV-Runner

Bin-Huang/chatbox

chatboxai/chatbox

janhq/jan

CherryHQ/cherry-studio

meta-llama/llama

imartinez/privateGPT

jaymody/picoGPT

oobabooga/text-generation-webui

ggerganov/llama.cpp

ChatGPTNextWeb/NextChat

mlc-ai/mlc-llm

fauxpilot/fauxpilot