What are the best Awesome Local Model Loading GitHub Repositories?

Mechanisms for importing model weights from local storage. **Distinguishing note:** Focuses on local file-based model loading. Explore 11 awesome GitHub repositories matching data & databases · Local Model Loading. Refine with filters or upvote what's useful. Top picks: zai-org/chatglm-6b, mlfoundations/open_clip, runanywhereai/runanywhere-sdks, mochidiffusion/mochidiffusion, nat/openplayground, serge-chat/serge, nsarrazin/serge, michael-a-kuykendall/shimmy, pytorch/serve, jaymody/picogpt.

Why is zai-org/chatglm-6b a recommended Local Model Loading GitHub Repositories repository?

Imports pre-trained model weights from local storage to perform inference without external hosting.

Why is mlfoundations/open_clip a recommended Local Model Loading GitHub Repositories repository?

Provides mechanisms to initialize model architectures using weights stored on a local disk.

Why is runanywhereai/runanywhere-sdks a recommended Local Model Loading GitHub Repositories repository?

Handles the downloading of model files from remote URLs and loading them into device memory.

Why is mochidiffusion/mochidiffusion a recommended Local Model Loading GitHub Repositories repository?

Loads machine learning weights by scanning local filesystem paths for compatible model files.

Why is nat/openplayground a recommended Local Model Loading GitHub Repositories repository?

Includes mechanisms for importing model weights from local file system storage for on-device inference.

Why is serge-chat/serge a recommended Local Model Loading GitHub Repositories repository?

Loads GGUF-format model files from a curated list of supported open-source families for local inference.

Why is nsarrazin/serge a recommended Local Model Loading GitHub Repositories repository?

Loads GGUF-format models from a curated list for local execution through a structured pipeline.

Why is michael-a-kuykendall/shimmy a recommended Local Model Loading GitHub Repositories repository?

Provides a pipeline for discovering and loading GGUF-format models for local serving.

Why is pytorch/serve a recommended Local Model Loading GitHub Repositories repository?

Deno X imports model files and handlers into the runtime environment using a specified directory.

Why is jaymody/picogpt a recommended Local Model Loading GitHub Repositories repository?

Maps saved parameter tensors from local storage directly into the active model structure.

11 repositorios

Awesome GitHub RepositoriesLocal Model Loading

Mechanisms for importing model weights from local storage.

Distinguishing note: Focuses on local file-based model loading.

Explore 11 awesome GitHub repositories matching data & databases · Local Model Loading. Refine with filters or upvote what's useful.

Encuentra los mejores repositorios con IA.Buscaremos los repositorios que mejor coincidan usando IA.

zai-org/chatglm-6b
zai-org/ChatGLM-6B
41,039Ver en GitHub
ChatGLM-6B is a generative AI inference engine designed for local execution of transformer-based language models. It provides a comprehensive runtime environment that allows users to load and run pre-trained neural network weights directly on their own hardware, ensuring data privacy and independence from external cloud services. The project distinguishes itself through a hardware-agnostic execution backend that supports deployment across diverse environments, including standard processors, Apple Silicon, and multi-GPU configurations. It incorporates advanced optimization techniques such as w
Imports pre-trained model weights from local storage to perform inference without external hosting.
Python
Ver en GitHub41,039
mlfoundations/open_clip
mlfoundations/open_clip
13,935Ver en GitHub
Open CLIP is an open source framework for training and deploying Contrastive Language-Image Pre-training models. It serves as a vision-language training framework and multimodal embedding engine that maps images and text into a shared vector space for similarity searches and zero-shot classification. The project provides a toolkit for distributed training of contrastive models and includes an image-to-text generative model for producing natural language descriptions. It supports custom text encoder integration and utilizes teacher-student model distillation to transfer knowledge from large pr
Provides mechanisms to initialize model architectures using weights stored on a local disk.
Pythoncomputer-visioncontrastive-lossdeep-learning
Ver en GitHub13,935
runanywhereai/runanywhere-sdks
RunanywhereAI/runanywhere-sdks
8,781Ver en GitHub
This project is an on-device AI SDK providing a framework for running large language models, vision models, and speech models locally. It serves as an orchestration layer for local LLM execution, ensuring data privacy and offline availability by utilizing hardware acceleration on the device. The SDK is distinguished by its comprehensive voice and multimodal capabilities, including a coordinated voice pipeline for activity detection, speech-to-text, and text-to-speech synthesis. It also provides a dedicated implementation kit for local retrieval-augmented generation and tools for processing co
Handles the downloading of model files from remote URLs and loading them into device memory.
C++androidapple-intelligencecpp
Ver en GitHub8,781
mochidiffusion/mochidiffusion
MochiDiffusion/MochiDiffusion
7,895Ver en GitHub
MochiDiffusion is a local client for Stable Diffusion that functions as an AI image generation studio. It provides a workspace for performing text-to-image, image-to-image, and inpainting tasks, enabling the production of high-resolution images offline using local hardware and neural engine acceleration. The project includes a local model manager for importing, organizing, and converting machine learning models into compatible formats for offline execution. It features a ControlNet integration tool to guide structural composition and spatial layout, alongside a dedicated image upscaler that u
Loads machine learning weights by scanning local filesystem paths for compatible model files.
Swiftaneappleapple-silicon
Ver en GitHub7,895
nat/openplayground
nat/openplayground
6,353Ver en GitHub
OpenPlayground is a web-based comparison playground and multi-provider client used to test and evaluate outputs from multiple large language models and local inference engines side-by-side. It serves as a local testing environment for routing prompts to various external APIs and on-device models through a single interface. The project enables concurrent request dispatching, allowing a single prompt to be sent to multiple models simultaneously for comparative analysis. It includes a parameter tuning interface for refining model behavior via generation settings and provides a system for detecti
Includes mechanisms for importing model weights from local file system storage for on-device inference.
TypeScript
Ver en GitHub6,353
serge-chat/serge
serge-chat/serge
5,725Ver en GitHub
Serge is a self-hosted web chat interface for running large language models locally using the llama.cpp inference engine. It loads GGUF-format model files directly on your own machine, removing the need for internet connectivity or external API keys, and streams responses to the browser in real time via WebSocket connections. The project is packaged for containerized deployment using Docker and Docker Compose, with a Traefik reverse proxy that handles HTTP and WebSocket routing along with automatic TLS certificate management. Ready-made Kubernetes manifests are also provided, enabling deploym
Loads GGUF-format model files from a curated list of supported open-source families for local inference.
Sveltealpacadockerfastapi
Ver en GitHub5,725
nsarrazin/serge
nsarrazin/serge
5,725Ver en GitHub
A web interface for chatting with Alpaca through llama.cpp. Fully dockerized, with an easy to use API.
Loads GGUF-format models from a curated list for local execution through a structured pipeline.
Svelte
Ver en GitHub5,725
michael-a-kuykendall/shimmy
Michael-A-Kuykendall/shimmy
5,428Ver en GitHub
Shimmy es un motor de inferencia y servidor de modelos de lenguaje grandes local que carga y sirve pesos en formato GGUF. Se distribuye como un runtime binario único escrito en Rust, proporcionando un entorno independiente para ejecutar modelos sin dependencias de runtime externas. El proyecto utiliza WebGPU para aceleración por hardware, permitiendo que los kernels de cómputo del modelo se ejecuten a través de diversos hardware gráficos mediante una interfaz estandarizada. Cuenta con un servidor local que implementa una capa de API compatible con OpenAI, permitiendo que las aplicaciones interactúen con modelos locales a través de endpoints REST estandarizados. La memoria y el rendimiento se gestionan mediante la compresión de caché de clave-valor cuantizada para reducir el uso de VRAM de la GPU y el escalado de embedding rotatorio para extender la ventana de contexto del modelo. El sistema también incluye descubrimiento automático de archivos de modelo para escanear y registrar pesos compatibles desde el almacenamiento local. El servidor se gestiona a través de una interfaz de línea de comandos dedicada para controlar operaciones y verificar la generación del modelo.
Provides a pipeline for discovering and loading GGUF-format models for local serving.
Rust
Ver en GitHub5,428
pytorch/serve
pytorch/serve
4,354Ver en GitHub
Este proyecto es un framework de servicio de modelos de PyTorch diseñado para desplegar y escalar modelos de machine learning en producción a través de endpoints de red escalables. Funciona como un servidor de inferencia de alto rendimiento, optimizador y gestor del ciclo de vida del modelo que maneja la carga de modelos, el procesamiento por lotes (batching) de solicitudes y la aceleración por hardware. El sistema se distingue por sus capacidades avanzadas de orquestación y optimización, como el encadenamiento de múltiples modelos en flujos de trabajo secuenciales mediante grafos de ejecución y el uso de procesamiento por lotes dinámico para mejorar el rendimiento y la latencia. Proporciona soporte especializado para IA generativa y modelos de lenguaje de gran tamaño (LLM) mediante procesamiento por lotes continuo y paralelismo de tensores. Las áreas de capacidad incluyen la gestión de recursos de GPU en hardware diverso como NVIDIA, AMD y Apple Silicon, así como una gestión integral del ciclo de vida del modelo para registro, versionado y escalado de trabajadores. También integra herramientas de observabilidad para rastrear la salud del sistema y el rendimiento del modelo mediante métricas compatibles con Prometheus. El servidor se gestiona a través de una interfaz de línea de comandos utilizada para el control del ciclo de vida y la configuración de parámetros de tiempo de ejecución.
Deno X imports model files and handlers into the runtime environment using a specified directory.
Java
Ver en GitHub4,354
jaymody/picogpt
jaymody/picoGPT
3,449Ver en GitHub
picoGPT is a lightweight, low-level runtime environment and inference engine designed to load pre-trained checkpoints and execute generative transformer model inference. It provides a minimal implementation of the generative pre-trained transformer architecture to facilitate local language model execution. The project includes a C++ machine learning library for converting model parameters and executing greedy token generation without heavy external dependencies. It handles remote asset synchronization by downloading pre-trained weights, hyperparameters, and vocabulary files from remote server
Maps saved parameter tensors from local storage directly into the active model structure.
Pythondeep-learninggptgpt-2
Ver en GitHub3,449
protectai/llm-guard
protectai/llm-guard
2,561Ver en GitHub
LLM Guard is a security firewall and guardrail framework designed to scan and sanitize inputs and outputs for large language models. It functions as a proxy gateway and security layer to block prompt injections, toxicity, and sensitive data leakage while ensuring that model interactions remain compliant with organizational policies. The system distinguishes itself through a modular scanner pipeline that utilizes local model orchestration to eliminate external network dependencies. It supports real-time security filtering via streaming chunk analysis and implements a fail-fast execution model
Supports loading model weights from local directories to eliminate the need for downloading assets during startup.
Pythonadversarial-machine-learningchatgptlarge-language-models
Ver en GitHub2,561

Awesome Local Model Loading GitHub Repositories

zai-org/ChatGLM-6B

mlfoundations/open_clip

RunanywhereAI/runanywhere-sdks

MochiDiffusion/MochiDiffusion

nat/openplayground

serge-chat/serge

nsarrazin/serge

Michael-A-Kuykendall/shimmy

pytorch/serve

jaymody/picoGPT

protectai/llm-guard

Explorar subetiquetas