What are the best open-source GitHub repositories for an open source model for local inference?

deepseek-ai/deepseek-v3 is the closest match — DeepSeek-V3 is an openly released large language model with pre-trained weights, native 8-bit quantization support, and thorough deployment documentation, squarely matching your interest in open-source LLMs that can be run locally and fine-tuned.. Other strong matches: databrickslabs/dolly, facebookresearch/codellama, meta-llama/codellama, ymcui/chinese-llama-alpaca-2.

Why does deepseek-ai/deepseek-v3 match “an open source model for local inference”?

DeepSeek-V3 is an openly released large language model with pre-trained weights, native 8-bit quantization support, and thorough deployment documentation, squarely matching your interest in open-source LLMs that can be run locally and fine-tuned.

Why does databrickslabs/dolly match “an open source model for local inference”?

Dolly is an instruction-tuned, open-source large language model with permissive licensing and accessible pre-trained weights, directly matching your need for a model you can run locally or fine-tune, though it offers a single parameter size rather than multiple variants.

Why does facebookresearch/codellama match “an open source model for local inference”?

Code Llama is a large language model family specialized for code tasks, built on Llama 2 with accessible weights, instruction-tuned variants, multiple parameter sizes, and quantization support, fitting the search for open-source LLMs—though its code-specific focus may be narrower than general-purpo…

Why does meta-llama/codellama match “an open source model for local inference”?

CodeLlama is a family of open-source large language models focused on code generation, with pre-trained weights available in multiple sizes and instruction-tuned variants, so you can run it locally or fine-tune it — though its code‑specific focus means you should look elsewhere if you need a genera…

Why does ymcui/chinese-llama-alpaca-2 match “an open source model for local inference”?

This repository provides an instruction-tuned Chinese large language model based on LLaMA-2 with accessible pre-trained weights, support for fine-tuning, quantization, and long-context processing, making it a fitting open-source LLM for local deployment and customization.

large language model

We curate open-source GitHub repositories matching “open source llm models”. Results are ranked by relevance to your query — pick filters below to narrow, or refine with AI.

Find the best repos with AI.We'll search the best matching repositories with AI.

deepseek-ai/deepseek-v3
deepseek-ai/DeepSeek-V3
103,753View on GitHub
DeepSeek-V3 is a large language model that provides comprehensive resources for model utilization, including technical specifications, pre-trained weights, and evaluation benchmarks. The project details the core transformer architecture, including parameter counts and multi-token prediction modules, while supporting native 8-bit floating-point quantization. The repository offers extensive support for local and distributed inference through integration with multiple frameworks and engines. It includes documentation for deploying the model across various hardware configurations, such as GPUs an
DeepSeek-V3 is an openly released large language model with pre-trained weights, native 8-bit quantization support, and thorough deployment documentation, squarely matching your interest in open-source LLMs that can be run locally and fine-tuned.
PythonModel Weights
View on GitHub103,753
databrickslabs/dolly
databrickslabs/dolly
10,795View on GitHub
Dolly is an instruction-tuned large language model designed to follow complex natural language directions. It operates as a causal language model that predicts the next token in a sequence to generate coherent conversational responses and perform tasks such as brainstorming, classification, and question answering. The project focuses on the development of models using open datasets suitable for commercial application. It enables the creation of instruction-following models by utilizing curated collections of human-generated instruction-response pairs. The repository provides capabilities for
Dolly is an instruction-tuned, open-source large language model with permissive licensing and accessible pre-trained weights, directly matching your need for a model you can run locally or fine-tune, though it offers a single parameter size rather than multiple variants.
PythonInstruction TuningInstruction-Tuned Language Models
View on GitHub10,795
facebookresearch/codellama
facebookresearch/codellama
16,307View on GitHub
Code Llama is a large language model based on Llama 2 trained specifically for programming tasks and software development. It provides specialized model types optimized for general code generation, instruction following, and context-aware infilling. The project includes an instruction-tuned programming model for executing technical tasks via natural language prompts and a code infilling model that predicts missing sections based on surrounding source context. A large context code model is also provided to analyze extensive blocks of source code for improved coherence. The system covers capab
Code Llama is a large language model family specialized for code tasks, built on Llama 2 with accessible weights, instruction-tuned variants, multiple parameter sizes, and quantization support, fitting the search for open-source LLMs—though its code-specific focus may be narrower than general-purpose models.
PythonInstruction Fine-tuningInstruction-Tuned Language Models
View on GitHub16,307
meta-llama/codellama
meta-llama/codellama
16,307View on GitHub
CodeLlama is a family of large language models derived from the Llama 2 architecture and specialized for producing, completing, and refactoring source code across multiple programming languages. It functions as a code generation model capable of synthesizing source code from natural language descriptions. The project includes specific model variants designed for different programming tasks. This includes instruction-tuned models trained to follow complex natural language directions and code infilling models that predict and insert missing code segments into existing files by analyzing surroun
CodeLlama is a family of open-source large language models focused on code generation, with pre-trained weights available in multiple sizes and instruction-tuned variants, so you can run it locally or fine-tune it — though its code‑specific focus means you should look elsewhere if you need a general‑purpose model.
PythonInstruction Fine-tuningInstruction-Tuned Language Models
View on GitHub16,307
ymcui/chinese-llama-alpaca-2
ymcui/Chinese-LLaMA-Alpaca-2
7,136View on GitHub
This project provides a Chinese large language model based on the LLaMA architecture. It is an instruction-tuned model optimized for natural language processing and multi-turn conversations in Chinese. The system includes a framework for parameter-efficient fine-tuning using low-rank adaptation and quantization to reduce memory requirements. It also implements retrieval augmented generation for local document question answering and supports long-context processing for sequences up to 64K tokens. The project covers a broad set of capabilities including supervised instruction tuning, reinforce
This repository provides an instruction-tuned Chinese large language model based on LLaMA-2 with accessible pre-trained weights, support for fine-tuning, quantization, and long-context processing, making it a fitting open-source LLM for local deployment and customization.
PythonInstruction Fine-tuningInstruction TuningInstruction-Tuned Language Models
View on GitHub7,136
stability-ai/stablelm
Stability-AI/StableLM
15,699View on GitHub
StableLM is a pre-trained transformer-based large language model designed for natural language generation and zero-shot inference. It functions as a causal language model that predicts the next token in a sequence to produce human-like text for conversational and creative writing tasks. The model is built as a fine-tunable base, allowing the adaptation of pre-trained weights to specific tasks or styles through custom dataset training and weight regularization. It utilizes rotary positional embeddings and flash-attention to optimize memory usage and processing efficiency during deployment on G
StableLM is a pre-trained transformer-based large language model with open weights that supports fine-tuning, multiple parameter sizes, and is designed for local deployment, making it a solid fit for an open-source LLM that can be run or adapted locally.
Jupyter NotebookModel CheckpointsPre-trained Models
View on GitHub15,699
qwenlm/qwen3
QwenLM/Qwen3
27,324View on GitHub
Qwen3 is a transformer-based large language model designed as a generative AI foundation for understanding, reasoning, and generating human language. It functions as a comprehensive ecosystem for model training, fine-tuning, and production-ready inference, providing the underlying architecture and weights necessary to build diverse artificial intelligence applications. The project distinguishes itself through extensive support for model quantization and distributed inference, enabling efficient execution across a wide range of hardware from consumer-grade devices to scalable cloud infrastruct
Qwen3 is an open-source transformer large language model family with publicly available pre-trained weights, multiple parameter sizes, instruction-tuned variants, quantization support, and full compatibility with Hugging Face Transformers, directly meeting the need for locally runnable and fine-tunable models with community benchmarks.
PythonModel Quantization
View on GitHub27,324
meta-llama/llama
meta-llama/llama
59,464View on GitHub
Llama is a computational framework and runtime environment designed for executing transformer-based neural networks locally. It functions as a generative AI inference engine, enabling the processing of input sequences through pre-trained model weights to produce text completions and structured data outputs directly on your own hardware. The system distinguishes itself through specialized memory and computation management techniques, including memory-mapped weight loading and quantization-aware inference, which allow for efficient execution on standard consumer hardware. It utilizes a stateles
Meta's Llama repository is the official codebase for the Llama family of large language models, providing pre-trained weights (available separately), support for multiple sizes from 7B to 70B, instruction-tuned versions, quantization for local inference, and integration with Hugging Face Transformers, making it a flagship open-source LLM that directly matches your need for accessible weights and fine-tuning capability.
PythonInference EnginesLarge Language Model RuntimesLocal Inference Engines
View on GitHub59,464
01-ai/yi
01-ai/Yi
7,822View on GitHub
Yi is a bilingual language model and foundation model designed for natural language processing, reasoning, and reading comprehension in both English and Chinese. It is built as a transformer-based architecture capable of general purpose text generation and conversational tasks. The model is distinguished by its ability to function as a long context system, processing and analyzing extended input sequences up to 200k tokens. It also supports quantized versions that use low-bit precision to reduce memory footprints, enabling execution on consumer-grade hardware. The project covers a broad rang
Yi is a bilingual open-source large language model with released weights, long-context support, and quantization for local execution, fitting the requirement for an accessible, locally runnable LLM.
Jupyter NotebookModel QuantizationPrecision Quantization
View on GitHub7,822
haotian-liu/llava
haotian-liu/LLaVA
24,465View on GitHub
LLaVA is a multimodal large language model architecture designed to process and interpret both image and text inputs to generate natural language responses. It functions as a research-oriented platform for visual instruction tuning, providing a framework to align language models with human intent through training on diverse datasets of paired images and text queries. The system distinguishes itself through a specialized vision-language training pipeline that connects visual data to language models using projection layers and instruction-based fine-tuning. It supports distributed inference by
LLaVA is an open-source multimodal large language model with accessible pre-trained weights that you can run locally, fine-tune, and use via Hugging Face Transformers, though the base model license (LLaMA) is not fully permissive.
PythonInstruction Fine-tuning
View on GitHub24,465
allenai/olmo
allenai/OLMo
6,313View on GitHub
OLMo is a fully open-source large language model family from AI2 with publicly released pre-trained weights under the Apache 2.0 license, supporting Hugging Face Transformers, multiple parameter sizes, instruction-tuned variants, and quantization tools, making it exactly the kind of open-source LLM this search targets.
PythonLarge Language Model Training Frameworks8-Bit Inference Quantizers8-Bit Load-Time Quantizers
View on GitHub6,313
qwenlm/qwen2.5
QwenLM/Qwen2.5
27,307View on GitHub
Qwen2.5 is a suite of large language model foundation models designed for natural language generation, code production, and complex mathematical reasoning. The project encompasses a multilingual language model capable of processing dozens of languages and a specialized code generation model for technical problem solving and debugging. The framework is distinguished by its long context capabilities, enabling the analysis of massive inputs ranging from 256K up to 1 million tokens. It further functions as an agentic framework, utilizing standardized templates and parsers to execute autonomous wo
Qwen2.5 is a comprehensive family of open-source LLMs released under a permissive license (Apache 2.0), with pre-trained weights in multiple sizes from 0.5B to 72B, full Hugging Face Transformers support, instruction-tuned variants, and quantization support—exactly what you need for local deployment and fine-tuning.
PythonFoundation ModelsLong-Context ModelsAdvanced Reasoning Models
View on GitHub27,307
google/gemma_pytorch
google/gemma_pytorch
5,697View on GitHub
The official PyTorch implementation of Google's Gemma models
This repository provides the official PyTorch implementation of Google's Gemma open-weight LLMs, giving you direct access to pretrained weights, multiple parameter sizes, instruction-tuned variants, and tools for fine-tuning and local inference — exactly the kind of open-source large language model you're looking for.
PythonDecoder-Only InferenceLLM Inference EnginesLong Context Processing
View on GitHub5,697
meta-llama/llama-models
meta-llama/llama-models
7,643View on GitHub
This project provides a foundational framework and reference implementation for executing causal language modeling and multimodal reasoning on local systems. It includes a set of core components for managing model assets, a fine-tuning framework, and structural definitions required to instantiate transformer-based architectures. The system is distinguished by its ability to process combined text and image inputs through multimodal transformer models for visual reasoning and document analysis. It also supports the deployment of quantized models, reducing memory footprints through low-precision
This repository provides the official framework for running and fine-tuning Meta's Llama models locally, with pre-trained weights, multiple parameter sizes, instruction-tuned variants, and quantization support, though its license is a custom one rather than a standard permissive open-source license.
PythonModel QuantizationModel Weight Management
View on GitHub7,643
thudm/chatglm-6b
THUDM/ChatGLM-6B
41,040View on GitHub
ChatGLM-6B is an open-source bilingual large language model designed for natural dialogue and text generation in both English and Chinese. It is structured as a dialogue model capable of tasks such as role-playing and information extraction. The project provides implementations for quantized language models, using low-precision weights to reduce GPU memory requirements for local inference. It also supports parameter-efficient fine-tuning, allowing model behavior to be optimized for specific tasks without requiring full retraining. The model includes capabilities for local execution on GPUs a
ChatGLM-6B is an open-source bilingual large language model with pre-trained weights, quantization for local inference, and parameter-efficient fine-tuning, making it a solid fit for running or adapting locally, though it offers only one size (6B) and doesn't explicitly mention Hugging Face Transformers or multiple variants.
PythonPrecision Quantization
View on GitHub41,040
clue-ai/chatyuan
clue-ai/ChatYuan
1,870View on GitHub
ChatYuan: Large Language Model for Dialogue in Chinese and English
ChatYuan is an open-source large language model for dialogue in Chinese and English, fitting the search for open-weight LLMs, though details on Hugging Face support, parameter sizes, or instruction-tuned variants are not confirmed in the provided evidence.
PythonOpen Source ModelsText LLM Models
View on GitHub1,870
google-research/bert
google-research/bert
39,869View on GitHub
This project is a transformer-based language model and natural language processing toolkit designed to generate deep contextual representations of text. By utilizing a transformer-based encoder architecture, the system processes input sequences through stacked self-attention layers to capture the semantic meaning of tokens based on their surrounding sentence structure. The model distinguishes itself through bidirectional contextual processing, which analyzes text in both directions simultaneously, and masked language modeling, which trains the system by predicting hidden tokens within a seque
BERT is an open-source transformer-based language model with pre-trained weights available for fine-tuning and local use, fitting the search for accessible LLMs despite focusing on bidirectional encoding rather than generative tasks.
PythonTransformer Language ModelsTransformer EncodersMasked Language Modeling
View on GitHub39,869

large language model

deepseek-ai/DeepSeek-V3

databrickslabs/dolly

facebookresearch/codellama

meta-llama/codellama

ymcui/Chinese-LLaMA-Alpaca-2

Stability-AI/StableLM

QwenLM/Qwen3

meta-llama/llama

01-ai/Yi

haotian-liu/LLaVA

allenai/OLMo

QwenLM/Qwen2.5

google/gemma_pytorch

meta-llama/llama-models

THUDM/ChatGLM-6B

clue-ai/ChatYuan

google-research/bert