We curate open-source GitHub repositories matching “open source llm models”. Results are ranked by relevance to your query — pick filters below to narrow, or refine with AI.
DeepSeek-V3 is a large language model that provides comprehensive resources for model utilization, including technical specifications, pre-trained weights, and evaluation benchmarks. The project details the core transformer architecture, including parameter counts and multi-token prediction modules, while supporting native 8-bit floating-point quantization. The repository offers extensive support for local and distributed inference through integration with multiple frameworks and engines. It includes documentation for deploying the model across various hardware configurations, such as GPUs an
DeepSeek-V3 is an openly released large language model with pre-trained weights, native 8-bit quantization support, and thorough deployment documentation, squarely matching your interest in open-source LLMs that can be run locally and fine-tuned.
Dolly is an instruction-tuned large language model designed to follow complex natural language directions. It operates as a causal language model that predicts the next token in a sequence to generate coherent conversational responses and perform tasks such as brainstorming, classification, and question answering. The project focuses on the development of models using open datasets suitable for commercial application. It enables the creation of instruction-following models by utilizing curated collections of human-generated instruction-response pairs. The repository provides capabilities for
Dolly is an instruction-tuned, open-source large language model with permissive licensing and accessible pre-trained weights, directly matching your need for a model you can run locally or fine-tune, though it offers a single parameter size rather than multiple variants.
Code Llama is a large language model based on Llama 2 trained specifically for programming tasks and software development. It provides specialized model types optimized for general code generation, instruction following, and context-aware infilling. The project includes an instruction-tuned programming model for executing technical tasks via natural language prompts and a code infilling model that predicts missing sections based on surrounding source context. A large context code model is also provided to analyze extensive blocks of source code for improved coherence. The system covers capab
Code Llama is a large language model family specialized for code tasks, built on Llama 2 with accessible weights, instruction-tuned variants, multiple parameter sizes, and quantization support, fitting the search for open-source LLMs—though its code-specific focus may be narrower than general-purpose models.
CodeLlama is a family of large language models derived from the Llama 2 architecture and specialized for producing, completing, and refactoring source code across multiple programming languages. It functions as a code generation model capable of synthesizing source code from natural language descriptions. The project includes specific model variants designed for different programming tasks. This includes instruction-tuned models trained to follow complex natural language directions and code infilling models that predict and insert missing code segments into existing files by analyzing surroun
CodeLlama is a family of open-source large language models focused on code generation, with pre-trained weights available in multiple sizes and instruction-tuned variants, so you can run it locally or fine-tune it — though its code‑specific focus means you should look elsewhere if you need a general‑purpose model.
This project provides a Chinese large language model based on the LLaMA architecture. It is an instruction-tuned model optimized for natural language processing and multi-turn conversations in Chinese. The system includes a framework for parameter-efficient fine-tuning using low-rank adaptation and quantization to reduce memory requirements. It also implements retrieval augmented generation for local document question answering and supports long-context processing for sequences up to 64K tokens. The project covers a broad set of capabilities including supervised instruction tuning, reinforce
This repository provides an instruction-tuned Chinese large language model based on LLaMA-2 with accessible pre-trained weights, support for fine-tuning, quantization, and long-context processing, making it a fitting open-source LLM for local deployment and customization.
StableLM is a pre-trained transformer-based large language model designed for natural language generation and zero-shot inference. It functions as a causal language model that predicts the next token in a sequence to produce human-like text for conversational and creative writing tasks. The model is built as a fine-tunable base, allowing the adaptation of pre-trained weights to specific tasks or styles through custom dataset training and weight regularization. It utilizes rotary positional embeddings and flash-attention to optimize memory usage and processing efficiency during deployment on G
StableLM is a pre-trained transformer-based large language model with open weights that supports fine-tuning, multiple parameter sizes, and is designed for local deployment, making it a solid fit for an open-source LLM that can be run or adapted locally.
Qwen3 is a transformer-based large language model designed as a generative AI foundation for understanding, reasoning, and generating human language. It functions as a comprehensive ecosystem for model training, fine-tuning, and production-ready inference, providing the underlying architecture and weights necessary to build diverse artificial intelligence applications. The project distinguishes itself through extensive support for model quantization and distributed inference, enabling efficient execution across a wide range of hardware from consumer-grade devices to scalable cloud infrastruct
Qwen3 is an open-source transformer large language model family with publicly available pre-trained weights, multiple parameter sizes, instruction-tuned variants, quantization support, and full compatibility with Hugging Face Transformers, directly meeting the need for locally runnable and fine-tunable models with community benchmarks.
Llama is a computational framework and runtime environment designed for executing transformer-based neural networks locally. It functions as a generative AI inference engine, enabling the processing of input sequences through pre-trained model weights to produce text completions and structured data outputs directly on your own hardware. The system distinguishes itself through specialized memory and computation management techniques, including memory-mapped weight loading and quantization-aware inference, which allow for efficient execution on standard consumer hardware. It utilizes a stateles
Meta's Llama repository is the official codebase for the Llama family of large language models, providing pre-trained weights (available separately), support for multiple sizes from 7B to 70B, instruction-tuned versions, quantization for local inference, and integration with Hugging Face Transformers, making it a flagship open-source LLM that directly matches your need for accessible weights and fine-tuning capability.
Yi is a bilingual language model and foundation model designed for natural language processing, reasoning, and reading comprehension in both English and Chinese. It is built as a transformer-based architecture capable of general purpose text generation and conversational tasks. The model is distinguished by its ability to function as a long context system, processing and analyzing extended input sequences up to 200k tokens. It also supports quantized versions that use low-bit precision to reduce memory footprints, enabling execution on consumer-grade hardware. The project covers a broad rang
Yi is a bilingual open-source large language model with released weights, long-context support, and quantization for local execution, fitting the requirement for an accessible, locally runnable LLM.
LLaVA is a multimodal large language model architecture designed to process and interpret both image and text inputs to generate natural language responses. It functions as a research-oriented platform for visual instruction tuning, providing a framework to align language models with human intent through training on diverse datasets of paired images and text queries. The system distinguishes itself through a specialized vision-language training pipeline that connects visual data to language models using projection layers and instruction-based fine-tuning. It supports distributed inference by
LLaVA is an open-source multimodal large language model with accessible pre-trained weights that you can run locally, fine-tune, and use via Hugging Face Transformers, though the base model license (LLaMA) is not fully permissive.
OLMo is a fully open-source large language model family from AI2 with publicly released pre-trained weights under the Apache 2.0 license, supporting Hugging Face Transformers, multiple parameter sizes, instruction-tuned variants, and quantization tools, making it exactly the kind of open-source LLM this search targets.
Qwen2.5 is a suite of large language model foundation models designed for natural language generation, code production, and complex mathematical reasoning. The project encompasses a multilingual language model capable of processing dozens of languages and a specialized code generation model for technical problem solving and debugging. The framework is distinguished by its long context capabilities, enabling the analysis of massive inputs ranging from 256K up to 1 million tokens. It further functions as an agentic framework, utilizing standardized templates and parsers to execute autonomous wo
Qwen2.5 is a comprehensive family of open-source LLMs released under a permissive license (Apache 2.0), with pre-trained weights in multiple sizes from 0.5B to 72B, full Hugging Face Transformers support, instruction-tuned variants, and quantization support—exactly what you need for local deployment and fine-tuning.
The official PyTorch implementation of Google's Gemma models
This repository provides the official PyTorch implementation of Google's Gemma open-weight LLMs, giving you direct access to pretrained weights, multiple parameter sizes, instruction-tuned variants, and tools for fine-tuning and local inference — exactly the kind of open-source large language model you're looking for.
This project provides a foundational framework and reference implementation for executing causal language modeling and multimodal reasoning on local systems. It includes a set of core components for managing model assets, a fine-tuning framework, and structural definitions required to instantiate transformer-based architectures. The system is distinguished by its ability to process combined text and image inputs through multimodal transformer models for visual reasoning and document analysis. It also supports the deployment of quantized models, reducing memory footprints through low-precision
This repository provides the official framework for running and fine-tuning Meta's Llama models locally, with pre-trained weights, multiple parameter sizes, instruction-tuned variants, and quantization support, though its license is a custom one rather than a standard permissive open-source license.
ChatGLM-6B is an open-source bilingual large language model designed for natural dialogue and text generation in both English and Chinese. It is structured as a dialogue model capable of tasks such as role-playing and information extraction. The project provides implementations for quantized language models, using low-precision weights to reduce GPU memory requirements for local inference. It also supports parameter-efficient fine-tuning, allowing model behavior to be optimized for specific tasks without requiring full retraining. The model includes capabilities for local execution on GPUs a
ChatGLM-6B is an open-source bilingual large language model with pre-trained weights, quantization for local inference, and parameter-efficient fine-tuning, making it a solid fit for running or adapting locally, though it offers only one size (6B) and doesn't explicitly mention Hugging Face Transformers or multiple variants.
ChatYuan: Large Language Model for Dialogue in Chinese and English
ChatYuan is an open-source large language model for dialogue in Chinese and English, fitting the search for open-weight LLMs, though details on Hugging Face support, parameter sizes, or instruction-tuned variants are not confirmed in the provided evidence.
This project is a transformer-based language model and natural language processing toolkit designed to generate deep contextual representations of text. By utilizing a transformer-based encoder architecture, the system processes input sequences through stacked self-attention layers to capture the semantic meaning of tokens based on their surrounding sentence structure. The model distinguishes itself through bidirectional contextual processing, which analyzes text in both directions simultaneously, and masked language modeling, which trains the system by predicting hidden tokens within a seque
BERT is an open-source transformer-based language model with pre-trained weights available for fine-tuning and local use, fitting the search for accessible LLMs despite focusing on bidirectional encoding rather than generative tasks.