30 open-source projects similar to google-deepmind/gemma, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Gemma alternative.
This project is a collection of educational resources and technical guides focused on the development and implementation of large language models. It provides a comprehensive curriculum covering transformer architectures, training methods, and deployment strategies. The materials provide detailed instructions for building autonomous agents using reasoning loops and tool integration, as well as guides for fine-tuning models through supervised learning and preference optimization. It also includes tutorials for constructing retrieval augmented generation pipelines and implementing transformer m
Qwen-7B is a pretrained causal language model designed for natural language generation, text processing, and complex reasoning tasks. It is available as an instruction-tuned model optimized for conversational interactions and a tool-use model capable of executing function calls and interacting with external APIs. The project provides a quantized version of the model to reduce GPU memory usage and supports the development of autonomous agents that can execute code and perform functions to complete complex goals. The system covers a wide range of capabilities including model fine-tuning throug
ChatGLM3 is an open-weights large language model designed for bilingual conversational interactions in English and Chinese. It functions as a tool-augmented system capable of calling external functions and executing internal code to resolve complex tasks. The model utilizes four-bit quantization to reduce memory requirements, enabling inference on consumer hardware and diverse processing units including GPUs and CPUs. It features an expanded context window for processing and summarizing long documents and includes a supervised fine-tuning pipeline for adapting the model to specialized domains
GLM-4 is an open weights large language model designed as a multimodal chat system. It functions as a reasoning-focused and multilingual model capable of processing and generating responses across text and visual data types. The model is distinguished by its function-calling capabilities, allowing it to interface with external tools and APIs to execute tasks and retrieve real-time information. It is optimized for complex logical reasoning, mathematical problem solving, and deep research involving long-form content generation. Broad capabilities include multilingual text generation, the creat
ChatGLM2-6B is an open-weight large language model designed for natural language conversations and text generation in both English and Chinese. It functions as a bilingual chat model capable of processing and maintaining coherence across text sequences up to 32K tokens. The model is optimized for local deployment through precision quantization, which reduces memory requirements to allow execution on consumer-grade hardware. It supports distributing model weights across multiple graphics cards to handle parameters that exceed the memory of a single device. The project covers capabilities for
Torchtune is a PyTorch-native library for fine-tuning, aligning, and quantizing large language models. It provides a configurable training pipeline orchestrated through YAML recipes, with CLI overrides and component swapping, distributed training via FSDP2, memory optimizations, and parameter-efficient fine-tuning methods like LoRA, DoRA, and QLoRA. The library distinguishes itself through its YAML-driven configuration system that defines all training parameters and instantiates components from config files, with full CLI override capability for any field or component at launch time. It suppo
gpt-oss is an open-weight large language model and reasoning engine designed for complex reasoning and agentic workflows. It functions as an AI agent framework and model serving API, allowing for local deployment and the hosting of standardized interfaces to expose model completions and internal reasoning processes. The project distinguishes itself as a quantized inference engine, utilizing tensor parallelism and weight quantization to run high-parameter models on limited hardware. It features a reasoning model that employs chain-of-thought processing to solve multi-step logical tasks. The s
bitsandbytes is a deep learning quantization tool and library designed to reduce the memory footprint of large language models. It serves as a GPU memory optimizer and quantization framework, compressing model weights and features to 8-bit and 4-bit precision to enable inference and training on hardware with limited memory. The project provides a framework for low-rank adaptation, allowing the fine-tuning of quantized models by combining 4-bit weights with small trainable matrices. It further distinguishes itself through memory paging, which moves optimizer states between CPU and GPU memory t
IF is a text-to-image diffusion system that translates natural language descriptions into visual imagery. The project provides a generative pipeline for creating images, an inpainting tool for modifying specific image sections, and a super-resolution upscaler to increase pixel density and clarity. The system includes a concept fine-tuning framework that allows for the teaching of new visual concepts by updating a small set of parameters. It also supports image style transfer to apply the aesthetic characteristics of a reference image to a new output.
MiniCPM is a collection of small language models designed for local, on-device deployment in resource-constrained environments. The project focuses on running dense Transformer models on consumer hardware, including GPUs, CPUs, and Apple Silicon, without requiring custom code forks. The project distinguishes itself through heavy optimization for edge hardware, utilizing quantized weight compression in GGUF and MLX formats to reduce memory overhead. It implements advanced inference techniques such as speculative sampling and radix-tree prefix caching to accelerate generation speed and throughp
Torchtune is a PyTorch-native library for fine-tuning, aligning, and quantizing large language models. It provides a config-driven system for instantiating components, orchestrating distributed training, and managing parameter-efficient fine-tuning with quantization support, all through YAML-based configurations and command-line overrides. The library distinguishes itself through its comprehensive post-training workflow orchestration, combining supervised fine-tuning, preference optimization (DPO, PPO, GRPO), knowledge distillation, and quantization-aware training in a single configurable pip
This repository provides a collection of reference implementations and code examples for training and deploying machine learning models using the MLX framework. It serves as a practical guide for executing distributed training, fine-tuning large language models, converting model weights, and implementing multimodal generative workflows. The project distinguishes itself through specialized examples for local hardware execution, featuring weight quantization to reduce memory usage and low-rank adaptation for parameter-efficient fine-tuning. It also includes scripts for transforming external mod
Sana is a framework for high-resolution image and video synthesis based on a linear diffusion transformer. It provides a toolkit for the training, fine-tuning, and execution of text-to-image and text-to-video models, as well as a video generative world model capable of simulating physical environments with precise spatial control. The project is distinguished by its use of linear complexity layers to handle high resolutions and its support for long-form, minute-length video generation in real time. It implements a two-stage inference paradigm that separates structural generation from visual t
Tiny Universe is an educational monorepo that delivers multiple independent implementations of core AI subsystems as self-contained Jupyter notebooks. It provides from-scratch constructions of foundational architectures including a complete Transformer model built from the original paper specification, a denoising diffusion probabilistic model for image generation, and a ReAct-style autonomous agent framework that equips an LLM with tools for planning and multi-step task execution. The project distinguishes itself by covering the full lifecycle of modern AI systems through hands-on implementa
OpenNMT-py is a PyTorch neural machine translation framework used for training and deploying neural machine translation and large language models. It functions as a distributed model training system, an inference engine, and a toolkit for fine-tuning large language models. The framework distinguishes itself with a dedicated toolkit for adapting large language models through low-rank adaptation, quantization, and instruction tuning. It also includes a neural machine translation server that allows trained models to be hosted and exposed via REST API endpoints. The project covers a broad range
h2o-llmstudio is a language model training framework that provides a no-code graphical interface for fine-tuning large language models on custom datasets. It functions as a specialized tool for managing the training lifecycle, from configuring hyperparameters to monitoring performance metrics. The project distinguishes itself through a multi-GPU training orchestrator that distributes workloads via data parallel processing and a low-rank adaptation tool for memory-efficient fine-tuning. It also includes a model evaluation dashboard featuring an interactive chat interface to verify conversation
CogVLM is a multimodal large language model designed for visual reasoning and multi-turn dialogue. It functions as a visual grounding model and a quantized vision model, combining text and image processing to perform complex understanding and maintain context across visual inputs. The project includes capabilities as a GUI automation agent, allowing it to analyze application screenshots, plan operational steps, and return precise screen coordinates for interface interaction. It further supports visual grounding by generating bounding box coordinates to map text descriptions to specific spatia
This project is a comprehensive collection of educational examples and reference implementations for building vision and language models using PyTorch. It serves as a deep learning tutorial covering the end-to-end process of developing neural networks, from initial architecture definition to final production deployment. The repository provides detailed guides on implementing a wide range of domain-specific models, including convolutional neural networks for object detection and segmentation, as well as transformer and recurrent architectures for natural language processing. It emphasizes gene
Fairseq is a PyTorch toolkit for sequence-to-sequence modeling, specializing in neural machine translation, automatic speech recognition, and large-scale language model training. It provides a framework for processing and aligning diverse data sources, including text, audio, and video, to support tasks such as speech-to-text conversion and multimodal sequence learning. The project is distinguished by its distributed training capabilities, which utilize parameter sharding, mixed-precision training, and CPU offloading to handle models that exceed single-device memory. It also includes specializ
This project provides a foundational framework and reference implementation for executing causal language modeling and multimodal reasoning on local systems. It includes a set of core components for managing model assets, a fine-tuning framework, and structural definitions required to instantiate transformer-based architectures. The system is distinguished by its ability to process combined text and image inputs through multimodal transformer models for visual reasoning and document analysis. It also supports the deployment of quantized models, reducing memory footprints through low-precision
This project is an AI-powered IDE extension and LLM coding assistant that provides a conversational interface for generating, refactoring, and debugging code. It functions as an AI agent framework and a Model Context Protocol client, connecting AI models to external data sources and tools to automate complex development tasks. The system is distinguished by its use of autonomous AI agents capable of multi-step task execution, including the ability to read files, modify code, and run terminal commands iteratively. It supports recursive agent orchestration through subagent delegation and employ
This project is a comprehensive framework and toolkit for developing, optimizing, and deploying transformer-based models across multimodal, document intelligence, and natural language processing tasks. It provides a unified neural architecture that processes text, vision, audio, and document layout data through a shared set of weights, enabling researchers and developers to build foundational models that align cross-modal representations. The platform distinguishes itself through advanced training and inference strategies designed for large-scale deep learning. It incorporates specialized mec
KoboldCPP is a local large language model inference engine and GGUF model runner designed to execute quantized models on personal hardware. It functions as a multimodal AI server and API gateway, providing OpenAI-compatible endpoints that allow third-party clients to interact with locally hosted models. The project distinguishes itself as an AI storytelling backend, featuring dedicated tools for long-form narrative management through persistent memory, world lore tracking, and character state management. It further extends its capabilities as a multimodal server capable of processing text, im
tiny-llm is a large language model inference engine and transformer model implementation. It serves as a quantized model runtime and paged key-value cache manager, providing a specialized inference stack optimized for Apple Silicon. The system distinguishes itself through high-throughput execution techniques, including continuous batching and paged attention. It utilizes a paged memory system to eliminate fragmentation during token generation and employs on-the-fly dequantization of compressed weights to reduce the memory footprint during matrix multiplication. The project covers a broad ran
This project is a research framework and toolkit designed for training large-scale vision transformers and multimodal language models. It provides a comprehensive suite for vision-language pretraining, enabling the development of models that map images and text into shared latent spaces. The framework is distinguished by its capabilities in high-fidelity image generation and multimodal research, utilizing normalizing flows and variational autoencoders to produce images from text prompts or class labels. It supports the development of both generative and contrastive models, allowing for a wide
DeepSeek-VL2 is a multimodal large language model and vision-language system designed to analyze visual scenes and generate descriptive text. It functions as a visual question answering and visual grounding model, capable of extracting information from documents and locating specific objects or regions within images based on textual descriptions. The project utilizes a mixture-of-experts architecture to process combined image and text inputs. It is optimized for inference through incremental prefilling, which reduces the GPU memory requirements on hardware. The model covers multimodal data a
This project is an educational curriculum and set of technical guides for building production-ready large language model and retrieval augmented generation systems. It provides instructional materials and hands-on lessons focused on model specialization, LLMOps, and the implementation of vector databases. The course covers the development of retrieval augmented generation systems, including tutorials on creating data pipelines that crawl, chunk, and embed content into vector stores. It includes training guides for the deployment, monitoring, and maintenance of language models in production en
Metaseq is a transformer sequence modeling toolkit designed for training, fine-tuning, and deploying sequence-to-sequence models using open pre-trained weights. It provides a comprehensive framework for large language model training, including dedicated tools for sequence dataset processing and a standalone inference server for generating text via API requests. The project features specialized utilities for model quantization to reduce parameter precision to eight bits, which lowers memory usage and increases inference speed. It also includes a checkpoint conversion pipeline to transform mode
OpenChat is a framework for the training, fine-tuning, and deployment of large language models optimized for conversational and mathematical reasoning tasks. It provides a comprehensive lifecycle for these models, ranging from training pipelines and deployment stacks to a web-based chat interface. The project focuses on enabling high-performance model execution on consumer-grade hardware without the need for enterprise-grade accelerators. It includes a production-ready inference server that implements the OpenAI chat completion protocol and utilizes dynamic request batching to optimize hardwa