30 open-source projects similar to xiaomimimo/mimo, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best MiMo alternative.
Qwen3 is a transformer-based large language model designed as a generative AI foundation for understanding, reasoning, and generating human language. It functions as a comprehensive ecosystem for model training, fine-tuning, and production-ready inference, providing the underlying architecture and weights necessary to build diverse artificial intelligence applications. The project distinguishes itself through extensive support for model quantization and distributed inference, enabling efficient execution across a wide range of hardware from consumer-grade devices to scalable cloud infrastruct
DeepSeek-R1 is an open-weights large language model focused on advanced reasoning. It uses chain-of-thought processing and internal monologues to solve complex mathematical and logical problems by breaking tasks into sequential, verifiable thought processes. The model is developed using reinforcement learning to optimize reasoning patterns and verify logical steps. It employs a distillation process to transfer these high-performance logic capabilities from a large teacher model into smaller, computationally efficient versions. The training framework incorporates group relative policy optimiz
Kimi-K2 is a conversational AI engine and reasoning framework designed for text generation, advanced problem solving, and coding tasks. It functions as a tool-augmented language model capable of producing human-like chat responses through a compatible model interface. The system utilizes a reasoning-optimized architecture that separates standard conversational flow from deep logical processing. This allows the model to execute autonomous tasks by invoking external functions and calling APIs to retrieve real-time data. The project supports structured JSON output parsing for function-call inte
Qwen2.5 is a suite of large language model foundation models designed for natural language generation, code production, and complex mathematical reasoning. The project encompasses a multilingual language model capable of processing dozens of languages and a specialized code generation model for technical problem solving and debugging. The framework is distinguished by its long context capabilities, enabling the analysis of massive inputs ranging from 256K up to 1 million tokens. It further functions as an agentic framework, utilizing standardized templates and parsers to execute autonomous wo
An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Megatron-LM is a distributed transformer training library and large language model training framework designed to scale models across thousands of GPUs. It functions as a GPU-optimized deep learning toolkit and a scaling engine for mixture-of-experts architectures, enabling the training of models with hundreds of billions of parameters. The project implements multi-dimensional model parallelism, combining tensor, pipeline, data, expert, and context-based workload distribution. It specifically optimizes mixture-of-experts architectures through integrated memory and communication improvements t
MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.
GLM-4.5 is a multimodal large language model and advanced reasoning system. It functions as an AI coding assistant, an autonomous AI agent, and a multimodal content generator capable of processing and generating text, images, audio, and video within a single unified system. The project is distinguished by its deep reasoning capabilities, utilizing chain-of-thought processing to solve complex mathematical, logical, and technical problems. It features an agentic architecture that allows for autonomous task execution, long-horizon goal planning, and the ability to interact with external tools an
✊ Unleashing the Power of Reinforcement Learning for Math and Code Reasoners 🤖
Code and data associated with the AmbiEnt dataset in "We're Afraid Language Models Aren't Modeling Ambiguity" (Liu et al., 2023)
From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 🍓
This project is a quantized fine-tuning framework for large language models. It implements a low-rank adaptation library and a four-bit quantizer to reduce the GPU memory requirements needed to train large models. The framework utilizes four-bit quantization and low-rank adapters to enable model training on consumer-grade hardware. It further reduces the memory footprint through double quantization and a paged optimizer that offloads states to system RAM. The system supports distributed training across multiple GPUs to handle larger parameter scales and includes utilities for custom dataset
AudioGPT is an LLM-driven audio framework and processing suite that uses large language models to orchestrate neural audio pipelines. It functions as a multimodal audio generator and processing system, integrating a collection of pretrained models to handle speech synthesis, sound generation, and audio manipulation. The system is distinguished by its ability to generate audio from diverse inputs, including text and images, and its capacity to produce synchronized talking head videos. It also operates as a neural speech translator, converting spoken language between different tongues while pre
Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency
Exploring Applications of GRPO
Chroma is a specialized vector database designed to index and retrieve high-dimensional data representations for semantic similarity search. It functions as a comprehensive platform for information retrieval, enabling the storage and management of unstructured documents alongside structured metadata. By mapping data into numerical representations, the system facilitates rapid similarity lookups across large datasets. The platform distinguishes itself through a hybrid search infrastructure that combines dense vector embeddings with sparse keyword and regular expression matching to balance sema
Please refer to our new GitHub Wiki which documents our efforts in detail in creating the open source version of GitHub Copilot
Deepeval is a framework for testing and evaluating large language model applications. It provides a suite of tools for executing automated regression tests, validating model output quality against defined standards, and tracing the execution of complex agent workflows. By integrating these capabilities into development pipelines, the platform ensures consistent performance and reliability throughout the software lifecycle. The platform distinguishes itself through its focus on programmatic validation and observability. It utilizes secondary language models to score output quality and employs
ChatBot Injection and Exploit Examples: A Curated List of Prompt Engineer Commands - ChatGPT
Explanation to key concepts in ML
Dolly is an instruction-tuned large language model designed to follow complex natural language directions. It operates as a causal language model that predicts the next token in a sequence to generate coherent conversational responses and perform tasks such as brainstorming, classification, and question answering. The project focuses on the development of models using open datasets suitable for commercial application. It enables the creation of instruction-following models by utilizing curated collections of human-generated instruction-response pairs. The repository provides capabilities for