30 open-source projects similar to modalminds/mm-eureka, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best MM EUREKA alternative.
EasyR1 is a distributed model training system and reinforcement learning framework for large language and vision-language models. It functions as a multimodal trainer and an implementation of a Proximal Policy Optimization pipeline designed to refine the reasoning and perception capabilities of models that process both text and images. The system specializes in distributing reinforcement learning workloads across multiple compute nodes to manage high memory requirements. It optimizes hardware utilization through padding-free training and fine-tuning to fit large models onto available graphics
VLM-R1 is a reasoning vision-language model and embodied AI framework designed to map visual inputs and language instructions into physical navigation waypoints and robotic actions. It functions as a multimodal policy optimizer and an open vocabulary detector capable of locating objects based on arbitrary natural language descriptions. The system distinguishes itself through the use of chain-of-thought reasoning and reinforcement learning to solve complex visual and spatial tasks. It utilizes a video semantic memory system, which employs a visual cache to maintain a history of live video for
TinyZero is a reinforcement learning framework and implementation designed to train language models to develop reasoning and self-verification abilities. It provides a training pipeline to optimize model performance on mathematical and logical tasks. The project serves as a minimal reproduction of the DeepSeek R1 architectural and training approach. It focuses on creating reasoning models that can solve structured problems through autonomous chain-of-thought discovery. The framework incorporates group relative policy optimization and reward-based self-correction to improve accuracy on logica
An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
AReaL is a system for agent orchestration, distributed model training, and parameter-efficient tuning. It provides a framework for developing multi-turn reasoning agents and training large models using reinforcement learning from human feedback. The project implements a toolkit for improving the visual reasoning and geometry problem solving capabilities of vision-language models. It utilizes a memory-efficient tuning system to optimize mathematical and reasoning models across different inference backends. The infrastructure supports large-scale training through tensor, pipeline, and expert p
🚀 Reinforcement Learning for Language Agents🌟
MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.
OpenRLHF is a training framework and alignment library designed for reinforcement learning from human feedback across distributed GPU clusters. It provides tools for aligning large language models and multimodal vision-language models using algorithms such as PPO, GRPO, and DPO. The framework distinguishes itself through a distributed inference engine that overlaps sample rollout with training to increase throughput. It supports scaling to models exceeding 70 billion parameters via parameter sharding and handles long-context sequences through ring-attention sequence parallelism. The project
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
Qwen2.5 is a suite of large language model foundation models designed for natural language generation, code production, and complex mathematical reasoning. The project encompasses a multilingual language model capable of processing dozens of languages and a specialized code generation model for technical problem solving and debugging. The framework is distinguished by its long context capabilities, enabling the analysis of massive inputs ranging from 256K up to 1 million tokens. It further functions as an agentic framework, utilizing standardized templates and parsers to execute autonomous wo
Dopamine is a reinforcement learning research framework designed for prototyping and testing algorithms across diverse simulated environments. It provides an agent development toolkit that utilizes a flat class hierarchy to facilitate the creation and extension of learning agents. The framework includes a standardization layer via environment wrappers that connect agents to various physics simulations and gaming environments. It also features a high-performance experience replay buffer for storing and sampling transition data to improve training stability, alongside a dedicated hyperparameter
Habitat-Lab is an open-source platform for training and evaluating embodied AI agents in photorealistic 3D indoor environments. It functions as a high-performance 3D indoor environment simulator that supports physics-based interaction, enabling research into navigation and manipulation tasks. The platform provides a modular task-environment abstraction that separates task logic from environment simulation, using configuration-driven pipeline assembly to compose simulation and training pipelines. It includes a hierarchical sensor-actuator architecture for mixing and matching perception and act
Exploring Applications of GRPO
ROLL is a distributed reinforcement learning framework and model alignment toolkit designed for large language models. It serves as a scalable training pipeline and GPU cluster manager, providing the infrastructure to align model behavior using reinforcement learning algorithms and preference optimization techniques. The project distinguishes itself through an agentic rollout orchestrator that generates and collects multi-turn interaction trajectories between AI agents and simulated environments. It supports specialized alignment methods including Direct Preference Optimization, reinforcement
Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"
Checkpoints take up a lot of space. Please email yninghong@gmail.com if you need them.
🧐 About | 🚀 Quick Start | 🐣 Agentless Mini | 📝 Citation | 🙏 Acknowledgements
R1-onevision, a visual language model capable of deep CoT reasoning.
DeepSeek-R1 is an open-weights large language model focused on advanced reasoning. It uses chain-of-thought processing and internal monologues to solve complex mathematical and logical problems by breaking tasks into sequential, verifiable thought processes. The model is developed using reinforcement learning to optimize reasoning patterns and verify logical steps. It employs a distillation process to transfer these high-performance logic capabilities from a large teacher model into smaller, computationally efficient versions. The training framework incorporates group relative policy optimiz