30 open-source projects similar to deep-agent/r1-v, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best R1 V alternative.
EasyR1 is a distributed model training system and reinforcement learning framework for large language and vision-language models. It functions as a multimodal trainer and an implementation of a Proximal Policy Optimization pipeline designed to refine the reasoning and perception capabilities of models that process both text and images. The system specializes in distributing reinforcement learning workloads across multiple compute nodes to manage high memory requirements. It optimizes hardware utilization through padding-free training and fine-tuning to fit large models onto available graphics
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
TinyZero is a reinforcement learning framework and implementation designed to train language models to develop reasoning and self-verification abilities. It provides a training pipeline to optimize model performance on mathematical and logical tasks. The project serves as a minimal reproduction of the DeepSeek R1 architectural and training approach. It focuses on creating reasoning models that can solve structured problems through autonomous chain-of-thought discovery. The framework incorporates group relative policy optimization and reward-based self-correction to improve accuracy on logica
π Reinforcement Learning for Language Agentsπ
VLM-R1 is a reasoning vision-language model and embodied AI framework designed to map visual inputs and language instructions into physical navigation waypoints and robotic actions. It functions as a multimodal policy optimizer and an open vocabulary detector capable of locating objects based on arbitrary natural language descriptions. The system distinguishes itself through the use of chain-of-thought reasoning and reinforcement learning to solve complex visual and spatial tasks. It utilizes a video semantic memory system, which employs a visual cache to maintain a history of live video for
MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
AReaL is a system for agent orchestration, distributed model training, and parameter-efficient tuning. It provides a framework for developing multi-turn reasoning agents and training large models using reinforcement learning from human feedback. The project implements a toolkit for improving the visual reasoning and geometry problem solving capabilities of vision-language models. It utilizes a memory-efficient tuning system to optimize mathematical and reasoning models across different inference backends. The infrastructure supports large-scale training through tensor, pipeline, and expert p
Habitat-Lab is an open-source platform for training and evaluating embodied AI agents in photorealistic 3D indoor environments. It functions as a high-performance 3D indoor environment simulator that supports physics-based interaction, enabling research into navigation and manipulation tasks. The platform provides a modular task-environment abstraction that separates task logic from environment simulation, using configuration-driven pipeline assembly to compose simulation and training pipelines. It includes a hierarchical sensor-actuator architecture for mixing and matching perception and act
Qwen2.5 is a suite of large language model foundation models designed for natural language generation, code production, and complex mathematical reasoning. The project encompasses a multilingual language model capable of processing dozens of languages and a specialized code generation model for technical problem solving and debugging. The framework is distinguished by its long context capabilities, enabling the analysis of massive inputs ranging from 256K up to 1 million tokens. It further functions as an agentic framework, utilizing standardized templates and parsers to execute autonomous wo
Dopamine is a reinforcement learning research framework designed for prototyping and testing algorithms across diverse simulated environments. It provides an agent development toolkit that utilizes a flat class hierarchy to facilitate the creation and extension of learning agents. The framework includes a standardization layer via environment wrappers that connect agents to various physics simulations and gaming environments. It also features a high-performance experience replay buffer for storing and sampling transition data to improve training stability, alongside a dedicated hyperparameter
Exploring Applications of GRPO
ROLL is a distributed reinforcement learning framework and model alignment toolkit designed for large language models. It serves as a scalable training pipeline and GPU cluster manager, providing the infrastructure to align model behavior using reinforcement learning algorithms and preference optimization techniques. The project distinguishes itself through an agentic rollout orchestrator that generates and collects multi-turn interaction trajectories between AI agents and simulated environments. It supports specialized alignment methods including Direct Preference Optimization, reinforcement
Checkpoints take up a lot of space. Please email yninghong@gmail.com if you need them.
Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"
π§ About | π Quick Start | π£ Agentless Mini | π Citation | π Acknowledgements
R1-onevision, a visual language model capable of deep CoT reasoning.
OpenSeek aims to unite the global open source community to drive collaborative innovation in algorithms, data and systems to develop next-generation models.
Medical o1, Towards medical complex reasoning with LLMs