CleanRL is a reinforcement learning library and PyTorch framework providing a suite of reproducible implementations for online reinforcement learning algorithms. It serves as a deep reinforcement learning benchmark suite and experiment orchestrator designed for research and agent development across both discrete and continuous action spaces. The project is distinguished by its single-file algorithm implementation approach, which encapsulates each algorithm in a standalone script to eliminate complex class hierarchies. This structure is paired with a system for scheduling and executing large-s
keras-rl is a reinforcement learning library that enables the training of neural agents using Keras. It serves as a framework for implementing deep reinforcement learning agents that interact with simulated environments to discover optimal behaviors and maximize cumulative rewards. The library provides a system for configuring, training, and managing neural network agents. It handles the interaction loop between agents and environments, allowing models to learn through direct experience and gradient-based optimization. The framework includes capabilities for model weight management, allowing
Dopamine is a reinforcement learning research framework designed for prototyping and testing algorithms across diverse simulated environments. It provides an agent development toolkit that utilizes a flat class hierarchy to facilitate the creation and extension of learning agents. The framework includes a standardization layer via environment wrappers that connect agents to various physics simulations and gaming environments. It also features a high-performance experience replay buffer for storing and sampling transition data to improve training stability, alongside a dedicated hyperparameter
RLinf is a distributed reinforcement learning orchestrator and embodied AI training framework. It provides the infrastructure to train vision-language-action models and robotic policies using a combination of reinforcement learning and supervised fine-tuning. The system is designed for scaling workloads across GPU clusters, managing the placement of actors, rollout workers, and environment components. It features a specialized robotics data collection pipeline for gathering teleoperated demonstrations and simulation trajectories into standardized replay buffers, alongside a hardware interface