rlm is an LLM code execution engine and orchestration framework designed to coordinate multiple language model calls and recursive sub-tasks through a programmable environment. It provides a sandboxed REPL environment and a recursive context processor to handle inputs that exceed standard token limits by programmatically decomposing prompts. The project differentiates itself through a reinforcement learning training harness used to teach models how to utilize recursive calls and code execution. It includes a reasoning visualization system that records and renders execution trajectories to ana
mini-swe-agent is an autonomous software engineering system designed to develop features and fix bugs by combining large language models with a bash interface. It operates as an agentic framework that executes coding tasks and documentation updates through a continuous cycle of model reasoning and tool execution. The project differentiates itself with a strong focus on safety and evaluation, utilizing container-based sandbox execution via Docker or Singularity to isolate command execution. It includes a batch-parallel evaluation harness to measure code-fixing accuracy against standardized sof
DeepAnalyze is an autonomous data science agent and research pipeline designed to transform raw datasets into comprehensive analysis reports. It operates by generating and executing Python code to perform data preparation, modeling, and visualization. The system utilizes a secure, containerized execution environment to run generated scripts in isolation from the host system. It includes a benchmarking tool to evaluate the accuracy and performance of large language models against standardized data science tasks and a standardized API gateway for managing model completions and file uploads. Th
Sandbox Agent is a platform designed to manage, secure, and orchestrate autonomous coding assistants. It provides a standardized infrastructure for executing untrusted code and managing agent lifecycles within isolated, containerized environments. By decoupling agent execution from client connections, the platform ensures that session states remain persistent across process restarts and network interruptions. The project distinguishes itself through a capability-based security model that enforces granular permission checks on tool usage, ensuring that autonomous processes operate within defin