mini-swe-agent is an autonomous software engineering system designed to develop features and fix bugs by combining large language models with a bash interface. It operates as an agentic framework that executes coding tasks and documentation updates through a continuous cycle of model reasoning and tool execution.
The project differentiates itself with a strong focus on safety and evaluation, utilizing container-based sandbox execution via Docker or Singularity to isolate command execution. It includes a batch-parallel evaluation harness to measure code-fixing accuracy against standardized software engineering datasets and a constraint-based control system to enforce limits on step counts, time, and API expenditure.
The system provides comprehensive LLM API orchestration, supporting a unified interface for multiple model providers, native tool calling, and detailed expenditure tracking. Additional capabilities cover interactive human-in-the-loop oversight via a REPL-style interface, trajectory serialization for post-run analysis, and a flexible configuration system using Jinja2 templates for prompt and observation formatting.