OpenHands/OpenHands
OpenHands
OpenHands is an autonomous agent framework designed for software engineering workflows. It provides a modular platform for orchestrating AI agents that reason, plan, and execute tasks within isolated, containerized development environments. By integrating with standard version control and development tools, the system enables agents to autonomously navigate codebases, implement features, and resolve issues through iterative reasoning and tool execution.
The platform distinguishes itself through a model-agnostic orchestrator that connects diverse language models to a unified tool registry. It supports complex, multi-agent collaboration via hierarchical task delegation, allowing parent agents to spawn and manage independent sub-agents for parallelized workflows. Security is managed through configurable action approval policies and real-time risk evaluation, ensuring that autonomous operations remain within defined safety boundaries.
The system covers a broad capability surface including persistent conversation state management, automated code review, and web research automation. It features an event-driven architecture that serializes interactions into immutable logs, facilitating observability and time-travel debugging. Developers can extend agent functionality through custom skill definitions, plugin packages, and integration with external services via standardized protocols.
The project provides a command-line interface for managing agent sessions, remote server deployments, and containerized workspace lifecycles. It is designed for extensibility, allowing users to configure agent behavior through structured objects, markdown-based definitions, and environment-specific settings.
Features
- Remote Agent Deployments - Deploy agent software as containerized backend services to manage isolated workspaces, execute shell commands, and stream real-time events to client applications.
- Agent Configuration Schemas - Define agent configurations by specifying language models, available tools, and condenser settings within structured objects for consistent agent initialization.
- Agent Orchestration - Initialize functional agent instances directly from validated settings objects to begin tasks or integrate into conversation workflows.
- Agent Orchestrators - A unified interface for connecting diverse language models to custom tools, enabling complex reasoning cycles and multi-agent delegation strategies.
- Agent Skill Definitions - Define reusable agent skills using standardized directory structures and configuration files to provide context, instructions, and triggers for agent tasks.
- Agent Tool Definitions - Define custom tools for agent use by specifying names and parameters, with built-in validation for tool names and parameter structures.
- Agent Tool Execution - Execute tools using action-observation patterns, with configurable security levels that determine whether to run actions immediately or require explicit user confirmation.
- Custom Tool Definitions - Define custom tools by extending base classes with action and observation schemas, allowing agents to execute business logic and return structured results.
- Task Delegation Configurations - Register custom sub-agent types with specialized skills and add task management tools to agents to enable autonomous sub-task execution and conversation persistence.
- Agent Evaluation Frameworks - Evaluate agent actions and conversation history in real-time to predict success probability scores and provide feedback for automated quality monitoring.
- Agent Refinement Workflows - Configure agents to automatically review and improve work by triggering iterative refinement cycles when success probability scores fall below defined thresholds.
- Agent Tool Integrations - Integrate external tools into agent environments by embedding server configurations within skill files, allowing systems to spawn clients and register tools dynamically.
- Agent Tooling Interfaces - Define custom tools by implementing action, observation, and executor classes, then register them to extend agent capabilities with new functionality and logic.
- Agent Tooling Registries - Register and retrieve specific sets of tools for agents to restrict capabilities to tasks like file analysis, searching, or structured planning.
- Agent Action Representations - Represent agent actions as events that can be converted into model messages, including details like tool calls, reasoning content, and security risk assessments.
- Agent Configuration Formats - Define custom agents using Markdown files with configuration headers to specify agent identity, available tools, and system prompts for task execution.
- Agent Configuration Profiles - Configure advanced agent capabilities including external server connections with environment variable resolution and custom lifecycle hooks for specialized execution behavior.
- Agent Prompt Templates - Define custom system prompts using templates to enforce specific behavioral constraints, goal structures, and execution strategies for specialized agent roles.
- Agent Registries - Register built-in sub-agents from predefined directories to enable specialized capabilities like code exploration, bash execution, and web research for delegation tasks.
- Agent Task Refinement - Configure automatic iterative refinement to trigger follow-up prompts when agent performance falls below defined thresholds, repeating until success criteria are met.
- Agent Tooling Definitions - Define custom tools by registering them at module levels and packaging them within custom Docker images to extend agent capabilities in remote server environments.
- Tool Registration Systems - Register tools or factory functions by name to make them available for agents, supporting shared executors and dynamic configuration based on workspace state.
- Parallel Tool Execution - Configure the maximum number of tools agents execute simultaneously to improve performance for independent I/O-bound operations and sub-agent delegation tasks.
- Agent Querying Interfaces - Query agents for information about current tasks or conversation history without interrupting primary execution flows or modifying ongoing agent states.
- Sequential Task Delegation - Delegate complex, multi-step tasks to specialized sub-agents that run synchronously, allowing parent agents to block until sub-agents complete assigned work.
- Agent Workspace Management - Manage isolated agent workspaces by creating, executing commands, and monitoring container lifecycles to ensure strict resource limits and user security.
- Conversation Management - Manage agent-user interactions by handling message exchange, execution control, and conversation state, providing unified interfaces for both local and remote conversation implementations.
- Reasoning-Action Loops - Orchestrates autonomous cycles where agents process inputs, query language models for decisions, and execute tools to perform tasks.
- Hierarchical Task Delegation - Enables agents to spawn and manage independent sub-agents to parallelize complex workflows and consolidate results into a primary conversation.
- Task Delegation Systems - Delegate complex tasks to multiple independent sub-agents that run in parallel, allowing main agents to consolidate results and improve processing throughput.
- Agent Evaluation Feedback - Access critic evaluation scores, success status, and feedback messages programmatically via event callbacks to build custom automated workflows based on agent performance.
- Concurrent Agent Execution - Execute multiple agent conversation tasks concurrently using asynchronous executors to improve performance and handle several independent workflows within single application processes.
- Execution Control Flows - Pause active agent threads from external processes and resume execution later to perform intermediate operations or wait for specific conditions.
- Execution Message Injection - Send messages to active agents during execution by running conversation processes in background threads and injecting new instructions while agents are working.
- Task Completion Signals - Signal the completion of tasks or conversations by using dedicated tools that terminate agent current workflows.
- Reasoning Cycle Orchestrators - Execute reasoning cycles by processing pending actions, condensing event history, querying language models, and performing tool-based actions or message responses.
- Agentic Development Environments - A collaborative workspace that integrates AI-driven code review, pull request management, and autonomous task implementation into standard software development workflows.
- Agent Terminal Interfaces - Interact with autonomous agents using natural language commands while monitoring progress and accessing system settings through a dedicated command palette.
- Agent Execution Engines - Execute AI agent steps by calling language models, processing tool interactions, and updating conversation state with assistant responses and tool results.
- Agent Reasoning Configurations - Configure agent reasoning parameters by enabling retrieval-augmented generation for context awareness and assigning dedicated language models for internal mental modeling tasks.
- Agentic Workflow Orchestrators - Managing complex task delegation, multi-agent collaboration, and persistent conversation state to handle long-running development projects across diverse environments.
- Agent Configuration Serialization - Serialize agent configuration objects into JSON-compatible formats for storage, transmission, or rehydration across different processes and user interfaces.
- LLM Request Routers - Route incoming requests to different language models based on performance, cost, or capability requirements to optimize agent execution.
- Autonomous Agent Frameworks - A modular platform for building and orchestrating AI agents that reason, plan, and execute tasks within isolated development environments.
- Automated Code Reviewers - Automating the analysis of pull requests to provide actionable feedback on code quality, security, and adherence to project-specific architectural standards.
- Multimodal Vision Inputs - Pass images alongside text in conversation messages to enable multimodal analysis by vision-capable language models, provided active models support image inputs.
- Autonomous Software Engineering - Building and deploying AI agents that can autonomously navigate codebases, implement features, and resolve issues through iterative reasoning and tool execution.
- Web Research Agents - Automate web interactions by combining browser control, file editing, and command-line tools to navigate pages, extract content, and summarize information from web sources.
- Conversation State Management - Create and manage conversation states, including persistence, agent configuration, and execution status, while providing locking mechanisms for thread-safe access to conversation resources.
- Conversation Context Condensation - Condense long conversation histories by summarizing older messages while preserving recent context and critical information to reduce token usage and maintain agent performance.
- Conversation Event Visualizers - Visualize conversation events by implementing custom handlers that receive state updates and event logs, supporting sub-agent delegation through nested visualizer creation.
- Conversation Forking - Create deep-copied branches of existing conversations to experiment with different agents, debug specific execution states, or perform comparative testing without altering original history.
- Conversation History Condensation - Manage conversation history by condensing events, removing forgotten items, and inserting summary events to maintain context while optimizing event streams.
- Conversation History Optimizers - Manage conversation views by filtering out forgotten events and inserting summary events to ensure LLMs receive coherent and optimized histories.
- Conversation Session Initializers - Initialize conversations by automatically selecting between local or remote execution modes based on provided workspace types, allowing seamless switching between deployment environments.
- Conversation State Persistence - Persist conversation state to disk and restore it across sessions by providing unique identifiers and directory paths during conversation initialization.
- Conversation Summarization Strategies - Implement rolling window strategies that preserve initial system prompts and recent events while summarizing middle sections of conversations.
- Conversation Summarization - Summarize conversation history using LLMs to compress long event logs into concise summaries, preserving critical context while staying within token limits.
- Condensation Triggers - Trigger condensation automatically when event thresholds are exceeded or manually in response to context window errors to maintain manageable conversation history.
- Conversation Visualizers - Configure how conversation events are displayed by passing visualizer classes or instances to conversation objects, or disable visualization entirely for specific workflows.
- Custom Visualizers - Create custom visualizers by subclassing base visualizer classes and implementing event-handling logic to define unique output formats, state tracking, or external system integrations.
- Event Attribution Management - Distinguish between event origin and model role representation to ensure accurate attribution of messages while maintaining correct formatting for model input.
- Event Transformation Strategies - Convert event streams into model-compatible message formats, providing base interfaces for events to define their own transformation logic for model consumption.
- Model Provider Adapters - Integrate with many different language model providers using unified interfaces that handle provider-specific request formatting, response parsing, and error normalization.
- LLM Response Streaming - Stream LLM responses progressively by enabling streaming on models, defining token callback functions, and registering callbacks with conversation objects.
- Model Instance Registries - Manage multiple model instances centrally using unique usage identifiers to track costs and performance metrics independently across different parts of applications.
- LLM Provider Adapters - Interact with various language models through unified interfaces that handle configuration, API authentication, retry logic, and tool calling capabilities.
- Modular Capability Compositions - Configures agent behavior by composing interchangeable components like model providers, security policies, and skill sets rather than using inheritance.
- Event-Driven State Management - Manages agent interactions by serializing state changes into immutable, sequentially indexed event logs for persistence and time-travel debugging.
- Event-Driven Agent Architectures - A state-management system that tracks conversation history, tool interactions, and agent reasoning through structured event streams for persistence and observability.
- Model Context Protocol Implementations - Execute MCP tools by converting agent-generated actions into protocol-compliant calls, parsing resulting data, and wrapping responses or errors into standardized observations.
- Model Context Protocol Servers - Manage MCP server connections using background event loops that allow synchronous agent code to invoke asynchronous tools without manual await statements.
- Tool Discovery Systems - Discover and register external tools by spawning MCP servers, parsing JSON schemas, and dynamically generating Pydantic models for seamless integration into tool registries.
- Browser Automation Tools - Configure specific browser capabilities by manually registering individual tools and creating custom executors to control which web interaction features are exposed to agents.
- Dynamic Command Execution - Execute shell commands within skill content to dynamically inject repository state, environment information, or computed values into agent contexts at runtime.
- System Command Execution Tools - Execute bash commands within sandboxed environments, whether running locally on host systems or remotely on agent servers, with timeout and output management.
- Patch-Based Editing Configurations - Configure agents to use patch-based file editing by swapping standard file editor tools for specialized patch application tools.
- AI Tool Integration Layers - Connecting autonomous agents to external APIs, databases, and development tools through standardized protocols with configurable safety and approval policies.
- Agent APIs - Expose agent capabilities through REST API and WebSocket interfaces to support authenticated requests, real-time event streaming, and continuous health monitoring.
- Sandbox Provisioning Services - Connect to remote runtime API services to automatically provision and manage sandboxed container environments for agent execution without manual infrastructure handling.
- Action Approval Policies - Require user approval before executing agent actions by setting policies that trigger confirmation prompts for all, none, or only risky operations.
- Action Risk Classifications - Categorize agent actions into risk levels based on potential impact, such as read-only operations, file modifications, or dangerous system commands.
- Action Security Evaluations - Evaluate the risk level of agent actions before execution using LLMs to flag potentially dangerous operations and assign risk scores.
- Confirmation Policies - Define confirmation policies to determine whether specific agent actions require user approval based on evaluated security risk levels.
- Secret Management Systems - Update conversation secrets using static strings or dynamic callable functions to integrate with external credential management systems and secure secret stores.
- Security Policy Configurations - Define custom risk assessment guidelines using templates to tailor how agents evaluate the safety and security of proposed actions.
- Security Risk Analysis - Analyze agent actions and conversation history for security risks by integrating with external safety monitoring services or internal LLM-based evaluation logic.
- Security Risk Categorization - Categorize security risks into standardized levels to enable consistent comparison, ordering, and visual representation of potential threats during agent execution.
- Confirmation Policies - Define confirmation policies to control when user approval is required for agent actions, ranging from always confirming to risk-based thresholds or fully autonomous execution.
- Agent Context Management - Configure agent behavior by injecting project-specific skills, domain knowledge, and custom prompt templates into language model system and user message contexts.
- Task Success Predictors - Predict task success in real-time using experimental critics that evaluate agent performance and provide probability scores for completed tasks.
- Agent Task Initiations - Start agent sessions with specific tasks provided as direct command-line strings or by referencing local text files containing detailed instructions.
- Iterative Refinement Workflows - Implement iterative refinement workflows where refactoring agents perform tasks and critique agents provide feedback until quality thresholds are reached.
- Reasoning Process Monitors - Intercept and display internal reasoning steps from models by registering callbacks that detect and process thinking blocks within conversation events.
- User Preference Management - Consult user models to disambiguate vague requests and process conversation history into persistent preference profiles stored locally for long-term personalization.
- Model Configuration Management - Manage persistent model configurations by saving, loading, and listing model parameters and credentials in local directories for reuse across scripts and environments.
- Model Request Orchestrators - Execute chat completions or responses API calls with automatic validation, exponential backoff retry logic, and telemetry collection for token usage, latency, and cost tracking.
- Headless Task Runners - Run automated tasks without an interactive terminal interface to support continuous integration pipelines, batch processing, and integration with external scripting environments.
- Agent Delegation Systems - Register and configure delegation tools to enable agents to spawn sub-agents and assign tasks, with optional limits on concurrent sub-agents.
- Cloud Sandbox Provisioning - Provision and manage sandboxed environments in the cloud by connecting to APIs, handling lifecycle tasks like creation, status polling, and cleanup automatically.
- User Intent Modeling - Implement agents that interpret vague user instructions, infer intent, and build long-term preference profiles by modeling user mental states across conversations.
- Containerized Workspace Managers - Manage the lifecycle of containerized workspaces using context managers that handle image pulling, container startup, command execution, and automatic cleanup.
- Agent Action Approval Policies - Configure agent action approval levels, including manual confirmation, automatic approval, or automated security analysis to validate operations before execution.
- CLI Task Managers - Manage agentic tasks, configure runtime settings, and interact with server environments using global flags and subcommands within the command-line interface.
- Language Model Configurations - Configure language model instances programmatically, via environment variables, or by loading serialized JSON files to manage model settings, API keys, and operational parameters.
- CLI Installation Utilities - Install command-line interface tools using package managers, executable scripts, or containerized environments to manage and run applications on local or remote systems.
- Conversation History Management - Resume previous interaction sessions by listing recent history or specifying unique conversation identifiers to continue work exactly where it was interrupted.
- Automated Task Resolvers - Scan source code for pending tasks, generate necessary code changes to resolve them, and open new pull requests with appropriate reviewers assigned.
- Remote Editor Interfaces - Expose web-based code editors within containerized workspaces by enabling extra ports and generating authenticated URLs to access interfaces directly from hosts.
- Workspace Command Execution - Execute shell commands within workspace environments, returning standard output, error streams, and exit codes regardless of whether execution is local or remote.
- Containerized Development Environments - Provisioning isolated, secure, and reproducible workspaces for running AI agents and executing system commands without risking host machine integrity.
- Containerized Runtimes - Executes agent tasks within isolated, ephemeral container environments to ensure system security and consistent runtime dependencies.
- Containerized Execution Runtimes - A secure infrastructure layer that manages isolated workspaces, persistent terminal sessions, and resource-constrained sandboxes for automated code execution.
- Automated Pull Request Reviewers - Analyze incoming code changes for quality, security, and best practices, posting actionable feedback directly as comments to streamline the collaborative development process.
- LLM Fallback Strategies - Configure sequences of fallback language models to automatically retry requests when primary models fail, ensuring continued operation through chains of secondary providers.
- Model Registries - Register and retrieve language model configurations by unique usage identifiers to manage multiple model instances within single application environments.
- Model Switching Strategies - Switch active LLM models during running conversations to leverage different model capabilities while maintaining conversation history and aggregating usage metrics across entire sessions.
- Language Model Metrics - Track token usage, latency, and costs for every model request, with support for overriding default pricing for custom models or specific billing agreements.
- Agent Observability Configurations - Configure observability by setting environment variables to automatically instrument agent steps, tool executions, and LLM calls without modifying application code.
- Agent Plugin Definitions - Define custom agent capabilities, event hooks, and MCP configurations within structured directories to extend agent functionality through modular, reusable plugin packages.
- Conversation State Synchronization - Synchronize conversation state across remote clients via websocket updates by serializing state changes into events that clients can consume.
- Remote Conversation Management - Establish remote conversations with containerized workspaces to execute commands, manage files, and receive real-time event updates through persistent, bidirectional WebSocket connections.
- Remote Workspace Command Execution - Execute shell commands and manage files on remote servers by connecting workspace instances to specific host URLs for secure interaction.
- Inline Risk Analysis - Analyze action security risks inline by requiring LLMs to include risk parameters in tool calls, avoiding additional latency or separate analysis steps.
- Secret Management - Manage sensitive information by scanning text for secret keys, injecting them as environment variables for command execution, and masking secret values in output logs.
- Multi-tenant Security - Secure multi-tenant environments using container-level isolation, strict input validation, network restrictions, and mandatory authentication to protect against unauthorized access.
- Sandbox Authentication Strategies - Authenticate requests to hosted runtime services by providing required API keys, ensuring secure access to remote sandbox infrastructure.
- Tool Registries - Registers and dynamically discovers agent capabilities through a standardized interface that abstracts local execution and remote protocol-based tools.
- Container Workspace Configurations - Configure isolated workspaces using pre-built images, custom base images, or existing files to run agent services in secure container environments.
- Container Orchestration - Deploy agent servers across diverse environments, ranging from local development machines to horizontally scaled clusters managed by container orchestration platforms.
- Subprocess Lifecycle Managers - Manage the lifecycle of agent server subprocesses by automatically starting, stopping, and performing health checks to ensure readiness before executing tasks.
- Server Lifecycle Managers - Manage server configurations by listing, retrieving details, removing, enabling, or disabling servers through the command-line interface to maintain integration environments.
- Stuck Agent Detection - Monitor agent event history to identify repetitive action-observation patterns and flag conversations as stuck when agents perform identical tasks multiple times.
- Reasoning Effort Configurations - Configure model reasoning effort levels to control the depth of internal thought processes and capture resulting reasoning traces during conversation events.
- OAuth Device Flows - Authenticate with cloud services using OAuth device flows, supporting custom server URLs via command-line flags or environment variables for enterprise deployments.
- Subscription-Based Authentication - Authenticate with model providers using OAuth via browser-based login flows that cache credentials locally for subsequent automated token refreshing and usage.
- Rootless Container Runtimes - Execute containers without requiring root privileges by leveraging user namespace isolation, ensuring compatibility with strict security policies and environments.
- IDE Configuration Managers - Integrate agent servers with development environments by defining command, argument, and environment variable configurations in JSON files for AI-assisted coding.
- Resource Usage Policies - Enforce resource limits and automated cleanup policies on containerized agents to prevent exhaustion, control operational costs, and maintain consistent server stability.
- LLM Usage Metrics - Track token usage, costs, and response latencies for individual LLM instances by accessing metrics directly from LLM objects after conversation execution.
- Usage Metric Monitors - Track and aggregate token usage and cost metrics across primary and fallback models to monitor total expenditure and individual model performance after completing conversations.
- MCP Server Configurations - Configure and filter Model Context Protocol servers to provide agents with external tools, using regex patterns to restrict which specific tools are accessible.
- Agent Configuration Standards - Define repository-wide guidelines and coding standards that remain active in agent contexts by placing configuration files in project roots or designated directories.
- Workspace Configurations - Configure workspace execution environments to run either directly on host systems for performance or within isolated containers for enhanced security.
- Keyword-Based Skill Triggers - Trigger domain-specific knowledge or task-oriented skills automatically by matching keywords or patterns found within user messages during interaction lifecycles.
- Tool Metadata Annotations - Provide metadata hints about tool behavior, such as idempotency or read-only status, to guide agent execution based on protocol specifications.
- Visual Browser Monitoring - Enable browser automation tools for agents within containerized environments, providing visual monitoring of web interactions through VNC interfaces.
- Browser Session Recorders - Capture browser interactions and DOM mutations into JSON files by starting and stopping recording sessions during automated browser tasks.
- Real-time Event Streams - Handle real-time communication with agent servers over WebSockets by using remote conversation instances to stream events and manage ongoing agent interactions.
- WebSocket Event Streams - Facilitates real-time communication between client interfaces and backend agent services using bidirectional WebSocket connections for event synchronization.
- Structured Event Streams - Stream agent events as structured JSON lines to enable programmatic parsing, logging, and integration with external monitoring or automation tools.
- Sandbox Environment Images - Specify pre-built agent server images to be pulled and executed by remote runtime APIs within sandboxed environments.
- Workspace File Transfer Utilities - Transfer files between local machines and workspace environments, supporting both direct filesystem copies for local workspaces and network-based transfers for remote server connections.
- Review Guidelines - Define project-specific standards for code quality and communication to ensure automated feedback aligns with team architectural requirements and internal policies.
- Typed Exception Hierarchies - Handle provider errors using unified sets of typed exceptions that abstract away provider-specific status codes, messages, and error classes for consistent application logic.
- Model Credential Managers - Store and retrieve OAuth credentials for various model providers in local directories to manage authentication tokens and expiry information securely.
- Persistence Automation - Automatically save conversation state changes to disk by detecting modifications to public fields and separating data into atomic base state updates and incremental event logs.
- Persistent Session Managers - Run shell commands within persistent terminal sessions to maintain state across interactions while monitoring output and managing execution with configurable timeouts.
- Remote Sandbox Environments - Execute shell commands within remote sandboxes to verify connectivity and ensure environments are correctly configured before starting agent tasks.
- Conversation Cost Aggregators - Retrieve aggregated cost and performance statistics for entire conversations, including usage data from all language models involved in interactions.
- Workspace File Operations - Manage file transfers between hosts and workspaces, supporting both direct local filesystem copies and remote HTTP-based upload and download operations.