30 open-source projects similar to bytebot-ai/bytebot, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Bytebot alternative.
Agent-S is a multimodal AI agent and LLM desktop automation framework designed to control operating systems through graphical user interface interactions. It functions as a computer use interface, utilizing vision-language grounding to translate natural language goals into precise screen coordinates and system actions. The project differentiates itself by combining structured accessibility tree inspection with vision-based element localization. It manages cross-application workflows by mapping conceptual descriptions to physical pixels and simulating low-level keyboard and mouse events to mov
PraisonAI is an autonomous AI agent platform that coordinates multiple LLM-powered agents for research, planning, and execution of complex workflows. It functions as a multi-agent orchestration framework, a workflow builder, and a Model Context Protocol server, while also providing retrieval-augmented generation through vector knowledge bases. Agents can interact via CLI, web, or standardized protocols with sandboxed code execution. The platform distinguishes itself with a rich set of agent communication protocols, including A2A, REST, WebSocket, voice and telephony integration, and MCP, allo
This project is a Python framework for building autonomous, event-driven agent systems. It provides a unified runtime for orchestrating multi-agent workflows, managing persistent conversation state, and executing code within secure, isolated sandbox environments. The framework is designed to handle complex task delegation, allowing agents to invoke other agents as tools while maintaining context across multi-turn interactions. The framework distinguishes itself through its deep integration with the Model Context Protocol, enabling agents to connect to external data sources and remote services
Letta is a framework for building, deploying, and managing autonomous AI agents that maintain persistent state across long-term interactions. It provides a comprehensive suite of primitives for defining agents with configurable personas, modular memory blocks, and tool-use capabilities, enabling them to retain user preferences and conversation history over extended sessions. The platform distinguishes itself through its advanced memory management and orchestration capabilities. It allows agents to autonomously update their own memory, perform retrieval-augmented generation, and coordinate com
This project is a comprehensive framework for building and managing autonomous agent systems. It provides a unified architecture for orchestrating multi-agent societies, where specialized agents collaborate through roleplay to decompose and solve complex tasks. The system integrates language models with external environments, enabling agents to perform real-world actions through a standardized tool-calling abstraction layer. The framework distinguishes itself through its focus on iterative reasoning and data reliability. It employs automated feedback loops to refine agent outputs and self-eva
Hermes-agent is an autonomous AI agent framework and runtime designed to execute complex tasks and synthesize new skills from execution traces. It includes a provider-agnostic gateway for routing requests across multiple model backends and a serverless runtime that suspends idle agent instances and resumes them on demand across containers and virtual machines. The project provides a desktop automation toolset that controls native GUI workflows on Linux by querying accessibility APIs and injecting input events. It further distinguishes itself with the ability to generate procedural skills from
PyAutoGUI is a Python GUI automation library and desktop automation framework. It provides a set of tools for programmatically controlling the mouse and keyboard to automate user interface interactions across different operating systems. The project functions as a cross-platform input simulator and computer vision screen scanner. It enables the simulation of keystrokes and cursor movements to perform repetitive tasks and utilizes screen analysis to locate specific images or pixel colors on the display. Its capability surface includes mouse and keyboard input simulation, screen image capture,
This is a Model Context Protocol server that exposes Windows desktop automation and system administration functions to large language models. It provides programmatic control of mouse, keyboard, windows, and UI elements on Windows through simulated user input, while also enabling LLMs to manage the Windows registry, processes, files, and execute PowerShell commands through a remote interface. The server supports multiple transport protocols including stdio, SSE, and streamable HTTP, allowing flexible integration with different language model clients. It implements OAuth 2.0 with PKCE for secu
Robotjs is a native Node.js automation library and desktop input simulator. It uses C++ bindings to provide low-level access to operating system functions, allowing for the programmatic control of the mouse and keyboard and the analysis of screen pixels. The library functions as a toolkit for automating user interfaces and desktop workflows, including those within Electron applications. It enables the simulation of key presses and mouse movements to automate interactions with desktop software and perform automated data entry. Its capabilities extend to screen pixel analysis, where it capture
Robotgo is a cross-platform desktop automation framework for the Go programming language. It provides a comprehensive toolkit for programmatically interacting with graphical user interfaces, enabling developers to simulate human input, manage application windows, and monitor system-wide hardware events. The library distinguishes itself through its low-level system integration, utilizing a foreign function interface to interact directly with native operating system APIs. It employs pixel-buffer memory mapping and real-time screen capture to perform visual element identification, allowing for i
Hammerspoon is a programmable automation engine for macOS that enables deep system-level control through a Lua scripting environment. By bridging high-level scripts with native Objective-C APIs, it allows users to interact with the operating system's accessibility tree, intercept hardware input streams, and manage the lifecycle of running applications. The project distinguishes itself through an event-driven architecture that registers asynchronous hooks for system notifications and hardware events. This allows for real-time automation, such as remapping keyboard and mouse inputs, managing wi
Open-computer-use is a framework designed to connect vision-capable language models to isolated cloud-based desktop environments. It functions as an agentic interface that enables autonomous systems to interact with graphical user interfaces by simulating mouse movements, keyboard keystrokes, and shell commands. By bridging language models with remote workspaces, the platform facilitates the execution of complex, long-running tasks within secure, sandboxed environments. The platform distinguishes itself through its ability to orchestrate thousands of concurrent, isolated instances, making it
UFO is a multi-device task orchestrator and LLM agent orchestration framework designed to decompose natural language requests into executable task graphs. It functions as a cross-platform UI automation tool capable of performing interactions on Windows and mobile devices while routing tasks to distributed agents based on their hardware and software capabilities. The system is distinguished by its RAG-enhanced agent architecture, which integrates external documentation and previous execution traces to improve decision-making. It employs a hybrid UI detection approach that combines computer vis
mcp-agent is a framework for building AI agents that integrate with Model Context Protocol servers to execute tools and access data. It functions as a multi-agent orchestrator and protocol-compliant server, enabling the creation of agents that can discover and invoke tools from connected external servers. The project distinguishes itself through a durable workflow engine that supports long-running tasks capable of pausing, resuming, and surviving restarts. It implements complex orchestration patterns, including iterative evaluator-optimizer loops, hierarchical workflow nesting, and specialist
Claude Code is a command-line interface and multi-agent orchestration framework designed for autonomous software engineering. It enables AI agents to perform codebase modifications, debugging, and Git workflow management while coordinating multiple specialized agents to decompose and execute complex engineering tasks in parallel. The system distinguishes itself through a high degree of isolation and safety, utilizing Git worktrees to create independent working directories for concurrent agents and implementing a tiered permission system that combines user rules, project policies, and OS-level
Tambo is an orchestration platform and framework designed for building generative user interfaces and conversational AI agents. It provides the infrastructure to manage persistent chat threads, execute multi-step reasoning workflows, and integrate large language models with external tools and services. By combining an agent orchestration layer with a component-based library, the project enables developers to create interactive interfaces where AI models dynamically render and update UI elements in real-time. The framework distinguishes itself through its generative UI capabilities, which allo
This project provides a translation layer and set of adapters designed to bridge AI agents with the Model Context Protocol. It functions as an integration layer that allows agents to operate as protocol-compliant servers and enables the conversion of protocol-based tools into formats compatible with agent frameworks and logic graphs. The adapters facilitate tool interoperability by wrapping external protocol tools for use within agent workflows and exposing internal agent capabilities to any client implementing the Model Context Protocol. This creates a communication bridge that supports inte
Mastra is an orchestration framework designed for building, deploying, and managing autonomous AI agents and multi-agent systems. It provides a comprehensive suite of primitives for creating resilient AI applications, including durable workflow orchestration, event-driven agent loops, and semantic memory management. By integrating these core components, the platform enables developers to build complex, multi-step processes that can reason about goals and execute tasks without manual intervention. The framework distinguishes itself through its focus on observability and secure, isolated execut
LangChain.js is a framework for building, executing, and monitoring stateful agentic applications. It provides an orchestration engine that models workflows as directed graphs, allowing developers to connect language models, data sources, and external tools into modular, multi-step processes. The platform distinguishes itself through its focus on stateful execution and human-in-the-loop control. It manages agent lifecycles by persisting execution state across threads, enabling fault tolerance and the ability to pause workflows at designated breakpoints for manual review or modification. This
Cua is an agent benchmarking and desktop automation platform designed to evaluate autonomous agents and execute repetitive tasks within isolated, virtualized environments. It provides a framework for provisioning consistent workspaces and measuring agent performance against standardized desktop operations. The platform distinguishes itself by integrating virtual machine orchestration with headless interaction capabilities. By leveraging hypervisor-based virtualization, it runs operating systems at near-native speeds, while its automation layer injects commands directly into application proces
OSWorld is an evaluation framework and multimodal agent benchmark designed to test the ability of large language models to complete complex tasks within virtualized operating system environments. It provides a virtualized desktop sandbox and a virtual machine orchestrator to deploy, snapshot, and reset cloud-based desktops, ensuring reproducible test states for AI agent interactions. The system distinguishes itself by providing an OS-level action space that translates model decisions into mouse clicks, keyboard inputs, and system commands. It employs a standardized interface to integrate vari
Handy is a local speech-to-text automation tool designed to convert spoken audio into text and inject it directly into active desktop applications. By running machine learning models entirely on the host hardware, it provides a private, offline-first environment for dictation and command execution. The system functions as a background service that manages microphone input, transcription state, and text output, enabling hands-free typing across various software environments. The project distinguishes itself through a modular pipeline that integrates local language models for post-transcription
This project provides a containerized environment for running a full macOS desktop operating system. It utilizes a hardware-accelerated virtualization engine to execute the guest environment, allowing for the deployment and management of virtual machines through standard container orchestration tools. The platform distinguishes itself by enabling direct hardware passthrough, which maps physical host disks, partitions, and USB controllers directly into the virtual machine for native driver access. It also supports advanced network integration, allowing the guest system to obtain its own unique
Langchain-Chatchat is a system for building retrieval-augmented generation applications and autonomous AI agents. It integrates a knowledge base management system and an agent framework to enable language models to interact with private documents and execute multi-step tasks through external tools. The platform supports local deployment of language models on private infrastructure to operate without an internet connection. It includes a multimodal AI platform that combines vision models for image analysis with text-to-image generation capabilities. The system provides a web-based conversatio
DeepSeek-TUI is an AI coding agent orchestrator and framework designed to automate complex programming tasks. It functions as a harness for coordinating AI models that can read source code, edit files, and execute shell commands through automated agent workflows. The system is distinguished by its multi-agent coordination capabilities, which allow for the spawning of parallel sub-agents to handle concurrent investigations or implementation slices. It employs autonomous goal-seeking loops to pursue objectives across multiple turns and utilizes a tool integration gateway to connect models to ex
Casibase is an open-source platform that orchestrates multi-turn conversations with large language models and manages retrieval-augmented knowledge bases from a single interface. It provides a unified system for connecting to over 30 AI model providers, ingesting documents into vector embeddings for semantic search, and running autonomous agent loops that can drive a browser, search the web, execute commands, and integrate with external tools. The platform distinguishes itself by combining AI conversation management with infrastructure and application orchestration capabilities. It includes a
This project is an AI-powered IDE extension and LLM coding assistant that provides a conversational interface for generating, refactoring, and debugging code. It functions as an AI agent framework and a Model Context Protocol client, connecting AI models to external data sources and tools to automate complex development tasks. The system is distinguished by its use of autonomous AI agents capable of multi-step task execution, including the ability to read files, modify code, and run terminal commands iteratively. It supports recursive agent orchestration through subagent delegation and employ
XAgent is an autonomous agent system that decomposes complex goals into sequential subtasks for execution via a planner and actor model. It functions as a collaboration framework that integrates human-in-the-loop workflows, allowing users to provide real-time guidance and missing information during the automation process. The system features a containerized tool sandbox to isolate the execution of shells and browsers, ensuring system safety and consistency. It includes a state-based execution recorder that captures snapshots of agent runs to enable the exact reproduction of specific task sequ
Youtu Agent is an open-source framework for building, running, and evaluating autonomous agents powered by large language models. It provides the core infrastructure for creating agents that follow reasoning loops, use toolkits, and coordinate with other agents to solve complex tasks, all managed through YAML-driven configuration files. The framework distinguishes itself through its support for multi-agent orchestration, where a planner agent decomposes tasks and coordinates specialized worker agents, and through its integration with the Model Context Protocol for connecting to external toolk
OmniParser is a multimodal interaction engine designed to function as a desktop automation agent. It interprets visual screen information to execute complex, multi-step tasks across operating system environments by bridging visual interface perception with language models. Through a continuous cycle of observation and command execution, the system grounds high-level natural language instructions into precise, coordinate-based actions. The project distinguishes itself by utilizing vision-based parsing to interact with software interfaces without requiring access to underlying application progr