30 open-source projects similar to x-plug/mobileagent, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best MobileAgent alternative.
UI-TARS is an LLM GUI automation framework and multimodal action grounding system. It functions as a GUI agent orchestrator and cross-platform device controller that uses large language models to interpret graphical interfaces and execute actions across desktop and mobile operating systems. The system translates model-generated coordinates into precise screen positions to interact with visual user interface elements. It employs a multimodal approach to interpret screen layouts and decomposes complex goals into multi-step trajectories through reasoning and error correction. The project provid
Auto-GPT is an autonomous agent framework that uses large language models to decompose complex goals and execute multi-step tasks without human intervention. It functions as a workflow automation tool that chains language model tasks and manages memory to achieve specific objectives. The project features a visual agent designer that allows users to define behaviors and goals by connecting functional blocks through a graphical interface. It employs a vector database memory system to recall information across different sessions and a sliding-window buffer for immediate short-term context. The
This is a framework for building autonomous agents that use large language models to plan, execute, and refine their own tasks. It functions as an autonomous task orchestrator and agent framework, utilizing a function registry to manage the code-based tools and plugins the agents use to achieve complex goals. The system is distinguished by its ability to perform autonomous code generation, where the agent analyzes requirements to write new reusable functions on the fly. It employs a recursive loop-based planning model to continuously update its goal list and refine its performance based on ex
Agentscope is a comprehensive toolkit for developing and orchestrating autonomous multi-agent systems. It provides a unified framework for building agents that can reason, execute tools, and manage memory, enabling the creation of complex, collaborative workflows where multiple specialized agents interact to solve multi-step objectives. The platform distinguishes itself through a robust orchestration engine that supports both sequential and concurrent agent pipelines. It utilizes a centralized event bus for real-time telemetry, allowing developers to track agent reasoning, tool usage, and sys
Qwen-Agent is a development framework for building autonomous software applications that leverage large language models to plan, reason, and execute complex tasks. It functions as an orchestration engine that enables models to interact with external APIs, manage persistent memory, and maintain context across multi-step workflows. The framework distinguishes itself through a multi-agent collaboration platform that allows independent agent instances to exchange structured messages and delegate sub-tasks to one another. By utilizing iterative reasoning loops and dynamic prompt injection, the sys
Agent-S is a multimodal AI agent and LLM desktop automation framework designed to control operating systems through graphical user interface interactions. It functions as a computer use interface, utilizing vision-language grounding to translate natural language goals into precise screen coordinates and system actions. The project differentiates itself by combining structured accessibility tree inspection with vision-based element localization. It manages cross-application workflows by mapping conceptual descriptions to physical pixels and simulating low-level keyboard and mouse events to mov
CogVLM is a multimodal large language model designed for visual reasoning and multi-turn dialogue. It functions as a visual grounding model and a quantized vision model, combining text and image processing to perform complex understanding and maintain context across visual inputs. The project includes capabilities as a GUI automation agent, allowing it to analyze application screenshots, plan operational steps, and return precise screen coordinates for interface interaction. It further supports visual grounding by generating bounding box coordinates to map text descriptions to specific spatia
This project is a comprehensive framework for building and managing autonomous agent systems. It provides a unified architecture for orchestrating multi-agent societies, where specialized agents collaborate through roleplay to decompose and solve complex tasks. The system integrates language models with external environments, enabling agents to perform real-world actions through a standardized tool-calling abstraction layer. The framework distinguishes itself through its focus on iterative reasoning and data reliability. It employs automated feedback loops to refine agent outputs and self-eva
gh-aw is a GitHub automation platform and orchestration framework that uses an agentic workflow engine to automate repository management and code reviews. It translates natural language markdown and configuration files into secure, automated task sequences driven by large language models. The system integrates a Model Context Protocol gateway to route calls between AI agents and external tools. It distinguishes itself through a comprehensive security guardrail system that provides sandboxed execution for protocol servers, network egress controls via domain allowlists, and human-in-the-loop ap
AgentScope is a multi-agent framework and orchestration platform designed for building and coordinating teams of language model agents. It provides a system for managing multiple agents that collaborate to solve complex tasks through structured communication and state sharing. The project distinguishes itself with a focus on production-ready deployment and security, featuring a multi-tenant hosting service that ensures session isolation between different users. It includes a sandboxed tool execution environment and fine-grained permission controls to manage how agents access system resources
UFO is a multi-device task orchestrator and LLM agent orchestration framework designed to decompose natural language requests into executable task graphs. It functions as a cross-platform UI automation tool capable of performing interactions on Windows and mobile devices while routing tasks to distributed agents based on their hardware and software capabilities. The system is distinguished by its RAG-enhanced agent architecture, which integrates external documentation and previous execution traces to improve decision-making. It employs a hybrid UI detection approach that combines computer vis
This project is an LLM financial agent framework and multi-agent orchestration system designed to execute complex investment banking and wealth management workflows. It provides a financial data integration layer using a standardized context protocol to connect autonomous agents to real-time market data and third-party feeds. The system utilizes a multi-agent architecture that coordinates specialized worker agents through a steering event bus to handle task delegation and secure handoffs. It includes an enterprise AI deployment manifest for provisioning agent personas, prompts, and skill sets
Auto-GPT is an autonomous agent framework designed for creating and deploying AI agents that use large language models to plan and execute complex goals independently. The system provides a comprehensive environment for managing the entire agent lifecycle, from initial design and testing to live production deployment. The project features a low-code workflow designer that allows users to define agent behaviors by connecting functional blocks in a visual interface. It includes an agent marketplace for discovering and deploying pre-configured agent templates and a standardized evaluation tool t
Claude-flow is an autonomous agent coordination platform and orchestration framework designed for building complex, multi-step workflows powered by large language models. It functions as a TypeScript-based engine that decomposes high-level objectives into executable action sequences, enabling the creation of collaborative agent teams that operate with minimal manual oversight. The platform distinguishes itself through its ability to federate autonomous agents across network boundaries using secure communication channels and identity verification. It integrates a goal-oriented planning engine
ml-ferret is a multimodal large language model framework and visual reasoning engine designed to reason about images and user interfaces. It functions as a UI grounding model and referring expression comprehension tool that maps natural language descriptions to precise pixel coordinates. The system focuses on high-resolution image analysis to identify and locate specific interface components. It employs multi-resolution image processing and region-aware visual encoding to preserve detail across different aspect ratios, enabling the model to analyze spatial relationships and functional layouts
Qwen2.5 is a suite of large language model foundation models designed for natural language generation, code production, and complex mathematical reasoning. The project encompasses a multilingual language model capable of processing dozens of languages and a specialized code generation model for technical problem solving and debugging. The framework is distinguished by its long context capabilities, enabling the analysis of massive inputs ranging from 256K up to 1 million tokens. It further functions as an agentic framework, utilizing standardized templates and parsers to execute autonomous wo
gptme is an autonomous AI agent server and framework designed for local system automation, software development, and code execution. It operates as a local execution engine that enables language models to run shell commands, modify local files, and interact with the operating system. The project functions as a Model Context Protocol client, integrating with external servers to expand agent capabilities with standardized tools and data sources. It features a provider-agnostic routing system to orchestrate tasks across multiple proprietary cloud APIs and local AI backends. The system includes
Kiro is an AI-powered development tool and multi-agent workflow orchestrator. It functions as a context-aware code generator and coding assistant that transforms natural language requirements into structured implementation plans and production-grade code. The system distinguishes itself through multi-agent task decomposition, where complex requirements are broken into sequenced tasks and assigned to specialized agents. It features multi-model orchestration to select specific language models based on reasoning complexity, cost, and latency, and includes a headless command-line interface for id
Claude Code is a command-line interface and multi-agent orchestration framework designed for autonomous software engineering. It enables AI agents to perform codebase modifications, debugging, and Git workflow management while coordinating multiple specialized agents to decompose and execute complex engineering tasks in parallel. The system distinguishes itself through a high degree of isolation and safety, utilizing Git worktrees to create independent working directories for concurrent agents and implementing a tiered permission system that combines user rules, project policies, and OS-level
Refact is an autonomous AI software engineering system and code assistant. It functions as an agent orchestrator capable of planning, executing, and managing multi-step development workflows to complete complex software tasks independently. The system distinguishes itself through agentic state management, using isolated worktrees and versioned checkpoints to allow autonomous agents to experiment with code changes and roll back to stable states if tasks fail. It further extends its capabilities via the Model Context Protocol, connecting the AI engine to external databases, version control syst
Droidrun is a mobile device automation framework that uses large language models to translate natural language commands into executable actions on mobile operating systems. It functions as an agent orchestrator and UI automation engine, providing a reasoning engine that decomposes complex mobile tasks into smaller, manageable steps. The system distinguishes itself through a hierarchical action translation process and the ability to analyze accessibility trees and screenshots to determine the visual layout and current status of mobile applications. It supports execution across both physical ha
This project is an autonomous software engineering platform and orchestration framework designed to manage specialized artificial intelligence agents. It provides a suite of tools for coordinating autonomous entities to execute complex development tasks, ranging from architectural planning and code reviews to performance optimization. The platform distinguishes itself through its multi-agent orchestration layer, which dynamically assigns roles based on an analysis of a project's technology stack. By utilizing a modular agent registry, the system scales capabilities across different software m
CogVLM is a multimodal large language model designed to integrate visual and textual data for reasoning about images and generating natural language. It functions as a visual question answering system that analyzes image content to provide detailed descriptions or answer specific questions. The project includes a visual grounding model capable of mapping text descriptions to precise bounding box coordinates within an image. It also features a vision-based automation agent that analyzes screen captures to generate execution plans and interaction coordinates for software interfaces. The system
This project is a computer control framework that uses multimodal vision models to simulate mouse and keyboard inputs for automating desktop tasks. It functions as an autonomous agent and vision-based orchestrator that interprets screen visuals to interact with user interfaces. The system employs vision language models and object detection to locate and click interface elements. It utilizes visual grounding to overlay numerical markers on UI components and uses optical character recognition to map on-screen text to precise pixel coordinates. The framework supports voice-controlled computing
AIOS is an LLM agent operating system and orchestration kernel designed to manage memory, resource scheduling, and tool execution for multiple autonomous AI agents. It serves as a comprehensive framework for developing and deploying agents, featuring a dedicated resource manager that coordinates model backends, GPU memory, and isolated kernel instances. The system distinguishes itself through a semantic memory engine that uses vector search and autonomous clustering for long-term knowledge management, and a semantic file system that allows users to control computer files and system operations
Memary is a memory-augmented agent framework that stores and retrieves contextual information from a knowledge graph to personalize responses and maintain long-term memory across interactions. It automatically captures all agent interactions and stores them as structured memories without requiring explicit instrumentation, then injects top-ranked user entities and themes into the active context window to tailor agent responses dynamically. The framework distinguishes itself through a multi-retriever memory search that combines COLBERT reranking with recursive graph queries across databases, e
GLM-4.5 is a multimodal large language model and advanced reasoning system. It functions as an AI coding assistant, an autonomous AI agent, and a multimodal content generator capable of processing and generating text, images, audio, and video within a single unified system. The project is distinguished by its deep reasoning capabilities, utilizing chain-of-thought processing to solve complex mathematical, logical, and technical problems. It features an agentic architecture that allows for autonomous task execution, long-horizon goal planning, and the ability to interact with external tools an
Hive is an artificial intelligence workflow automation engine and development platform designed for building and deploying autonomous agents. It provides a framework for orchestrating complex, multi-step business processes by coordinating tasks across multiple specialized agents using directed graph structures. The platform distinguishes itself through a focus on production-grade reliability and state management. It maintains persistent execution context and conversation history on disk, enabling crash recovery and continuity for long-running automated sessions. Furthermore, it incorporates a
Kilocode is an autonomous engineering platform designed to orchestrate AI agents for complex software development tasks. It functions as a comprehensive system for automating coding, testing, and repository management by integrating directly with your codebase and terminal. The platform provides a unified gateway for model orchestration, allowing for the management of agentic workflows, event-driven automation, and persistent session state across distributed development environments. The platform distinguishes itself through its federated task management and policy-based access control, which
Llama-stack is a standardized orchestration stack and generative AI API gateway. It provides a unified communication layer and a consistent interface for deploying, managing, and interacting with various large language model providers and deployments. The system functions as an agent framework that manages tool execution and versioned skill bundles to automate complex tasks. It includes a batch processing system for handling large volumes of asynchronous requests through offline processing and a vector database interface for storing and searching documents to enable retrieval augmented genera