30 open-source projects similar to tebelorg/rpa-python, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best RPA Python alternative.
This application is a cross-platform desktop utility designed for automated translation, optical character recognition, and speech synthesis. It functions as a modular client that integrates various local and remote language services, allowing users to process text through hotkeys, clipboard monitoring, or direct input. The software distinguishes itself through a plugin-based architecture and a built-in automation framework. By exposing a local network interface, it enables external applications and scripts to programmatically trigger its translation and recognition workflows. Users can furth
Phoenix is a macOS workspace automator and window manager that uses a JavaScript scripting engine to control system-level behaviors. It functions as an AppleScript automation bridge, allowing users to programmatically manipulate application states, window geometry, and desktop interactions. The project enables the creation of custom workflows by binding keyboard shortcuts to JavaScript functions. This allows for the automation of complex system actions, such as organizing application layouts across multiple screens and managing virtual spaces. Its capability surface covers window and applica
Hammerspoon is a programmable automation engine for macOS that enables deep system-level control through a Lua scripting environment. By bridging high-level scripts with native Objective-C APIs, it allows users to interact with the operating system's accessibility tree, intercept hardware input streams, and manage the lifecycle of running applications. The project distinguishes itself through an event-driven architecture that registers asynchronous hooks for system notifications and hardware events. This allows for real-time automation, such as remapping keyboard and mouse inputs, managing wi
Kit is a desktop automation framework and scriptable UI toolkit designed for building personalized productivity tools. It serves as a cross-platform CLI wrapper and macOS system automator, providing an environment to execute scripts that manage operating system tasks, file management, and application workflows. The project distinguishes itself with a dedicated LLM integration layer for structured data extraction and text generation, alongside a specialized UI framework for creating interactive input forms, HTML windows, and floating widgets. It features deep macOS integration through AppleScr
Robotgo is a cross-platform desktop automation framework for the Go programming language. It provides a comprehensive toolkit for programmatically interacting with graphical user interfaces, enabling developers to simulate human input, manage application windows, and monitor system-wide hardware events. The library distinguishes itself through its low-level system integration, utilizing a foreign function interface to interact directly with native operating system APIs. It employs pixel-buffer memory mapping and real-time screen capture to perform visual element identification, allowing for i
LaVague is an LLM web agent framework and large action model designed to translate natural language instructions into executable browser automation scripts. It functions as a multi-modal orchestrator that reasons over web page states and HTML content to automate multi-step tasks via a Selenium-based automation engine. The framework features a modular model provider layer, allowing users to swap between different language and vision models from providers such as Anthropic, Gemini, and Azure OpenAI. It employs a multi-modal world model to process screenshots and HTML structures, utilizing retri
This project is a Go shell scripting library and framework designed for writing automation scripts and CLI tools. It provides a concurrent data pipeline system for chaining sources, filters, and sinks to process text and JSON streams. The library distinguishes itself through a comprehensive toolkit for shell-like operations, including a text processing engine for regular expression filtering and frequency analysis, a filesystem utility toolkit for recursive search and path manipulation, and an integrated HTTP client wrapper for building data pipelines that fetch web content. The capability s
This project is a high-level Python library and wrapper for Selenium designed for web browser automation and functional testing. It provides a simplified interface for controlling browsers to execute automated workflows and end-to-end tests across Chrome and Firefox. The library distinguishes itself by replacing technical CSS selectors and identifiers with label-based element discovery, allowing elements to be located via visible text. It further simplifies browser control by automating window management through page titles and handling nested frame interactions without requiring manual conte
Accomplish is an artificial intelligence action framework and desktop automation agent designed to execute productivity tasks through natural language prompts. It functions as a workflow orchestrator that manages connections between various cloud and local language model providers to perform cross-platform operations. The system distinguishes itself through the ability to define and save stateful, reusable custom skills for recurring workflows. It integrates local application programming interfaces with third-party services to synchronize data and manage information across different platforms
Claude Code is a command-line interface and multi-agent orchestration framework designed for autonomous software engineering. It enables AI agents to perform codebase modifications, debugging, and Git workflow management while coordinating multiple specialized agents to decompose and execute complex engineering tasks in parallel. The system distinguishes itself through a high degree of isolation and safety, utilizing Git worktrees to create independent working directories for concurrent agents and implementing a tiered permission system that combines user rules, project policies, and OS-level
Skim is a cross-platform interactive fuzzy finder that runs as a terminal application, a Rust library, a Vim and Neovim plugin, and a shell integration tool. It provides real-time filtering and selection from lists of items, supporting keyboard and mouse navigation, live preview panes, and multi-select functionality across Linux, macOS, and Windows. The tool distinguishes itself through a composable query expression tree that supports fuzzy, exact, inverse, prefix, suffix, and logical AND/OR operators, combined with a Smith-Waterman scoring engine that penalizes typos and gaps for natural rel
MisakaTranslator is a real-time game translation tool designed to extract text from games and manga and provide machine translations via external engines. It functions as a text extractor using both memory hooking to retrieve raw text directly from running processes and optical character recognition to convert images of in-game text into editable strings. The tool includes a speech synthesizer to read translated dialogue and sentences aloud. To maintain accuracy, it utilizes a custom translation dictionary to manage specialized word lists and manual phrase mappings for character names and loc
Dango-Translator is an OCR translation system and multi-engine translation client designed to extract text from images or screens and replace it with translated content. It functions as an image text translator and real-time screen translator, utilizing optical character recognition to convert text between different languages automatically. The software distinguishes itself through coordinate-based image typesetting and a glossary manager. These tools allow for the replacement of original image content with translated text in the same area and the use of specialized dictionaries to ensure con
Mjolnir is a macOS automation framework and extensible scripting engine. It provides a system for creating custom productivity workflows, managing application states, and controlling the macOS desktop interface programmatically. The project functions as a global hotkey manager that binds keyboard shortcuts to trigger automated scripts across the operating system. It includes a macOS application controller to inspect active windows and manage system-wide user interface interactions. The environment supports extensibility through a pluggable package management system, allowing for the installa
Helium is a Python library and high-level wrapper for Selenium designed for browser automation, functional UI testing, and web scraping. It provides a simplified interface for interacting with web applications across different browser engines. The library distinguishes itself by allowing users to identify and interact with web elements using visible text labels rather than relying exclusively on technical identifiers like XPaths or CSS selectors. This approach enables the creation of automation scripts based on human-readable labels. The toolkit covers a broad range of browser automation cap
Bob is an extensible macOS utility designed for screen text extraction, translation aggregation, and speech synthesis. It functions as a wrapper that integrates multiple optical character recognition and translation services into a single interface, allowing users to capture screen areas, decode QR codes, and convert visual text into editable strings. The tool distinguishes itself through a plugin-based architecture that supports the integration of custom translation, speech synthesis, and image recognition APIs. It enables multi-engine parallel execution, allowing a single request to be proc
KeymouseGo is an input automation tool and macro recorder designed to capture, edit, and replay keyboard and mouse sequences to automate repetitive desktop tasks. It functions as a scriptable input automator that translates recorded user interactions into reusable blueprints for automated playback. The system distinguishes itself through a logic-based scripting framework that supports conditional branching, sub-routine calls, and jump-to-labels for complex workflow control. It further extends runtime behavior via a plugin system that allows for the registration of custom functions to modify t
This project is a computer control framework that uses multimodal vision models to simulate mouse and keyboard inputs for automating desktop tasks. It functions as an autonomous agent and vision-based orchestrator that interprets screen visuals to interact with user interfaces. The system employs vision language models and object detection to locate and click interface elements. It utilizes visual grounding to overlay numerical markers on UI components and uses optical character recognition to map on-screen text to precise pixel coordinates. The framework supports voice-controlled computing
Peco is an interactive text filter and fuzzy finder for the terminal. It serves as a terminal user interface selection tool that filters standard input in real-time using fuzzy matching and regular expressions. The tool preserves and renders ANSI color escape sequences from piped input streams while performing matching logic on plain-text versions. It supports multi-stage filtering, allowing users to freeze result sets to create a new base for subsequent refinements. Capability areas include advanced search filtering with negative matching, multi-item selection, and the ability to pipe selec
Vis is a terminal-based modal text editor that utilizes vi keybindings and a system of structural regular expressions. It functions as a scriptable environment where Lua is used for configuration, custom key mappings, and plugin development. The editor distinguishes itself through a syntax highlighting system based on Parsing Expression Grammars and a pattern matching engine that treats text as a structure for complex search and replace operations. It also integrates directly with the system shell, allowing users to pipe text ranges to external commands and capture the resulting output. The
This is a Model Context Protocol server that exposes Windows desktop automation and system administration functions to large language models. It provides programmatic control of mouse, keyboard, windows, and UI elements on Windows through simulated user input, while also enabling LLMs to manage the Windows registry, processes, files, and execute PowerShell commands through a remote interface. The server supports multiple transport protocols including stdio, SSE, and streamable HTTP, allowing flexible integration with different language model clients. It implements OAuth 2.0 with PKCE for secu
AzurLaneAutoScript is a mobile game automation system designed to perform repetitive gameplay tasks unattended. It functions as a screenshot-driven bot that controls Android devices, emulators, and cloud phones via ADB and uiautomator2, using computer vision to make interaction decisions instead of fixed timers. The project distinguishes itself through an advanced computer vision suite that includes local optical character recognition and perspective-aware grid detection. These tools allow the bot to parse 3D game maps, compute vanishing points, and normalize grid-centered objects for precise
Zed is a terminal-based code editor built in Rust that provides a full-featured editing experience with familiar keybindings, mouse support, and multiple cursors. It runs entirely in the terminal while offering capabilities typically found in graphical editors, including split panes, a command palette, and integrated language server protocol support for real-time diagnostics, completions, go-to-definition, and code actions across multiple languages. The editor distinguishes itself through a plugin system that runs sandboxed TypeScript plugins in a QuickJS runtime, with an asynchronous bridge
This project is an autonomous desktop automation agent that interprets natural language instructions to control applications, browser interfaces, and system terminals. It functions as a cross-platform utility designed to manage complex workflows by integrating visual screen analysis with system-level input simulation. The agent distinguishes itself through its ability to perform tasks asynchronously, ensuring that web and terminal operations run in the background without interrupting the active user session or desktop focus. By combining computer vision to map interface elements with event-dr
gptme is an autonomous AI agent server and framework designed for local system automation, software development, and code execution. It operates as a local execution engine that enables language models to run shell commands, modify local files, and interact with the operating system. The project functions as a Model Context Protocol client, integrating with external servers to expand agent capabilities with standardized tools and data sources. It features a provider-agnostic routing system to orchestrate tasks across multiple proprietary cloud APIs and local AI backends. The system includes
Codeception is a full-stack testing framework for PHP applications that provides a unified interface for unit, functional, and acceptance testing. It serves as a tool for automating real desktop and mobile browsers via the WebDriver protocol and acts as a client for testing REST and SOAP APIs. The framework is distinguished by its support for Behavior-Driven Development, allowing users to write human-readable test specifications in Gherkin language to align technical tests with business requirements. It implements actor-based action mapping to connect these natural language steps to executabl
wxauto is a Python library and bot framework designed for the programmatic control of the WeChat Windows desktop client. It functions as a wrapper that enables the automation of messaging and social feed functions by simulating user interface interactions. The library distinguishes itself by providing a bridge between network requests and local UI automation, allowing users to expose automation capabilities via a web interface. It utilizes background execution and simulated system-level inputs to trigger application events without moving the physical mouse cursor. The project covers extensiv
LunaTranslator is a real-time translation tool designed for visual novels and games. It functions as a multi-engine translation hub and text extractor that captures dialogue via memory hooking or optical character recognition to convert it into a target language. The project distinguishes itself through specialized linguistic tools, including a Japanese text analyzer for sentence segmentation and phonetic readings. It also operates as a digital dictionary aggregator, querying multiple online and offline databases simultaneously to provide comprehensive vocabulary definitions for language lear
This is a collection of Python automation scripts and utility tools designed to handle repetitive technical tasks, system administration, and developer workflows. The project serves as a suite for task automation, data utility, and web automation. The collection includes specialized tools for multimedia processing, such as optical character recognition for extracting text from images, speech-to-text conversion, and real-time face and human body detection. It also features web scraping and monitoring capabilities to track product prices, fetch external API content, and automate interactions wi
WebContainer is a browser-based runtime environment designed to execute server-side code, operating system commands, and full-stack development toolchains directly within a web tab. It provides the infrastructure for cloud IDEs and zero-install development workflows by simulating a runtime that eliminates the need for local installations or remote virtual machines. The system leverages WebAssembly to map system calls and implements a virtual POSIX-compliant filesystem and network interception layer. This allows the runtime to spawn command-line processes, execute shell commands, and route int