30 open-source projects similar to asweigart/pyautogui, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Pyautogui alternative.
Robotgo is a cross-platform desktop automation framework for the Go programming language. It provides a comprehensive toolkit for programmatically interacting with graphical user interfaces, enabling developers to simulate human input, manage application windows, and monitor system-wide hardware events. The library distinguishes itself through its low-level system integration, utilizing a foreign function interface to interact directly with native operating system APIs. It employs pixel-buffer memory mapping and real-time screen capture to perform visual element identification, allowing for i
Agent-S is a multimodal AI agent and LLM desktop automation framework designed to control operating systems through graphical user interface interactions. It functions as a computer use interface, utilizing vision-language grounding to translate natural language goals into precise screen coordinates and system actions. The project differentiates itself by combining structured accessibility tree inspection with vision-based element localization. It manages cross-application workflows by mapping conceptual descriptions to physical pixels and simulating low-level keyboard and mouse events to mov
Robotjs is a native Node.js automation library and desktop input simulator. It uses C++ bindings to provide low-level access to operating system functions, allowing for the programmatic control of the mouse and keyboard and the analysis of screen pixels. The library functions as a toolkit for automating user interfaces and desktop workflows, including those within Electron applications. It enables the simulation of key presses and mouse movements to automate interactions with desktop software and perform automated data entry. Its capabilities extend to screen pixel analysis, where it capture
Bytebot is an LLM desktop automation framework and virtual Linux desktop environment. It enables AI agents to plan and execute mouse and keyboard actions on a virtual computer using natural language, allowing for autonomous desktop automation and the integration of legacy systems that lack native APIs. The system operates as an LLM API gateway and a Model Context Protocol server, routing requests across multiple language model providers with integrated load balancing and rate limiting. It provides isolated, containerized environments where agents use visual reasoning to interpret screenshots
This is a Model Context Protocol server that exposes Windows desktop automation and system administration functions to large language models. It provides programmatic control of mouse, keyboard, windows, and UI elements on Windows through simulated user input, while also enabling LLMs to manage the Windows registry, processes, files, and execute PowerShell commands through a remote interface. The server supports multiple transport protocols including stdio, SSE, and streamable HTTP, allowing flexible integration with different language model clients. It implements OAuth 2.0 with PKCE for secu
Robot Framework is a keyword-driven automation framework designed for acceptance testing and robotic process automation. It utilizes a human-readable, tabular syntax to define test cases and workflows, separating the automation logic from the underlying implementation. By mapping plain-text keywords to executable commands, the framework enables the creation of maintainable and reusable automation sequences. The platform distinguishes itself through a modular architecture that supports the integration of custom libraries and external modules. This extensibility allows users to expand the frame
Locust is a distributed performance testing framework that allows users to define complex system stress scenarios using standard Python code. By modeling concurrent users as classes with weighted tasks and lifecycle hooks, it enables the simulation of realistic user behavior across large-scale environments. The tool functions as a scalable load generator capable of orchestrating traffic across multiple worker nodes to measure system stability and responsiveness under heavy, real-world conditions. The framework is distinguished by its protocol-agnostic architecture, which supports diverse comm
UI-TARS is an LLM GUI automation framework and multimodal action grounding system. It functions as a GUI agent orchestrator and cross-platform device controller that uses large language models to interpret graphical interfaces and execute actions across desktop and mobile operating systems. The system translates model-generated coordinates into precise screen positions to interact with visual user interface elements. It employs a multimodal approach to interpret screen layouts and decomposes complex goals into multi-step trajectories through reasoning and error correction. The project provid
Handy is a local speech-to-text automation tool designed to convert spoken audio into text and inject it directly into active desktop applications. By running machine learning models entirely on the host hardware, it provides a private, offline-first environment for dictation and command execution. The system functions as a background service that manages microphone input, transcription state, and text output, enabling hands-free typing across various software environments. The project distinguishes itself through a modular pipeline that integrates local language models for post-transcription
This project is an Android RPA framework designed for automating user interfaces and system tasks on rooted Android devices using Python and ADB. It provides a suite of tools for rooted device management, allowing for programmatic control of system settings, application lifecycles, and shell command execution via a remote API. The framework distinguishes itself through a combination of dynamic instrumentation and AI integration. It can inject scripts into running processes to hook Java interfaces and modifies application behavior in real time. Additionally, it supports large language model in
This project is a computer control framework that uses multimodal vision models to simulate mouse and keyboard inputs for automating desktop tasks. It functions as an autonomous agent and vision-based orchestrator that interprets screen visuals to interact with user interfaces. The system employs vision language models and object detection to locate and click interface elements. It utilizes visual grounding to overlay numerical markers on UI components and uses optical character recognition to map on-screen text to precise pixel coordinates. The framework supports voice-controlled computing
This project is a Model Context Protocol server and automation framework designed to control and automate iOS and Android devices. It provides a unified API that abstracts interactions between physical hardware and simulators across different mobile operating systems, functioning as a cross-platform device bridge. The system is distinguished by a visual UI automation toolkit that uses screenshots and coordinate-based gestures—such as tapping, swiping, and long-pressing—rather than relying on element selectors. It supports remote connectivity via an HTTP server using Server-Sent Events, which
This project is a collection of toolsets for executing visual effects, simulating operating system-level input, and manipulating graphical user interface window states. It functions as a Windows API automation tool and GUI manipulator designed to programmatically alter the behavior and appearance of active application windows. The toolkit features a desktop visual effects engine capable of applying rotational animations, hue shifts, and view transformations to the display. It includes an input simulator that can spawn multiple independent mouse cursors and automate user interactions across th
This project is an autonomous desktop automation agent that interprets natural language instructions to control applications, browser interfaces, and system terminals. It functions as a cross-platform utility designed to manage complex workflows by integrating visual screen analysis with system-level input simulation. The agent distinguishes itself through its ability to perform tasks asynchronously, ensuring that web and terminal operations run in the background without interrupting the active user session or desktop focus. By combining computer vision to map interface elements with event-dr
Hammerspoon is a programmable automation engine for macOS that enables deep system-level control through a Lua scripting environment. By bridging high-level scripts with native Objective-C APIs, it allows users to interact with the operating system's accessibility tree, intercept hardware input streams, and manage the lifecycle of running applications. The project distinguishes itself through an event-driven architecture that registers asynchronous hooks for system notifications and hardware events. This allows for real-time automation, such as remapping keyboard and mouse inputs, managing wi
Better Genshin Impact is a computer vision-based automation framework designed to perform repetitive tasks and combat sequences within game environments. It functions as a macro scripting engine that utilizes synthetic input injection to simulate human interaction with the operating system, allowing for hands-free execution of complex gameplay loops. The system distinguishes itself through a combination of template-matching visual recognition and state-machine logic, which enables the software to identify on-screen game elements and transition between operational states in real time. By mappi
This is a collection of Python automation scripts and utility tools designed to handle repetitive technical tasks, system administration, and developer workflows. The project serves as a suite for task automation, data utility, and web automation. The collection includes specialized tools for multimedia processing, such as optical character recognition for extracting text from images, speech-to-text conversion, and real-time face and human body detection. It also features web scraping and monitoring capabilities to track product prices, fetch external API content, and automate interactions wi
SerpentAI is a game AI development kit and computer vision framework designed for building autonomous agents that interact with video games. It serves as a game input automation tool and a machine learning model integration engine, allowing developers to create agents that perceive game states and execute actions. The framework utilizes a plugin-based agent architecture to provide modular extensions for game-specific logic and behaviors. It features a specialized system for training, bundling, and deploying machine learning classifiers to recognize visual contexts and game states in real time
KeymouseGo is an input automation tool and macro recorder designed to capture, edit, and replay keyboard and mouse sequences to automate repetitive desktop tasks. It functions as a scriptable input automator that translates recorded user interactions into reusable blueprints for automated playback. The system distinguishes itself through a logic-based scripting framework that supports conditional branching, sub-routine calls, and jump-to-labels for complex workflow control. It further extends runtime behavior via a plugin system that allows for the registration of custom functions to modify t
Karabiner-Elements is a system-level utility designed for advanced keyboard and mouse customization. It functions as a background service that intercepts raw hardware input signals at the driver level, allowing for the transformation of key presses and pointer movements before they reach the operating system. By utilizing virtual input device emulation, the software re-injects modified events into the system stream, enabling complex remapping, macro execution, and hardware-specific control. The project distinguishes itself through a sophisticated state-based logic engine that enables context-
BackstopJS is an automated screenshot testing framework and visual regression testing tool designed to identify pixel-level discrepancies between different versions of a web application. It functions as a browser automation testing suite that captures visual snapshots of a user interface and compares them against stored reference images to detect unintended changes. The project utilizes a containerized testing environment via Docker to ensure consistent browser rendering and prevent cross-platform visual discrepancies. It includes a web UI diffing interface that allows users to analyze visual
Open Interpreter is an autonomous agent runtime that translates natural language instructions into executable code to interact with local software and operating systems. It functions as an orchestration framework that connects language models to a secure execution environment, enabling the development of agents capable of managing system resources and performing complex tasks. To ensure safety, the system mandates explicit user verification before executing any generated code and provides robust isolation through containerized sandboxing. The project distinguishes itself through its deep inte
Maestro is a declarative mobile and web UI automation framework designed for end-to-end testing. It operates by querying the native accessibility tree of an application, allowing for black-box testing without requiring source code instrumentation or platform-specific dependencies. The framework distinguishes itself through a unified command syntax that abstracts interactions across Android, iOS, and web environments. It features a dynamic synchronization engine that automatically pauses test execution to account for non-deterministic animations and network-dependent content loading, ensuring
pytest is a testing framework for Python that provides a command-line runner for discovering and executing test suites. It is built on a modular architecture that uses standard language assertions to verify code correctness, automatically inspecting expressions to provide detailed failure reports without requiring specialized assertion methods. The framework distinguishes itself through a dependency injection system that manages setup and teardown logic by automatically resolving and injecting resources into test functions. It also features a hook-based plugin architecture that allows for dee
Automated Integration Testing and Live Documentation for your API
Jest is a JavaScript testing framework that integrates a test runner, an assertion library, and a snapshot testing tool. Its primary purpose is to provide a comprehensive environment for writing and running automated JavaScript tests to verify software correctness. The framework is distinguished by its snapshot testing capabilities, which capture the state of large objects or rendered components to detect regressions over time. It also features a reactive watch mode that monitors file changes and automatically executes only the tests related to modified code. The project covers a broad range
Hypothesis is a Python property-based testing library and data generation engine. It enables the discovery of edge cases and bugs by generating a wide range of randomized inputs based on defined strategies and shrinking complex failing examples to their smallest possible form. It also functions as a state machine testing framework to verify system behavior across sequences of interdependent operations. The project features a fuzzing integration layer that converts raw byte buffers from coverage-guided fuzzers into structured test cases. It includes a persistence mechanism to store and synchro
Selenium is a comprehensive browser automation framework that provides a standardized interface for controlling web browsers to perform automated tasks, user interactions, and data extraction. It functions as a cross-browser testing tool, enabling developers to execute identical automation scripts across various browser engines and operating systems to ensure consistent application behavior. By implementing the WebDriver protocol, it maps high-level automation commands to browser-specific drivers using a standardized HTTP-based wire protocol. The project distinguishes itself through its distr
Playwright for Python is a browser automation framework designed for end-to-end testing, web scraping, and user interaction simulation. It functions as a headless browser controller that enables programmatic navigation, data extraction, and the execution of complex workflows across multiple rendering engines. The framework distinguishes itself through an actionability-aware interaction engine that automatically verifies element readiness before performing actions, significantly reducing test flakiness. It utilizes isolated browser contexts to maintain separate storage and cookies for parallel
Runs a load test on the selected URL. Fast and easy to use. Can be integrated in your own workflow using the API.