30 open-source projects similar to microsoft/winappdriver, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best WinAppDriver alternative.
wxauto is a Python library and bot framework designed for the programmatic control of the WeChat Windows desktop client. It functions as a wrapper that enables the automation of messaging and social feed functions by simulating user interface interactions. The library distinguishes itself by providing a bridge between network requests and local UI automation, allowing users to expose automation capabilities via a web interface. It utilizes background execution and simulated system-level inputs to trigger application events without moving the physical mouse cursor. The project covers extensiv
This is a Model Context Protocol server that exposes Windows desktop automation and system administration functions to large language models. It provides programmatic control of mouse, keyboard, windows, and UI elements on Windows through simulated user input, while also enabling LLMs to manage the Windows registry, processes, files, and execute PowerShell commands through a remote interface. The server supports multiple transport protocols including stdio, SSE, and streamable HTTP, allowing flexible integration with different language model clients. It implements OAuth 2.0 with PKCE for secu
WebDriverAgent is an on-device control agent and automation server that implements the WebDriver protocol for iOS devices. It serves as a bridge that enables the remote control of applications and the operating system on physical iOS devices and simulators for automated testing. The project provides a UI testing framework capable of interacting with on-screen elements, capturing screenshots, and simulating user gestures. It translates remote commands into native system calls to interact with the iOS accessibility hierarchy. The server covers a broad range of device orchestration and UI autom
php-webdriver is a WebDriver PHP client and browser automation framework that implements the W3C WebDriver standard. It serves as a programmatic interface for controlling web browsers, executing JavaScript, and managing browser sessions in both headed and headless environments. The library functions as a Selenium protocol implementation, allowing PHP applications to communicate with browser drivers such as ChromeDriver or GeckoDriver. It provides the ability to automate user actions, navigate pages, and validate DOM elements for web UI testing. Its capabilities cover broad areas of browser i
Maestro is a declarative mobile and web UI automation framework designed for end-to-end testing. It operates by querying the native accessibility tree of an application, allowing for black-box testing without requiring source code instrumentation or platform-specific dependencies. The framework distinguishes itself through a unified command syntax that abstracts interactions across Android, iOS, and web environments. It features a dynamic synchronization engine that automatically pauses test execution to account for non-deterministic animations and network-dependent content loading, ensuring
Open-AutoGLM is an autonomous agent framework designed to perform complex user workflows on mobile devices. By translating natural language instructions into precise sequences of taps, scrolls, and text inputs, the system enables the automation of mobile application interactions and testing. The platform distinguishes itself through a combination of vision-language processing and reinforcement learning. It converts graphical user interfaces into structured data, allowing agents to parse screen elements and map natural language commands to coordinate-based actions. To ensure reliability, the s
Midscene is a multimodal automation framework designed to enable AI agents to perceive, navigate, and manipulate graphical user interfaces across web, mobile, and desktop environments. By leveraging vision-capable AI models, the platform interprets interface screenshots to execute tasks based on natural language instructions, removing the reliance on traditional, brittle code-based selectors. The framework distinguishes itself through its ability to decompose high-level goals into autonomous, multi-step sequences that function consistently across diverse platforms. It provides a visual ground
WebDriverIO is a Node.js test automation framework used for automating functional tests across web browsers and mobile applications. It acts as a WebDriver protocol client that manages remote browser sessions and executes commands against WebDriver and Appium servers to perform end-to-end testing. The framework is distinguished by its ability to control both native and hybrid mobile applications and its support for running automated suites across local machines, remote grids, and cloud device providers. It includes specialized capabilities for coordinating multi-browser interactions and estab
Helium is a Python library and high-level wrapper for Selenium designed for browser automation, functional UI testing, and web scraping. It provides a simplified interface for interacting with web applications across different browser engines. The library distinguishes itself by allowing users to identify and interact with web elements using visible text labels rather than relying exclusively on technical identifiers like XPaths or CSS selectors. This approach enables the creation of automation scripts based on human-readable labels. The toolkit covers a broad range of browser automation cap
崩坏:星穹铁道全自动 三月七小助手
Resemble.js is an image comparison framework and visual difference engine designed for automated regression testing. It functions as a library to normalize image dimensions and analyze visual discrepancies to determine if two images are identical. The system identifies pixel-level changes between images while providing capabilities for bounding-box isolation and the exclusion of specific regions. It calculates a percentage of difference by measuring the numerical distance between RGBA color channel values. The library covers visual regression testing and frontend quality assurance by compari
Huxley is a visual regression testing tool and browser automation framework designed to detect pixel-level interface changes. It functions as an automated browser screenshotter that records user interactions and replays them to verify that web interfaces remain visually consistent across updates. The system generates visual diffs by comparing current screenshots against stored baseline images to highlight specific pixels that have changed. It includes mechanisms to manage these baselines, allowing users to update reference screenshots when interface changes are intentional. The framework cov
This repository provides a collection of reference implementations and patterns for testing Android applications. It serves as a guide for developers to integrate standard testing libraries and frameworks into their projects, covering the full spectrum of verification from local business logic to complex interface interactions. The project distinguishes itself by demonstrating how to configure and execute tests across diverse environments, including local virtual machines and physical devices or emulators. It provides specific patterns for validating inter-application communication, automatin
KIF is a functional testing framework and UI automation tool for iOS. It enables the simulation of user interactions and the verification of application states by driving interface components through their defined accessibility attributes. The framework utilizes an actor-pattern action wrapper to group reusable interaction sequences and allows the definition of custom, high-level test steps through a method-extension action library. It performs in-process interface driving, interacting with the application directly within the same process to execute actions and validate view states. The tool
AutoJs6 is an Android automation framework and JavaScript runtime designed to automate user interface interactions and system tasks on mobile devices. It functions as a UI automator that inspects screen hierarchies and manipulates on-screen controls via selectors to automate manual workflows. The project includes an Android script compiler that bundles automation scripts into standalone APK files for distribution. It also provides a remote debugging tool that creates a network-based bridge between a mobile device and a desktop IDE for writing and testing scripts. The framework covers a broad
Appium is a cross-platform automation server that enables user interface testing across mobile, desktop, and web environments. It functions as a unified server architecture that translates automation scripts into platform-specific actions using the W3C WebDriver protocol. The project distinguishes itself through a modular architecture that decouples core server logic from platform-specific implementations. This design allows for the integration of custom drivers and plugins, enabling support for specialized hardware, unique application environments, and non-standard interaction patterns that
Open Interpreter is a local language model agent framework that enables the deployment of autonomous agents capable of controlling a local operating system and its applications. It provides an execution environment where language models can run code and scripts directly on a computer to automate system tasks. The framework includes a computer control interface that allows language models to interact with web browsers and native user interfaces through programmatic commands. To ensure system stability, it utilizes a secure sandbox environment for the execution of model-generated code. The sys
Streamlit is a Python framework designed to transform data scripts into interactive web applications. It utilizes a reactive execution engine that automatically reruns scripts from top to bottom whenever a user interaction triggers a state change, ensuring the interface remains synchronized with the underlying data. By providing a declarative interface, it allows developers to build functional applications without requiring extensive knowledge of frontend web technologies. The framework distinguishes itself through an identity-based widget reconciliation system that persists user input across
AutoHotkey is a Windows automation scripting language and task automator. It serves as a keyboard macro engine and custom hotkey manager designed to map specific key combinations to scripted actions. The project provides a domain-specific language for automating repetitive tasks and operating system functions. It enables the creation of keyboard shortcuts and macros to replace manual input and streamline digital workflows on Windows. The system covers window and process management, virtual input simulation, and the interception of keyboard and mouse input via operating system hooks. It furth
TAICHI-flet is an AI-integrated resource browser and Windows desktop application built with Flet. It serves as a centralized multimedia hub and web content aggregator designed to combine artificial intelligence utilities with tools for searching and accessing movies, music, and software. The application enables the aggregation of resources from multiple sources, including cloud storage drives and external web addresses. It provides specialized tools for streaming and downloading anime and music, reading online novels with text-to-speech playback, and automating operations on the Windows opera
wpftoolkit is a UI component library and development toolset for building Windows desktop applications. It provides a collection of pre-built input controls, layout panels, and specialized components designed to extend standard interface capabilities. The project includes a data visualization toolkit for rendering complex datasets via charts, gauges, and three-dimensional views. It also features a window management framework for organizing application structures through docking layouts, wizard flows, and container controls, alongside a visual theme engine for managing the appearance of interf
WebDriverAgent is an iOS device automation driver and server that enables the programmatic control of applications on physical devices and simulators. It functions as a bridge that exposes Apple XCUITest capabilities via a network interface, translating WebDriver commands into native iOS actions for mobile UI testing. The system implements a WebDriver server that uses the JSON Wire Protocol to receive instructions and return results. It translates these network requests into local commands to manage application lifecycles, perform screen gestures, and verify the presence of specific user inte
Docker-Android runs a full Android emulator inside a Docker container, enabling mobile app testing and automation without requiring a physical device. The emulator uses QEMU-based virtualization with optional KVM acceleration for hardware-backed performance, and supports nested virtualization on cloud VMs from providers like AWS, GCP, and Azure for environments without direct hardware acceleration. The container exposes the Android Debug Bridge over TCP/IP, allowing host-side tools to connect to the emulator as if it were a local device. It provides browser-based interaction with the emulator
This project is a collection of toolsets for executing visual effects, simulating operating system-level input, and manipulating graphical user interface window states. It functions as a Windows API automation tool and GUI manipulator designed to programmatically alter the behavior and appearance of active application windows. The toolkit features a desktop visual effects engine capable of applying rotational animations, hue shifts, and view transformations to the display. It includes an input simulator that can spawn multiple independent mouse cursors and automate user interactions across th
XcodeBuildMCP is a Model Context Protocol server and development tool bridge that provides AI agents with the ability to control xcodebuild, manage simulators, and automate the compilation and execution of Apple platform applications. It functions as a persistent daemon that proxies native IDE build and debug capabilities to external clients and agents. The project distinguishes itself by using the Model Context Protocol to expose build and device management tools through a standardized interface. It implements specialized skill priming and instruction configuration to ensure AI agents can in
Qwen2-VL is a multimodal large language model and vision language model designed to process and reason across text, images, and video content. It functions as a visual reasoning engine and a visual agent framework, capable of interpreting visual data to perform object detection, document parsing, and spatial reasoning. The model is distinguished by its ability to act as a video understanding model, processing hour-long videos with second-level indexing and event recall. It further differentiates itself through a visual agent capability that interacts with software interfaces and robotic hardw
Protractor is a WebDriver-based end-to-end testing framework and browser automation tool. It serves as a frontend integration test suite used to verify web application flows by simulating user behavior and executing JavaScript within a browser. The framework is specifically designed for testing Angular applications, providing specialized locators and synchronization tools that align with the framework lifecycle. It distinguishes itself through automatic test step synchronization, which pauses execution until pending page tasks are completed to ensure stable browser execution. The tool covers
Open Interpreter is a coding agent that uses large language models to write and execute code directly on a local host machine. It functions as a system for performing operating system tasks and file manipulations through a natural language interface. The project features a model orchestrator that allows switching between different language model providers and emulation harnesses. It employs a loop-based reasoning process to iteratively generate code and process execution output until a goal is achieved. Its capabilities include cross-platform system automation, local model integration for da
This project is a Java GUI framework used to build cross-platform desktop, mobile, and embedded applications. It centers on a hardware accelerated graphics engine that provides 2D and 3D visualizations and visual effects, complemented by a reactive UI binding system for synchronizing data and interface updates. The framework distinguishes itself through the FXML markup language, which separates the visual structure of an interface from its procedural logic. It also includes a dedicated CSS styling engine that allows for the customization of component appearances using external stylesheets and
koa-router is a routing middleware for Koa applications that maps incoming HTTP requests to specific handler functions based on URL patterns and HTTP methods. It provides the foundation for organizing web endpoints and developing REST APIs by linking request paths to their corresponding controller actions. The project enables the organization of complex endpoints through recursive router nesting, allowing multiple router instances to be mounted as middleware to create logical route hierarchies. It supports dynamic URL generation via named route mapping, which allows the creation of URL string