30 open-source projects similar to ecthros/uncaptcha2, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Uncaptcha2 alternative.
Buster is a browser extension that solves reCAPTCHA audio challenges by transcribing them into text through speech recognition, and it simulates human-like mouse interactions to bypass visual verification prompts. The extension coordinates with a companion desktop application via local inter-process communication, where the desktop app handles the simulation of natural mouse movements and clicks to improve automated solving success rates. The project distinguishes itself by combining audio transcription with human behavior simulation, using randomized mouse trajectories and timing to mimic hu
PocketSphinx is an offline speech recognition engine that converts raw audio from files or live microphone streams into written text without requiring a network connection. It functions as a speech-to-text library, a real-time transcription engine, and a voice command processor, capable of detecting and transcribing spoken commands from continuous audio streams with configurable acoustic and language models. The engine uses weighted finite-state transducers to represent acoustic, phonetic, and language models as a single search graph for efficient decoding. It employs fixed-point acoustic mod
This project is a hardware-accelerated transcription server and offline subtitle generator. It functions as a speech-to-text tool that converts audio and video files into plain text, JSON, and SRT subtitle formats using the Whisper model. The system operates as an OpenAI Audio API emulator, providing a local server that mimics a specific audio interface. This allows it to serve transcriptions to existing client configurations without requiring changes to the client software. The service utilizes GPU acceleration to increase voice recognition speed and includes utilities for hardware detectio
ESPnet is a comprehensive speech processing toolkit and PyTorch-based trainer designed for building end-to-end speech recognition, synthesis, and translation models. It provides a structured framework for developing automatic speech recognition systems using transducer and encoder-decoder architectures, alongside engines for text-to-speech synthesis and speech translation pipelines. The project distinguishes itself through a recipe-based workflow execution system that ensures experimental reproducibility by running standardized sequences of scripts for data preparation and model training. It
Whishper is a graphical user interface for transcribing audio and video files into text using the Whisper model. It serves as a speech-to-text tool and subtitle file generator that converts spoken content into editable text and timed subtitle formats. The project features an integrated transcription and translation interface, allowing users to refine automated results and convert transcribed text into different languages. It includes a visual editor for correcting speech recognition errors, adjusting segment timecodes, and performing bilingual translation reviews. The system handles the full
CasperJS is a scripting utility and testing framework for automating web scenarios via headless browsers. It enables the execution of navigation steps and form inputs to automate complex user scenarios, extract web data, and validate the state of remote pages. The project provides specific tooling for PhantomJS and SlimerJS, allowing users to write programmable sequences for web navigation and data extraction. It includes capabilities for capturing visual snapshots of full pages or specific elements to perform user interface regression testing. The framework covers broad automation areas inc
This project is a social media automation tool designed to publish videos and images across multiple social networks programmatically. It functions as a headless browser content publisher and a multi-platform posting API, allowing for automated social media posting and content distribution. The system utilizes browser automation to execute uploads and logins on platforms without official public APIs. It features a social media command-line manager for executing batch media uploads and managing account sessions, as well as a programmatic interface for triggering uploads and scheduling content
This project is a LinkedIn data scraper and professional profile extractor designed to collect information from professional networking sites. It functions as a headless browser scraper that extracts professional profiles, company details, and job listings using automated browser sessions. The tool includes a session manager that saves and loads authentication cookies to maintain persistent access to protected profiles. It employs configurable browser settings and user-agent mimicry to simulate human activity and bypass bot detection. Data extraction capabilities cover person profiles, compa
Vibe is a cross-platform transcription tool that converts spoken audio into text by running Whisper neural models directly on your device, with no cloud dependency. It can transcribe audio from files, microphones, system output, and network streams, and supports both batch processing of multiple files and real-time captioning from continuous input. Beyond basic transcription, Vibe identifies and labels different speakers through speaker diarization, and offers a choice of Command-Line Interface or HTTP API for automated and remote workflows. It also includes plugins to export transcripts to c
Splash is a headless browser HTTP API and JavaScript rendering engine designed to convert dynamic web content into static HTML or images. It functions as a Lua-scriptable browser service that exposes browser automation and rendering capabilities through a RESTful interface for programmatic data extraction. The service distinguishes itself by allowing the execution of custom Lua scripts to automate complex user interaction sequences and page navigation. It provides the ability to switch rendering engines on a per-request basis to verify cross-browser compatibility and visual consistency. The
Social-analyzer is an open-source intelligence framework designed for the automated discovery, correlation, and verification of digital identities across online platforms. It functions as a comprehensive engine for gathering social media intelligence, utilizing distributed browser automation to extract metadata and profile information from hundreds of websites simultaneously. The platform distinguishes itself through its ability to perform cross-platform identity correlation using heuristic-based pattern matching and name permutation generation. It processes these findings through a confidenc
PhantomJS is a scriptable, headless browser engine based on WebKit that provides a programmatic interface for automating web page interactions. It operates without a graphical user interface, allowing for the execution of JavaScript to navigate pages, manipulate the document object model, and perform functional testing of web applications. The tool distinguishes itself by providing low-level control over the browser rendering lifecycle and network stack. It enables real-time interception and modification of network traffic, alongside the ability to generate visual snapshots and document expor
Crawlee is a web scraping framework designed for building scalable, reliable, and distributed data extraction pipelines. It provides a unified interface for managing headless browser automation and lightweight HTTP requests, allowing developers to handle complex web navigation, dynamic content rendering, and large-scale data collection within a single, modular architecture. The project distinguishes itself through its resource-aware concurrency controller, which dynamically scales task execution based on real-time CPU and memory usage to prevent host machine exhaustion. It also features a rob
This project is a Node.js framework designed for headless browser automation, enabling the creation of automated messaging clients. It functions by controlling a headless browser instance to programmatically interact with the messaging interface, allowing developers to simulate user sessions and manage complex chat workflows. The library distinguishes itself through its comprehensive session management and event-driven architecture. It supports persistent authentication by serializing session data to local or remote storage, ensuring that automated clients can maintain continuous connectivity
Browserless is a service-oriented platform designed for remote browser automation and headless execution. It provides a distributed infrastructure that manages browser sessions through containerized isolation, allowing users to execute scripts and interact with web content without maintaining local browser state or infrastructure. The platform functions as a remote API and WebSocket-based control layer, enabling stateless HTTP requests for tasks like document generation and real-time browser interaction. It incorporates proxy-based routing to manage traffic signatures and supports the integra
This project is a high-performance headless browser engine designed for scalable web automation, data extraction, and AI agent integration. It provides a specialized environment that allows autonomous agents and testing frameworks to interact with web content through standardized remote control protocols. By executing pages in a lightweight, headless state, the engine minimizes resource consumption while maintaining the ability to perform complex navigation and dynamic content rendering. The platform distinguishes itself through deep integration with AI-centric communication layers and advanc
Undetected-chromedriver is a framework for automated browser navigation designed to bypass anti-bot security measures. It functions by patching browser drivers at the binary level to obscure automation signals, allowing scripts to interact with protected websites without being flagged or blocked by security services. The project distinguishes itself through its ability to maintain stealth during automated sessions, including those executed in headless mode. It achieves this by injecting custom configurations to mimic human user behavior and by hooking into low-level browser debugging protocol
Omniparse is a multimodal content parser and generative AI ingestion engine designed to convert documents, images, and multimedia into a uniform format. It functions as a data preprocessing pipeline that transforms diverse raw data sources into structured markdown to improve the performance of large language model workflows. The system extracts text and structural data from PDFs, images, audio, and video files. It includes a web crawler that converts dynamic website content into clean markdown and a multimodal transformation process that maps disparate input formats into a unified data schema
X-crawl is a Node.js-based web scraping framework designed to automate data collection from both static and dynamic websites. It integrates artificial intelligence to perform semantic parsing, allowing it to transform unstructured HTML into structured data formats that remain accurate even when website layouts or class names change. The project distinguishes itself through a comprehensive suite of stealth and reliability features. It manages crawler identity by randomizing device fingerprints and rotating proxy servers to bypass access restrictions. To handle complex, JavaScript-heavy interfa
Fonoster is a conversational AI framework and multi-tenant communications platform as a service. It serves as a programmable voice gateway and SIP telephony platform, enabling the creation of voice-based assistants and automated communication workflows using large language models. The project distinguishes itself through a vendor-agnostic speech integration engine that abstracts speech-to-text and text-to-speech providers. It features a multi-tenant architecture that isolates telephony resources and user identities into distinct organizational workspaces. The system covers a broad range of t
This project is a multimodal translation framework and large language model capable of speech-to-speech, speech-to-text, and text-to-text translation across nearly 100 languages. It provides a real-time speech translation engine and a comprehensive toolkit for converting spoken audio between languages. The system is distinguished by its ability to preserve the original speaker's tone, pace, and prosody during translation. It utilizes a specialized on-device inference toolkit that converts model checkpoints into C-based libraries, enabling low-latency execution on mobile and edge hardware with
Crawlee-python is a web crawling framework for building scalable scrapers using Python. It serves as a comprehensive tool for web scraping automation, providing a system to extract structured data from websites using both lightweight HTTP requests and headless browser automation. The framework is distinguished by its anti-bot evasion capabilities, which include browser fingerprint impersonation and tiered proxy rotation to bypass detection systems and solve challenges such as Cloudflare. It also incorporates artificial intelligence for autonomous website navigation and schema-based data extra
chromedp is a browser automation framework and driver that controls web browsers via the Chrome DevTools Protocol. It functions as a headless browser automation tool and web browser controller, enabling the programmatic management of browser sessions, targets, and network responses through a remote debugging interface. The project provides specialized capabilities for Chrome DevTools Protocol automation, including headless browser testing, web scraping and data extraction, and mobile device emulation. It also supports browser-based visual regression by capturing precise screenshots of web pag
Autosub is a command-line media processor and automatic subtitle generator that converts audio streams from video and audio files into timed text overlays. It functions as an AI speech-to-text converter that uses OpenAI Whisper to generate synchronized subtitles. The tool includes a language translation pipeline to convert transcribed speech into target languages, enabling multilingual video captioning. It manages the process from audio-stream extraction to the serialization of final subtitle files for local storage. The system covers audio-to-text transcription, time-stamped text mapping, a
Eko is a framework for designing and deploying agentic workflows, featuring an LLM agent workflow orchestrator and a browser automation engine. It provides a server-side process manager for executing system-level operations and managing local files, alongside a human-in-the-loop agent controller for manual oversight and direction during automated decision processes. The system coordinates multi-agent collaboration through role-based partitioning and workflow orchestration, dividing complex tasks into distinct roles and managing execution handoffs. It integrates the Model Context Protocol to s
gstack is an AI agent framework and development workflow system designed to automate the software development lifecycle. It coordinates specialized AI personas to manage tasks across product design, engineering management, and quality assurance, transforming product intent into technical specifications and final releases. The project is distinguished by its deep integration of headless browser automation and semantic code memory. It utilizes a persistent Chromium daemon for web scraping and visual auditing, and implements a searchable knowledge base that logs architectural decisions and repos
Open-claude-cowork is an LLM agent workflow orchestrator and multi-agent collaborative workspace. It serves as a SaaS tool integration framework and a real-time AI chat interface designed to connect large language models with external software applications and browser tools to automate complex business processes. The platform functions as a headless browser automation tool, enabling AI agents to navigate websites and interact with web-based interfaces automatically. It allows for the creation of shared environments where multiple agents coordinate using external tools and shared memory to com
Super Video Downloader is an integrated application designed for capturing, managing, and playing streaming media from web sources. It functions as a comprehensive utility that combines a web browser with media extraction tools, allowing users to save video and audio content directly to local storage for offline access. The application distinguishes itself by incorporating a headless browser engine that automates navigation and interacts with dynamic web content. It includes built-in privacy and security features, such as proxy-based traffic routing and encrypted domain name queries, to prote
musicdl is a command line music downloader and library manager designed for searching and downloading audio tracks and playlists from streaming platforms to local storage. It functions as a tool for music library archiving, allowing for the bulk acquisition of media and the organization of local audio collections. The project includes an AI lyric transcriber that uses machine learning models to generate text lyrics from audio files, supporting synchronized playback where lyrics are highlighted based on playback timestamps. To maintain access to streaming platforms, it employs a network proxy