30 open-source projects similar to fake-useragent/fake-useragent, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Fake Useragent alternative.
Anti-Anti-Spider is an automated web scraping toolkit and CAPTCHA bypass framework. It uses convolutional neural networks to recognize characters and digits in image-based security challenges, enabling programmatic access to protected web content. The project functions as an image recognition model trainer, providing a workflow to preprocess labeled image datasets and train custom neural networks. Users can configure model architectures and hyperparameters to align the recognition system with the visual style of specific target websites. The toolkit covers capabilities for image data preproc
Autoscraper is an automatic web scraping library and pattern-based data extractor that learns extraction rules from sample data. It identifies and retrieves text, URLs, and HTML elements from web pages by analyzing sample values to replicate data patterns across different URLs. The system functions as a web scraping model manager, allowing users to save and reload learned rules to maintain consistent data extraction. It supports the export and import of scraping rules to a local file system to avoid repeating the training process for the same website. The library covers automated web data ex
Alova is a frontend data fetching library and HTTP request toolkit designed to manage remote data states and asynchronous communication in web applications. It functions as a request state management system that automates data retrieval, response caching, and synchronization between backend services and the user interface. The library distinguishes itself through a modular architecture that decouples network transport layers from request logic, allowing for consistent behavior across different environments. It utilizes a plugin-driven middleware system to extend core operations, enabling the
SpringSide 4 is an enterprise Java reference architecture and utility library built on the Spring Framework. It provides a pragmatic, best-practice application stack for building RESTful web services, web applications, and data access layers, along with a curated collection of high-performance utility classes for common operations like text, date, collection, reflection, concurrency, and I/O handling. The project distinguishes itself by combining a complete reference application scaffold with production-oriented infrastructure. It includes a JPA-based data access layer that automatically tran
MechanicalSoup is a Python web automation library and scraping framework designed to simulate browser sessions and navigate websites without requiring JavaScript execution. It functions as an HTML parsing tool and HTTP session manager, allowing for the programmatic retrieval of page content and the automation of web interactions. The library distinguishes itself by combining session persistence with automated form interaction. It maps user data to HTML input fields and selection boxes for programmatic submission and maintains authenticated states by managing cookies and user-agent headers acr
This project is a PHP user agent parser and mobile device detector. It analyzes HTTP user agent strings to identify the visitor's web browser, operating system, and device type. The library provides a dedicated integration for the Laravel web framework. It distinguishes between mobile phones, tablets, and desktop computers while identifying web robots and the specific names of search engine crawlers. The tool's capabilities include extracting preferred languages from request headers and verifying specific user agent properties. It uses regular expression pattern matching and static data mapp
Scraperr is a self-hosted web scraping and crawling platform designed for extracting structured data from websites using XPath selectors. It functions as a containerized system for managing scraping jobs through a queue and analyzing the resulting content using artificial intelligence. The project differentiates itself through its Kubernetes-native architecture, allowing for scalable deployment and management via package managers. It includes a crawling engine capable of domain-level spidering to discover linked pages and a data analyzer that uses artificial intelligence to query extracted we
curl_cffi is a Python HTTP client built on libcurl that focuses on browser fingerprint impersonation to evade anti-bot detection. By replacing default TLS handshake and HTTP/2 settings with those extracted from real browsers like Chrome and Firefox, it allows HTTP requests that closely mimic actual browser traffic, reducing the likelihood of being blocked by services that fingerprint automated clients. Beyond fingerprint impersonation, curl_cffi offers a dual API supporting both synchronous and asynchronous execution, with per-request proxy assignment, automatic retry with exponential backoff
Obscura is a web scraping infrastructure and headless browser server designed for AI agents. It provides a system for AI models to control browser sessions, interact with websites, and extract web data using a WebSocket implementation of the Chrome DevTools Protocol. The project focuses on bot detection evasion by randomizing browser fingerprints, masking native functions, and blocking tracking scripts to mimic human behavior. It further secures identities through a traffic layer that routes network requests via HTTP or SOCKS5 proxies. The system supports large-scale data extraction through
Bowser is a browser detection library that parses user-agent strings to identify a browser's name, version, rendering engine, and operating system. It functions as a user-agent parser and version constraint checker, extracting structured browser and platform details from raw user-agent strings without external dependencies. The library distinguishes itself by integrating User-Agent Client Hints alongside traditional user-agent data for more accurate browser identification in modern environments. It provides cross-platform browser detection that works consistently across desktop and mobile ope
snscrape is a Python-based social media web scraper and crawler designed to extract public posts, profiles, and hashtags from social networks without the use of official APIs. It functions as an archival tool and a utility for open-source intelligence data collection, allowing for the gathering of publicly available information to investigate trends and people. The tool facilitates social media data extraction for research and archival purposes, enabling the creation of historical records of conversations and user activity. It supports workflows for academic social analysis and the export of
Camoufox is a Firefox-based stealth automation browser designed to evade detection during automated browsing. It combines a fingerprint randomization engine that generates thousands of unique device attributes per session, native-level API interception to spoof WebRTC, WebGL, media, and other fingerprintable properties, and human behavior simulation that moves the cursor along natural, distance-aware trajectories. The browser is compiled from source with build-time stealth patches and runs headlessly via a lightweight virtual display buffer, making it suitable for web scraping, automated testi
current-device is a JavaScript device detection library used to identify operating systems, device categories, and screen orientations within a web browser environment. It serves as a browser environment detector and client-side mapper that translates browser metadata and user-agent strings into predefined device labels and hardware categories. The library provides tools for executing specific JavaScript logic based on the detected mobile or desktop platform. It also functions as a conditional styling tool and document body class injector, applying descriptive HTML classes to enable device-sp
python-goose is a Python library for web scraping and content extraction. It functions as an HTML boilerplate remover and article parser designed to isolate primary text and metadata from web pages by stripping away navigation, layout noise, and non-essential elements. The tool features multilingual processing capabilities, utilizing language-specific stop-word analyzers to identify and extract primary content across different languages. It also identifies and collects embedded media, including source URLs and embed codes for lead images and videos associated with an article. The library cov
Colly is a web scraping framework and concurrent crawler written in Go. It provides a system for traversing web pages, following links, and extracting structured data from HTML and XML documents. The framework includes a distributed scraping engine designed to spread data collection tasks across multiple instances to increase throughput. It ensures compliance with website owner policies by automatically reading and respecting robots.txt files. The system manages request lifecycles through domain-based rate limiting, concurrency controls, and session management via a stateful cookie jar. It s
CrawlerTutorial is a comprehensive Python web scraping tutorial and framework designed for extracting data from static and dynamic websites. It functions as a web data extraction pipeline and an HTTP request orchestrator, covering the full lifecycle of scraping applications from initial fetching to final data storage. The project provides specialized guidance on anti-bot bypass techniques and web API reverse engineering. It includes methods for evading browser detection through identity masking and proxy rotation, as well as techniques for identifying hidden API endpoints by analyzing network
X-Ray is a web scraping framework and asynchronous web crawler designed to extract structured data from websites. It functions as an HTML data extractor that transforms raw page content into a defined schema using CSS-style selectors. The project implements a headless browser crawler capable of executing JavaScript to render dynamic content. It handles website content discovery through a breadth-first crawling strategy and automatic pagination discovery to traverse multi-page result sets. The framework manages web data pipelines using a concurrency-limited request queue and request rate cont
bilibili-api is a Bilibili API wrapper and content scraper designed for programmatically accessing video metadata, user profiles, and content data. It functions as an anti-bot crawler framework and a WebSocket live chat client for retrieving platform information and real-time interaction data. The project incorporates tools to bypass anti-crawling measures and rate limits through the use of proxies and TLS fingerprint spoofing. It also includes logic for mapping and converting various video and content identifiers to ensure consistent data retrieval across different endpoints. Its capability
Haxl is a Haskell library and remote service request orchestrator designed for coordinating concurrent data fetching, request batching, and caching across multiple remote service providers. It functions as a framework for retrieving data from external databases and web services while minimizing network round trips. The project distinguishes itself through an applicative-based request batching system that groups multiple individual data requests into single calls to reduce network overhead. It employs an asynchronous parallel request scheduler to execute independent requests concurrently and u
This project is a containerized search infrastructure designed to deploy a privacy-focused metasearch engine. It acts as a self-hosted search proxy that aggregates results from multiple external web, image, and academic search providers while anonymizing requests and stripping trackers to protect user identity. The system utilizes Docker to orchestrate the search instance, integrating caching mechanisms and reverse proxy support to ensure a private and efficient search environment. It employs a modular adapter-based integration to standardize diverse external API responses and a processing pi
Cloud-mail is a cloud-based mail server and API platform providing a programmable interface for managing user accounts, sending bulk messages, and performing complex searches on email data. It serves as an automated email extraction tool and forwarding gateway, enabling the identification and capture of verification codes and the routing of incoming messages to external services. The infrastructure is hosted on serverless edge workers to remove the need for dedicated server hardware. It utilizes object storage for managing email attachments and employs a serverless message routing system to p
is.js is a JavaScript validation library and data type checker. It provides a suite of utilities to verify whether variables are primitives, arrays, functions, or specific object types. The project allows for the implementation of custom validation logic and the use of naming namespaces to override default rules and regular expressions. It covers a broad range of validation capabilities, including numeric arithmetic properties, date and time analysis for relative and absolute ranges, and string format verification for patterns such as emails, URLs, and IP addresses. The library also includes
HanekeSwift is a generic caching library for iOS and a specialized image caching framework. It provides a multi-level system that stores arbitrary data types in memory and on persistent disk storage to reduce network requests. The project features a specialized image handler that manages asynchronous loading, resizing, and disk storage for user interface components. It includes a background retrieval system that fetches remote content and automatically populates local caches. The library covers key-value data storage with sequential fallbacks, where it checks memory, then disk, and finally r
uncaptcha2 is an audio captcha bypass tool and ReCaptcha solver designed to circumvent bot detection systems. It functions as a speech-to-text captcha solver that converts audio security challenges into text to automate the completion of digit-based verification steps. The tool enables automated web scraping by removing security barriers and bypassing ReCaptcha challenges. It facilitates automated form submission by programmatically solving the verification steps required to access restricted websites.
This project is a comprehensive resource directory for web data extraction, providing a curated collection of tools and libraries for parsing data, automating browsers, and managing network operations. It serves as a guide for extracting structured information from HTML, XML, JSON, and PDF formats. The toolkit focuses on advanced data collection strategies, including headless browser automation to interact with JavaScript and a suite of network utilities for DNS resolution and WebSocket connections. It specifically covers methods for bypassing bot protections through proxy pool management, us
PathPicker is a command line file selector and interactive shell file picker that serves as a bridge between terminal output and external tools. It converts text output from shell commands into a visual list, allowing users to identify and isolate specific file paths for further action. The utility parses text streams from tools such as git or grep to identify file paths, which users can then filter and select via an interactive interface. These selected paths are injected into configurable command templates or passed as arguments to external processes, such as text editors or custom shell sc
This project is a collection of official plugin packages and a native integration library designed to provide a consistent interface for accessing hardware and software functionality across different mobile and desktop platforms. It serves as a native platform bridge, enabling cross-platform applications to invoke native code and manage operating system dependencies. The project utilizes a federated plugin architecture, splitting plugins into common interfaces and separate platform implementations to allow for independent development and extension. It further supports native integration throu
This is a collection of Python automation scripts and utility tools designed to handle repetitive technical tasks, system administration, and developer workflows. The project serves as a suite for task automation, data utility, and web automation. The collection includes specialized tools for multimedia processing, such as optical character recognition for extracting text from images, speech-to-text conversion, and real-time face and human body detection. It also features web scraping and monitoring capabilities to track product prices, fetch external API content, and automate interactions wi
so-novel is a web novel downloader and scraping engine designed to extract structured text from websites and convert it into electronic book formats. It functions as a multi-interface content extractor, providing a shared backend accessible via a web-based management dashboard, a terminal user interface, and a command line interface. The system utilizes a rule-driven approach for data extraction, using CSS selectors and XPath rules defined in external configuration files to map web elements to specific data fields. To maintain access to content, it includes a proxy-routed request pipeline to
MovieSwiftUI is a movie discovery application built with SwiftUI that integrates with the MovieDB API to retrieve and display movie information, ratings, and metadata. It functions as a cross-platform Apple application, providing a consistent user experience across iOS and macOS from a single codebase. The project implements a reactive data flow using Combine to synchronize global application state with the user interface. It employs a unidirectional data flow and a centralized store to maintain a single source of truth across different screens and components. The application utilizes declar