30 open-source projects similar to gxcuizy/python, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Python alternative.
CrawlerTutorial is a comprehensive Python web scraping tutorial and framework designed for extracting data from static and dynamic websites. It functions as a web data extraction pipeline and an HTTP request orchestrator, covering the full lifecycle of scraping applications from initial fetching to final data storage. The project provides specialized guidance on anti-bot bypass techniques and web API reverse engineering. It includes methods for evading browser detection through identity masking and proxy rotation, as well as techniques for identifying hidden API endpoints by analyzing network
php-webdriver is a WebDriver PHP client and browser automation framework that implements the W3C WebDriver standard. It serves as a programmatic interface for controlling web browsers, executing JavaScript, and managing browser sessions in both headed and headless environments. The library functions as a Selenium protocol implementation, allowing PHP applications to communicate with browser drivers such as ChromeDriver or GeckoDriver. It provides the ability to automate user actions, navigate pages, and validate DOM elements for web UI testing. Its capabilities cover broad areas of browser i
LaVague is an LLM web agent framework and large action model designed to translate natural language instructions into executable browser automation scripts. It functions as a multi-modal orchestrator that reasons over web page states and HTML content to automate multi-step tasks via a Selenium-based automation engine. The framework features a modular model provider layer, allowing users to swap between different language and vision models from providers such as Anthropic, Gemini, and Azure OpenAI. It employs a multi-modal world model to process screenshots and HTML structures, utilizing retri
UserScript is a collection of JavaScript userscripts designed to modify website behavior and appearance. It functions as a system for automating web tasks, modifying page structures, filtering content, and optimizing the retrieval of web assets. The project includes specialized utilities for accelerating file downloads from hosting platforms and enhancing cloud storage interfaces with advanced sorting and direct download options. It also provides tools for web content filtering to remove intrusive elements and keywords, alongside a web automator for handling repetitive site actions like login
This project is a high-level Python library and wrapper for Selenium designed for web browser automation and functional testing. It provides a simplified interface for controlling browsers to execute automated workflows and end-to-end tests across Chrome and Firefox. The library distinguishes itself by replacing technical CSS selectors and identifiers with label-based element discovery, allowing elements to be located via visible text. It further simplifies browser control by automating window management through page titles and handling nested frame interactions without requiring manual conte
This is a collection of Python automation scripts and utility tools designed to handle repetitive technical tasks, system administration, and developer workflows. The project serves as a suite for task automation, data utility, and web automation. The collection includes specialized tools for multimedia processing, such as optical character recognition for extracting text from images, speech-to-text conversion, and real-time face and human body detection. It also features web scraping and monitoring capabilities to track product prices, fetch external API content, and automate interactions wi
Mechanize is a Ruby library for web browser automation and headless browser emulation. It allows for programmatically navigating websites and simulating human behavior without a graphical user interface. The library provides an automated interface for populating and submitting web forms, including text fields, checkboxes, and file uploads. It manages stateful sessions by automatically storing and sending cookies across multiple requests to maintain user authentication and identity. Additional capabilities include web data scraping, the ability to download remote web content, and the maintena
Helium is a Python library and high-level wrapper for Selenium designed for browser automation, functional UI testing, and web scraping. It provides a simplified interface for interacting with web applications across different browser engines. The library distinguishes itself by allowing users to identify and interact with web elements using visible text labels rather than relying exclusively on technical identifiers like XPaths or CSS selectors. This approach enables the creation of automation scripts based on human-readable labels. The toolkit covers a broad range of browser automation cap
This project is a comprehensive geographic location dataset and reference library providing standardized data for countries, states, and cities. It serves as a source of truth for regional hierarchies, ISO codes, coordinates, and timezone information, available as both a relational SQL database and a document-based JSON library. The project includes a custom dataset export tool that functions as a filtering engine. This allows for the generation of tailored geographic files in JSON, CSV, and GeoJSON formats by selecting only the specific regions or fields required. The dataset covers global
This project is an open source discovery resource that provides curated lists of reusable code and libraries to help developers find technical solutions for specific tasks. It utilizes a category-based indexing system to organize diverse software tools by their functional capabilities. The repository is structured as a collection of markdown-based documentation and static content, serving as a directory for manual discovery and reference. The directory covers a wide range of capability areas, including cross-platform application development, cybersecurity tool creation, network protocol impl
This project is a public health dataset providing historical and real-time COVID-19 case and death counts across the United States. It consists of a collection of CSV files containing time-series pandemic data organized by date, state, and county. The dataset includes specialized records for institutional outbreaks, tracking infection and death rates within correctional facilities, colleges, and universities. It also provides statistics on excess mortality to estimate total pandemic impact and survey-based data on mask usage prevalence across different counties. To facilitate geographic anal
Countries is a static data repository that provides standardized country information based on the ISO 3166-1 schema. The dataset includes comprehensive attributes such as country names, codes, currencies, languages, borders, and area, stored as flat files in multiple formats including JSON, CSV, XML, and YAML without requiring a database or runtime server. The project includes a command-line tool that allows users to customize the dataset by including or excluding specific fields during export, enabling the creation of tailored country data outputs. Supplementary geographic assets such as Geo
NX_Firmware is a binary repository manager and firmware distribution server designed to store and serve versioned system update files to hardware devices. It functions as a stateless file host that organizes compiled firmware binaries in a structured directory for automated retrieval via a REST API. The system maps specific firmware builds to compatible hardware revisions using schema-based metadata. It utilizes a flat-file asset hosting model, serving binary blobs directly from the filesystem to avoid the overhead of a database management system. The server handles hardware version control
This project is a community-maintained directory that serves as a comprehensive index of software tools, frameworks, and educational materials. It functions as an open-source knowledge base, organizing diverse engineering domains and technical resources into a structured taxonomy to assist developers in discovering high-quality content. The directory distinguishes itself through a decentralized peer-review model, where independent contributors curate, verify, and update entries to ensure accuracy and relevance. All information is stored in a version-controlled, flat-file markdown format, whic
iScript is a collection of Python automation scripts designed for file downloads and data extraction from various web services and cloud platforms. The project provides specialized tools for managing cloud storage, converting torrent links, retrieving music, and fixing archive encoding errors. The toolkit includes a music downloader that fetches high-quality audio tracks and applies ID3 metadata tags, as well as a magnet link converter that transforms torrent files and filters results by keyword. It also features a utility to correct character encoding discrepancies in zip archives created on
This project is a collection of educational notes and tutorials focused on Python programming, scientific computing, and data analysis. It serves as a reference for learning language basics, advanced techniques, and object-oriented design. The materials include implementation guides for building linear, logistic, and convolutional neural networks using symbolic graph frameworks. It also provides instruction on manipulating and visualizing structured data frames and performing complex mathematical operations through numerical libraries. The repository includes a system for converting interact
This project is a collection of Python programming scripts and educational mini-projects designed as a shared development environment. It serves as an open source code repository where developers can practice coding and explore data science concepts through hands-on implementation. The repository functions as a collaborative learning resource focused on the fork and pull request workflow. It utilizes a distributed version control system to coordinate community contributions and peer reviews of Python scripts.
This repository is a comprehensive collection of instructional guides and practical examples for Python development, focusing on machine learning, data science, and web scraping. It provides implementations for neural networks, reinforcement learning algorithms, and deep learning architectures using PyTorch, alongside detailed manuals for scientific computing and data visualization. The project distinguishes itself by offering specialized tutorials on concurrent programming to optimize CPU performance and guides for setting up Linux development environments. It covers the implementation of ad
Playwright-cli is a command line interface for executing web tasks and automating browser interactions using the Playwright framework. It serves as a browser binary manager for downloading and installing specific browser engines and their required system dependencies, as well as a tool for running automated test suites across multiple engines to verify application behavior. The utility functions as a browser session controller, managing browser profiles and persistent storage states via the command line. It enables the execution of automation suites across different browser engines and config
ECommerceCrawlers is an educational collection of Python-based crawler scripts designed to extract data from a variety of public websites, including e-commerce platforms, social media sites, news outlets, and multimedia sources. The project serves as a learning resource for web scraping techniques, offering ready-to-run examples that demonstrate practical data extraction methods. The toolkit covers a broad range of data types, including product listings and prices from online retail platforms, public posts and profiles from social networking sites, articles from news and blogging platforms, p
This project is a Node.js web scraping framework designed to automate data extraction through a programmatic workflow of requests, parsing, and document interaction. It functions as a headless web crawler, an HTTP request manager, and a DOM parser and extractor. The framework distinguishes itself by combining a JavaScript execution engine to interact with dynamic content and a hybrid selection system that utilizes both CSS and XPath selectors. It includes specialized middleware for proxy rotation and cookie-jar session management to maintain authenticated states and manage automated traffic.
Taiko is a browser automation framework and web end-to-end testing library used to perform programmatic user actions and verify application behavior. It functions as a headless browser testing tool capable of simulating real interactions and asserting page states in Chromium and Firefox. The project includes a browser interaction recorder that captures live actions and exports them as executable JavaScript automation scripts. It also serves as a web accessibility auditor, analyzing pages to detect accessibility violations and ensure compliance with inclusive design standards. The framework c
This project is a collection of Python implementations for web scraping, network traffic interception, data analysis, and sentiment analysis. It provides methods for extracting structured data from websites and mobile application interfaces. The collection includes tools for capturing and analyzing network packets from mobile applications to identify hidden internal API endpoints. It also features scripts for evaluating the emotional tone and public perception of text data. The project covers data manipulation and transformation of large datasets, as well as the generation of charts and grap
This project is a collection of practical and idiomatic Python code recipes, technical tutorials, and programming references. It serves as an example-driven resource that translates theoretical programming concepts into executable Python source code. The repository is organized as a series of standalone scripts and modular recipes. Each sample is designed for stateless execution, allowing individual problem-solving patterns to be run independently without shared global state or complex setup. The content focuses on Python language mastery and software development. It covers the implementatio
This project is a Python education repository and programming tutorial designed to teach language fundamentals, from basic syntax and variables to advanced concepts. It serves as a data science starter kit and a guide for REST API integration. The repository provides instructional scripts and sample code covering object-oriented programming patterns and asynchronous programming. It includes practical demonstrations for fetching and processing JSON data from external web services using HTTP requests. The materials cover a broad capability surface including data analysis workflows with interac
Python-Guide-CN is a Chinese translation of a comprehensive guide to idiomatic Python programming and software development. It serves as a curated programming tutorial and ecosystem reference, providing a structured path for learning Python syntax, standard libraries, and professional coding patterns. The project distinguishes itself by offering detailed instructions for setting up development environments across Windows, macOS, and Linux. It specifically focuses on the selection of interpreters and the management of virtual environments to ensure a consistent workspace. The guide covers a b
Scraperr is a self-hosted web scraping and crawling platform designed for extracting structured data from websites using XPath selectors. It functions as a containerized system for managing scraping jobs through a queue and analyzing the resulting content using artificial intelligence. The project differentiates itself through its Kubernetes-native architecture, allowing for scalable deployment and management via package managers. It includes a crawling engine capable of domain-level spidering to discover linked pages and a data analyzer that uses artificial intelligence to query extracted we
TestCafe is a Node.js end-to-end web testing framework used to automate browser tests with JavaScript or TypeScript. It serves as a cross-browser testing tool and a command-line execution engine designed for integration into continuous integration pipelines. The framework supports behavior-driven development by mapping human-readable Gherkin syntax to automation logic. It also includes an integrated web accessibility auditor to identify violations within web applications. The toolset covers a broad range of automation capabilities, including parallel test execution across multiple browser in
Puppeteer Sharp is a web browser automation library and a headless Chrome .NET API. It provides a type-safe C# interface for controlling headless browsers, functioning as a Chrome DevTools Protocol wrapper that translates .NET method calls into JSON-RPC messages. The project enables programmatic navigation of pages, interaction with elements, and the execution of JavaScript within a .NET environment. It serves as an end-to-end testing framework for simulating user workflows and verifying web application behavior. Additional capabilities include automated screenshot generation for visual regr