This project is a Python-based proxy pool manager that collects, validates, and serves free proxy IP addresses through an HTTP API. It consists of an automated scraper to gather addresses from multiple online sources, a persistent database-backed store for organization, and a delivery interface for retrieving validated proxies. The system features a pluggable scraper architecture that allows for the integration of custom discovery methods and source expansion via generator functions. It employs decorator-based validation logic, enabling the definition of custom connectivity and HTTPS criteria
ProxyPool is a proxy pool manager that automatically collects, validates, and serves HTTP proxies from multiple sources through a web API. At its core, it runs scheduled background processes that scrape free and paid proxy websites, test each proxy's availability against configurable target URLs using asynchronous HTTP clients, and store the results in a Redis-backed sorted set where proxies are scored and ranked by reliability. The system distinguishes itself through a pluggable crawler architecture that allows users to add new proxy sources by writing a simple class with target URLs and a p
ProxyBroker is a tool for scraping public HTTP and SOCKS proxy addresses, validating their connectivity, and managing a curated pool of functional proxies. It consists of a proxy scraper for discovery, a validation engine to check anonymity and response times, and a pool manager to maintain a filtered queue of servers. The project includes a local rotating proxy server that acts as a single entry point, automatically distributing incoming network traffic across a pool of validated external proxies. This infrastructure allows for the rotation of IP addresses to maintain resilience during web d
Scylla is a system for managing HTTP proxy pools and automating web extraction. It provides a specialized data acquisition pipeline designed for gathering large-scale internet datasets for training and fine-tuning large language models. The project features a proxy rotation gateway that assigns fresh proxy addresses to incoming requests to mask origin traffic and avoid IP blocking. It includes a proxy pool manager that handles the collection, functional validation, and orchestration of proxy servers, complemented by a web dashboard for monitoring the health and geographic distribution of the