# jhao104/proxy_pool

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/jhao104-proxy-pool).**

23,426 stars · 5,389 forks · Python · MIT

## Links

- GitHub: https://github.com/jhao104/proxy_pool
- Homepage: https://jhao104.github.io/proxy_pool/
- awesome-repositories: https://awesome-repositories.com/repository/jhao104-proxy-pool.md

## Topics

`crawler` `http` `proxy` `redis` `spider`

## Description

This project is a Python-based proxy pool manager that collects, validates, and serves free proxy IP addresses through an HTTP API. It consists of an automated scraper to gather addresses from multiple online sources, a persistent database-backed store for organization, and a delivery interface for retrieving validated proxies.

The system features a pluggable scraper architecture that allows for the integration of custom discovery methods and source expansion via generator functions. It employs decorator-based validation logic, enabling the definition of custom connectivity and HTTPS criteria based on response codes and content length.

The manager handles background-scheduled collection and maintenance to ensure a consistent supply of functional addresses. It provides programmatic interfaces for retrieving, listing, and removing proxies, while using a key-value storage layer for high-speed retrieval and persistent state.

Network accessibility is managed through configurable listening IP addresses and ports.

## Tags

### Networking & Communication

- [Proxy Management Systems](https://awesome-repositories.com/f/networking-communication/proxy-management-systems.md) — Provides a comprehensive system to collect, validate, and serve free proxy IP addresses via an HTTP API.
- [Proxy Pool Automation](https://awesome-repositories.com/f/networking-communication/proxy-pool-automation.md) — Automates the collection and validation of free proxy addresses for consistent availability in a database. ([source](https://proxy-pool.readthedocs.io/zh/latest/user/index.html))
- [Proxy Request Routers](https://awesome-repositories.com/f/networking-communication/http-proxies/proxy-request-routers.md) — Exposes a REST interface that distributes outgoing requests across a pool of proxies to avoid IP bans.
- [Proxy List APIs](https://awesome-repositories.com/f/networking-communication/proxy-list-apis.md) — Provides programmatic interfaces for retrieving curated lists of functional proxy servers. ([source](https://cdn.jsdelivr.net/gh/jhao104/proxy_pool@master/README.md))
- [Proxy Scrapers](https://awesome-repositories.com/f/networking-communication/proxy-scrapers.md) — Provides an automated scraper to gather proxy addresses from multiple online sources for pool population.
- [Proxy Connectivity Testing](https://awesome-repositories.com/f/networking-communication/proxy-connectivity-testing.md) — Tests collected proxy addresses for connectivity to ensure only functional IPs are served. ([source](https://proxy-pool.readthedocs.io/zh/latest/dev/index.html))
- [Proxy Pool Pruning](https://awesome-repositories.com/f/networking-communication/proxy-pool-pruning.md) — Removes unresponsive proxy addresses from the storage layer immediately upon receiving failure reports from clients.
- [Proxy Source Aggregators](https://awesome-repositories.com/f/networking-communication/proxy-source-aggregators.md) — Combines diverse proxy links from custom discovery methods into a single managed stream. ([source](https://cdn.jsdelivr.net/gh/jhao104/proxy_pool@master/README.md))
- [Dynamic Proxy Selection](https://awesome-repositories.com/f/networking-communication/traffic-proxying/proxy-traffic-management/dynamic-proxy-selection.md) — Enables the selection of a single functional proxy IP from the pool for specific scraping tasks. ([source](https://proxy-pool.readthedocs.io/zh/latest/))

### Part of an Awesome List

- [Proxy Availability Checks](https://awesome-repositories.com/f/awesome-lists/devops/tasks-and-scheduling/proxy-availability-checks.md) — Automates the gathering and verification of proxy server reachability through a background scheduler. ([source](https://proxy-pool.readthedocs.io/zh/latest/user/how_to_run.html))

### Data & Databases

- [Proxy Stores](https://awesome-repositories.com/f/data-databases/key-value-stores/sql-backed-stores/proxy-stores.md) — Implements a persistent database-backed store using hashes for high-speed proxy IP retrieval.
- [Key-Value Stores](https://awesome-repositories.com/f/data-databases/key-value-stores.md) — Uses a Redis-based key-value store to maintain validated proxy addresses for high-speed retrieval.
- [Proxy Address Storage](https://awesome-repositories.com/f/data-databases/proxy-address-storage.md) — Saves validated proxy addresses into a database hash for persistent storage and high-speed retrieval. ([source](https://proxy-pool.readthedocs.io/zh/latest/user/how_to_use.html))

### Web Development

- [Proxy Delivery Interfaces](https://awesome-repositories.com/f/web-development/proxy-management-interfaces/proxy-delivery-interfaces.md) — Implements an HTTP interface to serve valid proxy addresses from a managed pool to external clients. ([source](https://proxy-pool.readthedocs.io/zh/latest/user/how_to_run.html))

### Software Engineering & Architecture

- [Background Task Schedulers](https://awesome-repositories.com/f/software-engineering-architecture/execution-control/background-task-schedulers.md) — Uses a timer-based background process to periodically scrape and refresh the pool of proxy addresses.
- [Pluggable Scraper Architectures](https://awesome-repositories.com/f/software-engineering-architecture/pluggable-scraper-architectures.md) — Provides a pluggable architecture allowing new proxy discovery methods to be integrated via generator functions.

### System Administration & Monitoring

- [Proxy Control APIs](https://awesome-repositories.com/f/system-administration-monitoring/proxy-control-apis.md) — Provides a programmable interface to manage and remove unresponsive proxies from the active pool. ([source](https://proxy-pool.readthedocs.io/zh/latest/))
- [Recurring Maintenance Scheduling](https://awesome-repositories.com/f/system-administration-monitoring/recurring-maintenance-scheduling.md) — Gathers free proxy addresses from multiple online sources on a schedule to maintain pool health. ([source](https://cdn.jsdelivr.net/gh/jhao104/proxy_pool@master/README.md))
