# spiderclub/haipproxy

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/spiderclub-haipproxy).**

5,535 stars · 899 forks · Python · MIT

## Links

- GitHub: https://github.com/SpiderClub/haipproxy
- Homepage: https://spiderclub.github.io/haipproxy/
- awesome-repositories: https://awesome-repositories.com/repository/spiderclub-haipproxy.md

## Topics

`crawler` `distributed` `high-availability` `ipproxy` `redis` `scheduler` `scrapy` `spider`

## Description

Haipproxy is a high-availability proxy gateway and distributed proxy pool manager. It consists of a system for storing and rotating verified IP proxy addresses using Redis, a web crawling system to discover anonymous proxies from public sources, and a validation engine that checks proxy functionality against specific target domains.

The project implements a middleware layer that provides a stable entry point for requests by automatically rotating backend IP addresses. This includes a local proxy server that acts as a bridge between the client and the pool, decoupling the two by updating internal IP lists.

The system covers distributed IP management, anonymous proxy retrieval, and health monitoring to track operational metrics and application bugs in large-scale environments. It also provides site-specific validation logic to ensure proxies are functional for particular domains rather than just globally active.

## Tags

### Networking & Communication

- [Proxy and User-Agent Rotation Middleware](https://awesome-repositories.com/f/networking-communication/proxy-rotation-services/proxy-and-fingerprint-rotation/proxy-and-user-agent-rotation-middleware.md) — Provides a middleware layer that automatically cycles through available proxy addresses to ensure a stable entry point for requests.
- [Distributed Proxy Managers](https://awesome-repositories.com/f/networking-communication/distributed-proxy-managers.md) — Manages a distributed pool of verified proxy addresses using Redis to ensure consistent website access.
- [Redis-Backed Proxy Pools](https://awesome-repositories.com/f/networking-communication/network-reliability-diagnostics/network-filtering/ip-address-filters/network-traffic-proxying/outbound-ip-rotation/proxy-pool-builders/redis-backed-proxy-pools.md) — Provides a distributed system for storing, rotating and managing a pool of verified IP proxy addresses using Redis.
- [Local Middleware Proxies](https://awesome-repositories.com/f/networking-communication/proxy-servers/local-middleware-proxies.md) — Implements a local proxy server that acts as a bridge between the client and the proxy pool.
- [Local Proxy Bridge Configurations](https://awesome-repositories.com/f/networking-communication/proxy-servers/proxy-configurations/local-proxy-bridge-configurations.md) — Allows the setup of a local proxy server that automatically updates IP lists to act as middleware. ([source](https://spiderclub.github.io/haipproxy/))
- [Single Proxy Fetches](https://awesome-repositories.com/f/networking-communication/direct-url-retrievals/url-dataset-fetches/proxy-configuration-fetches/single-proxy-fetches.md) — Enables fetching verified IP addresses from a cached pool to ensure reliable access to target websites. ([source](https://cdn.jsdelivr.net/gh/spiderclub/haipproxy@master/README.md))
- [High Availability Web Scraping](https://awesome-repositories.com/f/networking-communication/high-availability-web-scraping.md) — Scales data extraction workflows by rotating through a distributed set of healthy proxies to avoid IP blocks.
- [Domain-Specific Proxy Validators](https://awesome-repositories.com/f/networking-communication/http-proxies/domain-specific-proxy-validators.md) — Checks proxy functionality against specific target domains to ensure consistent connectivity.
- [Proxy Response Validators](https://awesome-repositories.com/f/networking-communication/http-proxies/proxy-response-validators.md) — Provides custom validation logic to verify proxy functionality for specific domains. ([source](https://github.com/SpiderClub/haipproxy/blob/master/docs/%E9%92%88%E5%AF%B9%E7%89%B9%E5%AE%9A%E7%AB%99%E7%82%B9%E6%B7%BB%E5%8A%A0%E6%A0%A1%E9%AA%8C%E5%99%A8.md))

### DevOps & Infrastructure

- [Gateway High Availability](https://awesome-repositories.com/f/devops-infrastructure/gateway-high-availability.md) — Implements a high-availability gateway that provides a stable entry point while rotating backend IP addresses.
- [Anonymous Proxy Retrieval](https://awesome-repositories.com/f/devops-infrastructure/rest-api-endpoint-management/single-endpoint-apis/proxy-retrieval-endpoints/anonymous-proxy-retrieval.md) — Provides a mechanism to retrieve validated anonymous IP addresses through a dedicated client. ([source](https://spiderclub.github.io/haipproxy/))

### Software Engineering & Architecture

- [Proxy Pool Caches](https://awesome-repositories.com/f/software-engineering-architecture/distributed-task-queues/redis-backed-queues/proxy-pool-caches.md) — Uses Redis as a distributed cache to store validated IP addresses for fast retrieval across crawler instances.
- [Proxy Crawler Frameworks](https://awesome-repositories.com/f/software-engineering-architecture/proxy-crawler-frameworks.md) — Implements a specialized crawling system to automatically discover and validate anonymous proxies from public sources.
- [Site-Specific Proxy Validators](https://awesome-repositories.com/f/software-engineering-architecture/site-specific-proxy-validators.md) — Verifies that proxies are functional for specific target domains rather than just globally active.

### Web Development

- [Scrapy-Framework-Based Crawlers](https://awesome-repositories.com/f/web-development/web-crawlers/scrapy-framework-based-crawlers.md) — Uses the Scrapy framework to discover and harvest anonymous proxy addresses from public web sources.
