Haipproxy

Haipproxy - manage distributed proxy pools | Awesome Repos

Features

Proxy and User-Agent Rotation Middleware - Provides a middleware layer that automatically cycles through available proxy addresses to ensure a stable entry point for requests.
Gateway High Availability - Implements a high-availability gateway that provides a stable entry point while rotating backend IP addresses.
Distributed Proxy Managers - Manages a distributed pool of verified proxy addresses using Redis to ensure consistent website access.
Redis-Backed Proxy Pools - Provides a distributed system for storing, rotating and managing a pool of verified IP proxy addresses using Redis.
Local Middleware Proxies - Implements a local proxy server that acts as a bridge between the client and the proxy pool.
Local Proxy Bridge Configurations - Allows the setup of a local proxy server that automatically updates IP lists to act as middleware.
Proxy Pool Caches - Uses Redis as a distributed cache to store validated IP addresses for fast retrieval across crawler instances.
Proxy Crawler Frameworks - Implements a specialized crawling system to automatically discover and validate anonymous proxies from public sources.
Scrapy-Framework-Based Crawlers - Uses the Scrapy framework to discover and harvest anonymous proxy addresses from public web sources.
Anonymous Proxy Retrieval - Provides a mechanism to retrieve validated anonymous IP addresses through a dedicated client.
Single Proxy Fetches - Enables fetching verified IP addresses from a cached pool to ensure reliable access to target websites.
High Availability Web Scraping - Scales data extraction workflows by rotating through a distributed set of healthy proxies to avoid IP blocks.
Domain-Specific Proxy Validators - Checks proxy functionality against specific target domains to ensure consistent connectivity.
Proxy Response Validators - Provides custom validation logic to verify proxy functionality for specific domains.
Site-Specific Proxy Validators - Verifies that proxies are functional for specific target domains rather than just globally active.

Open-source alternatives to Haipproxy

Similar open-source projects, ranked by how many features they share with Haipproxy.

python3webspider/proxypool
Python3WebSpider/ProxyPool
6,223View on GitHub
ProxyPool is a proxy pool manager that automatically collects, validates, and serves HTTP proxies from multiple sources through a web API. At its core, it runs scheduled background processes that scrape free and paid proxy websites, test each proxy's availability against configurable target URLs using asynchronous HTTP clients, and store the results in a Redis-backed sorted set where proxies are scored and ranked by reliability. The system distinguishes itself through a pluggable crawler architecture that allows users to add new proxy sources by writing a simple class with target URLs and a p
Pythonflaskhttpproxy
View on GitHub6,223
deuxfleurs-org/garage
deuxfleurs-org/garage
2,944View on GitHub
Garage is a distributed object storage system that provides an S3-compatible API gateway. It is designed to synchronize metadata across distributed nodes using conflict-free replicated data types and Merkle-tree state alignment to maintain cluster-wide consistency. The system ensures data resilience through zone-aware replication, distributing data copies across multiple physical locations. It employs quorum-based request routing and versioned layout management to validate and commit cluster configuration changes. The project covers a broad range of operational capabilities, including automa
Rustobject-storagerusts3
View on GitHub2,944
hazelcast/hazelcast
hazelcast/hazelcast
6,570View on GitHub
Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to support real-time analytics and event-driven applications. It functions as a partitioned, distributed key-value store that replicates data across cluster nodes to provide low-latency access and high availability. The platform also serves as a distributed SQL query engine, allowing users to execute standard SQL statements against both in-memory datasets and external data sources. What distinguishes Hazelcast is its use of a distributed consensus subsystem to maintain strongly consis
Javabig-datacachingdata-in-motion
View on GitHub6,570
henrylee2cn/pholcus
henrylee2cn/pholcus
7,578View on GitHub
Pholcus is a distributed web crawler framework written in Go designed for high-concurrency data extraction. It functions as a distributed crawling orchestrator and dynamic data extraction engine, utilizing a server-client architecture to coordinate tasks across multiple nodes. The system integrates a headless browser engine to render dynamic content and execute JavaScript, allowing it to extract data from single-page applications. It features a web-based management interface for configuring spider parameters and monitoring execution progress, alongside the ability to update extraction rules v
Go
View on GitHub7,578

See all 30 alternatives to Haipproxy

SpiderClubhaipproxy

Features

Open-source alternatives to Haipproxy

Python3WebSpider/ProxyPool

deuxfleurs-org/garage

hazelcast/hazelcast

henrylee2cn/pholcus

Star history

Open-source alternatives to Haipproxy

Python3WebSpider/ProxyPool

deuxfleurs-org/garage

hazelcast/hazelcast

henrylee2cn/pholcus