This project is a distributed web crawling framework that enables the horizontal scaling of scraping tasks. It uses Redis as a centralized request queue manager and state store to coordinate crawl progress and request metadata across multiple server instances. The system distributes crawling workloads by sharing a single request queue and utilizes a distributed duplicate filter to prevent multiple workers from visiting the same page. It persists complex request state and metadata as JSON strings within the shared remote store. The framework also provides capabilities for distributed data pro
Haipproxy is a high-availability proxy gateway and distributed proxy pool manager. It consists of a system for storing and rotating verified IP proxy addresses using Redis, a web crawling system to discover anonymous proxies from public sources, and a validation engine that checks proxy functionality against specific target domains. The project implements a middleware layer that provides a stable entry point for requests by automatically rotating backend IP addresses. This includes a local proxy server that acts as a bridge between the client and the pool, decoupling the two by updating inter
Kvrocks is a distributed key-value store and Redis-compatible NoSQL database. It utilizes a RocksDB storage engine to provide disk-based persistence, allowing for high-capacity data storage with reduced memory costs compared to in-memory systems. The system functions as a vector database and full-text search engine, supporting nearest-neighbor searches on vector embeddings and complex document queries via text matching. It employs a proxyless cluster architecture with slot-based routing to distribute data and scale capacity across multiple nodes. The platform covers a wide range of data mana
Sidekiq is a Ruby background processing framework and asynchronous task runner. It functions as a Redis-backed background job processor that offloads heavy or time-consuming work from web requests to separate worker processes to ensure the main application remains responsive. The system operates as a Redis task queue, storing pending jobs in Redis to be processed concurrently by multiple threads. It provides a framework for distributed task queueing and asynchronous job scheduling to coordinate work across multiple server instances. The project covers Ruby application scaling by executing ba