ProxyPool is a proxy pool manager that automatically collects, validates, and serves HTTP proxies from multiple sources through a web API. At its core, it runs scheduled background processes that scrape free and paid proxy websites, test each proxy's availability against configurable target URLs using asynchronous HTTP clients, and store the results in a Redis-backed sorted set where proxies are scored and ranked by reliability. The system distinguishes itself through a pluggable crawler architecture that allows users to add new proxy sources by writing a simple class with target URLs and a p
Garage is a distributed object storage system that provides an S3-compatible API gateway. It is designed to synchronize metadata across distributed nodes using conflict-free replicated data types and Merkle-tree state alignment to maintain cluster-wide consistency. The system ensures data resilience through zone-aware replication, distributing data copies across multiple physical locations. It employs quorum-based request routing and versioned layout management to validate and commit cluster configuration changes. The project covers a broad range of operational capabilities, including automa
Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to support real-time analytics and event-driven applications. It functions as a partitioned, distributed key-value store that replicates data across cluster nodes to provide low-latency access and high availability. The platform also serves as a distributed SQL query engine, allowing users to execute standard SQL statements against both in-memory datasets and external data sources. What distinguishes Hazelcast is its use of a distributed consensus subsystem to maintain strongly consis
Pholcus is a distributed web crawler framework written in Go designed for high-concurrency data extraction. It functions as a distributed crawling orchestrator and dynamic data extraction engine, utilizing a server-client architecture to coordinate tasks across multiple nodes. The system integrates a headless browser engine to render dynamic content and execute JavaScript, allowing it to extract data from single-page applications. It features a web-based management interface for configuring spider parameters and monitoring execution progress, alongside the ability to update extraction rules v