30 open-source projects similar to spiderclub/haipproxy, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Haipproxy alternative.
ProxyPool is a proxy pool manager that automatically collects, validates, and serves HTTP proxies from multiple sources through a web API. At its core, it runs scheduled background processes that scrape free and paid proxy websites, test each proxy's availability against configurable target URLs using asynchronous HTTP clients, and store the results in a Redis-backed sorted set where proxies are scored and ranked by reliability. The system distinguishes itself through a pluggable crawler architecture that allows users to add new proxy sources by writing a simple class with target URLs and a p
Garage is a distributed object storage system that provides an S3-compatible API gateway. It is designed to synchronize metadata across distributed nodes using conflict-free replicated data types and Merkle-tree state alignment to maintain cluster-wide consistency. The system ensures data resilience through zone-aware replication, distributing data copies across multiple physical locations. It employs quorum-based request routing and versioned layout management to validate and commit cluster configuration changes. The project covers a broad range of operational capabilities, including automa
Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to support real-time analytics and event-driven applications. It functions as a partitioned, distributed key-value store that replicates data across cluster nodes to provide low-latency access and high availability. The platform also serves as a distributed SQL query engine, allowing users to execute standard SQL statements against both in-memory datasets and external data sources. What distinguishes Hazelcast is its use of a distributed consensus subsystem to maintain strongly consis
Pholcus is a distributed web crawler framework written in Go designed for high-concurrency data extraction. It functions as a distributed crawling orchestrator and dynamic data extraction engine, utilizing a server-client architecture to coordinate tasks across multiple nodes. The system integrates a headless browser engine to render dynamic content and execute JavaScript, allowing it to extract data from single-page applications. It features a web-based management interface for configuring spider parameters and monitoring execution progress, alongside the ability to update extraction rules v
ECommerceCrawlers is an educational collection of Python-based crawler scripts designed to extract data from a variety of public websites, including e-commerce platforms, social media sites, news outlets, and multimedia sources. The project serves as a learning resource for web scraping techniques, offering ready-to-run examples that demonstrate practical data extraction methods. The toolkit covers a broad range of data types, including product listings and prices from online retail platforms, public posts and profiles from social networking sites, articles from news and blogging platforms, p
This project is a proxy aggregation platform designed to collect and verify free proxy server lists from web platforms, social media, and public repositories. It functions as a crawler framework that gathers proxy data and subscription links, a validation tool for testing server liveness, and a synchronization service for distributing the results. The system uses a plugin-based architecture that allows for the integration of custom Python scripts to handle diverse web source structures. It also includes utilities to transform raw proxy data into standardized configuration formats compatible w
This project is a distributed web crawling framework that enables the horizontal scaling of scraping tasks. It uses Redis as a centralized request queue manager and state store to coordinate crawl progress and request metadata across multiple server instances. The system distributes crawling workloads by sharing a single request queue and utilizes a distributed duplicate filter to prevent multiple workers from visiting the same page. It persists complex request state and metadata as JSON strings within the shared remote store. The framework also provides capabilities for distributed data pro
MikuMikuBeam is a hybrid command-line and web-based tool for launching configurable network stress tests with real-time monitoring and plugin extensibility. It provides a modular pipeline for constructing and executing network attacks, supporting configurable parameters such as target, packet size, duration, and delay. The tool distinguishes itself through a dual-mode configuration interface that allows attack parameters to be set via both a web UI and command-line arguments, with CLI providing colored real-time output. It features isolated client session management where each browser tab spa
XX-Net is a cross-platform desktop application that functions as a local proxy server and network traffic router. It intercepts outgoing network requests from a local machine and redirects them through encrypted tunnels to a distributed mesh of cloud-based nodes, facilitating secure and reliable access to external resources. The software distinguishes itself by providing a centralized management interface for coordinating complex proxy infrastructure. It employs rule-based traffic routing, allowing users to define custom logic based on destination addresses and protocols to determine the opti
Cerebro is an administration tool for OpenSearch and Elasticsearch clusters, providing a web-based graphical interface to monitor health and manage performance. It serves as a central console for cluster administration, including the creation and organization of indices, aliases, and index templates. The project distinguishes itself through integrated directory authentication, utilizing LDAP services to manage user identities and access permissions. It also includes a dedicated REST client console for sending manual requests to clusters, featuring autocompletion and the ability to export requ
k8sgpt is a suite of Kubernetes-focused tools designed for AI-powered debugging, cluster diagnostics, and self-healing. It functions as an automated analyzer and debugger that uses large language models to explain cluster errors, suggest remediation steps, and identify resource failures. The project distinguishes itself through an extensible analysis framework that supports custom diagnostic plugins and a Model Context Protocol server, which exposes cluster diagnostics as tools for AI assistants. It includes a self-healing agent capable of automatically generating and applying fixes for detec
Kafka Manager is a web-based management interface and monitoring tool for Apache Kafka clusters. It serves as a central control plane for topic administration, consumer monitoring, and cluster health inspection. The project provides specialized utilities for data rebalancing and partition reassignment to distribute workloads across brokers. It also includes tools to optimize partition leadership by electing preferred replicas. The platform covers a broad range of administrative capabilities, including the creation and configuration of message topics, tracking of consumer offsets, and the col
This project is a comprehensive educational resource and operational handbook for Kubernetes. It serves as a technical reference for installing, managing, and scaling container orchestration clusters across diverse environments, covering the core architectural principles and system components required to maintain containerized applications. The resource provides structured guides for cluster administration, including high availability setups, resource control, and data backup operations. It also functions as a security audit and troubleshooting manual, offering instructions for identifying no
DevOps-Bash-tools is a collection of shell scripts and aliases designed to automate cloud infrastructure, container orchestration, and CI/CD pipelines. It provides a comprehensive toolset for managing operational workflows through the command line. The project specializes in automating tasks across multiple platforms, including managing namespaces and secrets in Kubernetes, auditing resources in AWS and GCP, and triggering builds or managing environment variables in GitHub Actions, GitLab CI, and CircleCI. It also includes a toolkit for interacting with container registries to query manifests
Coroot is an observability platform and Kubernetes performance monitor that utilizes eBPF to automatically collect metrics, logs, and traces without requiring manual code instrumentation. It functions as an OpenTelemetry trace analyzer and an LLM observability gateway, exposing system health data to large language models through the Model Context Protocol. The platform differentiates itself by combining automated root cause analysis and AI-driven diagnostics to investigate performance regressions. It also includes a cloud cost monitoring tool that attributes infrastructure spending to specifi
Pigsty is a full-stack orchestration suite for deploying, monitoring, and managing high-availability PostgreSQL clusters and their supporting infrastructure. It functions as a cluster management platform and high-availability suite that automates failover, manages virtual IPs, and ensures data consistency through distributed consensus. The project distinguishes itself by providing a comprehensive database infrastructure-as-code framework and a dedicated observability stack. It incorporates a backup and recovery manager supporting point-in-time recovery via S3-compatible object storage, alongs
Arkime is a distributed packet analysis platform and full packet capture system designed for recording raw network traffic, indexing metadata, and performing network forensics. It functions as a network traffic indexer and security tool that enables the monitoring, querying, and browsing of large-scale network traffic across multi-cluster architectures. The platform distinguishes itself through its ability to manage distributed capture clusters from a centralized administrative dashboard. It integrates external data feeds with internal traffic logs to identify known threats and provides a pro
Elasticvue is a management user interface and graphical dashboard for Elasticsearch clusters. It serves as a cluster browser and administration tool, providing a visual way to monitor cluster health, manage indices, and browse documents without writing raw JSON. The project is delivered as a cross-platform application, available as a browser extension, a standalone desktop application, or a containerized web service via Docker. The interface covers cluster administration, including the management of shards, aliases, and snapshot repositories for backups. It also includes data exploration too
Flower is a monitoring and administration tool for Celery task queues. It provides a real-time web dashboard and a REST API to monitor distributed task clusters, manage worker instances, and observe message broker health. The project distinguishes itself by offering centralized control over the task lifecycle, allowing users to trigger, revoke, or terminate tasks and apply execution rate limits. It also includes a Prometheus metrics exporter to surface internal performance and status data for external monitoring and alerting systems. The tool covers a broad range of observability and managem
Elasticsearch Head is a web-based graphical interface for monitoring and administering Elasticsearch clusters. It serves as a cluster management UI, a topology visualizer for nodes and shards, and a REST API client for sending HTTP requests and analyzing JSON responses. The tool distinguishes itself by providing a visual map of cluster topology to monitor data distribution and health. It includes a local proxy to enable administration of remote clusters that are not directly accessible and supports the injection of basic authentication headers for secure request handling. The platform covers
Vitess is a database clustering system for horizontal scaling of MySQL. It functions as a middleware layer that abstracts complex sharding and physical topology, allowing applications to interact with a distributed database environment through a unified interface. By intercepting and routing SQL queries across multiple shards, it enables large-scale data management while maintaining the appearance of a single database instance. The platform distinguishes itself through its ability to perform online schema migrations and distributed transaction coordination without requiring application downti
GreptimeDB is a distributed, open-source time-series database built for unified observability. It stores and queries metrics, logs, and traces together in a single columnar engine, supporting both SQL and PromQL for analysis. The database is designed as a Kubernetes-native operator with a decoupled compute and storage architecture, enabling horizontal scaling and multi-region deployment. What distinguishes GreptimeDB is its role as a multi-protocol ingestion gateway, accepting data through OpenTelemetry, Prometheus Remote Write, InfluxDB, Loki, Elasticsearch, Kafka, and MQTT protocols without
StatsD is a network-based metrics daemon and aggregator that collects application performance data, such as counters and timers, for periodic delivery to backend services. It functions as system monitoring middleware, receiving telemetry via UDP to minimize performance overhead on monitored services. The system acts as a distributed metrics router, employing consistent hashing to distribute data points across clusters and ensure aggregation accuracy. It includes cluster health monitoring to track node availability and automatically recalculate routing paths when services go offline. The proj
This project provides a comprehensive architectural blueprint and implementation set for building a platform-as-a-service on Kubernetes. It serves as a technical resource for deploying container orchestration environments, managing the full software development lifecycle, and integrating a complete DevOps toolchain. The implementation emphasizes automated software delivery through the integration of build and delivery pipelines, private container registries, and distributed configuration systems. It enables the decoupling of application settings from images via a centralized configuration man
Elasticsearch-HQ is a web-based management interface and orchestration tool for monitoring and administering Elasticsearch clusters. It provides a centralized application to manage multiple remote instances, facilitating cluster monitoring, index management, and API administration through a graphical user interface. The project functions as an API proxy and management suite, allowing users to execute REST API requests, track node performance and shard stability via real-time dashboards, and handle data maintenance tasks. It includes dedicated utilities for managing snapshot repositories and b
Olares is a comprehensive suite of self-hosted identity, storage, AI, and orchestration services designed for private infrastructure management. It functions as a Kubernetes home server orchestrator, enabling the deployment of containerized applications, AI models, and GPU resources on local hardware to replace third-party cloud services. The platform distinguishes itself through a combination of self-hosted AI infrastructure for running large language models and image generators, alongside a decentralized identity manager that uses cryptographic keys and OIDC for trustless authentication. It
KnowStreaming is a centralized Kafka cluster management platform that unifies multi-cluster federation, load balancing, disaster recovery, and resource governance through a web-based graphical interface. It provides a single control plane for administering brokers, topics, partitions, consumer groups, ACLs, and connectors across heterogeneous Kafka clusters without requiring CLI commands or agent deployment on brokers. The platform distinguishes itself through automated load balancing that redistributes partition leaders and replicas to eliminate hotspots and improve throughput, combined with
This is a Raft consensus library and distributed consensus engine implemented in Go. It provides the primitives necessary to build fault-tolerant distributed services by implementing a replicated state machine that ensures a group of servers agree on a shared system state through leader election and log replication. The project distinguishes itself through a pluggable architecture for storage backends and snapshot storage, decoupling the consensus logic from physical persistence. It includes specialized mechanisms for leadership transfer, protocol version management to support rolling upgrade
Flux is a Kubernetes GitOps delivery tool used to automate application deployments by synchronizing cluster state with configurations stored in Git, OCI, or Helm repositories. It functions as a set of controllers that monitor desired state in external sources and continuously reconcile the live cluster to match those definitions. The system distinguishes itself through a multi-cluster management plane that coordinates application delivery across fleets of remote clusters from a central hub. It provides a dedicated mechanism for automated image updates, which scans container registries for new
This is a learning collection of example projects that demonstrate core Spring Cloud patterns for building microservice architectures. The repository covers the fundamental building blocks of a microservices system, including service discovery through a central registry, centralized configuration management from Git or SVN repositories, API gateway-based request routing, circuit breaker patterns for fault tolerance, and distributed request tracing across service boundaries. The examples show how to implement service registration and dynamic discovery so that clients can locate microservices b