30 open-source projects similar to prometheus/alertmanager, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Alertmanager alternative.
VictoriaMetrics is a high-performance, scalable time series database and observability platform designed for long-term storage and analysis of metric, log, and trace data. It functions as a unified backend for monitoring ecosystems, offering full compatibility with industry-standard protocols and query languages. The system is built to handle massive data volumes through a distributed architecture that supports horizontal scaling and efficient data lifecycle management. The platform distinguishes itself through a storage engine that utilizes consistent hashing for data sharding and log-struct
Keep is an open-source AIOps alert management platform that aggregates, deduplicates, and orchestrates the lifecycle of alerts from multiple monitoring tools. It functions as a multi-provider integration hub to centralize the flow of data between observability, ticketing, and communication tools. The platform distinguishes itself through incident workflow automation and AI-powered enrichment. It uses a declarative workflow engine to execute multi-step operational sequences and integrates large language models to summarize event data and correlate technical logs for faster incident resolution.
Nightingale is a Prometheus-compatible monitoring and alerting platform designed to centralize telemetry management across multiple time-series databases. It functions as a multi-source alerting engine and metric data pipeline that ingests telemetry via remote write protocols and triggers alarms based on data from sources such as Prometheus, Elasticsearch, Loki, and ClickHouse. The system is distinguished by its automated alert healing system, which executes predefined scripts and RPC-based corrective actions when monitoring thresholds are breached. It supports distributed alert processing, a
HertzBeat is an agentless monitoring platform designed to collect performance metrics from network devices, databases, and servers without requiring client software. It functions as an infrastructure monitoring dashboard, an alert management system, and a centralized log aggregator using the OpenTelemetry Protocol. The system utilizes a cloud-edge collection hierarchy to scale data gathering across clusters and isolated networks. It distinguishes itself with a flexible extensibility model, allowing users to define new monitoring workflows through configuration-based metric templates and custo
This project is a comprehensive educational resource and curriculum focused on site reliability engineering, distributed systems, and infrastructure operations. It provides technical guides, a systems engineering course, and instructional manuals designed to teach the principles of managing large-scale computing environments. The curriculum covers high-level architectural design for scalability and resilience, including fault-tolerant infrastructure, high-availability patterns, and microservices decomposition. It emphasizes the practical application of site reliability engineering through the
Cortex is an open-source, horizontally scalable metrics platform that ingests, stores, and queries Prometheus-compatible time-series data with multi-tenant isolation. It accepts metrics via Prometheus remote write and OpenTelemetry, executes PromQL queries against both recent and historical data, and provides a Prometheus-compatible alerting and recording rule engine with an integrated Alertmanager. The system is built as a set of independently scalable microservices that use hash-ring-based sharding, gossip-based cluster membership, and tenant-aware object storage to distribute workloads acro
CrowdSec is a collaborative, distributed security engine designed for threat detection and infrastructure protection. It functions as an intrusion detection system that parses logs and network traffic to identify malicious patterns, utilizing a bucket-based threshold detection model to aggregate events and trigger alerts. The platform is built on a modular architecture that includes a centralized local API server for managing security signals and a relational database for persistent storage of remediation decisions. What distinguishes the project is its decoupled enforcement model, which offl
Subfinder is a security reconnaissance framework designed for subdomain enumeration and attack surface management. It functions as a discovery engine that identifies and maps internet-exposed infrastructure, cloud-hosted assets, and network ranges to maintain a comprehensive inventory of an organization's digital footprint. The project distinguishes itself through a modular, template-driven scanning engine that executes security checks against discovered assets. It leverages cloud-native asset discovery to query provider APIs and infrastructure metadata, while supporting distributed agent orc
Checkmate is an open-source, self-hosted tool designed to track and monitor server hardware, uptime, response times, and incidents in real-time with beautiful visualizations. Don't be shy, join here: https://discord.com/invite/NAb6H3UTjK :)
Tautulli is a monitoring tool and administration interface for Plex Media Servers. It tracks real-time streaming activity, maintains detailed playback histories, and provides a centralized dashboard for server analytics. The project distinguishes itself through an event-driven notification system that triggers custom scripts and alerts based on server activity. It includes a template-based engine for generating periodic newsletters and utilizes webhooks to dispatch alerts to third-party services. The software covers broad capability areas including media library auditing, usage trend analysi
Uptrace is an OpenTelemetry-based observability platform designed to collect, store, and analyze distributed traces, metrics, and logs. It functions as a centralized logging backend, a distributed tracing system, and a metrics engine to monitor application performance and system health. The platform is distinguished by AI-powered operational capabilities, allowing users to query telemetry data and manage monitoring dashboards using natural language. It specifically includes specialized monitoring for generative AI pipelines, tracking token usage and response quality for LLM interactions and r
Prometheus is a comprehensive monitoring and alerting platform designed to track infrastructure health and application performance. It functions as a time series database that ingests, indexes, and queries high-frequency numerical data points. By utilizing a pull-based model, the system periodically collects multi-dimensional metrics from monitored targets, storing them in an optimized block storage format that supports high-throughput ingestion and efficient historical analysis. The platform distinguishes itself through a specialized query engine that enables real-time analysis of performanc
bililive-go is an automated broadcast archivist and recording tool designed specifically for Bilibili live streams. It functions as a monitoring service that tracks broadcast status and automatically captures live video content to local storage based on target identifiers. The system features a web-based manager that allows for the remote configuration of recording targets and global settings via a browser interface. It supports the simultaneous recording of multiple streams and provides real-time status alerts through external messaging services when broadcasts start, end, or encounter error
dockprom is a monitoring stack based on Prometheus and Grafana designed to track the performance of Docker containers and their underlying hosts. It functions as a complete solution for gathering real-time metrics and displaying them through a self-hosted dashboard. The project includes a suite of tools for collecting container and host metrics, as well as a discovery tool specifically for automatically identifying and adding tagged EC2 instances to the monitoring configuration. The system covers several observability areas, including time-series data storage and the creation of performance
SigNoz is a full-stack observability platform designed to collect, store, and visualize metrics, logs, and distributed traces in a unified environment. It leverages OpenTelemetry-based data collection to ingest telemetry from diverse sources using vendor-neutral protocols, ensuring interoperability across complex microservices architectures. The platform utilizes a high-performance columnar storage engine to enable rapid aggregation and filtering, providing a centralized backend for monitoring application health and performance. What distinguishes the platform is its focus on automated instru
HertzBeat is a real-time observability platform that provides agentless monitoring for servers, databases, and networks. It functions as an infrastructure alerting manager, an OpenTelemetry Protocol log aggregator, and a public status page generator. The platform integrates an analysis engine that uses large language models to process monitoring data and generate system insights. It utilizes a cloud-edge collaborative architecture and distributed collector clustering to scale data gathering across large-scale networks. The system covers a broad range of observability capabilities, including
This project is a detection-as-code framework providing a library of security monitoring rules and predefined detection content for Elasticsearch data indices. It serves as a threat detection rule library designed to identify malicious activity and attack patterns across diverse data streams in cloud and on-premises environments. The framework implements a detection engineering workflow where rules are defined in YAML and managed as versioned code. It includes a set of command-line utilities for automated rule deployment, metadata searching, and template generation, supported by a Python-base
Akka.NET is an actor model framework used for building concurrent and distributed applications. It functions as a distributed computing platform and state manager that enables isolated actors to communicate via asynchronous message passing, ensuring thread-safe state management without manual locks. The project is distinguished by its decentralized coordination capabilities, including a distributed state manager that uses sharding and dynamic rebalancing to maintain high availability. It incorporates an event sourcing engine that persists state as a sequence of events in an append-only log an
Akka Core is an actor model framework and asynchronous concurrency library used for building scalable and resilient distributed systems. It provides a distributed computing platform and fault tolerant runtime that manages communication and state across networked nodes. The system uses location-transparent messaging and a cluster management system to organize nodes into high-availability architectures. This allows for the creation of elastic clusters that scale resources on demand and coordinate distributed workloads. The platform handles concurrent state management and distributed systems or
Dunst is a lightweight notification daemon for Linux desktop environments that receives and displays system alerts via the DBus protocol. It functions as a configurable alert service and notification manager, rendering pop-up messages for X11 and Wayland. The project distinguishes itself through a pattern-based rule engine that allows for dynamic alert filtering and conditional modifications of visual styles or behavioral settings based on the sending application or category. It also supports notification workflow automation by triggering external scripts and system commands when specific not
Velociraptor is a digital forensics and incident response platform, endpoint detection and response system, and visibility tool. It provides a query engine and remote forensic collector used to hunt for indicators of compromise and perform triage across a fleet of hosts. The system is distinguished by its specialized query language for interrogating host state and parsing binary files. It features a notebook environment that combines markdown documentation with executable query cells to standardize investigative workflows and enable collaborative reporting. The platform covers a wide range o
DataHub is a metadata management platform designed to unify technical, operational, and business context across diverse data ecosystems. By utilizing a graph-based metadata model and an event-driven ingestion architecture, it creates a centralized source of truth that maps complex data relationships, lineage, and ownership. This foundational framework enables organizations to maintain a synchronized view of their data landscape, supporting both human-led discovery and automated data operations. The platform distinguishes itself through its focus on grounding artificial intelligence and autono
kube-prometheus is a monitoring stack deployment and orchestration framework. It uses an operator pattern to automate the installation and lifecycle management of Prometheus and Alertmanager via custom resource definitions. The project focuses on scaling data collection through hash-based target sharding and topology-aware distribution to reduce cross-zone traffic. It implements a sidecar-based configuration reloading mechanism and utilizes consistent hashing to distribute scrape targets across multiple instances. The system covers broad observability capabilities including metric data colle
OpenObserve is a unified observability data platform designed to ingest, store, and analyze logs, metrics, and traces. It functions as a cloud-native monitoring tool that centralizes telemetry from diverse sources, including standard collectors and cloud service providers, into a single, scalable system. By utilizing a columnar storage engine backed by object storage, the platform enables efficient long-term data retention and high-performance analytical querying. The platform distinguishes itself through deep integration with artificial intelligence, allowing users to query data using natura
Flux is a Kubernetes GitOps delivery tool used to automate application deployments by synchronizing cluster state with configurations stored in Git, OCI, or Helm repositories. It functions as a set of controllers that monitor desired state in external sources and continuously reconcile the live cluster to match those definitions. The system distinguishes itself through a multi-cluster management plane that coordinates application delivery across fleets of remote clusters from a central hub. It provides a dedicated mechanism for automated image updates, which scans container registries for new
HyperDX is an OpenTelemetry observability platform that provides centralized log management, distributed tracing, and a self-hosted monitoring stack. It functions as a unified system for collecting, indexing, and visualizing logs, metrics, and traces from cloud and container environments. The platform distinguishes itself with specialized tooling for large language model monitoring and session replay, allowing user interactions in the browser to be linked to backend telemetry. It employs schema-less JSON parsing to index structured logs dynamically and uses source maps to resolve minified sta
TacticalRMM is a remote monitoring and management platform designed for overseeing endpoints and automating IT administration. It functions as an endpoint management tool and IT automation framework, providing a centralized dashboard for executing scripts, monitoring system health, and managing remote devices across multiple tenants. The platform distinguishes itself through a comprehensive remote administration suite that includes real-time shell access, remote file management, and registry editing. It integrates with third-party remote desktop software and provides a hierarchical policy inh
Flagger is a Kubernetes operator designed to automate the lifecycle of application deployments through progressive delivery. It functions as a controller that monitors custom resource definitions to orchestrate complex release strategies, including canary, blue/green, and A/B testing. By continuously reconciling the desired cluster state with the actual environment, it ensures that deployments adhere to defined specifications while managing the underlying infrastructure required for traffic routing. The project distinguishes itself through a sophisticated metric-driven analysis loop that eval
Bullet is an Active Record performance monitor and query profiler for Ruby on Rails applications. It serves as a diagnostic utility to identify inefficient database access patterns, flag redundant requests, and suggest eager loading strategies to improve response times. The tool specifically detects N+1 queries, missing counter caches, and unused eager loading. It monitors these patterns across both standard web requests and background jobs, identifying records that are fetched but never accessed to reduce memory usage and query overhead. Analysis is supported by a system that intercepts dat
ElastAlert is an alerting framework and query monitor for Elasticsearch. It functions as a real-time log monitoring tool and event notification engine that scans indices for specific patterns to trigger automated alerts when predefined rules are matched. The system distinguishes itself through specialized detection logic, including event spike detection, event frequency monitoring, field change tracking, and the identification of new terms within data fields. It handles notification noise via stateful alert suppression to prevent redundant messages and provides time-windowed aggregation to gr