Open-source platforms for managing incident response schedules, alert routing, and automated on-call team notifications.
OneUptime is an open-source observability platform designed for monitoring service availability, infrastructure health, and application performance. It functions as a comprehensive system for tracking uptime and managing the end-to-end lifecycle of production incidents. The platform distinguishes itself through automated root cause analysis agents that identify failure triggers and generate code fixes via pull requests. It also provides branded public status pages to communicate real-time service availability and historical uptime data to end users. The system covers a broad range of operati
OneUptime is a comprehensive, self-hostable observability and incident management platform that includes native support for on-call scheduling, escalation policies, and multi-channel alert routing.
Keep is an open-source AIOps alert management platform that aggregates, deduplicates, and orchestrates the lifecycle of alerts from multiple monitoring tools. It functions as a multi-provider integration hub to centralize the flow of data between observability, ticketing, and communication tools. The platform distinguishes itself through incident workflow automation and AI-powered enrichment. It uses a declarative workflow engine to execute multi-step operational sequences and integrates large language models to summarize event data and correlate technical logs for faster incident resolution.
Keep is a self-hostable alert management and orchestration platform that handles alert routing and incident workflows, making it a strong fit for the core requirements of an incident response system even if it focuses more on automation than traditional on-call scheduling.
Alertmanager is a monitoring notification gateway and routing service that deduplicates, groups, and directs alerts to the correct receivers. It functions as a central manager for Prometheus alerts, using a hierarchical routing tree and label-based matchers to dispatch notifications to external services. The system employs a peer-to-peer mesh network to coordinate multiple instances in a high availability cluster, ensuring continuous alert processing. It features a dedicated inhibition engine and grouping mechanisms to reduce notification noise by suppressing redundant alerts when related iss
This is a specialized notification routing and alert management engine designed to process monitoring data, but it lacks the on-call scheduling and incident lifecycle management features required for a full incident response platform.
dockprom is a monitoring stack based on Prometheus and Grafana designed to track the performance of Docker containers and their underlying hosts. It functions as a complete solution for gathering real-time metrics and displaying them through a self-hosted dashboard. The project includes a suite of tools for collecting container and host metrics, as well as a discovery tool specifically for automatically identifying and adding tagged EC2 instances to the monitoring configuration. The system covers several observability areas, including time-series data storage and the creation of performance
This is a monitoring and observability stack for infrastructure metrics rather than an incident response platform designed for on-call scheduling and human-centric escalation workflows.