30 open-source projects similar to google/cadvisor, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Cadvisor alternative.
VictoriaMetrics is a high-performance, scalable time series database and observability platform designed for long-term storage and analysis of metric, log, and trace data. It functions as a unified backend for monitoring ecosystems, offering full compatibility with industry-standard protocols and query languages. The system is built to handle massive data volumes through a distributed architecture that supports horizontal scaling and efficient data lifecycle management. The platform distinguishes itself through a storage engine that utilizes consistent hashing for data sharding and log-struct
Vector is a high-performance observability data pipeline designed to collect, transform, and route logs, metrics, and traces across distributed infrastructure. It functions as a modular engine that decouples data ingestion from processing and transmission, utilizing a component-based architecture to connect diverse sources to multiple destinations. The project distinguishes itself through a focus on reliability and flow control. It implements backpressure-aware data movement to prevent data loss during traffic spikes and utilizes disk-backed event buffering to ensure durability during network
Uptrace is an OpenTelemetry-based observability platform designed to collect, store, and analyze distributed traces, metrics, and logs. It functions as a centralized logging backend, a distributed tracing system, and a metrics engine to monitor application performance and system health. The platform is distinguished by AI-powered operational capabilities, allowing users to query telemetry data and manage monitoring dashboards using natural language. It specifically includes specialized monitoring for generative AI pipelines, tracking token usage and response quality for LLM interactions and r
Glances is a cross-platform system monitoring tool designed to track real-time resource usage and hardware health metrics across diverse computing environments. It functions as a command-line utility that provides a unified view of system performance, identifying bottlenecks and maintaining infrastructure stability through a consistent abstraction layer that translates kernel calls into actionable data. The project distinguishes itself through its distributed capabilities, offering a web-based interface that enables remote access to live performance metrics from any device without requiring d
Gatus is a service health monitoring tool and automated status page that tracks the availability and performance of endpoints. It functions as a multi-protocol uptime monitor, validating service health through response conditions, certificate expiration checks, and multi-step workflow executions. The system distinguishes itself by supporting a wide range of communication standards including HTTP, TCP, UDP, WebSocket, gRPC, and DNS. It enables the creation of developer-oriented dashboards that display real-time uptime, publish incident announcements, and generate dynamic uptime badges for exte
Telegraf is a modular, cross-platform telemetry pipeline designed to collect, process, and route metrics from diverse infrastructure, applications, and hardware. It functions as a server-side middleware that normalizes heterogeneous data into a unified format, enabling consistent monitoring across complex environments. By utilizing a plugin-driven architecture, the agent manages the entire lifecycle of telemetry data from initial ingestion to final transmission. The project distinguishes itself through a declarative, configuration-driven execution model that allows users to define complex dat
Prometheus is a comprehensive monitoring and alerting platform designed to track infrastructure health and application performance. It functions as a time series database that ingests, indexes, and queries high-frequency numerical data points. By utilizing a pull-based model, the system periodically collects multi-dimensional metrics from monitored targets, storing them in an optimized block storage format that supports high-throughput ingestion and efficient historical analysis. The platform distinguishes itself through a specialized query engine that enables real-time analysis of performanc
Quarkus is a Kubernetes-native Java framework designed for building high-performance, memory-efficient applications. It utilizes ahead-of-time native compilation to transform Java code into standalone, optimized binaries that eliminate the need for a virtual machine, enabling rapid startup and reduced memory consumption. By performing code augmentation during the build phase, it shifts heavy processing tasks away from runtime, ensuring that applications are optimized for cloud-native environments. The framework distinguishes itself through a unified approach to reactive and imperative program
Sysdig is a Linux system observability tool and kernel event analyzer designed for capturing and analyzing kernel-level system calls and operating system events. It functions as a system call tracer and container security monitor, providing deep visibility into the activity of machines, virtual machines, and containers. The project specializes in non-invasive container inspection, allowing for the monitoring of container activity and resource usage without modifying the container environment or adding instrumentation. It enables the recording of detailed system traces into binary files for re
LibreNMS is an SNMP network monitoring system and IT infrastructure management suite. It serves as an automated network discovery tool and infrastructure dashboard, enabling the identification and monitoring of network hardware and operating systems. The system differentiates itself through a rule-based alerting engine and a comprehensive IT incident workflow integration. It supports complex alert routing, including escalation sequences and direct ticket generation for project management and service desk platforms. Its observability capabilities cover multi-vendor hardware oversight, applica
OpenObserve is a unified observability data platform designed to ingest, store, and analyze logs, metrics, and traces. It functions as a cloud-native monitoring tool that centralizes telemetry from diverse sources, including standard collectors and cloud service providers, into a single, scalable system. By utilizing a columnar storage engine backed by object storage, the platform enables efficient long-term data retention and high-performance analytical querying. The platform distinguishes itself through deep integration with artificial intelligence, allowing users to query data using natura
Healthchecks is a heartbeat monitoring service and cron job monitoring tool designed to track the execution and success of scheduled tasks and systemd timers. It functions as a dead man switch, alerting users when expected periodic signals from remote processes fail to arrive. The system accepts health signals via HTTP and SMTP, allowing it to track infrastructure heartbeats from sources ranging from CI/CD workflows to network routers. It distinguishes itself by supporting the capture of diagnostic data, including exit codes and execution logs, and by calculating the duration between start an
Riemann is a Clojure-based event stream processor and real-time analytics engine. It functions as a network telemetry pipeline and extensible event router that ingests, transforms, and routes event data from distributed systems. The system uses a domain-specific language to compute metrics and statistical patterns over continuous streams, enabling network trend analysis and real-time alerting. It supports dynamic plugin loading from the classpath and allows for live configuration reloading without interrupting active event streams. Capabilities include centralized telemetry aggregation, even
all-in-one is a containerized deployment system designed to install and manage a complete suite of productivity and collaboration services. It functions as a cloud suite deployer that orchestrates the installation of a self-hosted content platform, incorporating necessary dependencies via Docker or Kubernetes. The project distinguishes itself by providing a web-based dashboard for orchestrating, updating, and monitoring the lifecycle of service containers. It also serves as a local AI inference server, enabling the execution of generative text models, image diffusion, and speech processing on
The Windows Exporter is a service that collects system, performance, and hardware metrics from Windows servers and exposes them via a text-based HTTP endpoint for Prometheus to scrape. It functions as a system metrics collector and service monitor designed to provide observability across Windows environments. The project utilizes a modular collector design that gathers data through Windows Management Instrumentation, native performance counters, and registry keys. It also includes a text-file metrics importer that allows user-defined or third-party business metrics to be read from local plain
Jaeger is a distributed tracing platform used for collecting, storing, and visualizing request flows across microservices. It identifies performance bottlenecks and errors by tracking requests as they move through multiple service boundaries. The system includes telemetry collectors, a multi-tenant backend, and a trace visualizer. The platform provides a multi-tenant tracing infrastructure that isolates data and queries by tenant to support shared environments. It supports standardized telemetry ingestion via the OpenTelemetry Protocol over gRPC and HTTP. To manage storage costs and overhead,
ExternalDNS is a controller that automatically synchronizes Kubernetes resource states with external DNS providers. It monitors cluster resources such as services, ingresses, and gateway APIs to dynamically create and update DNS records, enabling automated service discovery and external traffic management. The project features a provider-agnostic interface that supports a wide array of cloud-managed vendors and on-premises providers, as well as an extension system for custom providers via webhooks and sidecars. It implements a reconciliation loop that uses resource annotations and custom reso
HyperDX is an OpenTelemetry observability platform that provides centralized log management, distributed tracing, and a self-hosted monitoring stack. It functions as a unified system for collecting, indexing, and visualizing logs, metrics, and traces from cloud and container environments. The platform distinguishes itself with specialized tooling for large language model monitoring and session replay, allowing user interactions in the browser to be linked to backend telemetry. It employs schema-less JSON parsing to index structured logs dynamically and uses source maps to resolve minified sta
OSHI is a Java system information library and cross-platform hardware monitor used to extract real-time performance data and specifications from processors, memory, disks, network interfaces, and system firmware. It serves as an operating system metadata provider, querying system boot times, uptime, and detailed version information across various desktop and server distributions. The library integrates with observability pipelines by exporting system and process metrics to external monitoring backends using the Micrometer standard. It also supports connecting to vendor libraries to extract ad
SwanLab is an open-source machine learning experiment tracking platform and observability tool. It provides a centralized dashboard for logging training metrics, hyperparameters, and hardware performance to monitor and analyze AI model training runs. The platform is distinguished by its focus on self-hosted infrastructure, allowing users to deploy private instances via Docker or Kubernetes for secure on-premises data control. It also includes specialized utilities for migrating historical experiment logs and synchronizing real-time metrics from external tools like MLflow. The system covers a
Linkerd is a Kubernetes service mesh that manages network traffic between microservices. It functions as a transparent networking proxy, layer 7 traffic manager, and mutual TLS security layer, providing observability and reliability for service-to-service communication without requiring changes to application code. The project distinguishes itself through a sidecar-proxy architecture that intercepts TCP and application-level traffic to provide automatic mutual TLS encryption and identity verification. It enables cross-cluster service networking to link multiple clusters and implements cloud-n
Gatus is a multi-protocol health checker and automated service alerting tool. It provides a monitoring dashboard for tracking the uptime and health of HTTP, TCP, DNS, and gRPC endpoints, and serves as a Prometheus metrics exporter to track response times and success rates. The project distinguishes itself with a developer-oriented approach to status pages, securing administrative access and dashboards via OpenID Connect and Basic Authentication. It supports complex network environments through SSH tunneling to monitor internal services via bastion hosts and allows remote agents to push health
Sozu is a high-performance, memory-safe reverse proxy and load balancer built in Rust. It is designed to manage HTTP, TCP, and UDP traffic through a multi-process architecture that leverages isolated worker processes to ensure fault tolerance and efficient resource utilization across multi-core hardware. The project distinguishes itself through a focus on continuous availability and dynamic control. It features a unique binary hot-reloading mechanism and a Unix-socket-based control plane, allowing administrators to update proxy configurations, modify listener settings, and even replace the pr
Scanopy is a self-hosted infrastructure inventory and network discovery tool. It identifies hosts, services, and workloads across subnets to build a live model of network infrastructure, maintaining a searchable catalog of assets. The system features an interactive network topology visualizer that generates physical, logical, and application dependency diagrams. It maps the nesting chain from physical hardware and hypervisors down to virtual machines and containers, utilizing SNMP for hardware metadata and container APIs for workload discovery. The platform supports distributed network scann
Fluentd is a unified logging layer and distributed event router that collects, parses, and routes log data from diverse sources to various storage backends. It functions as a log forwarding agent and pipeline orchestrator, transforming raw unstructured log strings into formatted objects using structured log parsing. The project utilizes a plugin-based pipeline architecture to route data through independent input, filter, and output stages. It differentiates itself through tag-based event routing, which uses regular expression patterns to direct specific data streams to their intended destinat
The Intel Processor Performance Monitor is a set of specialized diagnostic tools designed for monitoring raw hardware events, memory latency, PCIe throughput, and processor power states on Intel architecture. The project provides dedicated utilities for measuring data throughput across sockets and PCIe buses, tracking power usage and sleep states to identify frequency throttling, and analyzing cache misses and memory access times. It also includes a hardware event profiler for querying raw core and uncore register events to monitor specific processor behaviors. Capabilities cover comprehensi
G-helper is a system utility designed for the management and optimization of ASUS laptop hardware. It functions as a background service that interfaces directly with kernel-level drivers and ACPI tables to provide granular control over device performance, thermal profiles, and power states. The utility distinguishes itself by offering automated firmware orchestration, which handles the retrieval and verification of manufacturer-signed updates to maintain system stability. It also provides specialized hardware control, including the ability to toggle between integrated and dedicated graphics m
Onyx is an enterprise-grade AI platform designed for knowledge management, search, and autonomous agent orchestration. It functions as a centralized system that aggregates unstructured organizational data, enabling secure, context-aware retrieval and interaction across internal documents and communication history. By integrating retrieval-augmented generation with multi-model orchestration, the platform provides a unified interface for teams to query internal knowledge bases and execute complex, multi-step business processes. The platform distinguishes itself through a focus on private infras