Cortex is an open-source, horizontally scalable metrics platform that ingests, stores, and queries Prometheus-compatible time-series data with multi-tenant isolation. It accepts metrics via Prometheus remote write and OpenTelemetry, executes PromQL queries against both recent and historical data, and provides a Prometheus-compatible alerting and recording rule engine with an integrated Alertmanager. The system is built as a set of independently scalable microservices that use hash-ring-based sharding, gossip-based cluster membership, and tenant-aware object storage to distribute workloads across a cluster.
Cortex distinguishes itself through its multi-tenant architecture, which isolates data, queries, and alerts for independent teams or customers within a single cluster using shuffle sharding and per-tenant resource limits. It supports long-term metrics storage on cheap object storage backends like S3, GCS, and Azure, with block compaction and deduplication to optimize storage efficiency and query performance. The platform offers a storage engine migration path between chunks and blocks backends without downtime, and provides zone-aware replication for fault tolerance across availability zones.
The system includes a comprehensive HTTP API for metric ingestion, PromQL querying, alert and rule management, and per-tenant configuration overrides that can be applied at runtime without restarting components. It supports caching at multiple levels—metadata, indexes, chunks, and query results—using Memcached or Redis to accelerate query execution. Cortex also provides operational tooling for safe ingester scaling, rolling updates, and cluster capacity planning based on active series counts and retention periods.
Configuration is managed through YAML files, CLI flags, and runtime overrides, with support for environment variable injection and Kubernetes-based declarative management.