Tools that dynamically adjust pod replicas based on real-time application performance and custom infrastructure metrics.
The Windows Exporter is a service that collects system, performance, and hardware metrics from Windows servers and exposes them via a text-based HTTP endpoint for Prometheus to scrape. It functions as a system metrics collector and service monitor designed to provide observability across Windows environments. The project utilizes a modular collector design that gathers data through Windows Management Instrumentation, native performance counters, and registry keys. It also includes a text-file metrics importer that allows user-defined or third-party business metrics to be read from local plain-text files and included in the exported stream. The exporter tracks the health and performance of background processes, software installations, and specialized server roles such as directory services and naming systems. It supports deployment within containerized environments to standardize monitoring across various operating system nodes. System and collector behaviors are managed through command-line arguments and configuration files.
This is a metrics collection agent for Windows servers rather than an autoscaling controller, though it could serve as a data source for a Kubernetes-native autoscaler to monitor custom metrics.
llm-d is a distributed serving framework designed for large language model inference. It functions as an inference orchestrator and gateway, providing a control plane for deploying model replicas and managing hardware accelerators. The system includes a batch inference scheduler and a cache manager to coordinate request flow and memory utilization. The project is distinguished by a disaggregated serving architecture that separates prefill and decode execution phases across specialized workers to maximize throughput. It employs a hardware-agnostic control plane and tiered cache offloading, moving memory blocks between GPU memory, host RAM, and shared storage to support long-context workloads. The framework covers comprehensive traffic management and scaling capabilities, including SLO-aware autoscaling, cache-affinity routing, and predictive latency scoring. It also provides mechanisms for offline batch processing and high-availability scheduler management to balance interactive traffic with asynchronous workloads. The system exposes these capabilities via an OpenAI-compatible chat completion API.
This is a specialized inference orchestration framework for large language models rather than a general-purpose Kubernetes autoscaling controller, though it does include internal scaling logic for managing model replicas.
This project is a Go language library that provides a programmatic interface for interacting with the Kubernetes API server. It serves as a client for managing cluster resources, offering both typed interfaces for compile-time safety and dynamic interfaces for unstructured data and custom resource management. The library includes a controller framework designed for building event-driven automation. This framework utilizes informers to maintain local resource caches and rate-limited work queues to decouple event detection from state reconciliation. High availability is supported through a leader election tool that uses shared lease objects to ensure single-writer exclusivity. Beyond core API interaction, the project covers secure authentication via internal service tokens and pluggable external credential providers. It also provides utilities for server-side apply functionality, API capability discovery, and tools for mocking API responses during testing.
This is a foundational Go library for interacting with the Kubernetes API, providing the building blocks to create custom controllers rather than serving as a pre-built autoscaling controller itself.
Fission is a function-as-a-service platform and serverless framework for Kubernetes. It manages the lifecycle and execution of code snippets as serverless functions, providing an orchestrator that triggers these functions based on HTTP requests, message queues, or scheduled events. The platform features a cold-start optimized runtime that utilizes warm container pools and dynamic loaders to achieve millisecond execution. It includes a native autoscaler to adjust the number of function instances based on real-time traffic demand and supports canary release testing to split incoming traffic between different function versions. The system covers event-driven orchestration, automatic workload scaling, and runtime environment management. It also provides capabilities for monitoring system performance and provisioning local development clusters.
Fission is a serverless framework for running functions on Kubernetes rather than a general-purpose autoscaling controller designed to manage custom metrics for arbitrary workloads.
StatsD is a network-based metrics daemon and aggregator that collects application performance data, such as counters and timers, for periodic delivery to backend services. It functions as system monitoring middleware, receiving telemetry via UDP to minimize performance overhead on monitored services. The system acts as a distributed metrics router, employing consistent hashing to distribute data points across clusters and ensure aggregation accuracy. It includes cluster health monitoring to track node availability and automatically recalculate routing paths when services go offline. The project covers metrics aggregation and organization through the use of hierarchical namespaces. It also provides mechanisms for external metrics export, pushing aggregated system health and performance statistics to external monitoring services at defined intervals.
This is a metrics aggregation and routing daemon used for collecting telemetry, but it lacks the Kubernetes-native autoscaling controller logic required to trigger pod scaling based on those metrics.
StatsD is a metrics aggregator and UDP collection server that collects system counters and timers. It functions as a time-series data forwarder, receiving high-frequency metric updates via a lightweight line protocol and summarizing them before flushing the data to a backend. The project features a pluggable metrics backend framework, allowing aggregated statistics to be routed to various third-party monitoring services or time-series databases such as Graphite. It supports horizontal scaling and high availability through a proxy ring distribution system that forwards incoming packets across a cluster of instances. The system provides capabilities for real-time metric aggregation, including event rate tracking, state gauge monitoring, and unique event counting. It performs statistical analysis on timing data using histogram-based sampling to calculate medians and percentiles for latency analysis. Metrics are organized using a hierarchical namespace structure to allow for logical grouping and filtering. The daemon can be deployed via a container image and is managed through a remote administration interface over TCP.
This is a metrics collection and aggregation server used to gather data, but it lacks the Kubernetes-native controller logic required to perform horizontal pod autoscaling based on those metrics.