Cortex is a Kubernetes-based machine learning infrastructure platform designed for deploying, scaling, and managing models and workloads. It functions as a serverless inference engine and GPU cluster orchestrator, providing the tools necessary to execute real-time, asynchronous, and batch model predictions.
The platform utilizes declarative infrastructure-as-code for provisioning model clusters and environments. It optimizes operational costs by elastically scaling CPU and GPU resources through the use of spot instances.
The system covers a broad set of operational capabilities, including workload orchestration, private cloud network isolation with integrated identity management, and observability pipelines that stream logs and performance metrics to external monitoring tools.