Chitu is a distributed serving platform and orchestrator for large language model inference. It functions as a compute manager designed to deploy and scale model workloads across diverse hardware architectures, including GPUs, CPUs, and heterogeneous hardware clusters.
The platform enables model deployment across a wide range of targets, including NVIDIA GPUs, regional chipsets, and legacy hardware. It manages the execution of models across these varying environments to increase available computing capacity and optimize resource utilization.
The system includes capabilities for distributed inference orchestration and heterogeneous hardware scaling, allowing models to run on configurations ranging from single devices to large production clusters. It also incorporates concurrent traffic management and request queueing to maintain stability during high-demand workloads.