FastChat is a training and serving platform for large language models that provides an integrated toolkit for fine-tuning, hosting, and benchmarking chatbots. It functions as an inference server capable of hosting multiple models and exposing them via a standardized API for chat applications.
The platform distinguishes itself through a distributed model controller that manages worker nodes and routes requests across a hardware-agnostic inference layer supporting various accelerators. It includes a dedicated evaluation framework for assessing model quality using automated judges, multi-turn dialogue benchmarking, and side-by-side preference ranking for human-driven comparisons.
The system also covers model specialization through a fine-tuning toolkit that utilizes low-rank adaptation to reduce training memory requirements. For deployment and access, it provides an OpenAI-compatible REST API and a web interface for distributed user interactions, as well as a command line interface for local inference.