1 repo
Services that expose generative model capabilities over network protocols for integration into external applications.
Explore 1 awesome GitHub repository matching artificial intelligence & ml · Distributed Model Servers. Refine with filters or upvote what's useful.
vLLM is a high-throughput inference engine designed for the efficient serving and execution of large language models. It functions as a production-ready distributed model server, providing standard API protocols for online serving while also supporting offline batch processing. The system is built to maximize token gen
Exposes generative model capabilities through standard network protocols for integration into external applications and chat interfaces.