1 repo
Systems that support the execution of user-defined or custom model architectures via optimized backends.
Explore 1 awesome GitHub repository matching artificial intelligence & ml · Custom Model Execution Engines. Refine with filters or upvote what's useful.
vLLM is a high-throughput inference engine designed for the efficient serving and execution of large language models. It functions as a production-ready distributed model server, providing standard API protocols for online serving while also supporting offline batch processing. The system is built to maximize token gen
Executes custom model architectures using highly optimized native implementations and support for various data formats.