1 repo
Specialized model structures designed for unique inference requirements that deviate from standard off-the-shelf architectures.
Explore 1 awesome GitHub repository matching artificial intelligence & ml · Custom Model Architectures. Refine with filters or upvote what's useful.
vLLM is a high-throughput inference engine designed for the efficient serving and execution of large language models. It functions as a production-ready distributed model server, providing standard API protocols for online serving while also supporting offline batch processing. The system is built to maximize token gen