What are the best open-source alternatives to TensorRT LLM?

30 open-source projects similar to nvidia/tensorrt-llm, ranked by shared features. Top picks: vllm-project/vllm, sgl-project/sglang, geeeekexplorer/nano-vllm, zhaochenyang20/awesome-ml-sys-tutorial, ggerganov/llama.cpp, lyogavin/airllm, pytorch/examples, internlm/lmdeploy, huggingface/text-generation-inference, bentoml/openllm.

Is vllm-project/vllm a good alternative to TensorRT LLM?

vLLM is a high-throughput inference engine designed for the efficient serving and execution of large language models. It functions as a production-ready distributed model server, providing standard API protocols for online serving while also supporting offline batch processing. The system is built…

Is sgl-project/sglang a good alternative to TensorRT LLM?

Sglang is a high-performance inference engine and serving system designed for large language and multimodal models. It provides a programmable interface for orchestrating complex generation workflows, enabling developers to coordinate multi-turn dialogues, tool invocations, and reasoning chains thr…

Is geeeekexplorer/nano-vllm a good alternative to TensorRT LLM?

Nano-vllm is a high-performance inference engine designed for executing large language models locally. It functions as a specialized runtime that prioritizes accelerated token generation and efficient hardware utilization for text generation tasks. The project distinguishes itself through a compre…

Is zhaochenyang20/awesome-ml-sys-tutorial a good alternative to TensorRT LLM?

This project provides a comprehensive technical guide and framework for engineering large-scale machine learning systems. It covers the full lifecycle of model development, focusing on the infrastructure and computational principles required to build, train, and serve generative AI models across di…

Is ggerganov/llama.cpp a good alternative to TensorRT LLM?

llama.cpp is a high-performance C++ inference engine and runtime for executing large language models locally across various hardware architectures. It provides the core components for local model execution, including a dedicated model quantizer for compressing weights into the GGUF format and a sys…

Is lyogavin/airllm a good alternative to TensorRT LLM?

Airllm is a framework designed to execute and fine-tune large language models on consumer-grade hardware. By employing layer-wise model decomposition and memory-efficient loading techniques, the engine enables the operation of massive models that would otherwise exceed available system or video mem…

Is pytorch/examples a good alternative to TensorRT LLM?

This repository serves as a comprehensive collection of reference implementations for the PyTorch machine learning library. It provides practical examples for building, training, and deploying deep learning models, functioning as a toolkit for developers to explore neural network architectures and…

Is internlm/lmdeploy a good alternative to TensorRT LLM?

lmdeploy is a high-performance inference engine and deployment framework for large language models and vision models. It functions as a multi-modal model server and compression toolkit designed to serve models with high throughput and low latency. The system enables the distribution of model servi…

Is huggingface/text-generation-inference a good alternative to TensorRT LLM?

Text Generation Inference is a production-ready engine designed for the deployment and serving of large language models. It functions as a containerized runtime environment that manages model execution, scales across distributed hardware, and provides high-performance inference capabilities for dem…

Is bentoml/openllm a good alternative to TensorRT LLM?

OpenLLM is a framework for deploying, managing, and scaling open-source large language models

Back to nvidia/tensorrt-llm

Open-source alternatives to TensorRT LLM

30 open-source projects similar to nvidia/tensorrt-llm, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best TensorRT LLM alternative.

vllm-project/vllm
vllm-project/vllm
83,048View on GitHub
vLLM is a high-throughput inference engine designed for the efficient serving and execution of large language models. It functions as a production-ready distributed model server, providing standard API protocols for online serving while also supporting offline batch processing. The system is built to maximize token generation speed and memory efficiency, enabling both large-scale cloud deployments and local execution on personal hardware. The project distinguishes itself through advanced memory management and request scheduling techniques, most notably its use of non-contiguous key-value cach
Pythonamdblackwellcuda
View on GitHub83,048
sgl-project/sglang
sgl-project/sglang
29,079View on GitHub
Sglang is a high-performance inference engine and serving system designed for large language and multimodal models. It provides a programmable interface for orchestrating complex generation workflows, enabling developers to coordinate multi-turn dialogues, tool invocations, and reasoning chains through a domain-specific language. The platform is built to support production-scale deployments, offering an OpenAI-compatible API that allows for integration with existing application ecosystems. The system distinguishes itself through a disaggregated architecture that separates compute-intensive pr
Pythonattentionblackwellcuda
View on GitHub29,079
geeeekexplorer/nano-vllm
GeeeekExplorer/nano-vllm
11,745View on GitHub
Nano-vllm is a high-performance inference engine designed for executing large language models locally. It functions as a specialized runtime that prioritizes accelerated token generation and efficient hardware utilization for text generation tasks. The project distinguishes itself through a comprehensive suite of optimization techniques, including a graph compilation engine that transforms neural network operations into pre-compiled execution plans. It also incorporates a tensor parallelism framework to distribute model weights across multiple hardware accelerators, effectively reducing memor
Pythondeep-learninginferencellm
View on GitHub11,745

Open-source alternatives to TensorRT LLM

vllm-project/vllm

sgl-project/sglang

GeeeekExplorer/nano-vllm

zhaochenyang20/Awesome-ML-SYS-Tutorial

ggerganov/llama.cpp

lyogavin/airllm

pytorch/examples

InternLM/lmdeploy

huggingface/text-generation-inference

bentoml/OpenLLM

Michael-A-Kuykendall/shimmy

mudler/LocalAI

ollama/ollama

alibaba/MNN

FMInference/FlexGen

sgl-project/mini-sglang

thu-pacman/chitu

kvcache-ai/ktransformers

oobabooga/text-generation-webui

FlowiseAI/Flowise

jmorganca/ollama

ggml-org/llama.cpp

BerriAI/litellm

janhq/jan

langchain-ai/langchain

ggml-org/whisper.cpp

karpathy/autoresearch

intel/neural-compressor

huggingface/peft

NVIDIA/TensorRT