What are the best open-source alternatives to LightLLM?

30 open-source projects similar to modeltc/lightllm, ranked by shared features. Top picks: sgl-project/sglang, predibase/lorax, ai-dynamo/dynamo, ericlbuehler/mistral.rs, intel/ipex-llm, kserve/kserve, sgl-project/mini-sglang, zhaochenyang20/awesome-ml-sys-tutorial, dusty-nv/jetson-inference, openvinotoolkit/openvino.

Is sgl-project/sglang a good alternative to LightLLM?

Sglang is a high-performance inference engine and serving system designed for large language and multimodal models. It provides a programmable interface for orchestrating complex generation workflows, enabling developers to coordinate multi-turn dialogues, tool invocations, and reasoning chains thr…

Is predibase/lorax a good alternative to LightLLM?

Lorax is a GPU-accelerated inference server and multi-adapter engine designed for serving large language models. It functions as a high-throughput system capable of deploying models via Kubernetes and managing the dynamic swapping of Low-Rank Adaptation adapters per request. The server distinguish…

Is ai-dynamo/dynamo a good alternative to LightLLM?

Dynamo is a distributed inference orchestration platform designed for large language models. It functions as a system to coordinate prefill and decode phases across GPU nodes, utilizing a multi-backend runtime adapter to connect engines like vLLM and TensorRT-LLM through a unified block-oriented me…

Is ericlbuehler/mistral.rs a good alternative to LightLLM?

mistral.rs is an inference engine for large language models that runs locally and exposes models behind OpenAI and Anthropic-compatible APIs. It serves as a multi-model serving platform, capable of loading several models in a single server process with per-request routing and on-demand loading and…

Is intel/ipex-llm a good alternative to LightLLM?

Intel XPU LLM Acceleration Library is a toolkit designed to accelerate large language model inference and finetuning on Intel CPUs, GPUs, and NPUs. It provides a distributed inference engine for scaling models across multiple accelerators, a multimodal model runtime for vision and speech tasks, and…

Is kserve/kserve a good alternative to LightLLM?

KServe is a Kubernetes-native platform for deploying and serving machine learning models as scalable inference services. It supports both generative AI models, including large language models, and traditional predictive models from frameworks such as TensorFlow, PyTorch, Scikit-Learn, XGBoost, and…

Is sgl-project/mini-sglang a good alternative to LightLLM?

mini-sglang is a collection of tools for large language model inference, serving as an OpenAI-compatible inference server, a memory-efficient prefill engine, and a tensor parallelism runtime. It also functions as a local batch processing engine for offline benchmarking and ablation studies. The pr…

Is zhaochenyang20/awesome-ml-sys-tutorial a good alternative to LightLLM?

This project provides a comprehensive technical guide and framework for engineering large-scale machine learning systems. It covers the full lifecycle of model development, focusing on the infrastructure and computational principles required to build, train, and serve generative AI models across di…

Is dusty-nv/jetson-inference a good alternative to LightLLM?

jetson-inference is a set of libraries and tools for executing optimized deep learning models on embedded GPU hardware. Its primary purpose is to enable real-time computer vision and AI inference at the edge with low latency and high throughput. The project distinguishes itself through high-perfor…

Is openvinotoolkit/openvino a good alternative to LightLLM?

OpenVINO is an AI inference engine and model serving platform designed to execute optimized deep learning models across CPUs, GPUs, and NPUs through a unified API. It includes a model optimization toolkit for converting, quantizing, and compressing models from various frameworks, alongside a specia…

Back to modeltc/lightllm

Open-source alternatives to LightLLM

30 open-source projects similar to modeltc/lightllm, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best LightLLM alternative.

sgl-project/sglang
sgl-project/sglang
29,079View on GitHub
Sglang is a high-performance inference engine and serving system designed for large language and multimodal models. It provides a programmable interface for orchestrating complex generation workflows, enabling developers to coordinate multi-turn dialogues, tool invocations, and reasoning chains through a domain-specific language. The platform is built to support production-scale deployments, offering an OpenAI-compatible API that allows for integration with existing application ecosystems. The system distinguishes itself through a disaggregated architecture that separates compute-intensive pr
Pythonattentionblackwellcuda
View on GitHub29,079
predibase/lorax
predibase/lorax
3,724View on GitHub
Lorax is a GPU-accelerated inference server and multi-adapter engine designed for serving large language models. It functions as a high-throughput system capable of deploying models via Kubernetes and managing the dynamic swapping of Low-Rank Adaptation adapters per request. The server distinguishes itself through multi-adapter dynamic batching, which allows requests using different adapter weights to be processed in a single GPU forward pass. It employs just-in-time adapter loading and weighted adapter merging to maximize throughput and enable multi-tasking without sacrificing performance.
Pythonfine-tuninggptllama
View on GitHub3,724
ai-dynamo/dynamo
ai-dynamo/dynamo
6,112View on GitHub
Dynamo is a distributed inference orchestration platform designed for large language models. It functions as a system to coordinate prefill and decode phases across GPU nodes, utilizing a multi-backend runtime adapter to connect engines like vLLM and TensorRT-LLM through a unified block-oriented memory interface. An OpenAI-compatible API server provides the frontend for integration with existing tools and clients. The project is distinguished by its disaggregated serving architecture, which separates prompt processing and token generation onto independent GPU pools to optimize throughput and
Rust
View on GitHub6,112

Open-source alternatives to LightLLM

sgl-project/sglang

predibase/lorax

ai-dynamo/dynamo

EricLBuehler/mistral.rs

intel/ipex-llm

kserve/kserve

sgl-project/mini-sglang

zhaochenyang20/Awesome-ML-SYS-Tutorial

dusty-nv/jetson-inference

openvinotoolkit/openvino

LMCache/LMCache

facebookresearch/llama

meta-llama/llama-models

kubeflow/kfserving

SakuraLLM/SakuraLLM

QwenLM/Qwen

flashinfer-ai/flashinfer

LostRuins/koboldcpp

OpenNMT/CTranslate2

NVIDIA/Isaac-GR00T

OpenGVLab/InternVL

pytorch/executorch

OpenBMB/MiniCPM

zai-org/ChatGLM3

lm-sys/FastChat

pytorch-labs/gpt-fast

mistralai/mistral-src

zai-org/ChatGLM-6B

anthropics/anthropic-sdk-python

OpenNMT/OpenNMT-py