FlexLLMGen

Open-source alternatives to FlexLLMGen

Similar open-source projects, ranked by how many features they share with FlexLLMGen.

fminference/flexgen
FMInference/FlexGen
9,366View on GitHub
FlexGen is an inference engine for large language models designed for high-throughput execution on single or multiple GPUs. It functions as a framework for managing model execution through a combination of memory offloading, weight compression, and pipeline orchestration. The system enables the execution of models that exceed available GPU memory by moving tensors and caches between GPU memory, system RAM, and disk storage. It utilizes 4-bit weight quantization to reduce the memory footprint of model parameters, allowing for increased batch processing capacity. The project covers distributed
Python
View on GitHub9,366
llm-d/llm-d
llm-d/llm-d
2,514View on GitHub
llm-d is a distributed serving framework designed for large language model inference. It functions as an inference orchestrator and gateway, providing a control plane for deploying model replicas and managing hardware accelerators. The system includes a batch inference scheduler and a cache manager to coordinate request flow and memory utilization. The project is distinguished by a disaggregated serving architecture that separates prefill and decode execution phases across specialized workers to maximize throughput. It employs a hardware-agnostic control plane and tiered cache offloading, mov
Shell
View on GitHub2,514
modeltc/lightllm
ModelTC/LightLLM
3,901View on GitHub
LightLLM is a high-performance serving framework for deploying and executing large language models. It functions as a multi-GPU inference engine and server capable of handling dense architectures, mixture-of-experts designs, and multimodal models that process both text and images. The system is distinguished by its specialized support for Mixture-of-Experts models using expert parallelism and fused kernels. It implements structured text generation through deterministic state machines and pushdown automata to enforce precise output formats. To optimize throughput, the framework employs specula
Pythondeep-learninggptllama
View on GitHub3,901
infrasys-ai/aisystem
Infrasys-AI/AISystem
17,017View on GitHub
AISystem is a comprehensive AI full-stack infrastructure project covering the entire pipeline from AI chip architecture to high-level training frameworks. It encompasses the development of AI compiler frameworks, inference engines, and distributed training orchestrators designed to coordinate workloads across a heterogeneous compute stack of CPUs, GPUs, and NPUs. The project focuses on the deep integration of software and hardware, employing software-hardware co-design to align tensor layouts with physical memory structures. It provides specialized capabilities for accelerating Transformer mo
Jupyter Notebookaiaiinfraaisys
View on GitHub17,017

See all 30 alternatives to FlexLLMGen

FMInferenceFlexLLMGenArchived

Features

Open-source alternatives to FlexLLMGen

FMInference/FlexGen

llm-d/llm-d

ModelTC/LightLLM

Infrasys-AI/AISystem

Star history

Open-source alternatives to FlexLLMGen

FMInference/FlexGen

llm-d/llm-d

ModelTC/LightLLM

Infrasys-AI/AISystem