What are the best open-source alternatives to Ktransformers?

30 open-source projects similar to kvcache-ai/ktransformers, ranked by shared features. Top picks: sgl-project/sglang, huggingface/text-generation-inference, zai-org/chatglm3, zhaochenyang20/awesome-ml-sys-tutorial, internlm/lmdeploy, lyogavin/airllm, lm-sys/fastchat, liguodongiot/llm-action, microsoft/bitnet, intel/ipex-llm.

Is sgl-project/sglang a good alternative to Ktransformers?

Sglang is a high-performance inference engine and serving system designed for large language and multimodal models. It provides a programmable interface for orchestrating complex generation workflows, enabling developers to coordinate multi-turn dialogues, tool invocations, and reasoning chains thr…

Is huggingface/text-generation-inference a good alternative to Ktransformers?

Text Generation Inference is a production-ready engine designed for the deployment and serving of large language models. It functions as a containerized runtime environment that manages model execution, scales across distributed hardware, and provides high-performance inference capabilities for dem…

Is zai-org/chatglm3 a good alternative to Ktransformers?

ChatGLM3 is a comprehensive framework for deploying, fine-tuning, and serving large language models. It functions as a high-performance inference engine designed to support conversational AI, enabling developers to build interactive agents capable of multi-turn dialogue, autonomous code execution,…

Is zhaochenyang20/awesome-ml-sys-tutorial a good alternative to Ktransformers?

This project provides a comprehensive technical guide and framework for engineering large-scale machine learning systems. It covers the full lifecycle of model development, focusing on the infrastructure and computational principles required to build, train, and serve generative AI models across di…

Is internlm/lmdeploy a good alternative to Ktransformers?

lmdeploy is a high-performance inference engine and deployment framework for large language models and vision models. It functions as a multi-modal model server and compression toolkit designed to serve models with high throughput and low latency. The system enables the distribution of model servi…

Is lyogavin/airllm a good alternative to Ktransformers?

Airllm is a framework designed to execute and fine-tune large language models on consumer-grade hardware. By employing layer-wise model decomposition and memory-efficient loading techniques, the engine enables the operation of massive models that would otherwise exceed available system or video mem…

Is lm-sys/fastchat a good alternative to Ktransformers?

FastChat is a training and serving platform for large language models that provides an integrated toolkit for fine-tuning, hosting, and benchmarking chatbots. It functions as an inference server capable of hosting multiple models and exposing them via a standardized API for chat applications. The…

Is liguodongiot/llm-action a good alternative to Ktransformers?

This project is a comprehensive framework for the training, fine-tuning, and deployment of large language models. It functions as a distributed deep learning platform that enables users to scale model workflows across multiple hardware nodes while providing tools for model evaluation and performanc…

Is microsoft/bitnet a good alternative to Ktransformers?

BitNet is a quantized inference engine designed to execute highly compressed language models by performing arithmetic on low-precision, bit-level weight data. It functions as a model optimization toolkit and a high-performance kernel library, enabling the execution of large language models on consu…

Is intel/ipex-llm a good alternative to Ktransformers?

Intel XPU LLM Acceleration Library is a toolkit designed to accelerate large language model inference and finetuning on Intel CPUs, GPUs, and NPUs. It provides a distributed inference engine for scaling models across multiple accelerators, a multimodal model runtime for vision and speech tasks, and…

Back to kvcache-ai/ktransformers

Open-source alternatives to Ktransformers

30 open-source projects similar to kvcache-ai/ktransformers, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Ktransformers alternative.

sgl-project/sglang
sgl-project/sglang
29,079View on GitHub
Sglang is a high-performance inference engine and serving system designed for large language and multimodal models. It provides a programmable interface for orchestrating complex generation workflows, enabling developers to coordinate multi-turn dialogues, tool invocations, and reasoning chains through a domain-specific language. The platform is built to support production-scale deployments, offering an OpenAI-compatible API that allows for integration with existing application ecosystems. The system distinguishes itself through a disaggregated architecture that separates compute-intensive pr
Pythonattentionblackwellcuda
View on GitHub29,079
huggingface/text-generation-inference
huggingface/text-generation-inference
10,775View on GitHub
Text Generation Inference is a production-ready engine designed for the deployment and serving of large language models. It functions as a containerized runtime environment that manages model execution, scales across distributed hardware, and provides high-performance inference capabilities for demanding production environments. The project distinguishes itself through advanced optimization techniques, including continuous batching to maximize hardware utilization and tensor parallelism to shard large models across multiple accelerator cards. It supports efficient inference through custom com
Pythonbloomdeep-learningfalcon
View on GitHub10,775
zai-org/chatglm3
zai-org/ChatGLM3
13,764View on GitHub
ChatGLM3 is a comprehensive framework for deploying, fine-tuning, and serving large language models. It functions as a high-performance inference engine designed to support conversational AI, enabling developers to build interactive agents capable of multi-turn dialogue, autonomous code execution, and structured tool invocation. The project distinguishes itself through its focus on hardware-agnostic deployment and resource optimization. It supports distributed model parallelism across multiple graphics cards, paged key-value caching for concurrent request processing, and weight quantization t
Python
View on GitHub13,764

Open-source alternatives to Ktransformers

sgl-project/sglang

huggingface/text-generation-inference

zai-org/ChatGLM3

zhaochenyang20/Awesome-ML-SYS-Tutorial

InternLM/lmdeploy

lyogavin/airllm

lm-sys/FastChat

liguodongiot/llm-action

microsoft/BitNet

intel/ipex-llm

openvinotoolkit/openvino

OpenBMB/MiniCPM

ModelTC/LightLLM

bentoml/OpenLLM

QwenLM/Qwen

xorbitsai/inference

ggerganov/llama.cpp

EricLBuehler/mistral.rs

mlc-ai/mlc-llm

modelscope/ms-swift

LostRuins/koboldcpp

xming521/WeClone

facebookresearch/metaseq

meta-llama/llama3

microsoft/onnxruntime

pytorch/examples

vllm-project/vllm

google-ai-edge/LiteRT

OpenNMT/CTranslate2

hiyouga/LlamaFactory