What are the best open-source alternatives to Llama Cpp Python?

30 open-source projects similar to abetlen/llama-cpp-python, ranked by shared features. Top picks: sgl-project/sglang, ericlbuehler/mistral.rs, openvinotoolkit/openvino, ggerganov/llama.cpp, lostruins/koboldcpp, microsoft/onnxruntime, predibase/lorax, zhaochenyang20/awesome-ml-sys-tutorial, tiiny-ai/powerinfer, facebookresearch/fairseq.

Is sgl-project/sglang a good alternative to Llama Cpp Python?

Sglang is a high-performance inference engine and serving system designed for large language and multimodal models. It provides a programmable interface for orchestrating complex generation workflows, enabling developers to coordinate multi-turn dialogues, tool invocations, and reasoning chains thr…

Is ericlbuehler/mistral.rs a good alternative to Llama Cpp Python?

mistral.rs is an inference engine for large language models that runs locally and exposes models behind OpenAI and Anthropic-compatible APIs. It serves as a multi-model serving platform, capable of loading several models in a single server process with per-request routing and on-demand loading and…

Is openvinotoolkit/openvino a good alternative to Llama Cpp Python?

OpenVINO is an AI inference engine and model serving platform designed to execute optimized deep learning models across CPUs, GPUs, and NPUs through a unified API. It includes a model optimization toolkit for converting, quantizing, and compressing models from various frameworks, alongside a specia…

Is ggerganov/llama.cpp a good alternative to Llama Cpp Python?

llama.cpp is a high-performance C++ inference engine and runtime for executing large language models locally across various hardware architectures. It provides the core components for local model execution, including a dedicated model quantizer for compressing weights into the GGUF format and a sys…

Is lostruins/koboldcpp a good alternative to Llama Cpp Python?

KoboldCPP is a local large language model inference engine and GGUF model runner designed to execute quantized models on personal hardware. It functions as a multimodal AI server and API gateway, providing OpenAI-compatible endpoints that allow third-party clients to interact with locally hosted mo…

Is microsoft/onnxruntime a good alternative to Llama Cpp Python?

This project is a cross-platform machine learning inference engine designed to execute pre-trained models across diverse operating systems and hardware environments. It functions as a standardized execution framework that manages the entire lifecycle of model inference, from loading and graph optim…

Is predibase/lorax a good alternative to Llama Cpp Python?

Lorax is a GPU-accelerated inference server and multi-adapter engine designed for serving large language models. It functions as a high-throughput system capable of deploying models via Kubernetes and managing the dynamic swapping of Low-Rank Adaptation adapters per request. The server distinguish…

Is zhaochenyang20/awesome-ml-sys-tutorial a good alternative to Llama Cpp Python?

This project provides a comprehensive technical guide and framework for engineering large-scale machine learning systems. It covers the full lifecycle of model development, focusing on the infrastructure and computational principles required to build, train, and serve generative AI models across di…

Is tiiny-ai/powerinfer a good alternative to Llama Cpp Python?

PowerInfer is a high-performance local large language model inference engine and sparse inference framework. It provides a runtime for executing models on consumer-grade hardware, utilizing a GPU acceleration backend to optimize tensor operations for graphics processors. The system distinguishes i…

Is facebookresearch/fairseq a good alternative to Llama Cpp Python?

Fairseq is a PyTorch toolkit for sequence-to-sequence modeling, specializing in neural machine translation, automatic speech recognition, and large-scale language model training. It provides a framework for processing and aligning diverse data sources, including text, audio, and video, to support t…

Back to abetlen/llama-cpp-python

Open-source alternatives to Llama Cpp Python

30 open-source projects similar to abetlen/llama-cpp-python, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Llama Cpp Python alternative.

sgl-project/sglang
sgl-project/sglang
29,079View on GitHub
Sglang is a high-performance inference engine and serving system designed for large language and multimodal models. It provides a programmable interface for orchestrating complex generation workflows, enabling developers to coordinate multi-turn dialogues, tool invocations, and reasoning chains through a domain-specific language. The platform is built to support production-scale deployments, offering an OpenAI-compatible API that allows for integration with existing application ecosystems. The system distinguishes itself through a disaggregated architecture that separates compute-intensive pr
Pythonattentionblackwellcuda
View on GitHub29,079
ericlbuehler/mistral.rs
EricLBuehler/mistral.rs
6,597View on GitHub
mistral.rs is an inference engine for large language models that runs locally and exposes models behind OpenAI and Anthropic-compatible APIs. It serves as a multi-model serving platform, capable of loading several models in a single server process with per-request routing and on-demand loading and unloading. The engine supports multimodal inference, processing text alongside images, video, audio, and speech inputs, and includes a quantized model deployment runtime that reduces memory use and speeds up inference on consumer hardware. The project distinguishes itself through an agentic tool exe
Rustllmrustuqff
View on GitHub6,597
openvinotoolkit/openvino
openvinotoolkit/openvino
10,414View on GitHub
OpenVINO is an AI inference engine and model serving platform designed to execute optimized deep learning models across CPUs, GPUs, and NPUs through a unified API. It includes a model optimization toolkit for converting, quantizing, and compressing models from various frameworks, alongside a specialized generative AI runtime for large language models. The project distinguishes itself through a plugin-based hardware acceleration layer that maps neural network operations to vendor-specific drivers. It features advanced execution mechanisms such as continuous batching, speculative decoding, and
C++aicomputer-visiondeep-learning
View on GitHub10,414

Open-source alternatives to Llama Cpp Python

sgl-project/sglang

EricLBuehler/mistral.rs

openvinotoolkit/openvino

ggerganov/llama.cpp

LostRuins/koboldcpp

microsoft/onnxruntime

predibase/lorax

zhaochenyang20/Awesome-ML-SYS-Tutorial

Tiiny-AI/PowerInfer

facebookresearch/fairseq

google/gemma.cpp

pytorch/executorch

ModelTC/LightLLM

zai-org/ChatGLM3

google-ai-edge/LiteRT-LM

huggingface/transformers.js

intel/ipex-llm

vercel/ai

josStorer/RWKV-Runner

langroid/langroid

QuantumNous/new-api

kvcache-ai/ktransformers

OpenBMB/MiniCPM

BerriAI/litellm

vercel/vercel

NVIDIA-AI-IOT/torch2trt

alibaba/MNN

microsoft/guidance

normal-computing/outlines

lancedb/lancedb