What are the best open-source alternatives to Llama.cpp?

30 open-source projects similar to ggml-org/llama.cpp, ranked by shared features. Top picks: ggerganov/llama.cpp, berriai/litellm, sgl-project/sglang, lyogavin/airllm, ollama/ollama, oobabooga/text-generation-webui, bentoml/openllm, langchain-ai/langchain, mudler/localai, vllm-project/vllm.

Is ggerganov/llama.cpp a good alternative to Llama.cpp?

llama.cpp is a high-performance C++ inference engine and runtime for executing large language models locally across various hardware architectures. It provides the core components for local model execution, including a dedicated model quantizer for compressing weights into the GGUF format and a sys…

Is berriai/litellm a good alternative to Llama.cpp?

LiteLLM is a unified gateway and proxy server designed to centralize access to over one hundred language model providers. It provides a standardized API interface that abstracts vendor-specific schemas, allowing developers to interact with diverse models through a single, consistent format. By acti…

Is sgl-project/sglang a good alternative to Llama.cpp?

Sglang is a high-performance inference engine and serving system designed for large language and multimodal models. It provides a programmable interface for orchestrating complex generation workflows, enabling developers to coordinate multi-turn dialogues, tool invocations, and reasoning chains thr…

Is lyogavin/airllm a good alternative to Llama.cpp?

Airllm is a framework designed to execute and fine-tune large language models on consumer-grade hardware. By employing layer-wise model decomposition and memory-efficient loading techniques, the engine enables the operation of massive models that would otherwise exceed available system or video mem…

Is ollama/ollama a good alternative to Llama.cpp?

Ollama provides a framework for running and managing local machine learning models. It includes a command-line interface for model lifecycle management, such as creation, embedding generation, and configuration, alongside a stable API for programmatic interaction across multiple programming languag…

Is oobabooga/text-generation-webui a good alternative to Llama.cpp?

This project is a comprehensive platform for hosting and interacting with large language models directly on local hardware. It provides a web-based graphical interface that allows users to manage model loading, configure generation parameters, and execute text or chat interactions entirely offline.…

Is bentoml/openllm a good alternative to Llama.cpp?

OpenLLM is a framework for deploying, managing, and scaling open-source large language models

Is langchain-ai/langchain a good alternative to Llama.cpp?

LangChain is an orchestration framework designed for building, managing, and deploying applications powered by large language models. It provides a unified integration layer that normalizes disparate model provider APIs into a consistent set of primitives, enabling developers to build complex, mult…

Is mudler/localai a good alternative to Llama.cpp?

LocalAI is a self-hosted inference server that enables the execution of machine learning models directly on local hardware. By providing a unified interface for text, image, and audio processing, it allows users to maintain full control over data privacy and infrastructure costs while eliminating d…

Is vllm-project/vllm a good alternative to Llama.cpp?

vLLM is a high-throughput inference engine designed for the efficient serving and execution of large language models. It functions as a production-ready distributed model server, providing standard API protocols for online serving while also supporting offline batch processing. The system is built…

Back to ggml-org/llama.cpp

Open-source alternatives to Llama.cpp

30 open-source projects similar to ggml-org/llama.cpp, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Llama.cpp alternative.

ggerganov/llama.cpp
ggerganov/llama.cpp
116,912View on GitHub
llama.cpp is a high-performance C++ inference engine and runtime for executing large language models locally across various hardware architectures. It provides the core components for local model execution, including a dedicated model quantizer for compressing weights into the GGUF format and a system for generating text embeddings for semantic search. The project distinguishes itself through specialized memory and execution optimizations, such as block-wise weight quantization to reduce memory footprints and memory-mapped model loading. It supports structured text generation by using formal
C++
View on GitHub116,912
berriai/litellm
BerriAI/litellm
50,579View on GitHub
LiteLLM is a unified gateway and proxy server designed to centralize access to over one hundred language model providers. It provides a standardized API interface that abstracts vendor-specific schemas, allowing developers to interact with diverse models through a single, consistent format. By acting as a central traffic management layer, it enables organizations to route, secure, and govern model interactions across multiple deployments. The platform distinguishes itself through its policy-driven architecture, which uses configuration-based routing to manage traffic distribution, load balanc
Pythonai-gatewayanthropicazure-openai
View on GitHub50,579
sgl-project/sglang
sgl-project/sglang
29,079View on GitHub
Sglang is a high-performance inference engine and serving system designed for large language and multimodal models. It provides a programmable interface for orchestrating complex generation workflows, enabling developers to coordinate multi-turn dialogues, tool invocations, and reasoning chains through a domain-specific language. The platform is built to support production-scale deployments, offering an OpenAI-compatible API that allows for integration with existing application ecosystems. The system distinguishes itself through a disaggregated architecture that separates compute-intensive pr
Pythonattentionblackwellcuda
View on GitHub29,079

Open-source alternatives to Llama.cpp

ggerganov/llama.cpp

BerriAI/litellm

sgl-project/sglang

lyogavin/airllm

ollama/ollama

oobabooga/text-generation-webui

bentoml/OpenLLM

langchain-ai/langchain

mudler/LocalAI

vllm-project/vllm

open-webui/open-webui

lm-sys/FastChat

ai-dynamo/dynamo

abetlen/llama-cpp-python

NVIDIA/TensorRT-LLM

nomic-ai/gpt4all

mlc-ai/web-llm

skypilot-org/skypilot

LostRuins/koboldcpp

kvcache-ai/ktransformers

InternLM/lmdeploy

Michael-A-Kuykendall/shimmy

huggingface/chat-ui

janhq/jan

openvinotoolkit/openvino

EricLBuehler/mistral.rs

mistralai/mistral-src

NexaAI/nexa-sdk

bentoml/BentoML

triton-inference-server/server