What are the best open-source alternatives to Gemma.cpp?

30 open-source projects similar to google/gemma.cpp, ranked by shared features. Top picks: pytorch/executorch, microsoft/onnxruntime, abetlen/llama-cpp-python, ggerganov/llama.cpp, alibaba/mnn, openbmb/minicpm-v, lostruins/koboldcpp, google-ai-edge/litert-lm, nexaai/nexa-sdk, rustformers/llm.

Is pytorch/executorch a good alternative to Gemma.cpp?

ExecuTorch is a lightweight C++ runtime for deploying PyTorch models on mobile, embedded, and edge hardware. It provides an ahead-of-time compilation pipeline that exports, quantizes, and lowers model graphs into compact serialized programs, then executes them through a minimal runtime with hardwar…

Is microsoft/onnxruntime a good alternative to Gemma.cpp?

This project is a cross-platform machine learning inference engine designed to execute pre-trained models across diverse operating systems and hardware environments. It functions as a standardized execution framework that manages the entire lifecycle of model inference, from loading and graph optim…

Is abetlen/llama-cpp-python a good alternative to Gemma.cpp?

llama-cpp-python provides a Python interface for the llama.cpp library, enabling the execution of large language models with hardware acceleration. It functions as a GGUF model loader and a structured text generator capable of running inference servers and multimodal runtimes for processing both te…

Is ggerganov/llama.cpp a good alternative to Gemma.cpp?

llama.cpp is a high-performance C++ inference engine and runtime for executing large language models locally across various hardware architectures. It provides the core components for local model execution, including a dedicated model quantizer for compressing weights into the GGUF format and a sys…

Is alibaba/mnn a good alternative to Gemma.cpp?

MNN is a high-performance inference engine and framework designed for on-device machine learning. It provides a comprehensive environment for executing, optimizing, and deploying neural network models directly on mobile and resource-constrained edge devices. The framework distinguishes itself thro…

Is openbmb/minicpm-v a good alternative to Gemma.cpp?

MiniCPM-V is a multimodal large language model and vision-language system designed for complex visual and linguistic understanding. It functions as an on-device AI model, providing the capacity to process text, images, and video as a compact neural network. The project is specifically developed as…

Is lostruins/koboldcpp a good alternative to Gemma.cpp?

KoboldCPP is a local large language model inference engine and GGUF model runner designed to execute quantized models on personal hardware. It functions as a multimodal AI server and API gateway, providing OpenAI-compatible endpoints that allow third-party clients to interact with locally hosted mo…

Is google-ai-edge/litert-lm a good alternative to Gemma.cpp?

LiteRT-LM is a high-performance inference framework designed to execute large language models locally on mobile, desktop, and IoT hardware. It serves as an on-device model runtime that utilizes CPU, GPU, and NPU acceleration to provide low-latency processing. The framework is distinguished by its…

Is nexaai/nexa-sdk a good alternative to Gemma.cpp?

The nexa-sdk is an on-device AI SDK and multimodal inference engine designed to run large language, vision, and audio models locally on mobile and desktop hardware. It functions as a local LLM runtime and NPU acceleration framework, enabling the execution of generative and discriminative models wit…

Is rustformers/llm a good alternative to Gemma.cpp?

This project is a library and command-line interface for local large language model inference. It enables the generation of text completions and chat responses from various model architectures. The project provides tools for weight quantization to reduce memory footprints and incorporates hardware…

Back to google/gemma.cpp

Open-source alternatives to Gemma.cpp

30 open-source projects similar to google/gemma.cpp, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Gemma.cpp alternative.

pytorch/executorch
pytorch/executorch
4,296View on GitHub
ExecuTorch is a lightweight C++ runtime for deploying PyTorch models on mobile, embedded, and edge hardware. It provides an ahead-of-time compilation pipeline that exports, quantizes, and lowers model graphs into compact serialized programs, then executes them through a minimal runtime with hardware acceleration and on-device large language model inference capabilities. The project distinguishes itself through a hardware accelerator delegate system that partitions model subgraphs and offloads computation to specialized backends including NPUs, GPUs, and DSPs from Apple, Arm, Intel, MediaTek,
Pythondeep-learningembeddedgpu
View on GitHub4,296
microsoft/onnxruntime
microsoft/onnxruntime
19,347View on GitHub
This project is a cross-platform machine learning inference engine designed to execute pre-trained models across diverse operating systems and hardware environments. It functions as a standardized execution framework that manages the entire lifecycle of model inference, from loading and graph optimization to hardware-accelerated execution and generative sequence management. The runtime distinguishes itself through a highly modular architecture that decouples model logic from hardware-specific kernels. By utilizing an execution provider abstraction, it enables developers to offload computation
C++ai-frameworkdeep-learninghardware-acceleration
View on GitHub19,347
abetlen/llama-cpp-python
abetlen/llama-cpp-python
9,993View on GitHub
llama-cpp-python provides a Python interface for the llama.cpp library, enabling the execution of large language models with hardware acceleration. It functions as a GGUF model loader and a structured text generator capable of running inference servers and multimodal runtimes for processing both text and image inputs. The project distinguishes itself through a local inference server that exposes model capabilities via an OpenAI-compatible web API. It supports advanced execution techniques including speculative decoding, weight quantization, and layer-based GPU offloading to manage memory acro
Python
View on GitHub9,993

Open-source alternatives to Gemma.cpp

pytorch/executorch

microsoft/onnxruntime

abetlen/llama-cpp-python

ggerganov/llama.cpp

alibaba/MNN

OpenBMB/MiniCPM-V

LostRuins/koboldcpp

google-ai-edge/LiteRT-LM

NexaAI/nexa-sdk

rustformers/llm

apple/ml-fastvlm

google/sentencepiece

ngxson/smolvlm-realtime-webcam

TingsongYu/PyTorch_Tutorial

xai-org/grok-1

RunanywhereAI/runanywhere-sdks

OpenBMB/MiniCPM

huggingface/transformers.js

ggerganov/whisper.cpp

openvinotoolkit/openvino

NVIDIA/Isaac-GR00T

moonshine-ai/moonshine

tiny-dnn/tiny-dnn

state-spaces/mamba

Zackriya-Solutions/meeting-minutes

BVLC/caffe

DLLXW/baby-llama2-chinese

d2l-ai/d2l-en

jbhuang0604/awesome-computer-vision

EleutherAI/gpt-neox