# cactus-compute/cactus

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/cactus-compute-cactus).**

5,363 stars · 431 forks · C++ · NOASSERTION

## Links

- GitHub: https://github.com/cactus-compute/cactus
- Homepage: https://cactuscompute.com
- awesome-repositories: https://awesome-repositories.com/repository/cactus-compute-cactus.md

## Topics

`ai` `android` `arm` `edge` `edge-ai` `framework` `ios` `llamacpp` `llm` `llm-inference` `llms` `mobile` `mobile-inference` `on-device-ai` `quantiz` `rag` `smartphone` `speech` `transformer` `whisper`

## Description

Cactus is an on-device AI inference engine designed for executing large language models, vision models, and speech-to-text systems on mobile and wearable hardware. It provides a programmable tensor computation graph for defining sequences of matrix operations and activation functions, alongside a local retrieval augmented generation framework that grounds model responses using local text files.

The project features a multiplatform SDK with language bindings for integrating AI capabilities into mobile applications and a model conversion system that transforms external model formats for optimized local execution. It utilizes a hybrid routing system to redirect workloads between on-device execution and cloud-based providers based on hardware capacity.

The engine covers a broad capability surface including on-device audio processing for voice activity detection and transcription, vector embedding generation for similarity search, and tool integration for parsing model outputs into external function calls. These processes are supported by optimized native kernels tuned for low-latency performance on mobile hardware.

## Tags

### Artificial Intelligence & ML

- [Local AI Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/local-and-on-device-inference/local-ai-inference.md) — Executes large language and vision models directly on mobile and wearable hardware using optimized kernels. ([source](https://docs.cactuscompute.com/v1.7/))
- [On-Device Inference Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/on-device-inference-engines.md) — Serves as an on-device AI inference engine for executing large language, vision, and speech models on mobile and wearable hardware.
- [Multimodal Input Processing](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-model-inference/multimodal-input-processing.md) — Performs inference on image and sound data to enable visual understanding and speech-to-text capabilities. ([source](https://docs.cactuscompute.com/v1.13/))
- [Chat Completion Services](https://awesome-repositories.com/f/artificial-intelligence-ml/chat-completion-services.md) — Produces natural language conversational responses based on chat history and configurable generation options. ([source](https://docs.cactuscompute.com/v1.11/))
- [Retrieval-Augmented Generation](https://awesome-repositories.com/f/artificial-intelligence-ml/conversational-interfaces/retrieval-augmented-generation.md) — Grounds model responses using locally stored text documents and directories to provide context-aware generation. ([source](https://docs.cactuscompute.com/v1.13/))
- [RAG Document Retrieval](https://awesome-repositories.com/f/artificial-intelligence-ml/documentation-retrieval-engines/rag-document-retrieval.md) — Retrieves relevant snippets from local text files to provide grounded context for LLM responses.
- [Inference Optimization Kernels](https://awesome-repositories.com/f/artificial-intelligence-ml/inference-optimization-kernels.md) — Utilizes native kernels tuned for low-latency, energy-efficient mathematical operations on mobile hardware.
- [Local RAG Implementations](https://awesome-repositories.com/f/artificial-intelligence-ml/local-rag-implementations.md) — Provides a local retrieval augmented generation framework that grounds model responses using local text files without cloud access.
- [Local Inference Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/serving-and-runtime/large-language-model-optimization/local-inference-engines.md) — Provides an optimized runtime for executing large language models and vision models locally on consumer mobile hardware.
- [On-Device Speech-to-Text SDKs](https://awesome-repositories.com/f/artificial-intelligence-ml/on-device-models/on-device-speech-to-text-sdks.md) — Provides on-device speech-to-text transcription using locally executed models on mobile and wearable hardware.
- [RAG Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/rag-frameworks.md) — Provides a framework for building local retrieval augmented generation systems that ground responses in local directories.
- [Speech Transcription](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-transcription.md) — Provides local on-device speech-to-text transcription services with low-latency execution. ([source](https://cdn.jsdelivr.net/gh/cactus-compute/cactus@main/README.md))
- [Vector Embeddings](https://awesome-repositories.com/f/artificial-intelligence-ml/vector-embeddings.md) — Generates numerical vector representations of text, visual, and speech inputs for similarity search and retrieval. ([source](https://docs.cactuscompute.com/v1.14/))
- [AI Integration Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-integration-tools.md) — Connects local AI models to external system functions and tools to perform actions based on model outputs.
- [Model Request Routing](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-model-clients/model-request-routing.md) — Redirects inference requests to cloud providers when local hardware capacity is insufficient. ([source](https://docs.cactuscompute.com/v1.14/))
- [Cross-Framework Model Conversion](https://awesome-repositories.com/f/artificial-intelligence-ml/cross-framework-model-conversion.md) — Transforms external model formats into representations optimized for mobile and wearable hardware.
- [Function Calling Interfaces](https://awesome-repositories.com/f/artificial-intelligence-ml/function-calling-interfaces.md) — Parses model outputs into structured function calls to interact with external system tools. ([source](https://cdn.jsdelivr.net/gh/cactus-compute/cactus@main/README.md))
- [Hybrid Local-Remote AI Routing](https://awesome-repositories.com/f/artificial-intelligence-ml/hybrid-local-remote-ai-routing.md) — Routes AI workloads between local on-device execution and cloud-based providers based on hardware capacity.
- [Local Speech-to-Text](https://awesome-repositories.com/f/artificial-intelligence-ml/local-speech-to-text.md) — Includes a low-latency on-device transcription system for converting audio input into text.
- [On-Device Speech Recognizers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/speech-processing/automatic-speech-recognition/on-device-speech-recognizers.md) — Performs local speech-to-text transcription and voice activity detection on handheld and wearable devices.
- [Voice Activity Detection](https://awesome-repositories.com/f/artificial-intelligence-ml/voice-activity-detection.md) — Identifies periods of human speech within audio streams to trigger transcription and downstream processing. ([source](https://docs.cactuscompute.com/v1.10/))

### Mobile Development

- [AI Integration SDKs](https://awesome-repositories.com/f/mobile-development/ai-integration-sdks.md) — Ships a multiplatform SDK with language bindings for integrating local AI capabilities into mobile applications.
- [Mobile Framework Integrations](https://awesome-repositories.com/f/mobile-development/mobile-model-deployment/mobile-framework-integrations.md) — Offers native software kits to integrate AI capabilities into handheld and wearable operating systems. ([source](https://docs.cactuscompute.com/v1.12/))
- [Mobile Model Format Converters](https://awesome-repositories.com/f/mobile-development/mobile-model-format-converters.md) — Transforms external model formats into optimized representations compatible with local mobile and wearable hardware execution. ([source](https://docs.cactuscompute.com/))

### Scientific & Mathematical Computing

- [Tensor Computation Graphs](https://awesome-repositories.com/f/scientific-mathematical-computing/high-performance-execution-environments/scientific-computing-platforms/computational-frameworks/tensor-computation-graphs.md) — Allows defining sequences of tensor operations and activation functions as computational graphs for local execution. ([source](https://docs.cactuscompute.com/v2.0/))
- [Graph-Based Execution Engines](https://awesome-repositories.com/f/scientific-mathematical-computing/high-performance-execution-environments/scientific-computing-platforms/graph-based-execution-engines.md) — Executes mathematical workflows as a sequence of tensor operations and activation functions via directed acyclic graphs.

### Software Engineering & Architecture

- [Language Bindings](https://awesome-repositories.com/f/software-engineering-architecture/language-bindings.md) — Provides multiplatform software development kits and language bindings to connect the core engine to external applications. ([source](https://docs.cactuscompute.com/))