# nexaai/nexa-sdk

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/nexaai-nexa-sdk).**

7,721 stars · 948 forks · Kotlin · apache-2.0

## Links

- GitHub: https://github.com/NexaAI/nexa-sdk
- Homepage: https://docs.nexa.ai/
- awesome-repositories: https://awesome-repositories.com/repository/nexaai-nexa-sdk.md

## Topics

`gemma3` `go` `gpt-oss` `granite4` `llama` `llama3` `llm` `on-device-ai` `phi3` `qwen3` `qwen3vl` `sdk` `stable-diffusion` `vlm`

## Description

The nexa-sdk is an on-device AI SDK and multimodal inference engine designed to run large language, vision, and audio models locally on mobile and desktop hardware. It functions as a local LLM runtime and NPU acceleration framework, enabling the execution of generative and discriminative models without reliance on cloud services.

The project distinguishes itself through a dedicated NPU acceleration framework that optimizes model execution on Neural Processing Units to reduce latency and power consumption. It employs hardware-agnostic backend routing to dynamically distribute computations across CPUs, GPUs, and NPUs, and supports GGUF-based model loading for efficient memory mapping and layer offloading.

Its capabilities cover a broad spectrum of AI tasks, including conversational text generation, text-to-image synthesis, and automatic speech recognition. It also provides tools for vector embedding generation and document reranking for local semantic search, as well as a REST-based inference server with an OpenAI-compatible interface for external integration.

The SDK supports cross-platform deployment across Android and Linux environments and includes a Python library for developer integration.

## Tags

### Artificial Intelligence & ML

- [Conversational Response Generation](https://awesome-repositories.com/f/artificial-intelligence-ml/language-model-response-generators/response-generation-configurations/conversational-response-generation.md) — Produces natural language responses for interactive dialogue through streaming or static interfaces. ([source](https://docs.nexa.ai/en/nexa-sdk-python/quickstart.md))
- [NPU Acceleration](https://awesome-repositories.com/f/artificial-intelligence-ml/npu-acceleration.md) — Integrates specialized neural processing units to optimize model inference performance and power efficiency on edge hardware.
- [OpenAI-Compatible APIs](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/model-integration-serving/model-integration-interfaces/ai-integration-apis/openai-compatible-apis.md) — Implements standard HTTP endpoints for model interaction and function calling in an OpenAI-compatible format. ([source](https://docs.nexa.ai/en/nexa-sdk-go/overview.md))
- [NPU Inference Execution](https://awesome-repositories.com/f/artificial-intelligence-ml/hardware-accelerated-inference/npu-inference-execution.md) — Executes large language models on NPUs using native formats and GGUF layer offloading. ([source](https://docs.nexa.ai/en/nexa-sdk-android/APIReference_NPU.md))
- [Hardware Acceleration Backends](https://awesome-repositories.com/f/artificial-intelligence-ml/hardware-acceleration-backends.md) — Implements hardware-agnostic routing to distribute model computations across CPU, GPU, and NPU backends. ([source](https://docs.nexa.ai/en/nexa-sdk-android/APIReference.md))
- [On-Device Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/inference-clients/on-device-inference.md) — Executes large language, vision, and audio models directly on local mobile or desktop hardware.
- [Local Model Runtimes](https://awesome-repositories.com/f/artificial-intelligence-ml/local-model-runtimes.md) — Acts as a local runtime for executing language models in GGUF and other formats on-device.
- [External Model Loading](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning-model-formats/external-model-loading.md) — Imports and initializes pre-trained models from standardized open formats and external repositories. ([source](https://docs.nexa.ai/en/nexa-sdk-go/gguf.md))
- [Multimodal Inference Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/inference-servers-and-runtimes/multimodal-inference-engines.md) — Functions as a multimodal engine capable of processing combined text, image, and audio inputs locally.
- [Local Model Inference Servers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/local-and-on-device-inference/local-model-inference-servers.md) — Provides a REST-based local inference server with an OpenAI-compatible interface for chat, embeddings, and reranking.
- [Remote Model Hubs](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/model-hubs-and-pre-made-models/model-management-utilities/remote-model-hubs.md) — Downloads pre-trained model snapshots from centralized remote hubs to a local directory for deployment. ([source](https://docs.nexa.ai/en/nexa-sdk-ios/quickstart.md))
- [Memory-Mapped Loading](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/data-and-checkpointing/model-loading/memory-mapped-loading.md) — Loads large language models using memory-mapped I/O and layer offloading via the GGUF format.
- [Model Format Parsers](https://awesome-repositories.com/f/artificial-intelligence-ml/model-format-parsers.md) — Interprets and translates diverse machine learning model file formats to ensure cross-architecture compatibility. ([source](https://docs.nexa.ai/en/nexa-sdk-go/overview.md))
- [Multimodal Analysis Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/multimodal-analysis-tools.md) — Processes combined text, image, and audio inputs for visual understanding and complex multimodal reasoning.
- [Multimodal Analytical Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/multimodal-analytical-pipelines.md) — Features a unified multimodal pipeline for processing text, image, and audio data streams.
- [Backend-Agnostic Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/neural-network-research/neural-network-toolkits/backend-agnostic-engines.md) — Implements a backend-agnostic engine that decouples neural network operations from specific hardware backends for cross-platform execution.
- [On-Device Models](https://awesome-repositories.com/f/artificial-intelligence-ml/on-device-models.md) — Provides a comprehensive SDK for running large language, vision, and audio models locally on mobile and desktop hardware.
- [Generative Media Runtimes](https://awesome-repositories.com/f/artificial-intelligence-ml/on-device-models/generative-media-runtimes.md) — Produces synthetic text, images, and speech locally using generative models without cloud reliance.
- [Sampling Controls](https://awesome-repositories.com/f/artificial-intelligence-ml/probabilistic-modeling/sampling-controls.md) — Adjusts temperature, token selection, and chat templates to control the style and accuracy of generated text. ([source](https://docs.nexa.ai/en/nexa-sdk-python/overview.md))
- [Automatic Speech Recognition](https://awesome-repositories.com/f/artificial-intelligence-ml/automatic-speech-recognition.md) — Converts spoken audio into written text across multiple languages using batch or real-time streaming. ([source](https://docs.nexa.ai/en/nexa-sdk-ios/APIReference.md))
- [Computer Vision](https://awesome-repositories.com/f/artificial-intelligence-ml/computer-vision-systems/computer-vision.md) — Processes visual data to identify objects or extract text via optical character recognition. ([source](https://docs.nexa.ai/en/nexa-sdk-python/api-reference.md))
- [Document Rerankers](https://awesome-repositories.com/f/artificial-intelligence-ml/document-rerankers.md) — Reranks document lists based on relevance to improve retrieval accuracy in semantic search. ([source](https://docs.nexa.ai/en/nexa-sdk-python/quickstart.md))
- [Text-to-Image Generators](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/diffusion-visual-models/generative-ai-pipelines/text-to-image-generators.md) — Generates high-resolution images from natural language text prompts using hardware-optimized diffusion models. ([source](https://docs.nexa.ai/en/nexa-sdk-python/quickstart))
- [Image Generation APIs](https://awesome-repositories.com/f/artificial-intelligence-ml/image-generation-apis.md) — Provides API endpoints to create visual imagery from text prompts using optimized diffusion models. ([source](https://docs.nexa.ai/en/nexa-sdk-go/NexaAPI.md))
- [Command Line Inference Interfaces](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/local-and-on-device-inference/command-line-inference-interfaces.md) — Provides a terminal-based interface for executing text generation, audio analysis, and speech transcription. ([source](https://docs.nexa.ai/en/nexa-sdk-go/NexaCLI.md))
- [Incremental Inference Streaming](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving/model-integration-pipelines/model-inference/inference-result-processors/incremental-inference-streaming.md) — Sends model outputs to users incrementally as tokens are generated to reduce perceived latency. ([source](https://cdn.jsdelivr.net/gh/nexaai/nexa-sdk@main/README.md))
- [Local Model Lifecycle Managers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-management/local-model-lifecycle-managers.md) — Includes tools for managing the local lifecycle of models, including listing and removing cached versions. ([source](https://docs.nexa.ai/en/nexa-sdk-go/NexaCLI.md))
- [Model Downloaders](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-management/model-downloaders.md) — Facilitates the retrieval of model weights and quantization files from remote hubs for local storage. ([source](https://docs.nexa.ai/en/nexa-sdk-go/NexaCLI))
- [Audio Transcription](https://awesome-repositories.com/f/artificial-intelligence-ml/npu-acceleration/audio-transcription.md) — Offers NPU-accelerated automatic speech recognition for efficient local audio transcription. ([source](https://docs.nexa.ai/en/nexa-sdk-android/APIReference_NPU.md))
- [Document Reranking](https://awesome-repositories.com/f/artificial-intelligence-ml/npu-acceleration/document-reranking.md) — Uses NPU-accelerated reranker models to score document relevance against specific queries locally. ([source](https://docs.nexa.ai/en/nexa-sdk-android/APIReference_NPU.md))
- [Optical Character Recognition](https://awesome-repositories.com/f/artificial-intelligence-ml/npu-acceleration/optical-character-recognition.md) — Performs NPU-accelerated text extraction from images using local computer vision models. ([source](https://docs.nexa.ai/en/nexa-sdk-android/APIReference_NPU.md))
- [Vector Embedding Generation](https://awesome-repositories.com/f/artificial-intelligence-ml/npu-acceleration/vector-embedding-generation.md) — Utilizes NPU acceleration to convert text strings into high-dimensional vector representations for local semantic search. ([source](https://docs.nexa.ai/en/nexa-sdk-android/APIReference_NPU.md))
- [Vision Language Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/npu-acceleration/vision-language-inference.md) — Processes combined text and image inputs through NPU-accelerated vision language models. ([source](https://docs.nexa.ai/en/nexa-sdk-android/APIReference_NPU.md))
- [Text-to-Speech](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech.md) — Synthesizes natural human speech from text input using on-device generative models. ([source](https://docs.nexa.ai/en/nexa-sdk-python/api-reference.md))
- [Vector Embeddings](https://awesome-repositories.com/f/artificial-intelligence-ml/vector-embeddings.md) — Creates high-dimensional numerical representations of text to enable semantic search and retrieval-augmented generation. ([source](https://docs.nexa.ai/en/nexa-sdk-ios/APIReference.md))

### Data & Databases

- [Inference State Management](https://awesome-repositories.com/f/data-databases/storage-engines/key-value/inference-state-caching/inference-state-management.md) — Manages internal key-value caches and state to optimize the performance of local model responses. ([source](https://docs.nexa.ai/en/nexa-sdk-python/overview))
- [Vector Search](https://awesome-repositories.com/f/data-databases/vector-search.md) — Implements on-device vector embedding generation and document reranking for semantic search.

### DevOps & Infrastructure

- [Cross-Platform Deployment Targets](https://awesome-repositories.com/f/devops-infrastructure/cross-platform-deployment-targets.md) — Supports deployment of AI capabilities across multiple operating systems using native SDKs and containers. ([source](https://cdn.jsdelivr.net/gh/nexaai/nexa-sdk@main/README.md))
- [On-Device Model Management](https://awesome-repositories.com/f/devops-infrastructure/ml-model-hosting/on-device-model-management.md) — Manages the local downloading, storage, and versioning of model snapshots and quantization versions on device.

### Mobile Development

- [Android Platform Integrations](https://awesome-repositories.com/f/mobile-development/android-ecosystem/android-platform-integrations.md) — Integrates AI model execution into Android applications and embedded platforms for local processing. ([source](https://docs.nexa.ai/))

### Web Development

- [REST APIs](https://awesome-repositories.com/f/web-development/rest-apis.md) — Provides a REST API to perform chat completions, generate embeddings, and execute reranking tasks. ([source](https://docs.nexa.ai/en/nexa-sdk-docker/quickstart))

### Part of an Awesome List

- [Visual Analysis](https://awesome-repositories.com/f/awesome-lists/ai/multimodal-inference/visual-analysis.md) — Processes images and text simultaneously to perform multimodal reasoning and visual understanding. ([source](https://docs.nexa.ai/en/nexa-sdk-ios/APIReference.md))
- [Model Deployment Tools](https://awesome-repositories.com/f/awesome-lists/devtools/model-deployment-tools.md) — Token-compressed models for efficient on-device inference.

### Development Tools & Productivity

- [Python Library Integrations](https://awesome-repositories.com/f/development-tools-productivity/python-library-integrations.md) — Exposes model management and inference capabilities through a Python library for application integration. ([source](https://docs.nexa.ai/))