Nexa Sdk | Awesome Repository

The nexa-sdk is an on-device AI SDK and multimodal inference engine designed to run large language, vision, and audio models locally on mobile and desktop hardware. It functions as a local LLM runtime and NPU acceleration framework, enabling the execution of generative and discriminative models without reliance on cloud services.

The project distinguishes itself through a dedicated NPU acceleration framework that optimizes model execution on Neural Processing Units to reduce latency and power consumption. It employs hardware-agnostic backend routing to dynamically distribute computations across CPUs, GPUs, and NPUs, and supports GGUF-based model loading for efficient memory mapping and layer offloading.

Its capabilities cover a broad spectrum of AI tasks, including conversational text generation, text-to-image synthesis, and automatic speech recognition. It also provides tools for vector embedding generation and document reranking for local semantic search, as well as a REST-based inference server with an OpenAI-compatible interface for external integration.

The SDK supports cross-platform deployment across Android and Linux environments and includes a Python library for developer integration.

Features

Conversational Response Generation - Produces natural language responses for interactive dialogue through streaming or static interfaces.
NPU Acceleration - Integrates specialized neural processing units to optimize model inference performance and power efficiency on edge hardware.
OpenAI-Compatible APIs - Implements standard HTTP endpoints for model interaction and function calling in an OpenAI-compatible format.
NPU Inference Execution - Executes large language models on NPUs using native formats and GGUF layer offloading.

Features

Conversational Response Generation - Produces natural language responses for interactive dialogue through streaming or static interfaces.
NPU Acceleration - Integrates specialized neural processing units to optimize model inference performance and power efficiency on edge hardware.
OpenAI-Compatible APIs - Implements standard HTTP endpoints for model interaction and function calling in an OpenAI-compatible format.
NPU Inference Execution - Executes large language models on NPUs using native formats and GGUF layer offloading.

The SDK supports cross-platform deployment across Android and Linux environments and includes a Python library for developer integration.