# Edge and Mobile ML Deployment

> Search results for `deploy ML models to edge and mobile devices` on awesome-repositories.com. 115 total matches; showing the first 50.

Explore on the web: https://awesome-repositories.com/q/deploy-ml-models-to-edge-and-mobile-devices

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [this search on awesome-repositories.com](https://awesome-repositories.com/q/deploy-ml-models-to-edge-and-mobile-devices).**

## Results

- [jomjol/ai-on-the-edge-device](https://awesome-repositories.com/repository/jomjol-ai-on-the-edge-device.md) (8,461 ⭐) — AI-on-the-edge-device is an edge AI meter digitizer and computer vision image processor designed to convert images of analog and digital utility meters into numeric values. It functions as an IoT gateway that runs neural network inference locally on hardware to monitor water, power, and gas readings.

The system is distinguished by its ability to handle both analog pointers and digital digits through custom-trained neural networks. It includes specialized tools for image alignment, region-of-interest extraction, and hardware-level lighting control to minimize glare on glass surfaces. To maintain performance on edge hardware, the project employs integer-quantized models.

The project covers a broad operational surface, including data processing for reading validation and rollover correction, and a variety of integration methods such as MQTT, REST APIs, and webhooks. It also features comprehensive device management capabilities, including GPIO control, a web-based administration dashboard, and Prometheus metrics exportation for hardware telemetry.

The system supports over-the-air firmware updates and provides a wireless access point for initial network and system configuration.
- [google-ai-edge/litert](https://awesome-repositories.com/repository/google-ai-edge-litert.md) (2,561 ⭐) — LiteRT is a runtime and API for executing machine learning and generative AI models on mobile, desktop, and IoT hardware. It consists of an inference engine and a specialized environment for running quantized large language and diffusion models locally on edge hardware.

The system includes an ahead-of-time model compiler that translates models into hardware-specific bytecode to reduce startup latency and memory overhead. It provides a unified interface for Neural Processing Units with automatic fallback routing to CPUs or GPUs when specific subgraph support is unavailable. An edge model converter transforms trained models into optimized formats for deployment on resource-constrained devices.

The project covers model optimization through format conversion and post-training quantization to reduce binary size. It manages hardware acceleration through automatic accelerator selection and zero-copy memory optimizations to eliminate CPU memory copying. The framework also supports custom operator definitions through a low-level kernel interface to extend the supported mathematical operations.
- [google-ai-edge/gallery](https://awesome-repositories.com/repository/google-ai-edge-gallery.md) (15,162 ⭐) — This project is a development framework for building edge-based AI agents that perform multimodal inference and system-level automation directly on mobile devices. By prioritizing local-first execution, the platform ensures data privacy and offline functionality, allowing developers to run large language models on hardware without requiring external server connectivity.

The framework distinguishes itself through an integrated orchestration layer that connects language models to custom tools, scripts, and native device intents. It provides a structured registry for mapping natural language instructions to executable code, enabling agents to perform proactive tasks, trigger system actions, and interact with local or remote services. To support complex workflows, the platform includes sandboxed script execution and dynamic webview rendering, allowing models to generate and display interactive interfaces within the conversation flow.

Beyond core inference, the system offers comprehensive utilities for managing and benchmarking local model files, including tools for prompt engineering and performance tuning. It also features diagnostic capabilities that visualize the internal reasoning traces of models and provide debugging logs for script execution. The platform is designed with security in mind, incorporating native credential management and repository access controls to maintain compliance while processing sensitive data locally.
- [apache/mxnet](https://awesome-repositories.com/repository/apache-mxnet.md) (20,829 ⭐) — This project is a deep learning framework designed for constructing, training, and deploying neural networks across diverse hardware environments. It functions as a high-performance tensor computation library that provides both imperative and symbolic programming interfaces, allowing developers to balance flexible, step-by-step model building with the efficiency of compiled computation graphs.

The framework distinguishes itself through a hybrid execution engine that integrates declarative graph compilation with imperative runtime logic. It supports scalable, distributed training across multiple compute nodes and devices, utilizing a shared key-value store and sophisticated synchronization strategies to manage parameters and gradient updates. The system is built on a language-agnostic native core, ensuring consistent performance and behavior when accessed through its various language bindings.

Beyond core training and inference, the project includes comprehensive tools for managing data pipelines, including utilities for streaming, resizing, and prefetching datasets from local or cloud storage. It also provides extensive monitoring, profiling, and visualization capabilities to track performance metrics, inspect intermediate outputs, and identify bottlenecks during the development process.

The software is designed for production-grade deployment, offering support for model serialization, mobile optimization, and secure execution environments. It includes specialized memory planning and hardware-specific tuning to maximize throughput and minimize resource usage across CPUs and graphics cards.
- [harvard-edge/cs249r_book](https://awesome-repositories.com/repository/harvard-edge-cs249r-book.md) (20,217 ⭐) — This project is a comprehensive educational framework designed to teach the design, deployment, and performance optimization of machine learning systems. It provides a structured curriculum that covers the full stack of artificial intelligence engineering, ranging from the construction of core framework components like tensors and automatic differentiation engines to the orchestration of large-scale distributed training clusters.

The platform distinguishes itself through its integration of physics-grounded systems modeling and interactive simulation environments. Users can experiment with distributed training strategies, analyze communication overhead, and perform economic modeling to estimate the total cost of ownership, energy consumption, and reliability of hardware clusters. By combining these analytical tools with hands-on embedded hardware kits and browser-based notebooks, the project enables students to bridge the gap between theoretical architecture and practical deployment on resource-constrained edge devices.

Beyond core training, the project offers a broad suite of capabilities for evaluating machine learning operations. This includes tools for assessing inference latency, quantifying environmental impact, and optimizing production workloads across diverse environments. The curriculum is supported by extensive pedagogical resources, including lecture materials, assessment banks, and interview preparation scenarios that focus on hardware selection and parallel scaling strategies.

The project is maintained as an open-source repository, providing version-controlled educational content and modular software components that allow for collaborative development and adaptation by the academic community.
- [openbmb/minicpm-v](https://awesome-repositories.com/repository/openbmb-minicpm-v.md) (25,653 ⭐) — MiniCPM-V is a multimodal large language model and vision-language system designed for complex visual and linguistic understanding. It functions as an on-device AI model, providing the capacity to process text, images, and video as a compact neural network.

The project is specifically developed as an edge AI framework, utilizing quantization and weight sharding to run on memory-constrained mobile chipsets. This allows for the deployment of multimodal intelligence directly on mobile operating systems for local inference.

Its capabilities cover multimodal content analysis of high-resolution images and high-frame-rate video, as well as real-time voice interaction. The system includes speech synthesis for voice cloning, prosody control, and the ability to maintain natural dialogue across simultaneous video and audio streams.
- [paulescu/hands-on-train-and-deploy-ml](https://awesome-repositories.com/repository/paulescu-hands-on-train-and-deploy-ml.md) (885 ⭐) — Train and Deploy an ML REST API to predict crypto prices, in 10 steps
- [gam-team/gam](https://awesome-repositories.com/repository/gam-team-gam.md) (4,206 ⭐) — GAM is a command-line tool for administering Google Workspace and Cloud Identity. It translates command-line arguments into structured API calls, enabling administrators to manage users, groups, organizational units, and domain settings across a Google Workspace environment. The tool handles authentication through OAuth2 flows, service accounts, and workload identity federation, and supports multi-tenant configurations for managing multiple domains or cloud projects from a single installation.

GAM distinguishes itself through its batch processing and automation capabilities. It can process large datasets from CSV files, Google Sheets, or cloud storage, distributing independent API requests across parallel worker threads for efficient execution. The tool supports template-based string substitution for personalizing content like email signatures, regex-based resource filtering for targeting specific users or files, and external script extensibility for implementing custom workflows beyond the built-in command set. It also provides keyless authentication methods, allowing short-lived tokens from external identity providers to replace static service account keys.

The tool covers a broad range of administrative domains including user account lifecycle management, group and membership administration, Drive file and folder operations, calendar event management, Gmail configuration and message handling, Google Classroom course administration, Chrome browser and device policy management, and Google Chat space management. It also includes capabilities for managing Shared Drives, contacts, tasks, forms, Google Meet spaces, and Google Vault matters, holds, and exports. Reporting and auditing features allow extraction of activity logs, usage statistics, and security alerts across workspace services.

Documentation is available through a built-in help system that displays the tool version and the path to the local command syntax file, along with a link to the online wiki.
- [balena-io/etcher](https://awesome-repositories.com/repository/balena-io-etcher.md) (33,872 ⭐) — Etcher is a cross-platform utility designed for creating bootable media by flashing raw disk images onto USB drives and SD cards. It functions as a desktop application that provides a graphical interface for low-level storage device management, ensuring data integrity through built-in validation during the writing process.

The application utilizes a unified interface layer to map high-level commands to native system utilities, allowing it to operate consistently across different operating systems. It employs a stream-based data pipeline to pipe image contents directly to storage media, which minimizes memory usage during large write operations. To maintain system security, the tool delegates administrative disk access tasks to a background process.

Beyond image deployment, the software includes capabilities for storage device maintenance, such as clearing partition tables and reformatting corrupted or unusable drives. It is distributed through various native package managers and community repositories across Windows, macOS, and Linux environments.
- [tensorflow/tensorflow](https://awesome-repositories.com/repository/tensorflow-tensorflow.md) (195,697 ⭐) — TensorFlow is a comprehensive machine learning framework designed for the construction, training, and deployment of complex mathematical models. It utilizes a graph-based execution model that represents operations as directed acyclic graphs, enabling automatic differentiation and efficient parallel processing. The system provides high-level interfaces for defining neural network architectures, alongside a robust engine for managing multidimensional array structures and tensor mathematics.

The framework distinguishes itself through a scalable distributed runtime that orchestrates workloads across heterogeneous hardware accelerators and decentralized network nodes. It employs deferred-execution symbolic graphs to perform graph-level optimizations, fusion, and ahead-of-time kernel compilation for specific hardware architectures. To ensure consistent performance across production environments, it features a standardized serialization format for model graphs and specialized tools for model serving, quantization, and compression.

Beyond core training capabilities, the platform includes a high-throughput data ingestion engine that supports asynchronous, multi-threaded pipelines to prevent bottlenecks. It also offers extensive support for hardware abstraction, allowing for pluggable device integration and containerized acceleration. The ecosystem is rounded out by utilities for data validation, federated learning, and specialized modeling tasks, providing a complete toolchain for moving models from research into high-availability production environments.
- [googlecloudplatform/click-to-deploy](https://awesome-repositories.com/repository/googlecloudplatform-click-to-deploy.md) (0 ⭐) — Source for Google Click to Deploy solutions listed on Google Cloud Marketplace.
- [trainindata/deploying-machine-learning-models](https://awesome-repositories.com/repository/trainindata-deploying-machine-learning-models.md) (0 ⭐) — Accompanying repo for the online course Deployment of Machine Learning Models.
- [openbmb/minicpm-o](https://awesome-repositories.com/repository/openbmb-minicpm-o.md) (23,850 ⭐) — MiniCPM-o is a multimodal large language model designed to function as a real-time conversational assistant on edge devices. By mapping text, image, video, and audio inputs into a unified latent space, the system enables simultaneous cross-modal reasoning and full-duplex interaction. It is built as an edge-side inference engine, utilizing quantized model weights to maintain high-performance processing on consumer hardware.

The system distinguishes itself through its integrated speech synthesis and voice cloning capabilities, which allow for the generation of expressive, personalized vocal output from short audio samples without additional training. Users can modulate the emotional tone, speed, and emphasis of synthesized speech in real time using latent prosody control tokens. Furthermore, the model supports the adoption of specific personas and roles, facilitating immersive, situation-aware dialogue.

Beyond its core conversational features, the framework provides tools for proactive visual assistance, such as monitoring environments to trigger navigation or scheduling alerts. The architecture is configurable, allowing for adjustments to visual token compression and frame sampling rates to balance accuracy and speed. The project supports fine-tuning for specialized domains, enabling developers to adapt the model to custom tasks using standard training frameworks.
- [huggingface/candle](https://awesome-repositories.com/repository/huggingface-candle.md) (19,422 ⭐) — Candle is a minimalist machine learning framework and deep learning inference engine designed for the Rust programming language. It functions as a low-level tensor computation library, providing the necessary primitives for multi-dimensional array operations and mathematical transformations required to execute pre-trained neural network models.

The framework distinguishes itself through a focus on memory efficiency and hardware utilization. It employs static-typed tensor operations to enforce shape validation and memory safety at compile time, while utilizing a lazy-loaded computational graph to minimize overhead. By implementing zero-copy memory mapping and ahead-of-time model compilation, the library reduces data duplication and eliminates interpretation latency during the inference phase.

The engine supports cross-platform deployment by routing mathematical operations through a modular backend dispatcher. This allows for the execution of complex neural networks across diverse hardware, including CPUs, GPUs, and specialized accelerators, making it suitable for resource-constrained edge environments. The project is distributed as a library for Rust, enabling the integration of machine learning capabilities into systems where performance and low resource consumption are required.
- [google-ai-edge/model-explorer](https://awesome-repositories.com/repository/google-ai-edge-model-explorer.md) (1,504 ⭐) — A modern model graph visualizer and debugger
- [edge-classic/edge-classic](https://awesome-repositories.com/repository/edge-classic-edge-classic.md) (0 ⭐) — EDGE-Classic is a Doom source port that provides advanced features, ease of modding, and attractive visuals while keeping hardware requirements very modest. It is a revival of the EDGE 1.35 codebase for modern systems.
- [fingerprintjs/fingerprintjs](https://awesome-repositories.com/repository/fingerprintjs-fingerprintjs.md) (27,334 ⭐) — Fingerprint is a visitor identification and fraud detection platform that generates persistent, unique identifiers by analyzing browser and device attributes. By extracting technical signals from the client environment, it enables reliable user tracking across sessions without relying on traditional cookies.

The platform distinguishes itself through its focus on high-accuracy identification and security-first architecture. It employs edge-side proxying to bypass ad-blockers and privacy restrictions, ensuring consistent data collection. To maintain data integrity, it uses cryptographic payload sealing and server-side verification flows, which prevent tampering by ensuring that identification data is processed securely on the backend rather than solely on the client.

Beyond core identification, the project provides a comprehensive suite for bot detection and security. It analyzes network metadata, device reputation, and behavioral patterns to identify malicious traffic, AI agents, and automated scrapers. These capabilities are supported by granular risk assessment tools, including confidence scoring and protection rulesets that allow for automated blocking of suspicious interactions.

The platform offers extensive administrative and integration features, including multi-environment resource isolation, regional data residency controls, and programmatic API management. It supports diverse deployment environments through framework-specific SDKs, mobile integration, and automated proxy infrastructure deployment.
- [apache/tvm](https://awesome-repositories.com/repository/apache-tvm.md) (13,497 ⭐) — TVM is a machine learning compiler framework designed to convert deep learning models from various frameworks into optimized machine code. It functions as a cross-platform deployment engine that transforms high-level model definitions into efficient, hardware-specific binaries for diverse computing architectures.

The system utilizes a multi-level compilation pipeline that decouples algorithm logic from hardware implementation through tensor-operator abstractions. It employs a graph-level intermediate representation to perform cross-operator optimizations and memory planning before lowering computations to target-specific instructions. To maximize performance, the framework includes an automated schedule space search that explores potential loop transformations and hardware mappings, alongside a lightweight virtual machine runtime for consistent model execution.

This toolkit supports the deployment of computational workloads across a wide range of devices, including CPUs, GPUs, and specialized accelerators. It provides capabilities for cross-compiling models for various operating systems and processor architectures, facilitating the development of high-performance machine learning applications for resource-constrained edge devices.
- [tensorflow/model-optimization](https://awesome-repositories.com/repository/tensorflow-model-optimization.md) (1,573 ⭐) — A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
- [paddlepaddle/paddledetection](https://awesome-repositories.com/repository/paddlepaddle-paddledetection.md) (14,243 ⭐) — PaddleDetection is an object detection framework designed for the end-to-end development, training, and deployment of computer vision models. It provides a comprehensive library of modular neural network architectures and pipelines that support object detection, instance segmentation, and multi-object tracking tasks.

The project distinguishes itself through a configuration-driven approach that decouples model components like backbones and heads, allowing for the flexible assembly of custom vision workflows. It incorporates advanced techniques such as anchor-free detection logic, joint detection-embedding architectures for tracking, and knowledge distillation to improve student model efficiency. To ensure consistent performance in real-time scenarios, the framework includes temporal prediction smoothing and multi-scale feature aggregation.

The toolkit covers a broad capability surface, including automated training schedules, distributed training support, and extensive data augmentation strategies. It provides specialized tools for analyzing human and vehicle activity, estimating poses, and monitoring traffic patterns. Users can optimize models for diverse environments through quantization, pruning, and export options for standardized inference runtimes.

The repository includes a model zoo of pre-trained architectures and supports deployment across server, mobile, and edge hardware via C++ and hardware-accelerated runtimes.
- [emqx/emqx](https://awesome-repositories.com/repository/emqx-emqx.md) (16,422 ⭐) — This project is a high-performance MQTT broker and IoT data platform designed to manage millions of concurrent device connections. It provides a scalable infrastructure for ingesting, processing, and routing telemetry data across distributed systems, utilizing an actor-based concurrency model to maintain high availability and state synchronization across cluster nodes.

The platform distinguishes itself through integrated stream processing and edge computing capabilities. It allows users to execute declarative SQL-based rules directly against incoming message streams for real-time filtering, transformation, and routing. Furthermore, it functions as an industrial connectivity hub and edge gateway, enabling local data processing, inference, and protocol bridging to normalize data from heterogeneous devices before it reaches cloud or enterprise systems.

Beyond core messaging, the platform encompasses a broad suite of operational tools including multi-tenant resource isolation, comprehensive security controls, and durable message delivery. It supports complex data lifecycles through persistent queues, schema validation, and direct integration with various storage backends for long-term archiving and time-series analysis.

The system provides a unified interface for global infrastructure monitoring and automated fleet orchestration. It is designed for flexible deployment across on-premise, cloud, and serverless environments, offering command-line tools to manage configuration, scaling, and system health.
- [likec4/likec4](https://awesome-repositories.com/repository/likec4-likec4.md) (2,723 ⭐) — likec4 is an architecture-as-code framework that transforms text-based architecture definitions into interactive diagrams, static websites, and image files. It serves as a system architecture visualizer and C4 model diagram generator, allowing users to define software components, boundaries, and deployment infrastructure using a domain-specific language.

The tool distinguishes itself by providing a modeling environment with Language Server Protocol integration for real-time validation and autocomplete. It enables interactive architecture documentation where users can navigate through hierarchical system views via drill-down exploration and scoped views.

Beyond basic visualization, the framework covers deployment modeling to map logical software components to physical infrastructure. It includes capabilities for interaction sequence visualization, architectural drift detection to identify discrepancies between design and implementation, and the generation of React or web components for embedding diagrams into applications.

A command-line interface is provided for automated diagram generation in CI/CD pipelines and hosting a local preview server for real-time updates.
- [sipeed/nanokvm](https://awesome-repositories.com/repository/sipeed-nanokvm.md) (6,061 ⭐) — NanoKVM is a KVM-over-IP device that provides remote keyboard, video, and mouse control over IP networks for headless server management. It functions as remote server management hardware enabling out-of-band control of a computer's power, BIOS, and operating system over a network, while also serving as a RISC-V single-board computer for embedded and edge applications. The device additionally operates as an AI edge inference device running neural network models locally for real-time image recognition and object detection, and integrates Tailscale as a VPN appliance for secure peer-to-peer connections without requiring a public IP address.

The system delivers comprehensive remote server management through high-resolution video streaming with hardware-accelerated encoding, remote keyboard and mouse control via USB HID emulation, and ATX power control through GPIO pins for remote power cycling. It provides remote BIOS access, serial console access through a web-based terminal, ISO image mounting via virtual USB drives, and Wake-on-LAN host wakeup. Network connectivity is established through wired Ethernet or wireless connections, with support for static IP assignment, mDNS service discovery, and HTTPS secure access using TLS certificates.

For remote access beyond the local network, NanoKVM integrates Tailscale VPN for peer-to-peer connections and Cloudflare Tunnel for secure outbound-only proxying, both configurable through a browser-based interface. The device supports custom domain assignment for public internet exposure, DNS configuration via boot-partition files, and SSH access toggling through web settings. Automation capabilities include custom script execution for device tasks, while monitoring features display connection status and performance metrics on an OLED screen with automatic service restart on failure.

The system is updated through firmware and software update mechanisms via SD card, USB, or web interface, with initial setup performed by flashing a disk image to an SD card.
- [pdollar/edges](https://awesome-repositories.com/repository/pdollar-edges.md) (839 ⭐) — Structured Edge Detection Toolbox
- [kubeedge/kubeedge](https://awesome-repositories.com/repository/kubeedge-kubeedge.md) (7,487 ⭐) — KubeEdge is a distributed edge computing framework that extends Kubernetes to manage containerized workloads and hardware devices at the edge. It functions as a Kubernetes edge orchestration system, allowing the deployment and management of applications across distributed edge nodes using native Kubernetes APIs and workflows.

The project distinguishes itself through a specialized focus on IoT integration and node autonomy. It employs digital-twin state modeling to represent physical hardware devices as virtual objects, utilizing an MQTT-based messaging bus for communication with heterogeneous devices. To ensure operational stability during network instability or cloud disconnections, it implements local metadata caching and state persistence, allowing edge nodes to maintain local application operations independently.

The framework provides a comprehensive set of capabilities covering cloud-edge networking via WebSocket and QUIC protocols, distributed device management, and container lifecycle orchestration. It further includes tools for remote pod debugging, centralized node status reporting, and the management of storage volumes and resource reclamation at the edge.
- [google-ai-edge/litert-lm](https://awesome-repositories.com/repository/google-ai-edge-litert-lm.md) (5,619 ⭐) — LiteRT-LM is Google's production-ready, high-performance, open-source inference framework for deploying Large Language Models on edge devices.
- [paddlepaddle/paddle-lite](https://awesome-repositories.com/repository/paddlepaddle-paddle-lite.md) (7,260 ⭐) — Paddle-Lite is a deep learning inference engine and edge computing runtime designed to execute trained models on mobile and edge devices. It provides a hardware-accelerated inference framework and a decoupled runtime with a minimal binary footprint to operate in resource-constrained environments without third-party dependencies.

The project includes a model quantization tool for reducing precision and size via static and dynamic quantization, as well as a computation graph optimizer. These tools reduce latency and memory usage by fusing operators and pruning the model intermediate representation.

The system supports mixed-hardware model deployment, scheduling computations across CPUs, GPUs, and NPUs. It further optimizes performance through hardware-specific kernel implementations and a scheduling system that distributes tasks across available accelerators.
- [c0re100/qbittorrent-enhanced-edition](https://awesome-repositories.com/repository/c0re100-qbittorrent-enhanced-edition.md) (25,128 ⭐) — qBittorrent-Enhanced-Edition is a cross-platform desktop application designed to manage the downloading and uploading of files across peer-to-peer networks. It functions as an open-source file sharer, facilitating the decentralized distribution of digital content by breaking files into smaller pieces for efficient transfer.

The application utilizes a high-performance library to handle complex protocol specifications and employs a mature widget toolkit to provide a consistent native user interface across Windows, macOS, and Linux. It operates as a network traffic manager, incorporating asynchronous event-driven networking and multi-threaded task scheduling to maintain high throughput and system responsiveness during large-scale data transfers.

Beyond core file sharing, the software includes capabilities for automated content acquisition, remote management via web browsers, and granular bandwidth control. It supports extensible search functionality through external scripts and maintains state integrity using a local relational database for metadata storage.
- [michaelhenry/deploy-to-cocoapods-github-action](https://awesome-repositories.com/repository/michaelhenry-deploy-to-cocoapods-github-action.md) (37 ⭐) — Github action for deploying to Cocoapods.org
- [openbmb/minicpm](https://awesome-repositories.com/repository/openbmb-minicpm.md) (9,464 ⭐) — MiniCPM is a collection of small language models designed for local, on-device deployment in resource-constrained environments. The project focuses on running dense Transformer models on consumer hardware, including GPUs, CPUs, and Apple Silicon, without requiring custom code forks.

The project distinguishes itself through heavy optimization for edge hardware, utilizing quantized weight compression in GGUF and MLX formats to reduce memory overhead. It implements advanced inference techniques such as speculative sampling and radix-tree prefix caching to accelerate generation speed and throughput.

Capability areas cover the full model lifecycle, including supervised fine-tuning and preference optimization via parameter-efficient LoRA adapters. The system supports structured tool calling for external agent integration and provides various serving options, including OpenAI-compatible APIs, REST endpoints, and a command-line interface.

The implementation includes tools for converting model checkpoints between formats and distributing training workloads across multiple GPUs.
- [agiresearch/aios](https://awesome-repositories.com/repository/agiresearch-aios.md) (5,168 ⭐) — AIOS is an LLM agent operating system and orchestration kernel designed to manage memory, resource scheduling, and tool execution for multiple autonomous AI agents. It serves as a comprehensive framework for developing and deploying agents, featuring a dedicated resource manager that coordinates model backends, GPU memory, and isolated kernel instances.

The system distinguishes itself through a semantic memory engine that uses vector search and autonomous clustering for long-term knowledge management, and a semantic file system that allows users to control computer files and system operations via natural language. It also implements a virtualization layer for multi-kernel scheduling and provides a compatibility layer to run agents developed in third-party frameworks.

Broad capabilities include a unified model provider interface for routing requests across cloud and local backends, a tool orchestrator for executing external functions with structured JSON output, and secure virtual machine sandboxing for system interactions. The project also provides mechanisms for agent and tool distribution through remote hubs and a command-line interface for local testing and management.
- [imputnet/cobalt](https://awesome-repositories.com/repository/imputnet-cobalt.md) (41,096 ⭐) — Cobalt is a cross-platform web application designed as a distributed service platform for managing media content downloading. It functions as a full-stack monorepo that integrates a backend API with a responsive frontend, providing a unified interface for users to fetch and save media files from various online platforms.

The project utilizes a modular architecture where backend services, frontend interfaces, and shared logic are organized into decoupled packages within a single repository. This monorepo structure employs centralized workspace orchestration to manage dependencies and cross-package builds, ensuring consistent versioning across the entire application. The backend exposes structured RESTful API endpoints to handle data operations, while the frontend is delivered as pre-compiled static assets for client-side rendering.

The system supports containerized deployment and environment-variable configuration, allowing for consistent execution and self-hosted instances across different infrastructures. Comprehensive technical documentation is included within the repository to guide the deployment and operation of the service.
- [matomo-org/device-detector](https://awesome-repositories.com/repository/matomo-org-device-detector.md) (3,494 ⭐) — The Universal Device Detection library will parse any User Agent and detect the browser, operating system, device used (desktop, tablet, mobile, tv, cars, console, etc.), brand and model.
- [rebelinblue/deployer](https://awesome-repositories.com/repository/rebelinblue-deployer.md) (907 ⭐) — Deployer is a free and open source deployment tool.
- [microsoft/onnxruntime](https://awesome-repositories.com/repository/microsoft-onnxruntime.md) (19,347 ⭐) — This project is a cross-platform machine learning inference engine designed to execute pre-trained models across diverse operating systems and hardware environments. It functions as a standardized execution framework that manages the entire lifecycle of model inference, from loading and graph optimization to hardware-accelerated execution and generative sequence management.

The runtime distinguishes itself through a highly modular architecture that decouples model logic from hardware-specific kernels. By utilizing an execution provider abstraction, it enables developers to offload computations to specialized hardware such as GPUs, NPUs, and dedicated chipsets. It also provides a comprehensive toolkit for model optimization, including quantization, precision conversion, and graph-level transformations, which allow for significant reductions in binary size and latency for both edge and cloud deployments.

Beyond core inference, the project includes extensive support for generative AI, offering built-in capabilities for tokenization, chat template formatting, and streaming output generation. It supports complex model architectures through custom operator registration and modular adapter management, ensuring that developers can integrate specialized mathematical operations or fine-tuned model weights into their pipelines.

The software is built primarily in C++ and provides language-specific bindings to facilitate integration into various programming environments. It includes robust diagnostic and profiling tools that allow for granular performance analysis, hardware utilization tracking, and debugging of tensor data during the inference process.
- [firerpa/lamda](https://awesome-repositories.com/repository/firerpa-lamda.md) (7,834 ⭐) — This project is an Android RPA framework designed for automating user interfaces and system tasks on rooted Android devices using Python and ADB. It provides a suite of tools for rooted device management, allowing for programmatic control of system settings, application lifecycles, and shell command execution via a remote API.

The framework distinguishes itself through a combination of dynamic instrumentation and AI integration. It can inject scripts into running processes to hook Java interfaces and modifies application behavior in real time. Additionally, it supports large language model integration through a standardized protocol, enabling the translation of natural language prompts into executable device actions.

The system covers a broad range of capabilities, including network traffic analysis via man-in-the-middle proxies, remote administration with real-time screen streaming and touch simulation, and a comprehensive security analysis toolset for binary patching and disassembly. It also provides an emulated Debian runtime environment for native code compilation and a variety of UI automation primitives such as optical character recognition and image-based element location.

The framework supports remote connectivity through VPNs, port forwarding, and a WebSocket-based control interface.
- [deployphp/deployer](https://awesome-repositories.com/repository/deployphp-deployer.md) (11,077 ⭐) — Deployer is a PHP deployment tool and SSH-based deployment automator used to push applications to remote servers and automate the provisioning of hosting environments. It functions as a zero-downtime deployment manager that utilizes symbolic links to switch between application versions, ensuring continuous site availability.

The system employs pre-defined deployment recipes tailored to the specific requirements of popular PHP web frameworks. This framework-specific automation allows for the execution of task sequences designed for particular software environments.

The tool covers remote server provisioning, host-based target mapping, and stateful release versioning to allow for rollbacks. It includes a plugin-based extension system for integrating external monitoring and notification tools into the deployment pipeline.
- [neuphonic/neutts](https://awesome-repositories.com/repository/neuphonic-neutts.md) (6,007 ⭐) — Neutts is a neural text-to-speech engine designed for real-time streaming output on edge devices such as phones and laptops. It supports voice cloning from short audio references, enabling zero-shot reproduction of a target speaker's voice, and can be fine-tuned or retrained from scratch for custom voices and styles.

The system distinguishes itself through a decoder-only architecture that halves memory and accelerates generation on constrained hardware, combined with quantized model inference for reduced memory footprint. Its streaming decoder loop interleaves synthesis with playback, delivering minimal latency. Additionally, each generated utterance can embed an inaudible or perceptible audio watermark to verify synthetic origin and traceability.

Beyond core synthesis, neutts offers capabilities such as pre-encoding reference audio to skip encoding on repeated runs, and full model customization through fine-tuning on paired text-audio data. The project provides tools for adapting the model to edge deployment and supporting on-device real-time speech generation.
- [allegroai/clearml](https://awesome-repositories.com/repository/allegroai-clearml.md) (6,733 ⭐) — ClearML is a comprehensive MLOps platform designed to manage the entire machine learning lifecycle. It functions as an experiment tracking tool, a data versioning system, and a pipeline orchestrator, while providing infrastructure for GPU cluster management and model serving.

The platform is distinguished by its ability to handle hybrid-cloud compute scheduling and fractional GPU allocation, allowing multiple workloads to share a single hardware accelerator. It employs a metadata-based approach to data versioning, using virtual views to track large datasets and artifacts without duplicating raw files.

The system covers a broad range of capabilities including automated machine learning pipeline orchestration via task-graph dependencies, hyperparameter optimization, and distributed model training. It also provides an integrated AI workbench for remote development and a centralized control plane for tracking models from training through to production deployment.

Governance and observability are integrated through multi-tenant resource isolation, role-based access control, and real-time monitoring of compute resources and model performance.
- [ekhoo/device](https://awesome-repositories.com/repository/ekhoo-device.md) (1,729 ⭐) — Light weight tool for detecting the current device and screen size written in swift.
- [nvidia/triton-inference-server](https://awesome-repositories.com/repository/nvidia-triton-inference-server.md) (10,756 ⭐) — Triton Inference Server is a high-performance AI model inference server and multi-framework model runtime designed for deploying machine learning models across cloud, data center, and embedded edge infrastructure. It serves as an execution engine that allows for the concurrent running of models from various frameworks to optimize hardware utilization.

The project features a dynamic batching inference engine that groups individual requests into larger batches to increase total processing throughput. It also provides a model ensemble pipeline, which enables the chaining of multiple models together to create complex data processing and inference sequences.

The server covers broader capabilities including model lifecycle management through a central storage repository, performance monitoring for hardware utilization and latency, and the ability to integrate in-process via native APIs. It supports routing requests through standard web protocols and utilizes shared memory for efficient data exchange.
- [clearml/clearml](https://awesome-repositories.com/repository/clearml-clearml.md) (6,740 ⭐) — ClearML is a comprehensive MLOps platform designed to manage the end-to-end machine learning lifecycle, from initial experimentation to production deployment. It provides a suite of integrated tools including a pipeline orchestrator for automating workflows, an experiment tracking tool for logging hyperparameters and metrics, and a metadata-driven data versioning system for managing large-scale datasets and model artifacts.

The platform is distinguished by its advanced compute management and serving capabilities. It features a GPU compute manager that supports fractional resource slicing and priority scheduling across hybrid cloud environments. Additionally, it includes a dedicated serving framework for hosting large language models and agentic workflows through secure APIs with integrated autoscaling.

The system covers a broad range of operational capabilities, including real-time infrastructure cost tracking, multi-tenant resource isolation, and automated execution environment reproduction. It also provides observability tools for monitoring inference endpoints, auditing AI workflows, and analyzing system-level hardware utilization.

The orchestration engine can be deployed via containerized or cloud-image based installations to host the platform's lifecycle infrastructure.
- [huggingface/lerobot](https://awesome-repositories.com/repository/huggingface-lerobot.md) (21,687 ⭐) — This project is a comprehensive research platform designed for the end-to-end lifecycle of robotic learning. It provides a modular framework for training neural network policies—specifically through imitation and reinforcement learning—and deploying them onto physical robotic hardware. By offering a unified interface for hardware abstraction, the platform decouples high-level control logic from the specific sensors and actuators of diverse robotic systems.

The framework distinguishes itself through a standardized approach to data and policy management. It utilizes a consistent schema for recording and sharing interaction data, which includes synchronized video and state information. To support complex training requirements, it features distributed optimization across multiple graphics processing units and a kinematic engine that handles coordinate transformations between joint space and Cartesian systems. These capabilities are complemented by a flexible architecture that allows for the modular design of vision-language-action models.

Beyond core training, the platform includes extensive utilities for data processing, such as observation standardization and action normalization, ensuring compatibility across different environments and hardware configurations. It also provides integrated tools for benchmarking performance through standardized rollout loops and evaluation scripts. For resource-constrained hardware, the system supports remote inference streaming, allowing computational workloads to be offloaded to external servers while maintaining real-time control.
- [born-ml/born](https://awesome-repositories.com/repository/born-ml-born.md) (100 ⭐) — Born is a modern ML framework for Go — train and deploy models as single binaries. Pure Go, zero CGO, GPU accelerated.
- [greenrobot/eventbus](https://awesome-repositories.com/repository/greenrobot-eventbus.md) (24,760 ⭐) — EventBus is a publish-subscribe messaging library designed to facilitate decoupled communication between components in Java applications. It functions as a central hub where producers dispatch events that are routed to subscribers based on the class type of the payload. By using annotation-based markers, the system maps event handlers to specific data types, allowing different parts of an application to exchange information without requiring direct references between classes.

The library distinguishes itself through a focus on performance and execution control. It utilizes a compile-time indexing mechanism that generates static lookup tables, replacing slow runtime reflection with direct method calls to accelerate message routing. Furthermore, it provides a thread-aware dispatcher that allows developers to configure whether event handlers execute on the main interface thread, in background pools, or synchronously within the posting thread.

Beyond basic routing, the system supports advanced messaging patterns including priority-ordered delivery and sticky events. Sticky events maintain a memory-based cache of recent data, ensuring that late-registering subscribers automatically receive the most current state upon initialization. The library also offers granular control over the event lifecycle, enabling developers to cancel event propagation or manage custom thread pools and error handling strategies to maintain application responsiveness.
- [apple/ml-fastvlm](https://awesome-repositories.com/repository/apple-ml-fastvlm.md) (7,375 ⭐) — This project is a vision language model framework and vision-to-text pipeline designed for deploying and optimizing models that process both images and text. It provides an on-device inference engine and a vision language model framework to run quantized models locally on mobile and desktop hardware accelerators.

The framework features a model quantization toolkit to reduce weight precision for lower memory footprints and increased execution speed on specialized silicon. It also includes an efficient vision encoder utilizing a hybrid encoding system to compress image tokens, which reduces processing time and memory usage.

The system covers a broad range of capabilities, including model export for hardware-specific and silicon-optimized formats, vision encoder optimization, and template-based prompt engineering. It supports vision-language tasks such as visual question answering, visual content description, and inference latency tracking to measure time-to-first-token performance.
- [huggingface/smollm](https://awesome-repositories.com/repository/huggingface-smollm.md) (3,624 ⭐) — SmolLM is a project dedicated to the development of small language models. It focuses on training and fine-tuning compact models that maintain high performance while utilizing fewer parameters.

The project emphasizes efficient AI inference and on-device text generation, aiming to enable the deployment of lightweight models on edge devices with limited memory and processing power. It utilizes synthetic data generation to produce artificial datasets that improve the reasoning and training of these AI systems.

The system supports a variety of optimization and training capabilities, including weight quantization, parameter-efficient fine-tuning, and mixed-precision compute. It also covers multilingual text processing and the management of long context windows.
- [dongwookim-ml/python-topic-model](https://awesome-repositories.com/repository/dongwookim-ml-python-topic-model.md) (374 ⭐) — Implementation of various topic models
- [infrasys-ai/aisystem](https://awesome-repositories.com/repository/infrasys-ai-aisystem.md) (17,017 ⭐) — AISystem is a comprehensive AI full-stack infrastructure project covering the entire pipeline from AI chip architecture to high-level training frameworks. It encompasses the development of AI compiler frameworks, inference engines, and distributed training orchestrators designed to coordinate workloads across a heterogeneous compute stack of CPUs, GPUs, and NPUs.

The project focuses on the deep integration of software and hardware, employing software-hardware co-design to align tensor layouts with physical memory structures. It provides specialized capabilities for accelerating Transformer models and Mixture of Experts through dedicated engines and sparse computation acceleration.

Its broader scope includes multi-dimensional distributed parallelism for large-scale model training, high-performance inference optimization via quantization and pruning, and advanced memory management techniques such as tiled memory and unified memory spaces. It also addresses hardware interconnects and collective communication primitives to scale compute clusters.

The project is primarily implemented and documented via Jupyter Notebooks.
- [pytorch/vision](https://awesome-repositories.com/repository/pytorch-vision.md) (17,743 ⭐) — This project is a comprehensive computer vision library for the PyTorch ecosystem, providing a standardized collection of neural network architectures, datasets, and high-performance transformation utilities. It serves as a foundational framework for building, training, and deploying deep learning models, offering a centralized model registry that allows developers to instantiate architectures with pre-trained weights for tasks such as image classification, object detection, and semantic segmentation.

The library distinguishes itself through its modular approach to data and compute management. It features composable transformation pipelines that sequence complex image processing and augmentation operations into unified execution flows, ensuring consistent data preparation. To maximize performance, the project utilizes hardware-agnostic tensor abstractions and automated kernel-level execution dispatch, which selects and registers optimized compute kernels to ensure efficient hardware utilization across diverse environments.

Beyond core vision tasks, the project supports a broad capability surface including distributed training collectives for scaling large-scale models across multiple nodes and devices. It also provides extensive tooling for model optimization, including weight quantization, efficient inference compilation, and support for deploying models to resource-constrained edge devices. The framework is designed for extensibility, allowing users to integrate custom media backends and external tools to support specialized computer vision workflows.
