30 open-source projects similar to apple/coremltools, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Coremltools alternative.
TensorRT is a deep learning inference engine and software development kit designed to optimize and deploy neural networks for high-performance execution on NVIDIA GPUs. It functions as a GPU acceleration framework that reduces latency and increases throughput for trained models during production deployment. The toolkit imports models from the Open Neural Network Exchange format and transforms them into optimized engines. It utilizes graph-based model optimization, layer-fusion kernel generation, and precision-based quantization to convert floating point weights into lower precision formats.
ExecuTorch is a lightweight C++ runtime for deploying PyTorch models on mobile, embedded, and edge hardware. It provides an ahead-of-time compilation pipeline that exports, quantizes, and lowers model graphs into compact serialized programs, then executes them through a minimal runtime with hardware acceleration and on-device large language model inference capabilities. The project distinguishes itself through a hardware accelerator delegate system that partitions model subgraphs and offloads computation to specialized backends including NPUs, GPUs, and DSPs from Apple, Arm, Intel, MediaTek,
Corenet is a deep learning training framework and computer vision model library designed for developing neural networks across vision, text, and audio modalities. It functions as a distributed training orchestrator for scaling workloads across multiple compute nodes and provides a multimodal data pipeline for processing image, text, and video data. The project includes a model conversion toolkit for transforming weights and architectures between different machine learning frameworks. It also provides tools for optimizing model performance on Apple Silicon and reducing response latency in gene
Ivy is a machine learning framework transpiler and model converter designed to ensure deep learning portability. It serves as a tool for migrating source code and models between different deep learning frameworks while maintaining original functionality. The system enables cross-framework model portability by translating model weights, architectures, and source code. It uses abstract syntax tree based transpilation and computational graph tracing to capture execution flows and rewrite high-level logic into target framework code. The project covers model interoperability through weight-layout
This project is a comprehensive suite for neural speech synthesis, featuring a deep learning text-to-speech engine, a neural speech synthesis trainer, and a voice cloning toolkit. It provides a system for synthesizing human-like speech from text using neural network models and high-fidelity vocoders. The suite includes a speech model conversion utility to transform deep learning models between different formats for deployment across various hardware runtimes. It also provides a self-contained HTTP server to expose pre-trained text-to-speech models as a remote audio API. Capabilities include
This project is a cross-platform machine learning inference engine designed to execute pre-trained models across diverse operating systems and hardware environments. It functions as a standardized execution framework that manages the entire lifecycle of model inference, from loading and graph optimization to hardware-accelerated execution and generative sequence management. The runtime distinguishes itself through a highly modular architecture that decouples model logic from hardware-specific kernels. By utilizing an execution provider abstraction, it enables developers to offload computation
This project is a multimodal translation framework and large language model capable of speech-to-speech, speech-to-text, and text-to-text translation across nearly 100 languages. It provides a real-time speech translation engine and a comprehensive toolkit for converting spoken audio between languages. The system is distinguished by its ability to preserve the original speaker's tone, pace, and prosody during translation. It utilizes a specialized on-device inference toolkit that converts model checkpoints into C-based libraries, enabling low-latency execution on mobile and edge hardware with
TensorFlow.js is a JavaScript machine learning library used for training and deploying models in web browsers and server-side environments. It functions as a browser-based model trainer, a WebAssembly inference engine, and a WebGPU accelerated tensor library for low-level linear algebra. The project also includes a model converter to transform Python-based models into optimized formats for JavaScript execution. The library distinguishes itself through a pluggable backend architecture that allows mathematical operations to be executed via CPU, WebGL, or WebGPU. It supports the conversion of Py
This project is a deep learning model compiler and parser that converts ONNX models into optimized TensorRT engines. It functions as a bridge that maps standardized ONNX operators to vendor-specific kernels to enable high-performance inference on NVIDIA GPUs. The system operates as a GPU inference optimizer, selecting hardware-specific kernels and tuning memory allocation to maximize throughput. It transforms neural network graphs into serialized binary execution plans to reduce runtime overhead. The toolset covers deep learning model deployment and edge AI performance tuning. It includes ca
MMdnn is a deep learning model converter and migrator designed to translate neural network architectures and weights between different frameworks such as TensorFlow, PyTorch, and Keras. It utilizes a standardized intermediate representation to decouple network structures and weights from specific framework implementations, enabling the transformation of pre-trained models across different environments. The project distinguishes itself by generating native Python reconstruction code from its intermediate representations, allowing models to be rebuilt and fine-tuned in target environments. It a
This project is a collection of pre-trained machine learning models and conversion pipelines designed for running inference directly in the browser using TensorFlow.js. It provides a library of ready-to-use models for computer vision, audio classification, and natural language processing tasks. The suite includes specialized tools for transforming Python-based Keras models into JSON formats compatible with web environments. It enables the deployment of these models by fetching architectures and weight shards via HTTP for client-side execution. The project covers a broad range of capabilities
This project is a containerized local AI infrastructure stack designed to deploy large language models and vector databases on private hardware. It functions as an orchestration platform that combines AI runners, knowledge graphs, and a visual workflow builder for creating agentic chatflows and automating tasks via tool integration. The platform distinguishes itself through a low-code approach to agent orchestration, utilizing a visual interface to design complex sequences and connect agents to external tools and search engines. It includes a dedicated local observability stack to track promp
TNN is a deep learning inference framework designed to execute pre-trained neural networks across mobile, desktop, and server hardware. It functions as a hardware-accelerated runtime and model compression toolkit, providing a unified interface for deploying models in diverse environments. The framework includes an ONNX model converter to transform models from various training frameworks into a standardized internal format. It distinguishes itself through a combination of model compression tools—including weight quantization and static-code pruning—and a memory management system that reuses bu
Nebullvm is an AI inference accelerator, GPU resource orchestrator, and performance optimization library for large language models. It functions as an optimization layer designed to lower operational costs by aligning model execution with underlying hardware architectures. The system maximizes cluster efficiency through real-time dynamic partitioning and elastic quotas for shared hardware resources. It employs alignment methods and techniques to reduce the hardware and data requirements necessary for tuning large language models. The project covers broad capability areas including AI infrast
Ivy is a machine learning framework transpiler and model converter designed to translate code and computational graphs between different deep learning ecosystems. It serves as a portability tool for migrating model architectures and logic across competing frameworks to enable flexible deployment. The system achieves cross-framework conversion by utilizing abstract syntax tree analysis to rewrite source code and by employing a computational graph tracer to capture tensor flows and operation sequences during live execution. This process allows for the translation of both high-level model defini
LiteRT is a runtime and API for executing machine learning and generative AI models on mobile, desktop, and IoT hardware. It consists of an inference engine and a specialized environment for running quantized large language and diffusion models locally on edge hardware. The system includes an ahead-of-time model compiler that translates models into hardware-specific bytecode to reduce startup latency and memory overhead. It provides a unified interface for Neural Processing Units with automatic fallback routing to CPUs or GPUs when specific subgraph support is unavailable. An edge model conve
Neural Compressor is a deep learning model compression toolkit and AI inference acceleration engine. It functions as an automated model quantization tool and hardware-aware model compiler designed to reduce the memory footprint of neural networks and decrease execution latency. The project provides specialized frameworks for optimizing large language models, utilizing weight-only quantization and hardware-specific kernels to improve the operational efficiency of generative AI workloads. It maps neural network operators to specialized CPU and GPU vector instructions to accelerate model executi
tensorrtx is a computer vision inference engine and model implementation library designed for graphics processor acceleration. It provides a framework for optimizing deep learning models through a GPU inference optimizer, a deep learning model converter for transforming weights from frameworks like TensorFlow and PyTorch, and a custom plugin library to implement operations not natively supported by the TensorRT API. The project distinguishes itself through a comprehensive collection of pre-defined network implementations, ranging from various YOLO versions and DETR transformers for object det
This project is a vision language model framework and vision-to-text pipeline designed for deploying and optimizing models that process both images and text. It provides an on-device inference engine and a vision language model framework to run quantized models locally on mobile and desktop hardware accelerators. The framework features a model quantization toolkit to reduce weight precision for lower memory footprints and increased execution speed on specialized silicon. It also includes an efficient vision encoder utilizing a hybrid encoding system to compress image tokens, which reduces pro
DiffusionBee is a Stable Diffusion desktop client for macOS that functions as an AI image generator and editor. It allows for the local generation of images from text prompts and the management of diffusion models without requiring external cloud services or technical setup. The application includes a local diffusion model manager for importing and switching between custom trained model files to achieve specific artistic styles. It also features a system for tracking generation history and uploading assets to a public gallery. The software covers several image synthesis and manipulation work
Paddle-Lite is a deep learning inference engine and edge computing runtime designed to execute trained models on mobile and edge devices. It provides a hardware-accelerated inference framework and a decoupled runtime with a minimal binary footprint to operate in resource-constrained environments without third-party dependencies. The project includes a model quantization tool for reducing precision and size via static and dynamic quantization, as well as a computation graph optimizer. These tools reduce latency and memory usage by fusing operators and pruning the model intermediate representat
OpenVINO is an AI inference engine and model serving platform designed to execute optimized deep learning models across CPUs, GPUs, and NPUs through a unified API. It includes a model optimization toolkit for converting, quantizing, and compressing models from various frameworks, alongside a specialized generative AI runtime for large language models. The project distinguishes itself through a plugin-based hardware acceleration layer that maps neural network operations to vendor-specific drivers. It features advanced execution mechanisms such as continuous batching, speculative decoding, and
chineseocr is an end-to-end deep learning pipeline for detecting and recognizing Chinese and English text in images. The project combines text region detection using YOLOv3 with sequence-based recognition via Convolutional Recurrent Neural Networks (CRNN) and dense OCR models, forming a complete optical character recognition workflow. The pipeline includes orientation detection to handle text rotated at 0, 90, 180, or 270 degrees before recognition, and supports structured field extraction from identity cards and train tickets. A multi-framework model converter enables trained models to be co
chaiNNer is a GPU-accelerated AI image upscaling application that uses a visual node-based interface for constructing image processing pipelines. At its core, it provides a node-based visual programming environment where users connect processing nodes in a directed acyclic graph, with a graph execution scheduler that traverses the pipeline in topological order. The application includes an iterator-based batch processing system that automatically applies the same pipeline to multiple files, and a model format conversion pipeline that transforms neural network models between PyTorch, ONNX, and N
Cactus is an on-device AI inference engine designed for executing large language models, vision models, and speech-to-text systems on mobile and wearable hardware. It provides a programmable tensor computation graph for defining sequences of matrix operations and activation functions, alongside a local retrieval augmented generation framework that grounds model responses using local text files. The project features a multiplatform SDK with language bindings for integrating AI capabilities into mobile applications and a model conversion system that transforms external model formats for optimiz
Qwen is a comprehensive framework for large language model development, serving, and deployment. It provides a complete ecosystem for transformer-based sequence modeling, offering base models alongside specialized tools for instruction-tuned alignment, fine-tuning, and long-context inference. The project is designed to support both research and production environments, enabling users to train, optimize, and host generative models locally or across distributed hardware. The framework distinguishes itself through its focus on high-performance serving and extensibility. It features a high-perfor
MNN is a high-performance inference engine and framework designed for on-device machine learning. It provides a comprehensive environment for executing, optimizing, and deploying neural network models directly on mobile and resource-constrained edge devices. The framework distinguishes itself through a robust model optimization toolkit that supports quantization, compression, and structural graph manipulation to minimize memory footprint and maximize execution speed. It features a modular architecture that abstracts hardware-specific backends, allowing models to run efficiently across diverse
Paddle is a deep learning framework designed for building, training, and deploying neural networks. It provides a platform for constructing models using tensor-based computations and supports both dynamic and static execution graphs to facilitate research and production workflows. The platform functions as a distributed machine learning system, enabling the scaling of training workloads across multiple nodes and hardware clusters. It includes a comprehensive toolkit for model deployment and optimization, allowing users to convert external model formats, compress trained models for resource-co
Modular is a unified machine learning development platform designed for building, compiling, and deploying high-performance neural network models. It provides a comprehensive execution engine that supports both local and production-grade inference, enabling developers to manage the entire model lifecycle from initial architecture definition to scalable, containerized service deployment. The platform distinguishes itself through a hardware-agnostic runtime that abstracts diverse silicon architectures, allowing models to execute efficiently across varied compute environments. It includes a spec
ONNX is an open-source standard for machine learning interoperability that provides a unified format for representing neural network models. By defining a common set of operators and a standardized file structure, it enables models to be shared, exported, and executed consistently across different training frameworks and software ecosystems. The project functions as an intermediate representation layer that decouples model development from deployment. It utilizes a language-neutral binary serialization format to store model structures and weights, ensuring that computational graphs remain por