30 open-source projects similar to microsoft/mmdnn, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best MMdnn alternative.
Tensorspace is a WebGL-based 3D visualization framework and renderer designed to map deep learning model architectures and tensor data into interactive three-dimensional spaces. It serves as a neural network architecture visualizer and model inspector, allowing users to render model topologies and analyze data flow within a web browser. The project distinguishes itself through its ability to convert pre-trained Keras and TensorFlow models into spatial representations. It integrates with TensorFlow.js to execute inference in the browser, enabling the real-time visualization of intermediate act
Ivy is a machine learning framework transpiler and model converter designed to ensure deep learning portability. It serves as a tool for migrating source code and models between different deep learning frameworks while maintaining original functionality. The system enables cross-framework model portability by translating model weights, architectures, and source code. It uses abstract syntax tree based transpilation and computational graph tracing to capture execution flows and rewrite high-level logic into target framework code. The project covers model interoperability through weight-layout
PocketFlow is an integrated toolkit for deep learning model compression, distributed training, and mobile format optimization. It provides a system for reducing the size and complexity of neural networks to improve inference efficiency, featuring a dedicated engine for knowledge distillation and a mobile model optimizer. The framework differentiates itself through an automated hyperparameter tuning system that uses reinforcement learning and statistical models to determine optimal compression ratios and layer-wise bit allocation. It also includes a distributed training system that utilizes mu
YOLOv6 is a single-stage deep learning framework designed for industrial object detection. It serves as a computer vision model trainer for identifying and locating objects within images, as well as an instance segmentation tool that delineates precise object boundaries using masks. The project includes a specialized mobile inference optimizer and a model quantization toolkit. These components focus on reducing model size and resolution to improve execution speed on ARM-based chipsets and converting models to low-precision formats to decrease file size. The framework covers a broad range of
This project is a collection of pre-trained machine learning models and conversion pipelines designed for running inference directly in the browser using TensorFlow.js. It provides a library of ready-to-use models for computer vision, audio classification, and natural language processing tasks. The suite includes specialized tools for transforming Python-based Keras models into JSON formats compatible with web environments. It enables the deployment of these models by fetching architectures and weight shards via HTTP for client-side execution. The project covers a broad range of capabilities
Cactus is an on-device AI inference engine designed for executing large language models, vision models, and speech-to-text systems on mobile and wearable hardware. It provides a programmable tensor computation graph for defining sequences of matrix operations and activation functions, alongside a local retrieval augmented generation framework that grounds model responses using local text files. The project features a multiplatform SDK with language bindings for integrating AI capabilities into mobile applications and a model conversion system that transforms external model formats for optimiz
tflearn is a deep learning framework and high-level API wrapper for TensorFlow. It provides a toolkit for designing neural network architectures and a system for executing training loops and optimizing model weights across CPUs and GPUs. The project simplifies the process of building and training models through a modular interface and a high-level API for prototyping. It includes specialized utilities for deep learning visualization, allowing for the generation of graphical diagrams to analyze network structures, weights, gradients, and activations. The framework covers a broad range of capa
This project is a comprehensive suite for neural speech synthesis, featuring a deep learning text-to-speech engine, a neural speech synthesis trainer, and a voice cloning toolkit. It provides a system for synthesizing human-like speech from text using neural network models and high-fidelity vocoders. The suite includes a speech model conversion utility to transform deep learning models between different formats for deployment across various hardware runtimes. It also provides a self-contained HTTP server to expose pre-trained text-to-speech models as a remote audio API. Capabilities include
OpenVINO is an AI inference engine and model serving platform designed to execute optimized deep learning models across CPUs, GPUs, and NPUs through a unified API. It includes a model optimization toolkit for converting, quantizing, and compressing models from various frameworks, alongside a specialized generative AI runtime for large language models. The project distinguishes itself through a plugin-based hardware acceleration layer that maps neural network operations to vendor-specific drivers. It features advanced execution mechanisms such as continuous batching, speculative decoding, and
ONNX is an open-source standard for machine learning interoperability that provides a unified format for representing neural network models. By defining a common set of operators and a standardized file structure, it enables models to be shared, exported, and executed consistently across different training frameworks and software ecosystems. The project functions as an intermediate representation layer that decouples model development from deployment. It utilizes a language-neutral binary serialization format to store model structures and weights, ensuring that computational graphs remain por
coremltools is a conversion toolkit and translator designed to transform machine learning models from various frameworks into the Core ML format for execution on Apple hardware. It provides a suite of tools for migrating weights and architectures from external libraries into a deployable model format. The project includes an optimization tool and a programmatic interface for editing model graphs and modifying metadata to improve performance on target hardware. It also features a validation suite used to check model specifications and operation compatibility to ensure correct execution within
ExecuTorch is a lightweight C++ runtime for deploying PyTorch models on mobile, embedded, and edge hardware. It provides an ahead-of-time compilation pipeline that exports, quantizes, and lowers model graphs into compact serialized programs, then executes them through a minimal runtime with hardware acceleration and on-device large language model inference capabilities. The project distinguishes itself through a hardware accelerator delegate system that partitions model subgraphs and offloads computation to specialized backends including NPUs, GPUs, and DSPs from Apple, Arm, Intel, MediaTek,
rust-cuda is a GPU programming framework and device compiler that allows for the development and execution of high-performance kernels on NVIDIA hardware using Rust. It provides a driver wrapper to manage device memory allocation and kernel launching, effectively serving as a system for writing GPU compute logic without relying on C++. The project includes a compute library with hardware-optimized primitives for neural network acceleration and hardware-accelerated raytracing. It utilizes a compilation toolchain that translates source code into a low-level intermediate representation for execu
This project is a vision language model framework and vision-to-text pipeline designed for deploying and optimizing models that process both images and text. It provides an on-device inference engine and a vision language model framework to run quantized models locally on mobile and desktop hardware accelerators. The framework features a model quantization toolkit to reduce weight precision for lower memory footprints and increased execution speed on specialized silicon. It also includes an efficient vision encoder utilizing a hybrid encoding system to compress image tokens, which reduces pro
cnn-explainer is an interactive web application and educational sandbox designed for visualizing the internal operations and layers of convolutional neural networks. It functions as a tool for understanding how these networks process image data through real-time graphics and interactive visualizations. The project includes a browser-based environment for training small convolutional neural networks on specific image classes. It also provides a model converter that transforms trained neural network files from backend framework formats into web-compatible versions for browser loading. The appl
RF-DETR is a Python library for training and deploying object detection, instance segmentation, and keypoint detection models built on a vision transformer architecture. It provides a unified command-line interface and Python API for the full workflow, from fine-tuning pretrained checkpoints on custom datasets to running inference on images, video files, and live camera streams. The project supports training on datasets in COCO or YOLO format, with automatic format detection and configurable augmentation pipelines. Models can be exported to ONNX, TFLite, or TensorRT for deployment across edge
This project is a multimodal translation framework and large language model capable of speech-to-speech, speech-to-text, and text-to-text translation across nearly 100 languages. It provides a real-time speech translation engine and a comprehensive toolkit for converting spoken audio between languages. The system is distinguished by its ability to preserve the original speaker's tone, pace, and prosody during translation. It utilizes a specialized on-device inference toolkit that converts model checkpoints into C-based libraries, enabling low-latency execution on mobile and edge hardware with
OpenChat is a framework for the training, fine-tuning, and deployment of large language models optimized for conversational and mathematical reasoning tasks. It provides a comprehensive lifecycle for these models, ranging from training pipelines and deployment stacks to a web-based chat interface. The project focuses on enabling high-performance model execution on consumer-grade hardware without the need for enterprise-grade accelerators. It includes a production-ready inference server that implements the OpenAI chat completion protocol and utilizes dynamic request batching to optimize hardwa
MXNet is a deep learning framework and distributed machine learning engine designed for training and deploying neural networks. It functions as a hardware-agnostic backend that allows for the development of deep learning models through a hybrid of symbolic and imperative programming. The system distinguishes itself through automatic distributed parallelism, which scales training workloads across multiple GPUs and machines. It features an extensible hardware backend interface that enables the integration of custom accelerators and proprietary libraries without modifying the core source code.
KServe is a Kubernetes-native platform for deploying and serving machine learning models as scalable inference services. It supports both generative AI models, including large language models, and traditional predictive models from frameworks such as TensorFlow, PyTorch, Scikit-Learn, XGBoost, and ONNX. The platform manages the full lifecycle of model deployments, including revision tracking, canary rollouts, A/B testing, and automatic rollbacks, and provides serverless scale-to-zero capabilities for cost-efficient resource management. KServe distinguishes itself through a standardized infere
This project is a browser-based machine learning education tool and neural network sandbox. It provides an interactive environment for experimenting with network architectures and hyperparameters to understand deep learning concepts. The tool functions as a visualizer for TensorFlow neural networks, allowing users to see how models learn and classify data in real time. It enables the prototyping of model architectures to observe how different hidden layers and neurons affect a network's ability to solve specific data patterns. The system covers neural network architecture and operation visua
This project is a collection of pretrained reinforcement learning agents and training scripts built on Stable Baselines3 and Gymnasium. It provides a framework for training agents to solve specific tasks, managing experiment reproducibility, and deploying pretrained models. The system includes a specialized benchmarking suite and optimization tools for tuning agent settings. It utilizes automated search spaces and distributed trials to maximize performance, while employing bootstrap sampling to generate statistically robust performance metrics and confidence intervals. Broad capabilities cov
DeepPavlov is a conversational AI framework and deep learning NLP library designed for building end-to-end dialogue systems and chatbots. It functions as an NLP pipeline orchestrator that allows users to compose pre-trained models and text processing components into sequential data flows for complex linguistic tasks. The system is distinguished by its ability to act as a chatbot deployment server, exposing trained conversational models as web services via REST and Socket APIs. It utilizes JSON-based pipeline configurations and dynamic variable interpolation to decouple model logic from infras
LiteRT-LM is a high-performance inference framework designed to execute large language models locally on mobile, desktop, and IoT hardware. It serves as an on-device model runtime that utilizes CPU, GPU, and NPU acceleration to provide low-latency processing. The framework is distinguished by its ability to process text, vision, and audio inputs through a single multi-modal inference engine. It features a local HTTP server that emulates OpenAI-compatible API endpoints and a WebGPU-based runtime for executing models directly within a web browser. To ensure output reliability, it includes a con
GHDL is a compiler and simulator for VHDL hardware descriptions. It functions as a multi-pass analysis elaborator that resolves design hierarchies and dependencies to prepare hardware descriptions for simulation or synthesis. The project transforms VHDL source code into executable binaries for high-speed digital design verification and serves as a synthesis tool that converts descriptions into structural netlists compatible with vendor or open-source flows. It also implements the Language Server Protocol to provide static analysis, autocomplete, and code navigation for VHDL files. The toolse
Bosque is an experimental programming language and development platform designed for machine-assisted software construction. It combines functional programming semantics with imperative syntax to enforce logic correctness and runtime safety, providing a type-safe environment that utilizes structured data models to maintain information integrity throughout the application lifecycle. The platform distinguishes itself through deep integration with formal verification tools, including automated theorem provers and symbolic execution engines. By transforming source code into a regularized intermed
This project is a large language model inference library and framework designed to run models for text generation, problem solving, and coding assistance. It includes a multimodal framework for processing combined image and text inputs and a tool-use implementation that enables the execution of external functions based on model reasoning. The system features a distributed GPU inference engine that spreads large model workloads across multiple graphics processors to increase processing speed and meet memory requirements. It also provides containerized model deployment through pre-packaged imag
GB Studio is a visual integrated development environment and game engine for creating 8-bit games for Game Boy hardware. It functions as a retro hardware ROM compiler that transpiles graphical logic into native Z80 assembly and binary images compatible with original handhelds and emulators. The project serves as a cross-platform build tool, generating both native hardware ROMs and web-compatible builds from a single project source. It utilizes a drag-and-drop interface for game logic and scene design, allowing for the creation of game mechanics and asset placement without writing low-level ma
gemma.cpp is a C++ inference engine for Gemma, PaliGemma, and Griffin language models, designed to run directly on-device without Python dependencies. It provides a self-contained runtime that loads quantized model weights and performs text generation on CPU or GPU, along with a model checkpoint converter that transforms PyTorch or Keras checkpoints into a compact binary format for fast loading. The engine supports multiple model architectures, including the Griffin recurrent architecture with gated linear recurrent layers and sliding-window attention for efficient long-sequence handling, as
This project is a PyTorch transformer model library and pre-trained model framework. It serves as a deep learning model hub and multimodal inference engine, providing a centralized system for loading, executing, and fine-tuning state-of-the-art model checkpoints. The library focuses on multimodal machine learning, enabling predictions across text, vision, and audio data. It provides specialized capabilities for model framework interoperability, allowing the conversion of weights and definitions between different deep learning libraries. The platform covers the full model lifecycle, including