30 open-source projects similar to apache/mxnet, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Mxnet alternative.
This project provides a comprehensive technical guide and framework for engineering large-scale machine learning systems. It covers the full lifecycle of model development, focusing on the infrastructure and computational principles required to build, train, and serve generative AI models across distributed GPU clusters. The repository distinguishes itself by offering deep-dive tutorials and implementation strategies for complex system challenges. It emphasizes high-performance architectural primitives, such as collective communication orchestration, distributed tensor sharding, and static gr
This project is an educational platform and research toolkit designed to teach deep learning through a combination of mathematical theory, visual diagrams, and executable code. It provides a comprehensive environment for building, training, and evaluating neural networks, grounding complex concepts in interactive computational notebooks that allow for hands-on experimentation. The framework distinguishes itself by interleaving theoretical foundations—including linear algebra, calculus, and probability—with practical implementations across multiple industry-standard libraries. It supports flex
This project is a structured learning curriculum and technical reference for mastering deep learning with TensorFlow. It provides a comprehensive guide for building, training, and deploying neural networks, combining theoretical fundamentals with practical implementation examples. The repository distinguishes itself by covering the end-to-end machine learning workflow, from low-level tensor mathematics and linear algebra to the creation of complex model architectures. It includes specific guidance on developing data pipelines for diverse data types, such as images, text, and time-series seque
Paddle is a deep learning framework designed for building, training, and deploying neural networks. It provides a platform for constructing models using tensor-based computations and supports both dynamic and static execution graphs to facilitate research and production workflows. The platform functions as a distributed machine learning system, enabling the scaling of training workloads across multiple nodes and hardware clusters. It includes a comprehensive toolkit for model deployment and optimization, allowing users to convert external model formats, compress trained models for resource-co
MNN is a high-performance inference engine and framework designed for on-device machine learning. It provides a comprehensive environment for executing, optimizing, and deploying neural network models directly on mobile and resource-constrained edge devices. The framework distinguishes itself through a robust model optimization toolkit that supports quantization, compression, and structural graph manipulation to minimize memory footprint and maximize execution speed. It features a modular architecture that abstracts hardware-specific backends, allowing models to run efficiently across diverse
This project is a machine learning array framework and tensor computation library designed for high-performance numerical computing. It provides a comprehensive suite of tools for constructing and training neural networks, featuring an automatic differentiation engine that facilitates gradient-based optimization and complex mathematical modeling. The library distinguishes itself through a unified memory architecture that allows data to be shared across CPU and GPU devices without explicit copies, significantly reducing data movement overhead. Its execution model relies on a lazy evaluation en
Flashlight is a C++ machine learning library and deep learning framework designed for building and training neural networks. It functions as a tensor manipulation library and an automatic differentiation engine that tracks operations to calculate gradients via backpropagation for model optimization. The project is distinguished by its role as a distributed training framework, utilizing all-reduce gradient synchronization and distributed environments to scale machine learning workloads across multiple nodes and devices. It features a backend-agnostic memory interface and RAII-based management
This project is a comprehensive collection of educational examples and reference implementations for building vision and language models using PyTorch. It serves as a deep learning tutorial covering the end-to-end process of developing neural networks, from initial architecture definition to final production deployment. The repository provides detailed guides on implementing a wide range of domain-specific models, including convolutional neural networks for object detection and segmentation, as well as transformer and recurrent architectures for natural language processing. It emphasizes gene
Flashlight is a standalone C++ machine learning library and tensor library used for building and training neural networks. It functions as a comprehensive neural network framework and automatic differentiation engine, providing the tools to construct computation graphs and calculate gradients via backpropagation. The project serves as a distributed training framework, utilizing all-reduce operations to synchronize gradients and parameters across multiple compute nodes and devices. It distinguishes itself through deep integration of high-performance tensor manipulation, native device memory in
This project is a comprehensive educational resource and technical documentation suite for learning and developing deep learning models. It serves as an open-source textbook, implementation manual, and framework tutorial designed to guide users through the mathematical foundations and practical application of neural networks. The resource provides detailed instructional content on building various model architectures, including convolutional and recurrent neural networks. It includes a dedicated distributed training guide and a learning path that covers the fundamentals of tensors, automatic
AutoGluon is an automated machine learning framework and multimodal library designed to automate the end-to-end pipeline from data preprocessing to high-accuracy model training and validation. It functions as an automated model trainer for tabular, image, text, and time series data, as well as a tool for time series forecasting and foundation model finetuning. The project is distinguished by its ability to jointly process and fuse different data types, allowing for the construction of multimodal neural networks that integrate images, text, and structured tables. It supports zero-shot inferenc
Deeplearning4j is a JVM-based deep learning framework and tensor computing library. It provides a computational graph engine for defining and executing deep learning workflows and mathematical operations within the Java Virtual Machine. The project includes a dedicated importer for loading and running pretrained models exported from Keras, TensorFlow, and ONNX formats. Its tensor computing capabilities are driven by a modular native C++ math core to execute high-performance linear algebra operations. The framework covers neural network training, deep learning model inference, and the constru
This project is a Rust interface for the PyTorch C++ library, serving as a deep learning framework and tensor computing library. It functions as a C++ API wrapper that enables the manipulation of multi-dimensional arrays and the execution of neural network architectures across CPU and GPU hardware accelerators. The library provides a TorchScript inference engine to load and execute just-in-time compiled models. It also supports Rust and Python interoperability, allowing for the creation of Python extensions that share tensor data through a common interface. The system covers deep learning mo
AISystem is a comprehensive AI full-stack infrastructure project covering the entire pipeline from AI chip architecture to high-level training frameworks. It encompasses the development of AI compiler frameworks, inference engines, and distributed training orchestrators designed to coordinate workloads across a heterogeneous compute stack of CPUs, GPUs, and NPUs. The project focuses on the deep integration of software and hardware, employing software-hardware co-design to align tensor layouts with physical memory structures. It provides specialized capabilities for accelerating Transformer mo
This project provides Rust bindings for the TensorFlow C API, serving as a tensor computation interface and machine learning library. It enables the construction and execution of machine learning models and neural networks by bridging a systems language to high-performance backends. The framework supports GPU-accelerated computing to increase the speed of model training and inference by offloading mathematical operations to graphics processing units. It offers both graph-based computation for defining static network architectures and an eager execution mode for immediate operation calls durin
This is an interactive notebook-based course that teaches machine learning from Python fundamentals through deep learning and natural language processing. It uses real datasets and multiple frameworks within a structured, hands-on curriculum that combines concise explanations with executable code cells, built-in datasets, and embedded exercise checkpoints. Learning progresses through data preparation and exploration, classical machine learning workflows, computer vision with convolutional neural networks, and natural language processing with deep learning, all delivered as a cohesive progressi
mmagic is a multimodal training pipeline and framework for generative AI, focusing on visual synthesis and restoration. It provides the infrastructure to build and train models for tasks such as text-to-image and text-to-video generation, 3D-aware content synthesis, and high-fidelity image translation using diffusion models and generative adversarial networks. The project distinguishes itself through specialized capabilities for generative model personalization, including techniques for fine-tuning subjects and styles. It also supports advanced visual manipulations such as latent space interp
Fastai is a high-level deep learning library built on PyTorch that provides a unified interface for managing the entire machine learning lifecycle. It functions as a comprehensive training toolkit, abstracting hardware management and automating complex training loops to simplify the construction and execution of neural network models. The framework is distinguished by its notebook-centric development environment and a type-dispatching data pipeline that automatically applies transformations based on input data formats. It emphasizes transfer learning through discriminative layer-wise optimiza
Monolith is a distributed recommendation model framework and asynchronous training engine designed to build and train large-scale deep learning architectures. It functions as a distributed model trainer that processes massive datasets across multiple compute nodes using asynchronous update mechanisms. The system features a dedicated embedding table manager that creates unique, feature-isolated tables to prevent representation collisions. It also includes a real-time weight updater to capture immediate changes in user interest and data hotspots through continuous parameter synchronization. Th
Composer is a PyTorch distributed training framework designed for scaling large-scale models across multi-node GPU clusters. It functions as a large language model trainer, a distributed model optimizer, and a training lifecycle manager. The project differentiates itself as a deep learning regularization library, providing specialized optimization techniques such as Sharpness Aware Minimization, MixUp, and CutMix to improve model generalization. It further distinguishes its training flow through the use of sequence length warmup, progressive layer freezing, and sharded-state checkpointing for
PaddleDetection is an object detection framework designed for the end-to-end development, training, and deployment of computer vision models. It provides a comprehensive library of modular neural network architectures and pipelines that support object detection, instance segmentation, and multi-object tracking tasks. The project distinguishes itself through a configuration-driven approach that decouples model components like backbones and heads, allowing for the flexible assembly of custom vision workflows. It incorporates advanced techniques such as anchor-free detection logic, joint detecti
This project provides a collection of practical machine learning code examples, including implementations for supervised, unsupervised, and reinforcement learning algorithms. It features deep learning model implementations for convolutional, recurrent, and generative architectures, alongside specific examples of reinforcement learning agents that maximize rewards in simulated environments. The repository includes dedicated data preprocessing pipelines for sanitization, feature scaling, and dimensionality reduction. It also provides implementations for a wide range of specific models, such as
This project is a distributed training infrastructure designed for aligning large language models through reinforcement learning. It functions as an end-to-end engine for complex alignment tasks, including proximal policy optimization, direct preference optimization, and iterative self-play. By providing a unified framework for multi-turn interactions and tool-use scenarios, it enables the development of models capable of reasoning and external environment engagement. The framework distinguishes itself through a decoupled architecture that separates model training from sample generation. This
This project is a comprehensive instructional resource and course for building neural networks using PyTorch. It covers the fundamental building blocks of deep learning, including tensor manipulation, automatic differentiation, and the construction of modular neural network components. The repository serves as a technical guide for several specialized domains. It provides implementation details for computer vision tasks such as image classification, object detection, and semantic segmentation, as well as natural language processing workflows involving transformers, recurrent networks, and gen
Sglang is a high-performance inference engine and serving system designed for large language and multimodal models. It provides a programmable interface for orchestrating complex generation workflows, enabling developers to coordinate multi-turn dialogues, tool invocations, and reasoning chains through a domain-specific language. The platform is built to support production-scale deployments, offering an OpenAI-compatible API that allows for integration with existing application ecosystems. The system distinguishes itself through a disaggregated architecture that separates compute-intensive pr
This repository serves as a comprehensive collection of reference implementations for the PyTorch machine learning library. It provides practical examples for building, training, and deploying deep learning models, functioning as a toolkit for developers to explore neural network architectures and training workflows. The project distinguishes itself by offering concrete demonstrations of complex machine learning operations, ranging from computer vision tasks like object detection and depth estimation to the training of large-scale transformer models. These examples illustrate how to implement
This project is a collection of TensorFlow 2.x machine learning tutorials and practical code examples. It serves as a deep learning implementation guide for constructing diverse neural network architectures, including convolutional, recurrent, and generative networks. The repository provides templates and examples for several specialized domains, including computer vision for image classification and object detection, natural language processing for text generation and language understanding, and generative AI for synthesizing data using adversarial networks and autoencoders. It also includes
ncnn is a high-performance neural network inference framework designed for executing deep learning models locally on mobile and desktop hardware. It functions as a specialized engine that enables the deployment of artificial intelligence tasks directly on resource-constrained devices, eliminating the need for external network connectivity or cloud-based processing services. The framework provides a comprehensive toolset for model optimization, allowing users to convert and quantize machine learning models into specialized binary structures. By utilizing static model graph compilation and zero
jetson-inference is a set of libraries and tools for executing optimized deep learning models on embedded GPU hardware. Its primary purpose is to enable real-time computer vision and AI inference at the edge with low latency and high throughput. The project distinguishes itself through high-performance streaming analytics and the ability to execute concurrent AI pipelines on auto-grade silicon. It provides specialized support for multi-sensor stream processing, utilizing zero-copy data transport to load camera frames directly into GPU memory. The codebase covers a broad surface of capabiliti
Axolotl is a configuration-driven framework designed for the fine-tuning, evaluation, and quantization of large language models. It functions as a comprehensive orchestrator for distributed training, enabling users to manage complex workflows across multi-node and multi-GPU environments. By utilizing structured configuration files, the platform streamlines the setup of training parameters, dataset paths, and hardware distribution strategies. The project distinguishes itself through its support for diverse training methodologies, including full-parameter tuning, parameter-efficient adaptation,