30 open-source projects similar to megengine/megengine, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best MegEngine alternative.
MindSpore is a deep learning framework designed for building and training neural networks across cloud, edge, and mobile environments. It functions as a distributed training system and a hardware accelerated AI toolkit capable of executing workloads on CPUs, GPUs, and specialized AI processors. The project includes an automatic differentiation engine that computes gradients through source transformation and static compilation. It enables distributed model training by splitting workloads across hardware using data and model parallelism. The framework covers cross-platform AI deployment and mo
Apache MXNet is a deep learning framework and distributed machine learning library designed for training and deploying neural networks across distributed systems, mobile devices, and hardware accelerators. It functions as a cross-platform runtime and a dynamic dataflow scheduler that optimizes neural network execution. The framework provides a multi-language API, enabling the development of machine learning models using Python, R, Julia, Scala, Go, and JavaScript. It supports high-performance model training and the scaling of workloads across multiple GPUs and machines. The system covers cap
This repository serves as a comprehensive collection of reference implementations for the PyTorch machine learning library. It provides practical examples for building, training, and deploying deep learning models, functioning as a toolkit for developers to explore neural network architectures and training workflows. The project distinguishes itself by offering concrete demonstrations of complex machine learning operations, ranging from computer vision tasks like object detection and depth estimation to the training of large-scale transformer models. These examples illustrate how to implement
Chainer is an open-source deep learning framework built around define-by-run automatic differentiation, where computation graphs are constructed dynamically during forward execution. This imperative approach allows networks to be built using standard Python control flow, with gradients computed automatically through reverse-mode differentiation on the dynamically recorded graph. The framework supports GPU acceleration through a NumPy-compatible array backend with CUDA and cuDNN support, and provides a pluggable device abstraction that lets users switch between CPU and GPU computation without c
Tinygrad is a deep learning framework and tensor computation engine designed for building and training neural networks. It functions as a hardware abstraction layer that manages device memory, command queues, and kernel dispatching across heterogeneous computing architectures. By utilizing a lazy-evaluation approach, the framework constructs computational graphs that defer execution until data is explicitly required, allowing it to process only the necessary operations for a given result. The project distinguishes itself through a just-in-time compilation layer that transforms abstract comput
PyTorch Lightning is a high-level deep learning framework for PyTorch that automates training loops and removes repetitive engineering boilerplate. It functions as a structured pipeline for managing machine learning experiments, providing a distributed training orchestrator and tools for mixed-precision training. The framework decouples scientific model architecture from the engineering required for infrastructure and scaling. This separation allows the same model code to execute across CPUs, GPUs, or TPUs through a hardware-agnostic execution engine and a centralized trainer that manages the
This project is a comprehensive instructional resource and course for building neural networks using PyTorch. It covers the fundamental building blocks of deep learning, including tensor manipulation, automatic differentiation, and the construction of modular neural network components. The repository serves as a technical guide for several specialized domains. It provides implementation details for computer vision tasks such as image classification, object detection, and semantic segmentation, as well as natural language processing workflows involving transformers, recurrent networks, and gen
Deep Java Library is a Java deep learning framework and JVM model inference engine. It provides a high-level API for building and deploying deep learning models within the Java ecosystem, acting as a cross-platform runtime for executing models across CPUs, GPUs, and mobile devices. The library is engine-agnostic, allowing users to switch between different deep learning engines such as PyTorch, TensorFlow, and MXNet while maintaining a single unified API. This enables the deployment of the same model across different backends without changing the application code. The framework supports the f
tflearn is a deep learning framework and high-level API wrapper for TensorFlow. It provides a toolkit for designing neural network architectures and a system for executing training loops and optimizing model weights across CPUs and GPUs. The project simplifies the process of building and training models through a modular interface and a high-level API for prototyping. It includes specialized utilities for deep learning visualization, allowing for the generation of graphical diagrams to analyze network structures, weights, gradients, and activations. The framework covers a broad range of capa
Flax is a deep learning framework and JAX neural network library designed for building complex machine learning models. It functions as a distributed training library and model state manager, providing a toolkit for defining flexible neural network architectures and scaling their training across multiple hardware devices. The project is characterized by a design that separates network logic from parameter values to remain compatible with pure functions. It uses hierarchical module composition to organize networks as trees of nested modules and employs a reference-based state management system
This project is a deep learning tutorial series and educational curriculum designed to teach PyTorch fundamentals. It serves as a structured training guide for mastering neural network architecture, automatic differentiation, and the use of tensors and dynamic computation graphs. The curriculum focuses on practical implementations, specifically guiding the development of recommendation systems, advertising models, and interest networks to predict user preferences. It also provides instructional content for time series forecasting and processing sequential data. The material covers a broad ra
oneDNN is a library for deep learning acceleration that provides optimized building blocks for neural network training and inference. It manages tensor computation across CPU and GPU hardware, enabling the execution of high-performance primitives for model training and neural network inference optimization. The project distinguishes itself through hardware-specific kernel optimization and the use of just-in-time compilation to target specific processor instruction sets. It supports quantized neural network execution using both static and dynamic quantization to reduce memory usage and increas
The PyTorch Tutorials repository is a collection of educational resources that provides step-by-step guidance on building, training, and deploying neural networks using the PyTorch framework. It covers the complete machine learning workflow, from data loading and model definition through optimization loops and model persistence, with dedicated guides for distributed training, model fine-tuning, and deployment. The tutorials offer practical demonstrations of adapting pre-trained models to new tasks through transfer learning, scaling training across multiple GPUs or machines using PyTorch's dis
Flux.jl is a deep learning framework and numerical computing toolkit written in Julia. It serves as a machine learning library for designing and training neural networks, providing a system for automatic differentiation to optimize model parameters. The framework enables deep learning development and machine learning research by representing layers as parameterized functions. It supports scientific machine learning, integrating neural networks into workflows for solving physical and mathematical problems. The toolkit provides native GPU acceleration for tensor computations and utilizes rever
Caffe is a high-performance deep learning framework designed for training and deploying deep neural networks. It functions as a machine learning engine and a convolutional neural network library, providing a C++ backend to accelerate computations on both GPUs and CPUs. The system includes a specialized toolset for computer vision, enabling tasks such as object detection, semantic segmentation, and large-scale image retrieval. It supports the deployment of pre-trained models for image and scene recognition, as well as the ability to fine-tune neural network weights for specialized tasks. The
Caffe2 is a high-performance deep learning framework and C++ machine learning library. It serves as a modular system for designing, training, and executing scalable neural networks. The project functions as an inference engine and a scalable neural network engine designed to run models across distributed systems and diverse hardware. Its architecture allows for the construction of custom neural network components that can be scaled from research to production environments. The framework covers the full lifecycle of deep learning development, including modular network architecture design, mod
Gorgonia is a Go library that provides an automatic differentiation engine and a computation graph framework for building and training neural networks. It functions as a CUDA-accelerated tensor library and a SIMD-optimized math library, enabling machine learning workflows entirely within the Go ecosystem. The library distinguishes itself through a dual-backend architecture that dispatches neural network operations to either a GPU or CPU depending on CUDA availability at runtime. It constructs differentiable directed acyclic graphs of tensor operations, supports reverse-mode automatic gradient
PyTorch Lightning is a deep learning research framework that provides a structured environment for organizing machine learning code. It functions as a unified trainer orchestrator, centralizing the execution flow by managing the interaction between hardware resources, data loaders, and model components. By decoupling model architecture from training logic, the framework enables researchers to maintain clean, modular codebases that remain portable across different environments. The framework distinguishes itself through a hardware-agnostic abstraction layer that scales deep learning workloads
tiny-dnn is a header-only C++14 deep learning framework for building, training, and running inference on neural networks. It constructs static computational graphs at compile time using template-based layer composition, with a gradient-based backpropagation engine and minibatch stochastic gradient descent for training, all without external dependencies beyond the C++14 standard library. The framework supports importing pre-trained models from the Caffe framework directly, parsing its binary serialization format without requiring external protocol buffer libraries. It provides CPU-optimized te
PlaidML is a deep learning compiler framework and cross-platform runtime designed to execute machine learning models on a wide variety of hardware targets. It functions as a hardware agnostic tensor engine that translates tensor models into executable code, allowing deep learning networks to run across different compute devices without requiring specific driver dependencies. The system enables the execution of models on custom or limited hardware by using JSON specifications to define device hardware. It employs a domain-specific language to describe tensor computations and provides a middle
Stable-baselines3 is a reinforcement learning library built on the PyTorch deep learning framework. It provides a collection of reliable, standardized implementations of reinforcement learning algorithms designed for training, testing, and benchmarking agent policies in diverse simulated environments. The library functions as an agent training toolkit that emphasizes modularity and reproducibility. It features a unified environment interface and supports vectorized execution to accelerate data collection across multiple simulation instances. Users can customize neural network architectures, f
This project serves as a comprehensive educational resource and technical guide for mastering deep learning through the PyTorch framework. It provides structured tutorials and practical code examples designed to teach core machine learning principles, ranging from fundamental tensor operations to the construction of complex neural network architectures. The repository distinguishes itself by bridging the gap between theoretical concepts and hands-on implementation. It covers the development of generative applications, such as image synthesis and style transfer, while offering guidance on opti
llm-d is a distributed serving framework designed for large language model inference. It functions as an inference orchestrator and gateway, providing a control plane for deploying model replicas and managing hardware accelerators. The system includes a batch inference scheduler and a cache manager to coordinate request flow and memory utilization. The project is distinguished by a disaggregated serving architecture that separates prefill and decode execution phases across specialized workers to maximize throughput. It employs a hardware-agnostic control plane and tiered cache offloading, mov
DGL is a Python library for building and training graph neural networks. It functions as a graph message passing framework and a geometric deep learning tool, enabling the development of models that analyze graph-structured data. The library is designed for large-scale graph processing, utilizing distributed training and neighbor sampling to handle datasets with billions of edges. It provides specialized support for heterogeneous graph modeling, allowing for the representation of complex real-world entities with multiple node and edge types. Its capabilities cover a wide range of graph tasks
Deeplearning4j is a JVM-based deep learning framework and tensor computing library. It provides a computational graph engine for defining and executing deep learning workflows and mathematical operations within the Java Virtual Machine. The project includes a dedicated importer for loading and running pretrained models exported from Keras, TensorFlow, and ONNX formats. Its tensor computing capabilities are driven by a modular native C++ math core to execute high-performance linear algebra operations. The framework covers neural network training, deep learning model inference, and the constru
This project is a deep learning educational resource consisting of PyTorch model implementations and code examples. It provides functional Python scripts and notebooks for building, training, and optimizing neural networks using tensor-based computation. The repository includes implementations for designing custom network layers and loss functions, as well as examples of transfer learning workflows that load pretrained model weights to accelerate development. The codebase covers a broad range of deep learning capabilities, including neural network training, custom model component design, and
This project is an open source deep learning textbook and educational resource. It provides a structured curriculum of theory and practical examples designed for mastering the training of regression, classification, and generative models using the TensorFlow framework. The repository functions as a machine learning code collection, utilizing interactive notebooks and source code to demonstrate neural network implementation and tensor operations. It covers the development of deep learning models and the study of reinforcement learning. The material employs a case-study driven pedagogy, combin
Skorch is a deep learning workflow manager and tensor-based model interface. It provides a consistent API for training and predicting with neural networks within standard machine learning workflows, acting as a hyperparameter optimizer for finding optimal network configurations. The library specializes in wrapping PyTorch neural networks in a scikit-learn compatible interface. This allows tensor-based models to be used within traditional machine learning pipelines and grid search tools, including the mapping of parameter grids to model configurations. The framework covers training lifecycle
oneDNN is a deep learning primitive library and hardware acceleration framework designed to optimize neural network operations. It serves as an inference engine that accelerates the training and execution of computational graphs using optimized primitives for convolutions and matrix multiplications, following the oneAPI standard for cross-architecture performance. The project enables cross-architecture AI deployment by tuning workloads for specific CPU and GPU microarchitectures across different hardware vendors. It integrates with hardware runtimes and system drivers to share execution conte