30 open-source projects similar to caffe2/caffe2, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Caffe2 alternative.
Apache MXNet is a deep learning framework and distributed machine learning library designed for training and deploying neural networks across distributed systems, mobile devices, and hardware accelerators. It functions as a cross-platform runtime and a dynamic dataflow scheduler that optimizes neural network execution. The framework provides a multi-language API, enabling the development of machine learning models using Python, R, Julia, Scala, Go, and JavaScript. It supports high-performance model training and the scaling of workloads across multiple GPUs and machines. The system covers cap
Caffe is a high-performance deep learning framework and convolutional neural network library designed for training and deploying neural networks. It functions as a GPU-accelerated machine learning engine with a core implemented in C++ to enable high-throughput tensor operations. The project utilizes a declarative configuration system where model architectures and hyperparameters are defined in external text files, separating the network design from the execution code. It includes a model serialization system to export trained weights and topologies into binary files for efficient deployment a
This project is a comprehensive collection of educational examples and reference implementations for building vision and language models using PyTorch. It serves as a deep learning tutorial covering the end-to-end process of developing neural networks, from initial architecture definition to final production deployment. The repository provides detailed guides on implementing a wide range of domain-specific models, including convolutional neural networks for object detection and segmentation, as well as transformer and recurrent architectures for natural language processing. It emphasizes gene
TNN is a deep learning inference framework designed to execute pre-trained neural networks across mobile, desktop, and server hardware. It functions as a hardware-accelerated runtime and model compression toolkit, providing a unified interface for deploying models in diverse environments. The framework includes an ONNX model converter to transform models from various training frameworks into a standardized internal format. It distinguishes itself through a combination of model compression tools—including weight quantization and static-code pruning—and a memory management system that reuses bu
Tensorpack is a high-level TensorFlow neural network framework and research library designed for building and training deep learning models. It provides a collection of reproducible neural network architectures for computer vision, generative tasks, reinforcement learning, and natural language processing. The project distinguishes itself through a specialized deep learning data pipeline that uses pure Python for parallel data loading and streaming. It includes a multi-GPU training orchestrator for distributing workloads via data-parallel strategies and a dedicated interpretability toolkit for
MindSpore is a deep learning framework designed for building and training neural networks across cloud, edge, and mobile environments. It functions as a distributed training system and a hardware accelerated AI toolkit capable of executing workloads on CPUs, GPUs, and specialized AI processors. The project includes an automatic differentiation engine that computes gradients through source transformation and static compilation. It enables distributed model training by splitting workloads across hardware using data and model parallelism. The framework covers cross-platform AI deployment and mo
Flashlight is a standalone C++ machine learning library and tensor library used for building and training neural networks. It functions as a comprehensive neural network framework and automatic differentiation engine, providing the tools to construct computation graphs and calculate gradients via backpropagation. The project serves as a distributed training framework, utilizing all-reduce operations to synchronize gradients and parameters across multiple compute nodes and devices. It distinguishes itself through deep integration of high-performance tensor manipulation, native device memory in
jetson-inference is a set of libraries and tools for executing optimized deep learning models on embedded GPU hardware. Its primary purpose is to enable real-time computer vision and AI inference at the edge with low latency and high throughput. The project distinguishes itself through high-performance streaming analytics and the ability to execute concurrent AI pipelines on auto-grade silicon. It provides specialized support for multi-sensor stream processing, utilizing zero-copy data transport to load camera frames directly into GPU memory. The codebase covers a broad surface of capabiliti
This project is a structured learning curriculum and technical reference for mastering deep learning with TensorFlow. It provides a comprehensive guide for building, training, and deploying neural networks, combining theoretical fundamentals with practical implementation examples. The repository distinguishes itself by covering the end-to-end machine learning workflow, from low-level tensor mathematics and linear algebra to the creation of complex model architectures. It includes specific guidance on developing data pipelines for diverse data types, such as images, text, and time-series seque
This project is a comprehensive educational resource and technical documentation suite for learning and developing deep learning models. It serves as an open-source textbook, implementation manual, and framework tutorial designed to guide users through the mathematical foundations and practical application of neural networks. The resource provides detailed instructional content on building various model architectures, including convolutional and recurrent neural networks. It includes a dedicated distributed training guide and a learning path that covers the fundamentals of tensors, automatic
Leaf is a machine learning framework and neural network architecture toolkit used for building, training, and deploying models. It functions as a hardware abstraction layer, mapping high-level computational graphs to low-level instructions across various CPU and GPU backends and operating systems. The system enables the design of flexible model structures through a modular architecture where reusable container layers encapsulate weights and mathematical operations. This allows for the composition of complex neural networks via nested components. The framework includes a data engineering pipe
PowerInfer is a high-performance local large language model inference engine and sparse inference framework. It provides a runtime for executing models on consumer-grade hardware, utilizing a GPU acceleration backend to optimize tensor operations for graphics processors. The system distinguishes itself through a sparse inference framework that increases generation speed by skipping computations based on activation sparsity in model weights. It includes a GGUF model converter for transforming weights and metadata into a unified binary format, as well as an OpenAI API compatible server for inte
MXNet is a deep learning framework and distributed machine learning engine designed for training and deploying neural networks. It functions as a hardware-agnostic backend that allows for the development of deep learning models through a hybrid of symbolic and imperative programming. The system distinguishes itself through automatic distributed parallelism, which scales training workloads across multiple GPUs and machines. It features an extensible hardware backend interface that enables the integration of custom accelerators and proprietary libraries without modifying the core source code.
Flax is a deep learning framework and JAX neural network library designed for building complex machine learning models. It functions as a distributed training library and model state manager, providing a toolkit for defining flexible neural network architectures and scaling their training across multiple hardware devices. The project is characterized by a design that separates network logic from parameter values to remain compatible with pure functions. It uses hierarchical module composition to organize networks as trees of nested modules and employs a reference-based state management system
tflearn is a deep learning framework and high-level API wrapper for TensorFlow. It provides a toolkit for designing neural network architectures and a system for executing training loops and optimizing model weights across CPUs and GPUs. The project simplifies the process of building and training models through a modular interface and a high-level API for prototyping. It includes specialized utilities for deep learning visualization, allowing for the generation of graphical diagrams to analyze network structures, weights, gradients, and activations. The framework covers a broad range of capa
This repository serves as a comprehensive collection of reference implementations for the PyTorch machine learning library. It provides practical examples for building, training, and deploying deep learning models, functioning as a toolkit for developers to explore neural network architectures and training workflows. The project distinguishes itself by offering concrete demonstrations of complex machine learning operations, ranging from computer vision tasks like object detection and depth estimation to the training of large-scale transformer models. These examples illustrate how to implement
OneFlow is a deep learning framework and distributed execution engine designed for building, training, and deploying neural network architectures. It functions as a scalable neural network library that allows for the development of deep learning models and their execution across distributed hardware. The project includes a machine learning graph compiler used to optimize neural network execution graphs. This allows for the acceleration of model performance and the reduction of latency during both training and inference. The framework covers broad capability areas including large-scale model
Flux.jl is a deep learning framework and numerical computing toolkit written in Julia. It serves as a machine learning library for designing and training neural networks, providing a system for automatic differentiation to optimize model parameters. The framework enables deep learning development and machine learning research by representing layers as parameterized functions. It supports scientific machine learning, integrating neural networks into workflows for solving physical and mathematical problems. The toolkit provides native GPU acceleration for tensor computations and utilizes rever
MegEngine is a deep learning framework and automatic differentiation engine used for training and deploying neural networks. It functions as a differentiable programming library that enables the creation of mathematical models where operations are differentiable for gradient-based optimization. The project provides a hardware-agnostic tensor runtime and cross-platform model runtime, allowing models to execute across diverse CPU and GPU hardware architectures. It utilizes a dynamic computational graph engine to build execution graphs on the fly, supporting flexible input shapes and complex con
Caffe is a high-performance deep learning framework designed for training and deploying deep neural networks. It functions as a machine learning engine and a convolutional neural network library, providing a C++ backend to accelerate computations on both GPUs and CPUs. The system includes a specialized toolset for computer vision, enabling tasks such as object detection, semantic segmentation, and large-scale image retrieval. It supports the deployment of pre-trained models for image and scene recognition, as well as the ability to fine-tune neural network weights for specialized tasks. The
Stable-baselines3 is a reinforcement learning library built on the PyTorch deep learning framework. It provides a collection of reliable, standardized implementations of reinforcement learning algorithms designed for training, testing, and benchmarking agent policies in diverse simulated environments. The library functions as an agent training toolkit that emphasizes modularity and reproducibility. It features a unified environment interface and supports vectorized execution to accelerate data collection across multiple simulation instances. Users can customize neural network architectures, f
This project is a deep learning framework designed for constructing, training, and deploying neural networks across diverse hardware environments. It functions as a high-performance tensor computation library that provides both imperative and symbolic programming interfaces, allowing developers to balance flexible, step-by-step model building with the efficiency of compiled computation graphs. The framework distinguishes itself through a hybrid execution engine that integrates declarative graph compilation with imperative runtime logic. It supports scalable, distributed training across multip
This project is an educational resource and learning path for building and training neural network architectures. It provides a structured collection of instructional guides, notes, and exercises designed to help users master the fundamentals of deep learning model development and prototyping. The resource focuses on translating conceptual deep learning theory into executable code using a symbolic mathematics library. It includes specific guides and tutorials for executing neural network computations on graphics hardware to reduce model training time. The content covers the implementation of
PyTorch is a machine learning framework centered on a GPU-ready tensor library that supports multi-dimensional array operations across both CPU and accelerator hardware. It provides a foundational infrastructure for mathematical computation and dynamic neural network construction, utilizing a tape-based automatic differentiation system that allows for flexible, non-static graph execution. The framework is designed for deep integration with Python, enabling natural usage alongside standard scientific computing ecosystems. It distinguishes itself through a comprehensive distributed training sui
MNN is a high-performance inference engine and framework designed for on-device machine learning. It provides a comprehensive environment for executing, optimizing, and deploying neural network models directly on mobile and resource-constrained edge devices. The framework distinguishes itself through a robust model optimization toolkit that supports quantization, compression, and structural graph manipulation to minimize memory footprint and maximize execution speed. It features a modular architecture that abstracts hardware-specific backends, allowing models to run efficiently across diverse
This project is a character-level language modeling system that uses recurrent neural networks to predict and generate text one character at a time. It implements LSTM and GRU architectures to learn sequential patterns and probability distributions from text corpora. The system includes mechanisms for text generation sampling, allowing users to produce new sequences from trained models. It features temperature-based stochasticity to control the randomness and diversity of the generated output. The implementation covers the full model lifecycle, including training, state persistence through c
Horovod is a distributed deep learning framework and gradient synchronizer designed to scale model training across multiple GPUs and compute nodes. It functions as a distributed training orchestrator and an elastic training engine, utilizing an MPI collective communication library to synchronize weights and gradients across TensorFlow, PyTorch, Keras, and MXNet models. The system distinguishes itself through dynamic elastic scaling, which allows it to adjust the number of active workers at runtime and recover from node failures. It optimizes communication efficiency using tensor fusion batchi
dlib is a C++ machine learning toolkit and data analysis framework. It provides a collection of algorithms and utilities for building predictive modeling applications and performing statistical analysis on large datasets within native C++ environments. The project functions as a binding library that wraps low-level C++ machine learning algorithms into high-level Python scripting interfaces. This allows for the integration of high-performance native implementations with Python for machine learning development. The framework covers the implementation of predictive models, the execution of mach
PyTorch Lightning is a high-level deep learning framework for PyTorch that automates training loops and removes repetitive engineering boilerplate. It functions as a structured pipeline for managing machine learning experiments, providing a distributed training orchestrator and tools for mixed-precision training. The framework decouples scientific model architecture from the engineering required for infrastructure and scaling. This separation allows the same model code to execute across CPUs, GPUs, or TPUs through a hardware-agnostic execution engine and a centralized trainer that manages the
Deep Java Library is a Java deep learning framework and JVM model inference engine. It provides a high-level API for building and deploying deep learning models within the Java ecosystem, acting as a cross-platform runtime for executing models across CPUs, GPUs, and mobile devices. The library is engine-agnostic, allowing users to switch between different deep learning engines such as PyTorch, TensorFlow, and MXNet while maintaining a single unified API. This enables the deployment of the same model across different backends without changing the application code. The framework supports the f