30 open-source projects similar to flashlight/flashlight, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Flashlight alternative.
Flashlight is a C++ machine learning library and deep learning framework designed for building and training neural networks. It functions as a tensor manipulation library and an automatic differentiation engine that tracks operations to calculate gradients via backpropagation for model optimization. The project is distinguished by its role as a distributed training framework, utilizing all-reduce gradient synchronization and distributed environments to scale machine learning workloads across multiple nodes and devices. It features a backend-agnostic memory interface and RAII-based management
This project is a comprehensive collection of educational examples and reference implementations for building vision and language models using PyTorch. It serves as a deep learning tutorial covering the end-to-end process of developing neural networks, from initial architecture definition to final production deployment. The repository provides detailed guides on implementing a wide range of domain-specific models, including convolutional neural networks for object detection and segmentation, as well as transformer and recurrent architectures for natural language processing. It emphasizes gene
This project is a structured learning curriculum and technical reference for mastering deep learning with TensorFlow. It provides a comprehensive guide for building, training, and deploying neural networks, combining theoretical fundamentals with practical implementation examples. The repository distinguishes itself by covering the end-to-end machine learning workflow, from low-level tensor mathematics and linear algebra to the creation of complex model architectures. It includes specific guidance on developing data pipelines for diverse data types, such as images, text, and time-series seque
Tinygrad is a deep learning framework and tensor computation engine designed for building and training neural networks. It functions as a hardware abstraction layer that manages device memory, command queues, and kernel dispatching across heterogeneous computing architectures. By utilizing a lazy-evaluation approach, the framework constructs computational graphs that defer execution until data is explicitly required, allowing it to process only the necessary operations for a given result. The project distinguishes itself through a just-in-time compilation layer that transforms abstract comput
This project is an educational platform and research toolkit designed to teach deep learning through a combination of mathematical theory, visual diagrams, and executable code. It provides a comprehensive environment for building, training, and evaluating neural networks, grounding complex concepts in interactive computational notebooks that allow for hands-on experimentation. The framework distinguishes itself by interleaving theoretical foundations—including linear algebra, calculus, and probability—with practical implementations across multiple industry-standard libraries. It supports flex
Torch7 is a scientific computing environment and tensor computation library used for deep learning research and numerical analysis. It functions as a Lua-based framework for training neural networks and learning agents, providing a toolkit for implementing architectures and training through reinforcement learning algorithms. The project is distinguished by its tight integration with C, utilizing a binding layer to map high-level scripting to low-level C structures for direct memory access. It supports hardware-accelerated computation by offloading linear algebra and convolution operations to
This project is a deep learning framework designed for constructing, training, and deploying neural networks across diverse hardware environments. It functions as a high-performance tensor computation library that provides both imperative and symbolic programming interfaces, allowing developers to balance flexible, step-by-step model building with the efficiency of compiled computation graphs. The framework distinguishes itself through a hybrid execution engine that integrates declarative graph compilation with imperative runtime logic. It supports scalable, distributed training across multip
MNN is a high-performance inference engine and framework designed for on-device machine learning. It provides a comprehensive environment for executing, optimizing, and deploying neural network models directly on mobile and resource-constrained edge devices. The framework distinguishes itself through a robust model optimization toolkit that supports quantization, compression, and structural graph manipulation to minimize memory footprint and maximize execution speed. It features a modular architecture that abstracts hardware-specific backends, allowing models to run efficiently across diverse
TensorFlow-World is a collection of tutorials, implementation guides, and model templates for building and training machine learning models using the TensorFlow framework. It serves as an educational resource for designing deep learning architectures and implementing predictive models. The project provides ready-to-use examples for constructing neural network architectures and linear classifiers. It includes guides on performing tensor operations, automatic differentiation, and gradient descent optimization. The materials cover a range of machine learning capabilities, including the use of h
xtensor is a C++ multidimensional array library for numerical computing that provides N-dimensional containers with an interface mirroring the NumPy API. It utilizes a lazy evaluation expression engine to defer numerical computations until assignment, which minimizes memory allocations and intermediate copies. The library features a foreign memory array adaptor that allows it to wrap external buffers, such as NumPy arrays, to perform numerical operations in-place without duplicating data. It further optimizes performance through lazy broadcasting and a system that manages the lifetime of temp
This project is a comprehensive educational resource and curriculum focused on the design and implementation of the full machine learning software and hardware stack. It serves as a technical reference for architecting machine learning systems, spanning from low-level programming interfaces to large-scale deployment infrastructure. The project provides instructional guidance on several specialized domains, including the development of AI compilers through intermediate representations and graph optimizations. It covers the architectural patterns required for distributed training across GPU clu
TensorFlow is a comprehensive machine learning framework designed for the construction, training, and deployment of complex mathematical models. It utilizes a graph-based execution model that represents operations as directed acyclic graphs, enabling automatic differentiation and efficient parallel processing. The system provides high-level interfaces for defining neural network architectures, alongside a robust engine for managing multidimensional array structures and tensor mathematics. The framework distinguishes itself through a scalable distributed runtime that orchestrates workloads acr
Chainer is an open-source deep learning framework built around define-by-run automatic differentiation, where computation graphs are constructed dynamically during forward execution. This imperative approach allows networks to be built using standard Python control flow, with gradients computed automatically through reverse-mode differentiation on the dynamically recorded graph. The framework supports GPU acceleration through a NumPy-compatible array backend with CUDA and cuDNN support, and provides a pluggable device abstraction that lets users switch between CPU and GPU computation without c
MindSpore is a deep learning framework designed for building and training neural networks across cloud, edge, and mobile environments. It functions as a distributed training system and a hardware accelerated AI toolkit capable of executing workloads on CPUs, GPUs, and specialized AI processors. The project includes an automatic differentiation engine that computes gradients through source transformation and static compilation. It enables distributed model training by splitting workloads across hardware using data and model parallelism. The framework covers cross-platform AI deployment and mo
This project is a collection of TensorFlow 2.x machine learning tutorials and practical code examples. It serves as a deep learning implementation guide for constructing diverse neural network architectures, including convolutional, recurrent, and generative networks. The repository provides templates and examples for several specialized domains, including computer vision for image classification and object detection, natural language processing for text generation and language understanding, and generative AI for synthesizing data using adversarial networks and autoencoders. It also includes
Grokking-Deep-Learning is a collection of educational resources and courseware designed to teach the construction of neural networks from scratch. It serves as a programming tutorial and implementation guide for understanding the internal mechanics of deep learning. The project focuses on building various network architectures, including convolutional, recurrent, and long short-term memory networks. It provides step-by-step implementations of fundamental mechanisms such as forward propagation, backpropagation, and gradient descent. The material covers a broad range of deep learning capabilit
Trax is a deep learning framework and hardware-agnostic tensor engine designed for designing and training neural networks. It serves as a research tool providing high-level combinators for composing complex architectures, alongside a dedicated library for building transformer models and a toolkit for reinforcement learning. The framework is distinguished by its support for reversible and sparse transformer architectures, which reduce memory and computational overhead. It enables a single set of model instructions to execute across different hardware backends without changing the underlying co
Caffe is a high-performance deep learning framework and convolutional neural network library designed for training and deploying neural networks. It functions as a GPU-accelerated machine learning engine with a core implemented in C++ to enable high-throughput tensor operations. The project utilizes a declarative configuration system where model architectures and hyperparameters are defined in external text files, separating the network design from the execution code. It includes a model serialization system to export trained weights and topologies into binary files for efficient deployment a
This project is a machine learning array framework and tensor computation library designed for high-performance numerical computing. It provides a comprehensive suite of tools for constructing and training neural networks, featuring an automatic differentiation engine that facilitates gradient-based optimization and complex mathematical modeling. The library distinguishes itself through a unified memory architecture that allows data to be shared across CPU and GPU devices without explicit copies, significantly reducing data movement overhead. Its execution model relies on a lazy evaluation en
This project is a comprehensive educational resource and technical documentation suite for learning and developing deep learning models. It serves as an open-source textbook, implementation manual, and framework tutorial designed to guide users through the mathematical foundations and practical application of neural networks. The resource provides detailed instructional content on building various model architectures, including convolutional and recurrent neural networks. It includes a dedicated distributed training guide and a learning path that covers the fundamentals of tensors, automatic
This project is a deep learning tutorial series and educational curriculum designed to teach PyTorch fundamentals. It serves as a structured training guide for mastering neural network architecture, automatic differentiation, and the use of tensors and dynamic computation graphs. The curriculum focuses on practical implementations, specifically guiding the development of recommendation systems, advertising models, and interest networks to predict user preferences. It also provides instructional content for time series forecasting and processing sequential data. The material covers a broad ra
This project is a collection of PyTorch learning resources and educational guides designed to teach the construction and training of neural networks. It serves as a comprehensive deep learning tutorial covering various model architectures and practical implementation strategies. The resources provide specific guidance on implementing computer vision tasks, such as image classification and synthetic imagery generation, as well as reinforcement learning agents using value networks and experience replay. It also covers sequential data modeling through recurrent networks and generative modeling u
Monolith is a distributed recommendation model framework and asynchronous training engine designed to build and train large-scale deep learning architectures. It functions as a distributed model trainer that processes massive datasets across multiple compute nodes using asynchronous update mechanisms. The system features a dedicated embedding table manager that creates unique, feature-isolated tables to prevent representation collisions. It also includes a real-time weight updater to capture immediate changes in user interest and data hotspots through continuous parameter synchronization. Th
Sonnet is a modular machine learning framework and TensorFlow neural network library designed for building composable deep learning architectures. It functions as a model orchestrator that manages parameters, state serialization, and graph exports during the training process. The framework provides a distributed training system to synchronize gradients and spread workloads across multiple GPUs or hardware devices. It enables the design of reusable research components through high-level abstractions and subclassing. The library covers neural network architecture design through sequential laye
This project is a Rust interface for the PyTorch C++ library, serving as a deep learning framework and tensor computing library. It functions as a C++ API wrapper that enables the manipulation of multi-dimensional arrays and the execution of neural network architectures across CPU and GPU hardware accelerators. The library provides a TorchScript inference engine to load and execute just-in-time compiled models. It also supports Rust and Python interoperability, allowing for the creation of Python extensions that share tensor data through a common interface. The system covers deep learning mo
This project is a comprehensive machine learning educational resource and tutorial series delivered as a collection of interactive Jupyter Notebooks. It provides practical Python implementations for the end-to-end machine learning lifecycle, covering supervised and unsupervised learning, deep learning, and reinforcement learning. The resource distinguishes itself by providing detailed implementation guides for complex architectures, including transformers, generative adversarial networks, and convolutional neural networks. It also features specialized courseware for developing reinforcement l
RLinf is a distributed reinforcement learning orchestrator and embodied AI training framework. It provides the infrastructure to train vision-language-action models and robotic policies using a combination of reinforcement learning and supervised fine-tuning. The system is designed for scaling workloads across GPU clusters, managing the placement of actors, rollout workers, and environment components. It features a specialized robotics data collection pipeline for gathering teleoperated demonstrations and simulation trajectories into standardized replay buffers, alongside a hardware interface
This project provides a comprehensive technical guide and framework for engineering large-scale machine learning systems. It covers the full lifecycle of model development, focusing on the infrastructure and computational principles required to build, train, and serve generative AI models across distributed GPU clusters. The repository distinguishes itself by offering deep-dive tutorials and implementation strategies for complex system challenges. It emphasizes high-performance architectural primitives, such as collective communication orchestration, distributed tensor sharding, and static gr
TransformerLens is a library for mechanistic interpretability research designed to reverse engineer the learned algorithms within large language models. It provides a standardized framework for wrapping diverse transformer architectures, allowing researchers to extract, manipulate, and analyze internal activations and weights through a consistent interface. The project distinguishes itself through a comprehensive system of activation hooks that can capture, patch, and ablate internal tensors during the forward pass. It includes specialized utilities for decomposing fused projections, material
This project is a PyTorch project boilerplate and training framework designed to standardize the development of deep learning experiments. It provides a structured directory layout and a set of base classes to bootstrap new projects, ensuring a consistent workflow from data pipeline construction to model execution. The framework distinguishes itself through a centralized configuration manager for hyperparameters that supports command line overrides and a hardware acceleration layer for distributing computational tasks across multiple graphics processing units. It also implements a base-class