30 open-source projects similar to facebookresearch/flashlight, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Flashlight alternative.
Flashlight is a standalone C++ machine learning library and tensor library used for building and training neural networks. It functions as a comprehensive neural network framework and automatic differentiation engine, providing the tools to construct computation graphs and calculate gradients via backpropagation. The project serves as a distributed training framework, utilizing all-reduce operations to synchronize gradients and parameters across multiple compute nodes and devices. It distinguishes itself through deep integration of high-performance tensor manipulation, native device memory in
This project is a comprehensive collection of educational examples and reference implementations for building vision and language models using PyTorch. It serves as a deep learning tutorial covering the end-to-end process of developing neural networks, from initial architecture definition to final production deployment. The repository provides detailed guides on implementing a wide range of domain-specific models, including convolutional neural networks for object detection and segmentation, as well as transformer and recurrent architectures for natural language processing. It emphasizes gene
This project is a structured learning curriculum and technical reference for mastering deep learning with TensorFlow. It provides a comprehensive guide for building, training, and deploying neural networks, combining theoretical fundamentals with practical implementation examples. The repository distinguishes itself by covering the end-to-end machine learning workflow, from low-level tensor mathematics and linear algebra to the creation of complex model architectures. It includes specific guidance on developing data pipelines for diverse data types, such as images, text, and time-series seque
This project is an educational platform and research toolkit designed to teach deep learning through a combination of mathematical theory, visual diagrams, and executable code. It provides a comprehensive environment for building, training, and evaluating neural networks, grounding complex concepts in interactive computational notebooks that allow for hands-on experimentation. The framework distinguishes itself by interleaving theoretical foundations—including linear algebra, calculus, and probability—with practical implementations across multiple industry-standard libraries. It supports flex
MNN is a high-performance inference engine and framework designed for on-device machine learning. It provides a comprehensive environment for executing, optimizing, and deploying neural network models directly on mobile and resource-constrained edge devices. The framework distinguishes itself through a robust model optimization toolkit that supports quantization, compression, and structural graph manipulation to minimize memory footprint and maximize execution speed. It features a modular architecture that abstracts hardware-specific backends, allowing models to run efficiently across diverse
xtensor is a C++ multidimensional array library for numerical computing that provides N-dimensional containers with an interface mirroring the NumPy API. It utilizes a lazy evaluation expression engine to defer numerical computations until assignment, which minimizes memory allocations and intermediate copies. The library features a foreign memory array adaptor that allows it to wrap external buffers, such as NumPy arrays, to perform numerical operations in-place without duplicating data. It further optimizes performance through lazy broadcasting and a system that manages the lifetime of temp
Torch7 is a scientific computing environment and tensor computation library used for deep learning research and numerical analysis. It functions as a Lua-based framework for training neural networks and learning agents, providing a toolkit for implementing architectures and training through reinforcement learning algorithms. The project is distinguished by its tight integration with C, utilizing a binding layer to map high-level scripting to low-level C structures for direct memory access. It supports hardware-accelerated computation by offloading linear algebra and convolution operations to
Tinygrad is a deep learning framework and tensor computation engine designed for building and training neural networks. It functions as a hardware abstraction layer that manages device memory, command queues, and kernel dispatching across heterogeneous computing architectures. By utilizing a lazy-evaluation approach, the framework constructs computational graphs that defer execution until data is explicitly required, allowing it to process only the necessary operations for a given result. The project distinguishes itself through a just-in-time compilation layer that transforms abstract comput
This project is a deep learning framework designed for constructing, training, and deploying neural networks across diverse hardware environments. It functions as a high-performance tensor computation library that provides both imperative and symbolic programming interfaces, allowing developers to balance flexible, step-by-step model building with the efficiency of compiled computation graphs. The framework distinguishes itself through a hybrid execution engine that integrates declarative graph compilation with imperative runtime logic. It supports scalable, distributed training across multip
This project is a collection of PyTorch learning resources and educational guides designed to teach the construction and training of neural networks. It serves as a comprehensive deep learning tutorial covering various model architectures and practical implementation strategies. The resources provide specific guidance on implementing computer vision tasks, such as image classification and synthetic imagery generation, as well as reinforcement learning agents using value networks and experience replay. It also covers sequential data modeling through recurrent networks and generative modeling u
This project provides a comprehensive technical guide and framework for engineering large-scale machine learning systems. It covers the full lifecycle of model development, focusing on the infrastructure and computational principles required to build, train, and serve generative AI models across distributed GPU clusters. The repository distinguishes itself by offering deep-dive tutorials and implementation strategies for complex system challenges. It emphasizes high-performance architectural primitives, such as collective communication orchestration, distributed tensor sharding, and static gr
TransformerLens is a library for mechanistic interpretability research designed to reverse engineer the learned algorithms within large language models. It provides a standardized framework for wrapping diverse transformer architectures, allowing researchers to extract, manipulate, and analyze internal activations and weights through a consistent interface. The project distinguishes itself through a comprehensive system of activation hooks that can capture, patch, and ablate internal tensors during the forward pass. It includes specialized utilities for decomposing fused projections, material
Monolith is a distributed recommendation model framework and asynchronous training engine designed to build and train large-scale deep learning architectures. It functions as a distributed model trainer that processes massive datasets across multiple compute nodes using asynchronous update mechanisms. The system features a dedicated embedding table manager that creates unique, feature-isolated tables to prevent representation collisions. It also includes a real-time weight updater to capture immediate changes in user interest and data hotspots through continuous parameter synchronization. Th
This project is a collection of structured study notes and notebooks serving as an educational resource for deep learning and neural network fundamentals. It provides a technical reference for implementing machine learning theory, covering everything from basic network design to the construction of advanced architectures. The material specifically focuses on the implementation of convolutional neural networks for computer vision and sequence models for natural language processing. It includes detailed guidance on building object detection systems, face recognition, and speech transcription mo
This project is a Rust interface for the PyTorch C++ library, serving as a deep learning framework and tensor computing library. It functions as a C++ API wrapper that enables the manipulation of multi-dimensional arrays and the execution of neural network architectures across CPU and GPU hardware accelerators. The library provides a TorchScript inference engine to load and execute just-in-time compiled models. It also supports Rust and Python interoperability, allowing for the creation of Python extensions that share tensor data through a common interface. The system covers deep learning mo
This project is a comprehensive educational resource and technical documentation suite for learning and developing deep learning models. It serves as an open-source textbook, implementation manual, and framework tutorial designed to guide users through the mathematical foundations and practical application of neural networks. The resource provides detailed instructional content on building various model architectures, including convolutional and recurrent neural networks. It includes a dedicated distributed training guide and a learning path that covers the fundamentals of tensors, automatic
This project is a machine learning array framework and tensor computation library designed for high-performance numerical computing. It provides a comprehensive suite of tools for constructing and training neural networks, featuring an automatic differentiation engine that facilitates gradient-based optimization and complex mathematical modeling. The library distinguishes itself through a unified memory architecture that allows data to be shared across CPU and GPU devices without explicit copies, significantly reducing data movement overhead. Its execution model relies on a lazy evaluation en
AISystem is a comprehensive AI full-stack infrastructure project covering the entire pipeline from AI chip architecture to high-level training frameworks. It encompasses the development of AI compiler frameworks, inference engines, and distributed training orchestrators designed to coordinate workloads across a heterogeneous compute stack of CPUs, GPUs, and NPUs. The project focuses on the deep integration of software and hardware, employing software-hardware co-design to align tensor layouts with physical memory structures. It provides specialized capabilities for accelerating Transformer mo
This project is a low-dependency engine designed for training large language models using native C and CUDA. It provides a bare-metal environment for tensor computation, allowing for the execution of neural network operations directly on hardware accelerators without the overhead of high-level software abstractions. The framework distinguishes itself by implementing manual gradient backpropagation and custom hardware-specific kernels, providing granular control over memory mapping and computational precision. It supports distributed training across multiple graphics processors and compute nod
This is an interactive notebook-based course that teaches machine learning from Python fundamentals through deep learning and natural language processing. It uses real datasets and multiple frameworks within a structured, hands-on curriculum that combines concise explanations with executable code cells, built-in datasets, and embedded exercise checkpoints. Learning progresses through data preparation and exploration, classical machine learning workflows, computer vision with convolutional neural networks, and natural language processing with deep learning, all delivered as a cohesive progressi
This project is a comprehensive machine learning educational resource and tutorial series delivered as a collection of interactive Jupyter Notebooks. It provides practical Python implementations for the end-to-end machine learning lifecycle, covering supervised and unsupervised learning, deep learning, and reinforcement learning. The resource distinguishes itself by providing detailed implementation guides for complex architectures, including transformers, generative adversarial networks, and convolutional neural networks. It also features specialized courseware for developing reinforcement l
This project is a comprehensive Chinese translation of a technical deep learning textbook, providing an educational resource on the theory and implementation of neural networks. It functions as a collaborative technical translation project designed to make complex academic AI literature accessible to non-English speakers. The project utilizes a community-driven translation model that integrates external suggestions and pull requests to refine linguistic accuracy and reduce bias. It employs standardized terminology mapping to ensure a uniform vocabulary throughout the translated content. To i
Deeplearnjs is a JavaScript deep learning framework and automatic differentiation engine designed for building and training artificial intelligence models within a web browser environment. It functions as a machine learning library that leverages WebGL to provide hardware acceleration for neural networks. The project serves as a high-performance linear algebra library, using the GPU to execute operations on multi-dimensional arrays. This enables the implementation of deep learning models and the execution of client-side machine learning inference. The framework covers the complete automatic
Autograd is an automatic differentiation library and numerical gradient engine for Python. Its primary purpose is to compute the gradients of mathematical functions to enable numerical optimization and the training of mathematical models. The library automates the calculation of derivatives to simplify the implementation of optimization algorithms. This supports activities such as machine learning research, gradient-based learning, and the optimization of numerical models.
Sonnet is a modular machine learning framework and TensorFlow library used for building, training, and managing deep learning models. It functions as a system for composing neural networks from reusable modules and layers that encapsulate their own parameters and internal states. The project provides specialized tools for distributed model training, enabling the synchronization of gradients across multiple hardware devices. It also serves as a model state management system, allowing for the persistence of neural network weights and the export of portable models that separate the computation g
This project is a deep learning tutorial series and educational curriculum designed to teach PyTorch fundamentals. It serves as a structured training guide for mastering neural network architecture, automatic differentiation, and the use of tensors and dynamic computation graphs. The curriculum focuses on practical implementations, specifically guiding the development of recommendation systems, advertising models, and interest networks to predict user preferences. It also provides instructional content for time series forecasting and processing sequential data. The material covers a broad ra
Sonnet is a modular machine learning framework and TensorFlow neural network library designed for building composable deep learning architectures. It functions as a model orchestrator that manages parameters, state serialization, and graph exports during the training process. The framework provides a distributed training system to synchronize gradients and spread workloads across multiple GPUs or hardware devices. It enables the design of reusable research components through high-level abstractions and subclassing. The library covers neural network architecture design through sequential laye
micrograd is a scalar autograd engine and minimal neural network library. It implements a system for reverse-mode automatic differentiation over a dynamic graph of scalar operations to calculate gradients. The project includes a computation graph visualizer that generates representations of data flow and gradient propagation. It provides a set of tools for constructing and training multi-layer perceptrons using an API modeled after PyTorch. The library covers the fundamentals of backpropagation and neural network construction, specifically for binary classification tasks. This includes the i
Corenet is a deep learning training framework and computer vision model library designed for developing neural networks across vision, text, and audio modalities. It functions as a distributed training orchestrator for scaling workloads across multiple compute nodes and provides a multimodal data pipeline for processing image, text, and video data. The project includes a model conversion toolkit for transforming weights and architectures between different machine learning frameworks. It also provides tools for optimizing model performance on Apple Silicon and reducing response latency in gene
DeepXDE is a scientific machine learning library and deep learning PDE solver used to compute solutions for forward and inverse ordinary, partial, and integro-differential equations. It functions as a physics-informed neural network library that embeds physical laws and boundary conditions directly into the neural network loss function. The project provides a deep operator network framework for learning operator mappings that approximate relationships between functions in multiphysics problems. It is implemented as a multi-backend tensor library, allowing the system to switch between differen