# flashlight/flashlight

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/flashlight-flashlight).**

5,443 stars · 500 forks · C++ · MIT

## Links

- GitHub: https://github.com/flashlight/flashlight
- Homepage: https://fl.readthedocs.io/en/latest/
- awesome-repositories: https://awesome-repositories.com/repository/flashlight-flashlight.md

## Description

Flashlight is a standalone C++ machine learning library and tensor library used for building and training neural networks. It functions as a comprehensive neural network framework and automatic differentiation engine, providing the tools to construct computation graphs and calculate gradients via backpropagation.

The project serves as a distributed training framework, utilizing all-reduce operations to synchronize gradients and parameters across multiple compute nodes and devices. It distinguishes itself through deep integration of high-performance tensor manipulation, native device memory interoperability, and a system for synchronizing weights across distributed workers to accelerate large-scale model training.

The framework covers a wide range of deep learning capabilities, including modular layer composition for designing complex architectures like residual blocks and recurrent cells. It provides extensive data management utilities for ingestion and prefetching, alongside serialization systems for persisting model states. Additionally, it includes a suite of monitoring and observability tools for tracking training metrics and measuring sequence errors.

The library is implemented in C++.

## Tags

### Artificial Intelligence & ML

- [Automatic Differentiation](https://awesome-repositories.com/f/artificial-intelligence-ml/automatic-differentiation.md) — Provides a comprehensive automatic differentiation engine that calculates gradients via backpropagation through a computation graph.
- [C++ Machine Learning Libraries](https://awesome-repositories.com/f/artificial-intelligence-ml/c-machine-learning-libraries.md) — Serves as a standalone C++ machine learning library for implementing deep learning operations and training neural networks.
- [Distributed Training](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/distributed-training.md) — Synchronizes gradients and parameters across multiple compute nodes and devices to accelerate large-scale model training. ([source](https://fl.readthedocs.io/en/latest/))
- [Automatic Differentiation Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/automatic-differentiation-engines.md) — Features a built-in automatic differentiation engine that constructs computation graphs to calculate gradients via backpropagation.
- [C++ Machine Learning Development](https://awesome-repositories.com/f/artificial-intelligence-ml/c-machine-learning-development.md) — Provides a comprehensive library for building and training high-performance neural networks using native C++.
- [Deep Learning Architectures](https://awesome-repositories.com/f/artificial-intelligence-ml/deep-learning-architectures.md) — Implements a modular framework for constructing complex deep learning architectures, including residual blocks and recurrent cells.
- [Distributed Tensor Synchronization](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-tensor-synchronization.md) — Performs all-reduce operations on tensors to aggregate values from all nodes into a synchronized result. ([source](https://fl.readthedocs.io/en/latest/distributed.html))
- [Distributed Training Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-training-frameworks.md) — Provides a distributed training framework that synchronizes gradients and parameters across compute nodes using all-reduce operations.
- [Dynamic Tensor Shapes](https://awesome-repositories.com/f/artificial-intelligence-ml/dynamic-tensor-shapes.md) — Changes dimensions and permutes axes of tensors during model execution. ([source](https://fl.readthedocs.io/en/latest/tensor.html))
- [Gradient Computation](https://awesome-repositories.com/f/artificial-intelligence-ml/gradient-computation.md) — Calculates gradients from output to input by traversing the computation graph in topologically sorted order. ([source](https://fl.readthedocs.io/en/latest/linearregression.html))
- [Distributed Gradient Synchronization](https://awesome-repositories.com/f/artificial-intelligence-ml/gradient-computation/distributed-gradient-synchronization.md) — Implements all-reduce operations to synchronize gradients across distributed compute nodes during large-scale model training.
- [Weight Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/objectives-and-optimization/weight-optimizers.md) — Implements gradient descent algorithms to update network parameters and minimize loss functions. ([source](https://fl.readthedocs.io/en/latest/mnist.html))
- [Tensor Libraries](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/hardware-and-acceleration/tensor-computing-libraries/tensor-libraries.md) — Implements a multi-dimensional array library supporting various data types and device memory management.
- [Modular Layer Compositions](https://awesome-repositories.com/f/artificial-intelligence-ml/model-composition-architectures/hybrid-layer-compositions/modular-layer-compositions.md) — Supports constructing neural network architectures by stacking modular computation units into sequential containers.
- [Model Performance Evaluators](https://awesome-repositories.com/f/artificial-intelligence-ml/model-performance-evaluators.md) — Measures model accuracy and reliability on test data by disabling gradient tracking and training components. ([source](https://fl.readthedocs.io/en/latest/mnist.html))
- [Neural Network Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/neural-network-frameworks.md) — Offers a modular collection of layers, activation functions, and optimizers for constructing complex deep learning models.
- [Neural Network Modules](https://awesome-repositories.com/f/artificial-intelligence-ml/neural-network-modules.md) — Provides modular neural network modules that encapsulate mutable parameters and define forward pass calculations. ([source](https://fl.readthedocs.io/en/latest/modules.html))
- [Loss Function Selections](https://awesome-repositories.com/f/artificial-intelligence-ml/prediction-visualization/loss-function-calculators/binary-cross-entropy-calculators/cross-entropy-loss-functions/loss-function-selections.md) — Calculates errors between predictions and targets using standard loss functions like Mean Squared Error and Cross Entropy. ([source](https://fl.readthedocs.io/en/latest/modules.html))
- [Tensor Indexing](https://awesome-repositories.com/f/artificial-intelligence-ml/tensor-indexing.md) — Retrieves subtensors using literal values, ranges, and advanced indexing. ([source](https://fl.readthedocs.io/en/latest/tensor.html))
- [Tensor Initialization](https://awesome-repositories.com/f/artificial-intelligence-ml/tensor-initialization.md) — Initializes multi-dimensional arrays with specific shapes, data types, and sparse representations. ([source](https://fl.readthedocs.io/en/latest/tensor.html))
- [Tensor Reshaping](https://awesome-repositories.com/f/artificial-intelligence-ml/tensor-reshaping.md) — Modifies tensor dimensions without changing the order of underlying elements. ([source](https://fl.readthedocs.io/en/latest/modules.html))
- [Tensor Type Conversion](https://awesome-repositories.com/f/artificial-intelligence-ml/tensor-type-conversion.md) — Converts tensor elements between different numerical data types for compatibility. ([source](https://fl.readthedocs.io/en/latest/tensor.html))
- [Activation Functions](https://awesome-repositories.com/f/artificial-intelligence-ml/activation-functions.md) — Provides various non-linear activation functions including ReLU, Sigmoid, Tanh, and Gated Linear Units. ([source](https://fl.readthedocs.io/en/latest/modules.html))
- [Batch Normalization](https://awesome-repositories.com/f/artificial-intelligence-ml/batch-normalization.md) — Implements batch normalization to rescale input tensors using mean and variance to accelerate training. ([source](https://fl.readthedocs.io/en/latest/modules.html))
- [Custom Neural Network Layers](https://awesome-repositories.com/f/artificial-intelligence-ml/custom-neural-network-layers.md) — Allows extending base module classes and defining custom forward pass logic to create specialized neural network layers. ([source](https://fl.readthedocs.io/en/latest/extend.html))
- [Dataset Batch Loading](https://awesome-repositories.com/f/artificial-intelligence-ml/dataset-batch-loading.md) — Packs individual training samples into fixed or dynamic batch sizes using custom batching functions. ([source](https://fl.readthedocs.io/en/latest/data_loading.html))
- [Dropout Regularization](https://awesome-repositories.com/f/artificial-intelligence-ml/dropout-regularization.md) — Provides dropout regularization to prevent feature co-adaptation by randomly zeroing out input values. ([source](https://fl.readthedocs.io/en/latest/modules.html))
- [Linear Transformation Layers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/algorithms/linear-regression-implementations/linear-transformation-layers.md) — Implements linear transformation layers that use matrix multiplication and optional bias to transform input tensor sizes. ([source](https://fl.readthedocs.io/en/latest/modules.html))
- [Convolution Layers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/frameworks/model-construction/neural-network-layers/convolution-layers.md) — Implements 2D convolutional layers with configurable stride, padding, and dilation for 4D input tensors. ([source](https://fl.readthedocs.io/en/latest/modules.html))
- [Residual Block Composers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/frameworks/model-construction/neural-network-layers/convolution-layers/convolutional-block-composers/residual-block-composers.md) — Provides utilities to construct residual blocks with skip connections and scaling factors. ([source](https://fl.readthedocs.io/en/latest/contrib.html))
- [Normalization Layers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/frameworks/model-construction/neural-network-layers/normalization-layers.md) — Provides normalization layers that rescale inputs along a feature axis using learnable affine transformation parameters. ([source](https://fl.readthedocs.io/en/latest/modules.html))
- [Recurrent Layers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/frameworks/model-construction/neural-network-layers/recurrent-layers.md) — Implements standard recurrent layers including RNNs, LSTMs, and GRUs for sequential data processing. ([source](https://fl.readthedocs.io/en/latest/modules.html))
- [Sequential Containers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/frameworks/model-construction/sequential-containers.md) — Implements sequential containers that wrap layers and activation functions for streamlined model definition. ([source](https://fl.readthedocs.io/en/latest/linearregression.html))
- [Domain-Specific Processing Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/domain-specific-processing-pipelines.md) — Handles specialized data pipelines tailored for speech, vision, and text application modalities. ([source](https://cdn.jsdelivr.net/gh/flashlight/flashlight@main/README.md))
- [Mixed-Precision Computing](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/distributed-and-accelerated-compute/training-acceleration-tools/mixed-precision-training/mixed-precision-computing.md) — Adjusts computation precision across operators and normalization layers to balance performance and stability. ([source](https://fl.readthedocs.io/en/latest/common.html))
- [Sequential Model Builders](https://awesome-repositories.com/f/artificial-intelligence-ml/neural-network-construction/sequential-model-builders.md) — Supports stacking convolution, pooling, and linear layers in a linear sequence to build model architectures. ([source](https://fl.readthedocs.io/en/latest/mnist.html))
- [Tensor Debuggers](https://awesome-repositories.com/f/artificial-intelligence-ml/tensor-shape-inspection/tensor-debuggers.md) — Provides tools to output tensor values and gradients to a stream for manual numerical verification. ([source](https://fl.readthedocs.io/en/latest/debugging.html))
- [Training Data Ingestion](https://awesome-repositories.com/f/artificial-intelligence-ml/training-data-ingestion.md) — Provides built-in utilities to ingest and preprocess data for efficient delivery to neural network models. ([source](https://fl.readthedocs.io/en/latest/))
- [Training Data Prefetchers](https://awesome-repositories.com/f/artificial-intelligence-ml/training-data-prefetchers.md) — Uses background worker threads to prefetch and transform training samples, preventing data starvation during training.
- [Training Dataset Management](https://awesome-repositories.com/f/artificial-intelligence-ml/training-dataset-management.md) — Wraps input and target tensors into datasets and iterators to simplify training loop iterations. ([source](https://fl.readthedocs.io/en/latest/linearregression.html))
- [Embedding Lookup Layers](https://awesome-repositories.com/f/artificial-intelligence-ml/vector-embeddings/dense-embeddings/token-embedding-layers/embedding-lookup-layers.md) — Implements embedding lookups to retrieve vectors from learnable dictionaries using index lists. ([source](https://fl.readthedocs.io/en/latest/modules.html))

### Data & Databases

- [Sequence Tensor Generation](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/data-transformation/array-tensor-manipulation/tensor-transformations/constant-tensor-generation/sequence-tensor-generation.md) — Generates tensors containing identity matrices, sequential ranges, and evenly-spaced values. ([source](https://fl.readthedocs.io/en/latest/tensor.html))
- [High-Performance Tensor Libraries](https://awesome-repositories.com/f/data-databases/high-performance-tensor-libraries.md) — Provides high-performance multi-dimensional array operations and custom memory management for hardware accelerators.
- [Parameter Synchronization](https://awesome-repositories.com/f/data-databases/real-time-data-synchronization/parameter-synchronization.md) — Broadcasts or reduces parameter values across the network to ensure all processes start with identical weights. ([source](https://fl.readthedocs.io/en/latest/distributed.html))
- [Tensor-Based](https://awesome-repositories.com/f/data-databases/data-collections-datasets/dataset-creation/tensor-based.md) — Maps dataset indices to samples of tensor vectors, supporting splitting and resampling of training data. ([source](https://fl.readthedocs.io/en/latest/data_loading.html))
- [Tensor Serialization Utilities](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/data-transformation/array-tensor-manipulation/tensor-transformations/tensor-serialization-utilities.md) — Provides utilities for saving and loading tensors, shapes, and model modules to binary files or streams. ([source](https://fl.readthedocs.io/en/latest/serial.html))

### Operating Systems & Systems Programming

- [Communicator-Based Process Groupings](https://awesome-repositories.com/f/operating-systems-systems-programming/kernel-core-internals/system-programming-primitives/inter-process-communication/communicator-based-process-groupings.md) — Allows the configuration of process groups, cluster ranks, and sizes to organize distributed workers. ([source](https://fl.readthedocs.io/en/latest/dist.html))
- [Device Memory Interoperability](https://awesome-repositories.com/f/operating-systems-systems-programming/device-memory-interoperability.md) — Interfaces directly with backend device memory and pressure functions for native hardware interoperability. ([source](https://fl.readthedocs.io/en/latest/memory.html))
- [Direct-Pointer Memory Access](https://awesome-repositories.com/f/operating-systems-systems-programming/kernel-core-internals/kernel-development/kernel-driver-implementation/operator-kernel-implementations/direct-pointer-memory-access.md) — Enables custom GPU kernels to operate on raw tensor memory addresses for high-performance mathematical operations.
- [Custom Memory Allocators](https://awesome-repositories.com/f/operating-systems-systems-programming/kernel-core-internals/process-and-memory-management/memory-management/allocation-strategies/dynamic-memory-allocation/custom-memory-allocators.md) — Allows for the definition of custom memory allocation and management logic to override default device behaviors. ([source](https://fl.readthedocs.io/en/latest/memory.html))

### Scientific & Mathematical Computing

- [Graph Construction Engines](https://awesome-repositories.com/f/scientific-mathematical-computing/data-modeling-processing/computational-graphs/graph-construction-engines.md) — Records inputs and gradient functions during operations to build a symbolic graph for automatic differentiation. ([source](https://fl.readthedocs.io/en/latest/variable.html))
- [Mean Squared Error Scorers](https://awesome-repositories.com/f/scientific-mathematical-computing/ordinary-least-squares/mean-squared-error-scorers.md) — Computes the average squared difference between prediction and target tensors to evaluate regression performance. ([source](https://fl.readthedocs.io/en/latest/meters.html))
- [Tensor Arithmetic](https://awesome-repositories.com/f/scientific-mathematical-computing/tensor-arithmetic.md) — Provides fundamental mathematical operations including addition, subtraction, multiplication, and division on multi-dimensional arrays. ([source](https://fl.readthedocs.io/en/latest/functions.html))
- [Graph-Based Backpropagation](https://awesome-repositories.com/f/scientific-mathematical-computing/topological-sorting-algorithms/graph-based-backpropagation.md) — Traverses the computation graph in reverse topological order to calculate gradients from the loss back to the inputs.
- [Transcendental Function Implementations](https://awesome-repositories.com/f/scientific-mathematical-computing/transcendental-function-implementations.md) — Computes element-wise transcendental functions such as exponentials, natural logarithms, and reciprocals. ([source](https://fl.readthedocs.io/en/latest/functions.html))

### Software Engineering & Architecture

- [Distributed Cluster Coordination](https://awesome-repositories.com/f/software-engineering-architecture/distributed-cluster-coordination.md) — Coordinates multiple processes and devices across a cluster using shared filesystems for parallel computation. ([source](https://fl.readthedocs.io/en/latest/distributed.html))
- [Tensor Comparison Operators](https://awesome-repositories.com/f/software-engineering-architecture/logical-comparison-operators/tensor-comparison-operators.md) — Performs element-wise logical comparisons and boolean operations between tensors or scalars. ([source](https://fl.readthedocs.io/en/latest/functions.html))
- [Custom Kernel Accelerators](https://awesome-repositories.com/f/software-engineering-architecture/performance-reliability/performance-optimization/computational-efficiency/custom-kernel-accelerators.md) — Integrates hand-optimized GPU kernels by providing direct access to raw tensor memory pointers. ([source](https://fl.readthedocs.io/en/latest/extend.html))
- [Tensor Rearrangements](https://awesome-repositories.com/f/software-engineering-architecture/sorting-algorithms/array-rearrangement-algorithms/tensor-rearrangements.md) — Rearranges tensor axes to change shape while maintaining data contiguity. ([source](https://fl.readthedocs.io/en/latest/modules.html))
- [Model State Serialization](https://awesome-repositories.com/f/software-engineering-architecture/configuration-serializers/execution-state-serializers/model-state-serialization.md) — Saves and loads neural network weights, modules, and optimizer states to disk for checkpointing. ([source](https://fl.readthedocs.io/en/latest/))
- [Memory-Efficient Graph Lifecycles](https://awesome-repositories.com/f/software-engineering-architecture/execution-graphs/lifecycle-graph-management/memory-efficient-graph-lifecycles.md) — Minimizes peak memory usage by controlling the lifecycle of intermediate variables during the backward pass. ([source](https://fl.readthedocs.io/en/latest/autograd.html))
- [Device Memory RAII Wrappers](https://awesome-repositories.com/f/software-engineering-architecture/raii-resource-management/device-memory-raii-wrappers.md) — Wraps raw hardware pointers in RAII objects to automate memory release and prevent leaks on accelerator devices.
- [Tensor Memory RAII Wrappers](https://awesome-repositories.com/f/software-engineering-architecture/raii-resource-management/tensor-memory-raii-wrappers.md) — Uses RAII wrappers to automate the acquisition and release of device pointers for tensor arrays to prevent memory leaks. ([source](https://fl.readthedocs.io/en/latest/))
- [Kernel Call Fusion](https://awesome-repositories.com/f/software-engineering-architecture/shared-memory-management/memory-access-profilers/tiled-memory-access-patterns/memory-access-pattern-optimizers/kernel-call-fusion.md) — Reduces memory allocations and improves performance by fusing multiple function calls into a single kernel call. ([source](https://fl.readthedocs.io/en/latest/autograd.html))

### System Administration & Monitoring

- [Training Metric Monitors](https://awesome-repositories.com/f/system-administration-monitoring/monitoring-and-observability/observability-platforms/metric-performance-monitors/training-metric-monitors.md) — Tracks machine learning performance indicators, such as running averages of loss, during the training process. ([source](https://fl.readthedocs.io/en/latest/linearregression.html))
- [GPU Memory Monitors](https://awesome-repositories.com/f/system-administration-monitoring/monitoring-and-observability/observability-platforms/metric-performance-monitors/system-usage-monitoring/system-usage-monitors/gpu-memory-monitors.md) — Reports memory manager statistics and device information to identify leaks and troubleshoot GPU memory pressure. ([source](https://fl.readthedocs.io/en/latest/debugging.html))

### Development Tools & Productivity

- [Dataset Pipeline Management](https://awesome-repositories.com/f/development-tools-productivity/task-pipeline-managers/machine-learning-pipelines/dataset-pipeline-management.md) — Includes extensive utilities for data ingestion, prefetching, and batching of speech, vision, and text datasets.

### DevOps & Infrastructure

- [Dataset Partitioning Strategies](https://awesome-repositories.com/f/devops-infrastructure/distributed-task-workers/dataset-partitioning-strategies.md) — Distributes sample IDs across multiple worker partitions using round-robin or token-based strategies. ([source](https://fl.readthedocs.io/en/latest/data_loading.html))

### User Interface & Experience

- [Sequential Computation Flow](https://awesome-repositories.com/f/user-interface-experience/buttons/button-groups/sequential-computation-flow.md) — Arranges multiple computation units into an ordered sequence where output flows directly into the next input. ([source](https://fl.readthedocs.io/en/latest/modules.html))

### Part of an Awesome List

- [AI & Machine Learning](https://awesome-repositories.com/f/awesome-lists/ai/ai-machine-learning.md) — Standalone machine learning library
- [Artificial Intelligence](https://awesome-repositories.com/f/awesome-lists/ai/artificial-intelligence.md) — Fast, flexible machine learning library built for C++.
- [Machine Learning and AI](https://awesome-repositories.com/f/awesome-lists/ai/machine-learning-and-ai.md) — Fast and flexible machine learning library.
- [Computation and Optimization](https://awesome-repositories.com/f/awesome-lists/devtools/computation-and-optimization.md) — Fast, flexible machine learning library written in C++.