Pytorch Metric Learning

PyTorch Metric Learning is an open-source library for training neural networks to produce similarity-preserving embedding spaces. It provides a modular framework where interchangeable loss functions, mining strategies, and evaluation tools can be composed to learn representations that map similar items to nearby points and dissimilar items to distant points in the embedding space.

The library distinguishes itself through a highly configurable architecture that separates concerns across several interchangeable components. Users can assemble custom loss functions from pluggable distance metrics, reducers, and regularizers to control how similarity is measured and penalties are aggregated. It supports cross-batch memory queues that retain embeddings from previous iterations to enlarge the pool of negative samples without increasing batch size, and offers cascaded embedding training that splits the embedder output into sections for progressive feature refinement. The framework also includes distributed loss wrapping for multi-GPU training, hierarchical sampling strategies for datasets with nested categories, and self-supervised label wrappers that enable standard metric losses to train without manual label arrays.

The library covers the full metric learning workflow, including data loading with balanced batch construction and hard example mining, a comprehensive suite of loss functions spanning triplet, contrastive, proxy-based, and classification-style objectives, and evaluation infrastructure for measuring embedding quality through retrieval and clustering metrics. It supports training workflows that combine multiple loss functions, integrate classifier heads, or employ adversarial generation, and provides inference utilities for encoding new data into a learned embedding space. The documentation includes ready-to-use training loops and testers for running complete train-test workflows.

Features

Metric Learning Pipelines - Provides complete training pipelines for deep metric learning with configurable loss functions and mining strategies.
Batch Size Tuning - Implements cross-batch memory queues that store embeddings from previous iterations for contrastive learning.
Retrieval Accuracy Metrics - Calculates retrieval and clustering metrics like precision, recall, and NMI from embedding distances.
Clustering Evaluation Metrics - Provides clustering and retrieval metrics to evaluate embedding quality on held-out datasets.
Loss Metric Optimizer Customizers - Swaps distance metrics, reducers, or regularizers inside loss functions to control similarity computation.
Metric Learning Libraries - An open-source library for training similarity-preserving embeddings using modular loss functions, miners, and reducers.
Embedding Computation - Computes clustering and retrieval metrics from query and reference embeddings with labels.
Embedding Loss Functions - Provides loss functions that compute penalties from embeddings and labels for metric learning.
End-to-End Training Pipelines - Provides ready-to-use trainers that handle the full training loop for metric learning models.
Triplet Loss Functions - Provides triplet loss computation from all possible triplets within a batch for metric learning.
Self-Supervised Label Generators - Wraps standard metric losses to train embeddings without labels using augmented views as positive pairs.
Loss Function Customization - Provides modular loss function assembly from interchangeable distance metrics, reducers, and regularizers.
Contrastive Loss Implementations - Trains embeddings by minimizing distance between similar pairs and maximizing distance between dissimilar pairs.
Composable Loss Components - Provides a modular loss composition system where distance metrics, reducers, and regularizers can be swapped to customize loss behavior.
Triplet Margin Losses - Implements triplet margin loss for metric learning to separate classes in embedding space.
Model Training Pipelines - Provides complete training pipelines that initialize models, optimizers, losses, miners, and testers with logging.
Metric Learning Pipelines - Orchestrates training loops combining loss functions, miners, and optimizers for similarity-preserving embeddings.
Embedding Model Training - Trains neural networks to map similar items to nearby points in an embedding space using labeled data.
Metric Learning Training - Runs training loops combining metric loss functions, mining strategies, and classifiers for similarity embeddings.
Cascaded Embedding Refiners - Splits the embedder output into sections and computes separate losses for progressive refinement.
Metric Loss Pipelines - Orchestrates a full training loop that applies loss functions, miners, and reducers in a configurable sequence.
Self-Supervised Loss Calculators - Wraps standard metric learning losses to work with unlabeled data via augmented view pairs.
Embedding Training Toolkits - Provides a toolkit for training embedding models with triplet, contrastive, proxy-based, and self-supervised losses.
Metric Loss Training - Trains models using triplet and pair-based loss functions to learn similarity-preserving embeddings.
Embedding Accuracy Evaluators - Evaluates how well the learned embedding space separates classes using retrieval and clustering metrics.
Hard Negative Mining - Implements hard negative and positive pair mining to focus training on challenging examples.
Batch-Level Hard Pair Miners - Provides configurable batch-level mining strategies that select informative pairs and triplets to accelerate metric learning convergence.
Hard Pair and Triplet Miners - Provides configurable miners that select hard pairs and triplets from batches for metric learning.
Hard Triplet Miners - Ships miners that select hard triplets from batches to accelerate metric learning convergence.
Margin-Violation Triplet Miners - Ships miners that select triplets violating a margin constraint for metric learning.
Multi-Similarity Miners - Provides multi-similarity mining to select the most informative pairs during metric learning training.
Embedding Similarity Loss Functions - Implements loss functions that use configurable similarity measures to separate classes in embedding space.
Cross-Batch Memory Queues - Implements cross-batch memory queues that store embeddings from previous iterations for contrastive learning.
Metric Learning Workflows - Ships ready-to-use training loops and testers for end-to-end metric learning model training and evaluation.
Contrastive Learning Wrappers - Implements MoCo-style self-supervised contrastive learning with cross-batch memory for embedding training.
ML Batch Training Optimizations - Implements cross-batch memory queues that store embeddings from previous iterations for contrastive learning.
Loss Value Reducers - Ships a pluggable reducer architecture that aggregates per-element or per-pair losses using interchangeable strategies like mean or threshold averaging.
Cross-Batch Memory Queues - Implements cross-batch memory queues to enlarge negative sample pools without increasing batch size.
In-Memory Databases with Persistence - Implements cross-batch memory queues that store embeddings from previous iterations for contrastive learning.
Similarity Learning Libraries - Provides a library for learning similarity-preserving embeddings through configurable mining, losses, and metrics.
Triplet Loss Training - Trains similarity models by computing triplet loss from all possible anchor-positive and anchor-negative pairs within a batch.
Embedding Quality Diagnostics - Evaluates embedding quality using retrieval and clustering metrics like precision, recall, and NMI.
Embedding Pairwise Distance Calculators - Computes pairwise distance or similarity matrices between embeddings using multiple configurable metrics.
Embedding Evaluation Suites - Ships a suite of tools for evaluating embedding quality using retrieval and clustering metrics.
Metric Learning Frameworks - Provides a modular framework for composing and training metric learning models with interchangeable components.
Metric Learning Classifier Heads - Combines metric and classification losses for architectures with trunk, embedder, and classifier heads.
Balanced Class Samplers - Implements balanced batch sampling that draws a fixed number of samples per class to ensure balanced representation during training.
Distributed GPU Training - Supports multi-GPU distributed training with correctly wrapped loss functions and miners for PyTorch.
Distributed Training - Wraps loss functions and miners to operate correctly under PyTorch's DistributedDataParallel.
Data-Parallel Training - Wraps loss functions and miners to operate correctly under PyTorch's DistributedDataParallel.
Distributed Training - Wraps loss functions and miners to operate correctly under PyTorch's DistributedDataParallel.
Fixed-Depth Sampling - Provides fixed-depth sampling strategies that limit neighbor selection to constrain computational growth during training.
Metric Learning Classifier Heads - Trains embeddings jointly with a classification layer to combine metric learning and classification objectives.
Asymmetric Embedding Comparisons - Enables loss computation with separate anchor and reference embedding sets for asymmetric comparisons.
Per-Class Instance Samplers - Implements per-class instance sampling that draws a fixed number of samples from each class for balanced batch construction.
Angular Softmax Loss Training - Trains embeddings by applying a normalized softmax with angular margins for class separation.
Combined Loss Functions - Wraps multiple loss functions together, summing their outputs with optional per-loss weights and miners.
Configurable Loss Combinations - Sums or averages outputs of several loss functions with per-loss miners and weights in a single forward pass.
Metric and Classification Loss Combiners - Supports combining metric and classification losses for joint embedding training.
Anchor-Reference Separation - Supports loss computation with separate anchor and reference embedding sources for asymmetric comparisons.
Multi-Strategy Training Loops - Orchestrates training loops that combine metric losses, classifiers, cascaded embeddings, or adversarial objectives.
Embedding Inference Utilities - Ships utilities to encode new data points into a learned embedding space using a trained model.
Adversarial Robustness Training - Generates adversarial samples during training to make embeddings robust to small perturbations.
Precomputed Pair and Triplet Loss Acceptors - Accepts precomputed pair or triplet indices directly for loss computation without labels.
Angular Loss Training - Trains embeddings by optimizing the angle between feature vectors for class separation.
Margin-Based Loss Training - Trains embeddings by applying a margin constraint on pairwise distances for class separation.
Distribution Regularizers - Ships distribution regularizers that encourage uniform Lp norms or zero-mean embeddings during metric learning training.
Norm Penalizers - Provides norm-based penalties to shrink embedding magnitudes during metric learning training.
Space Shaping Regularizers - Applies center invariance and sparsity penalties to shape the learned embedding space distribution.
Distance-Range Pair Miners - Provides miners that select pairs based on distance ranges for focused metric learning.
Distance-Weighted Pair Miners - Implements distance-weighted sampling that selects pairs or triplets with probability proportional to their distance for stable training.
Hardness-Percentage Pair Miners - Implements miners that select the hardest percentage of pairs for focused training.
Uniform-Distance Pair Miners - Implements miners that select pairs with uniform distance distribution for balanced training.
Cross-Batch Embedding Queues - Ships cross-batch memory queues that retain embeddings from previous iterations to enlarge the negative sample pool.
Similarity-Threshold Pair Miners - Provides miners that select pairs by similarity thresholds for metric learning training.
Developer Tools - Metric learning library.
More to explore - Library for metric learning applications.
PyTorch Utilities - Listed in the “PyTorch Utilities” section of the The Incredible Pytorch awesome list.

tensorflow/similarity

1,025View on GitHub

TensorFlow Similarity is a Python framework designed for training neural networks to learn high-dimensional vector representations and perform similarity-based retrieval. It provides a comprehensive toolkit for metric learning, enabling the development of systems that group similar items together in vector space and identify them through distance-based comparisons. The library distinguishes itself by integrating specialized training techniques, such as contrastive and triplet-based learning, with robust data management tools that ensure stable model convergence. It supports self-supervised re

lightly-ai/lightly

3,684View on GitHub

Lightly is a self-supervised learning framework and computer vision data curation tool designed to manage large image datasets and train models on unlabeled data. It functions as a PyTorch vision library and dataset management SDK, providing tools to convert raw images into high-dimensional vectors for similarity search, visualization, and feature extraction. The project implements a variety of self-supervised architectures, including MoCo, SimCLR, VICReg, Barlow Twins, and masked image modeling. It distinguishes itself by combining these learning frameworks with active learning capabilities,

maiot-io/zenml

5,452View on GitHub

ZenML is an extensible machine learning orchestration framework designed to manage the end-to-end lifecycle of data pipelines and AI agent workflows. It functions as a durable orchestrator that executes machine learning tasks as directed acyclic graphs, ensuring that every step is containerized for consistent performance across local, cloud, and hybrid infrastructure. By decoupling pipeline code from underlying compute and storage backends, the platform allows developers to define infrastructure-agnostic stacks that remain portable across diverse environments. The project distinguishes itself

huggingface/sentence-transformers

18,817View on GitHub

This project is a transformer-based framework for generating dense and sparse vector embeddings of text and multimodal data. It serves as a library for fine-tuning models to perform semantic similarity tasks, retrieval, and reranking. The system is distinguished by its support for diverse architectural patterns, including bi-encoders for fast similarity search and cross-encoders for high-precision reranking. It provides dedicated pipelines for multimodal embeddings, mapping text and images into a shared vector space, and implements knowledge distillation to compress large models into smaller,

KevinMusgravepytorch-metric-learning

Features