# lmcinnes/umap

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/lmcinnes-umap).**

8,215 stars · 862 forks · Python · BSD-3-Clause

## Links

- GitHub: https://github.com/lmcinnes/umap
- Homepage: https://umap-learn.readthedocs.io
- awesome-repositories: https://awesome-repositories.com/repository/lmcinnes-umap.md

## Topics

`dimensionality-reduction` `machine-learning` `topological-data-analysis` `umap` `visualization`

## Description

This project is a manifold learning and non-linear dimensionality reduction library used to project high-dimensional data into lower-dimensional spaces while preserving topological structure. It functions as a parametric embedding framework and a topological data visualization library for identifying clusters and patterns within complex datasets.

The library distinguishes itself through parametric neural mapping, which uses neural networks to learn functional mappings that allow for out-of-sample projections and the reconstruction of original data. It supports supervised and semi-supervised dimensionality reduction by incorporating categorical labels to improve class separation, as well as the ability to project data into non-Euclidean spaces such as spheres or hyperboloids.

The capability surface covers wide-ranging data analysis tasks, including density-based anomaly detection, sparse matrix reduction, and high-dimensional text embedding. It provides tools for embedding alignment across multiple datasets or time-sequenced slices, alongside manifold regularization to preserve local density. Visualization features include interactive plotting, graph connectivity views, and density-based rendering for massive datasets.

The framework includes utilities for model serialization, multi-threaded data processing, and the integration of custom neural network architectures.

## Tags

### Artificial Intelligence & ML

- [Manifold Learning Algorithms](https://awesome-repositories.com/f/artificial-intelligence-ml/manifold-learning-algorithms.md) — Implements manifold learning algorithms to project high-dimensional data while preserving topological structure.
- [Parametric Neural Mappings](https://awesome-repositories.com/f/artificial-intelligence-ml/neural-networks/parametric-neural-mappings.md) — Uses neural networks to learn a functional mapping from high-dimensional input to a low-dimensional embedding.
- [Parametric Embedding Learning](https://awesome-repositories.com/f/artificial-intelligence-ml/parametric-embedding-learning.md) — Uses a neural network to learn the relationship between high-dimensional data and its low-dimensional projection. ([source](https://umap-learn.readthedocs.io/en/latest/parametric_umap.html))
- [Joint Autoencoder Losses](https://awesome-repositories.com/f/artificial-intelligence-ml/autoencoders/joint-autoencoder-losses.md) — Provides a joint autoencoder loss to optimize the neural network for simultaneous embedding and data reconstruction.
- [Dimensionality Reduction](https://awesome-repositories.com/f/artificial-intelligence-ml/dimensionality-reduction.md) — Projects high-dimensional data into a lower-dimensional space using non-linear techniques to facilitate visualization and analysis. ([source](https://cdn.jsdelivr.net/gh/lmcinnes/umap@master/README.md))
- [Semi-Supervised](https://awesome-repositories.com/f/artificial-intelligence-ml/dimensionality-reduction/semi-supervised.md) — Incorporates partial label information by treating unlabelled points as noise, refining class separation when only some data is tagged. ([source](https://umap-learn.readthedocs.io/en/latest/supervised.html))
- [Supervised](https://awesome-repositories.com/f/artificial-intelligence-ml/dimensionality-reduction/supervised.md) — Uses categorical labels during the embedding process to pull apart known classes while preserving data structure. ([source](https://umap-learn.readthedocs.io/en/latest/supervised.html))
- [Embedding Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/embedding-frameworks.md) — Offers a framework for training parametric embeddings using neural networks to map high-dimensional spaces.
- [Out-of-Sample Projections](https://awesome-repositories.com/f/artificial-intelligence-ml/out-of-sample-projections.md) — Transforms new data points into a previously learned lower-dimensional embedding for downstream machine learning tasks. ([source](https://umap-learn.readthedocs.io/en/latest/transform.html))
- [Parametric Manifold Learning](https://awesome-repositories.com/f/artificial-intelligence-ml/parametric-manifold-learning.md) — Trains neural networks to learn a mapping from high-dimensional spaces to low-dimensional embeddings.
- [Supervised Embedding Learning](https://awesome-repositories.com/f/artificial-intelligence-ml/supervised-learning/supervised-embedding-learning.md) — Uses categorical labels during the projection process to improve class separation and reveal data structures.
- [Anomaly Detection Preprocessing](https://awesome-repositories.com/f/artificial-intelligence-ml/anomaly-detection/ensembles/preprocessing-transform/anomaly-detection-preprocessing.md) — Reduces the complexity of high-dimensional datasets to improve the performance of density-based anomaly detection.
- [Clustering Preprocessing](https://awesome-repositories.com/f/artificial-intelligence-ml/data-preprocessing/clustering-preprocessing.md) — Reduces high-dimensional data to a lower-dimensional manifold to improve density-based clustering performance. ([source](https://umap-learn.readthedocs.io/en/latest/clustering.html))
- [Density-Based Anomaly Detectors](https://awesome-repositories.com/f/artificial-intelligence-ml/density-estimation/density-based-anomaly-detectors.md) — Identifies outliers in high-dimensional datasets by projecting data into a lower-dimensional space for efficient density-based detection. ([source](https://umap-learn.readthedocs.io/en/latest/outliers.html))
- [Density Regularization](https://awesome-repositories.com/f/artificial-intelligence-ml/density-estimation/density-regularization.md) — Estimates local density of high-dimensional data and uses it as a regularizer for low-dimensional projections. ([source](https://cdn.jsdelivr.net/gh/lmcinnes/umap@master/README.md))
- [Local Density Regularization](https://awesome-repositories.com/f/artificial-intelligence-ml/density-estimation/local-density-regularization.md) — Adjusts the embedding process by estimating high-dimensional density to ensure relative spacing is preserved in the projection.
- [Projection Plotting](https://awesome-repositories.com/f/artificial-intelligence-ml/dimensionality-reduction/projection-plotting.md) — Generates scatterplots of embeddings with automatic point-sizing and parameter watermarking for projection evaluation. ([source](https://umap-learn.readthedocs.io/en/latest/plotting.html))
- [Sparse Matrix Reduction](https://awesome-repositories.com/f/artificial-intelligence-ml/dimensionality-reduction/sparse-matrix-reduction.md) — Provides direct dimensionality reduction for high-dimensional sparse matrices without requiring conversion to dense arrays. ([source](https://umap-learn.readthedocs.io/en/latest/sparse.html))
- [Embedding Alignment](https://awesome-repositories.com/f/artificial-intelligence-ml/embedding-alignment.md) — Optimizes several low-dimensional projections simultaneously using shared-point constraints for consistent point locations. ([source](https://umap-learn.readthedocs.io/en/latest/aligned_umap_basic_usage.html))
- [Landmark-Constrained Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/embedding-model-fine-tuning/landmark-constrained-fine-tuning.md) — Fine-tunes learned embedding models to incorporate new information while preserving structure via landmark constraints. ([source](https://umap-learn.readthedocs.io/en/latest/transform_landmarked_pumap.html))
- [Incremental Embedding Updates](https://awesome-repositories.com/f/artificial-intelligence-ml/incremental-updates/incremental-model-updating/incremental-embedding-updates.md) — Appends new data slices to an existing aligned model by mapping shared points between slices. ([source](https://umap-learn.readthedocs.io/en/latest/aligned_umap_basic_usage.html))
- [Landmark-Based Model Updating](https://awesome-repositories.com/f/artificial-intelligence-ml/incremental-updates/incremental-model-updating/landmark-based-model-updating.md) — Integrates new data into existing projections by anchoring the mapping to shared landmark points between data slices.
- [Inverse Embedding Reconstruction](https://awesome-repositories.com/f/artificial-intelligence-ml/inverse-embedding-reconstruction.md) — Trains a neural network to learn an inverse mapping from low-dimensional embeddings back to the original data space. ([source](https://umap-learn.readthedocs.io/en/0.5dev/parametric_umap.html))
- [Supervised Metric Learning](https://awesome-repositories.com/f/artificial-intelligence-ml/supervised-learning/supervised-metric-learning.md) — Trains a supervised embedding on labelled data and transforms unlabelled points into that space for feature engineering. ([source](https://umap-learn.readthedocs.io/en/latest/supervised.html))
- [Autoencoder Training Optimizations](https://awesome-repositories.com/f/artificial-intelligence-ml/training-optimization-techniques/autoencoder-training-optimizations.md) — Optimizes the encoder network using a joint loss function that combines dimensionality reduction goals with reconstruction accuracy. ([source](https://umap-learn.readthedocs.io/en/latest/parametric_umap.html))

### Data & Databases

- [Dimensionality Projection Plots](https://awesome-repositories.com/f/data-databases/data-analysis-visualization/visualization-frameworks-libraries/data-visualization/three-dimensional-visualizations/dimensionality-projection-plots.md) — Generates dimensionality projection plots to visually identify clusters and trends in complex datasets.
- [Mutual k-NN Graph Projections](https://awesome-repositories.com/f/data-databases/project-graph-querying/mutual-k-nn-graph-projections.md) — Improves class separation in low-dimensional projections by using a mutual k-nearest neighbor graph to reduce distance concentration. ([source](https://umap-learn.readthedocs.io/en/latest/mutual_nn_umap.html))
- [Simplicial Set Intersection](https://awesome-repositories.com/f/data-databases/relationship-management/set-intersection-analysis/simplicial-set-intersection.md) — Combines a fuzzy simplicial set with categorical label data to respect discrete metric distances. ([source](https://umap-learn.readthedocs.io/en/latest/api.html))
- [Non-Euclidean Space Projections](https://awesome-repositories.com/f/data-databases/data-analysis-visualization/visualization-frameworks-libraries/data-visualization/three-dimensional-visualizations/dimensionality-projection-plots/non-euclidean-space-projections.md) — Maps high-dimensional data into specialized output spaces such as spheres, toruses, or hyperboloids by specifying non-Euclidean distance metrics. ([source](https://umap-learn.readthedocs.io/en/latest/embedding_space.html))
- [Data Visualization Libraries](https://awesome-repositories.com/f/data-databases/data-engineering/data-visualization-libraries.md) — Provides a toolkit for rendering topological data visualizations, including scatterplots and graph connectivity views.

### Scientific & Mathematical Computing

- [Fuzzy Simplicial Set Construction](https://awesome-repositories.com/f/scientific-mathematical-computing/fuzzy-simplicial-set-construction.md) — Constructs membership strength data for local fuzzy simplicial sets using nearest neighbor indices and distances. ([source](https://umap-learn.readthedocs.io/en/latest/api.html))
- [Fuzzy Simplicial Set Constructions](https://awesome-repositories.com/f/scientific-mathematical-computing/nearest-neighbor-searches/fuzzy-simplicial-set-constructions.md) — Implements fuzzy simplicial set construction to build the weighted graphs necessary for manifold learning.
- [Mutual K-Nearest Neighbor Graphs](https://awesome-repositories.com/f/scientific-mathematical-computing/nearest-neighbor-searches/mutual-k-nearest-neighbor-graphs.md) — Refines manifold connectivity using mutual k-nearest neighbor graphs to reduce hub effects and improve projection quality.
- [Non-Euclidean Metric Projections](https://awesome-repositories.com/f/scientific-mathematical-computing/distance-metrics/non-euclidean-metric-projections.md) — Maps high-dimensional data into specialized geometric shapes like spheres or hyperboloids by utilizing non-Euclidean distance metrics.

### Part of an Awesome List

- [Time-Sequenced Alignment](https://awesome-repositories.com/f/awesome-lists/data/sequence-alignment/time-sequenced-alignment.md) — Projects multiple time-sequenced data segments into a lower-dimensional space while preserving relative structure. ([source](https://umap-learn.readthedocs.io/en/latest/aligned_umap_politics_demo.html))
- [Data Visualization](https://awesome-repositories.com/f/awesome-lists/ai/data-visualization.md) — Dimensionality reduction for visualizing high-dimensional data.

### Graphics & Multimedia

- [Topological Graph Visualizations](https://awesome-repositories.com/f/graphics-multimedia/topological-graph-visualizations.md) — Plots the weighted graph representing the manifold structure, featuring edge-bundling to reduce visual clutter. ([source](https://umap-learn.readthedocs.io/en/latest/plotting.html))

### User Interface & Experience

- [Interactive Visualization Toolkits](https://awesome-repositories.com/f/user-interface-experience/visualization-primitive-toolkits/interactive-visualization-toolkits.md) — Provides zoomable, pannable interactive visualizations with data-driven tooltips for exploring individual points. ([source](https://umap-learn.readthedocs.io/en/latest/plotting.html))
