UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualisation similarly to t-SNE, but also for general non-linear dimension reduction. The algorithm is founded on three assumptions about the data
The data is uniformly distributed on Riemannian manifold;
The Riemannian metric is locally constant (or can be approximated as such);
The manifold is locally connected.
From these assumptions it is possible to model the manifold with a fuzzy topological structure. The embedding is found by searching for a low dimensional projection of the data that has the closest possible equivalent fuzzy topological structure.
The details for the underlying mathematics can be found in our paper on ArXiv:
McInnes, L, Healy, J, UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, ArXiv e-prints 1802.03426, 2018
You can find the software on github.
Installation
Conda install, via the excellent work of the conda-forge team:
conda install -c conda-forge umap-learn
The conda-forge packages are available for linux, OS X, and Windows 64 bit.
PyPI install, presuming you have numba and sklearn and all its requirements (numpy and scipy) installed:
pip install umap-learn
- How to Use UMAP
- Basic UMAP Parameters
- Plotting UMAP results
- UMAP Reproducibility
- Transforming New Data with UMAP
- Inverse transforms
- Parametric (neural network) Embedding
- Transforming New Data with Parametric UMAP
- UMAP on sparse data
- UMAP for Supervised Dimension Reduction and Metric Learning
- Using UMAP for Clustering
- Outlier detection using UMAP
- Combining multiple UMAP models
- Better Preserving Local Density with DensMAP
- Improving the Separation Between Similar Classes Using a Mutual k-NN Graph
- Document embedding using UMAP
- Embedding to non-Euclidean spaces
- How to use AlignedUMAP
- AlignedUMAP for Time Varying Data
- Precomputed k-nn
- Performance Comparison of Dimension Reduction Implementations
- Release Notes
- Frequently Asked Questions
- Should I normalise my features?
- Can I cluster the results of UMAP?
- The clusters are all squashed together and I can’t see internal structure
- I ran out of memory. Help!
- UMAP is eating all my cores. Help!
- Is there GPU or multicore-CPU support?
- Can I add a custom loss function?
- Is there support for the R language?
- Is there a C/C++ implementation?
- I can’t get UMAP to run properly!
- What is the difference between PCA / UMAP / VAEs?
- How UMAP can go wrong
- Successful use-cases
- Interactive Visualizations
- Exploratory Analysis of Interesting Datasets
- Scientific Papers
- The single-cell transcriptional landscape of mammalian organogenesis
- A lineage-resolved molecular atlas of C. elegans embryogenesis at single-cell resolution
- Exploring Neural Networks with Activation Atlases
- TimeCluster: dimension reduction applied to temporal data for visual analytics
- Dimensionality reduction for visualizing single-cell data using UMAP
- Revealing multi-scale population structure in large cohorts
- Understanding Vulnerability of Children in Surrey
- UMAP API Guide
- UMAP
- ParametricUMAP
- Useful Functions
compute_membership_strengths()
discrete_metric_simplicial_set_intersection()
fast_intersection()
fast_metric_intersection()
find_ab_params()
fuzzy_simplicial_set()
init_graph_transform()
init_transform()
make_epochs_per_sample()
nearest_neighbors()
raise_disconnected_warning()
reset_local_connectivity()
simplicial_set_embedding()
smooth_knn_dist()