UMAP of Text Embeddings with Nomic Atlas

Nomic Atlas is a platform for interactively visualizing and exploring massive datasets. It automates the creation of embeddings and 2D coordinate projections using UMAP.

UMAP interactive visualization with Nomic Atlas

Nomic Atlas automatically generates embeddings for your data and allows you to explore large datasets in a web browser. Atlas provides:

  • In-browser analysis of your UMAP data with the Atlas Analyst

  • Vector search over your UMAP data using the Nomic API

  • Interactive features like zooming, recoloring, searching, and filtering in the Nomic Atlas data map

  • Scalability for millions of data points

  • Rich information display on hover

  • Shareable UMAPs via URL links to your embeddings and data maps in Atlas

This example demonstrates how to use Nomic Atlas to create interactive maps of text using embeddings and UMAP.

Setup

  1. Get the required python packages with pip instll nomic pandas

  2. Get a Nomic API key here

  3. Run nomic login nk-... in a terminal window or use the following code:

import nomic
nomic.login('nk-...')

Download Example Data

import pandas as pd

# Example data
df = pd.read_csv("https://docs.nomic.ai/singapore_airlines_reviews.csv")

Create Atlas Dataset

from nomic import AtlasDataset
dataset = AtlasDataset("airline-reviews-data")

Upload to Atlas

dataset.add_data(df)

Create Data Map

We specify the text field from df as the field to create embeddings from. We choose some standard UMAP parameters as well.

from nomic.data_inference import ProjectionOptions

# model="umap" is how you choose UMAP in Nomic Atlas
# You can adjust n_neighbors, min_dist,
# and n_epochs as you would with the UMAP library.
atlas_map = dataset.create_index(
    indexed_field='text',
    projection=ProjectionOptions(
      model="umap",
      n_neighbors=20,
      min_dist=0.01,
      n_epochs=200
  )
)

print(f"Explore your interactive map at: {atlas_map.map_link}")

Your map will be available in your Atlas Dashboard.