Using public Pinecone datasets
Overview
This document explains how to use existing Pinecone datasets.
To learn about creating and listing datasets, see Creating datasets.
Datasets contain vectors and metadata
Pinecone datasets contain rows of dense and sparse vector values and metadata. Pinecone’s Python client supports upserting vectors from a dataset. You can also use datasets to iterate over vectors to automate queries.
Listing public datasets
To list available public Pinecone datasets, use the list_datasets()
method.
Example
The following example retrieves an object containing information about public Pinecone datasets.
The example above returns an object like the following:
Loading datasets
To load a dataset into memory, use the load_dataset
method. You can use load a Pinecone public dataset or your own dataset.
Example
The following example loads the quora_al-MiniLM-L6-bm25
Pinecone public dataset.
The example above prints the following output:
Iterating over datasets
You can iterate over vector data in a dataset using the iter_documents
method. You can use this method to upsert or update vectors, to automate benchmarking, or other tasks.
Example
The following example loads the quora_all-MiniLM-L6-bm25
dataset, then iterates over the documents in the dataset in batches of 100 and upserts the vector data to a Pinecone index named my-index
.
Iterate over documents in batches and upsert to an index.
The following example upserts the dataset as dataframe.
Upsert the dataset as a dataframe.
What’s next
- Learn more about using datasets in the Pinecone Python client
Was this page helpful?