Overview

This document explains and describes Pinecone datasets.

To learn about using public Pinecone datasets, see Using public datasets.

To learn about creating and listing datasets, see Creating datasets.

Datasets contain vectors and metadata

Pinecone datasets contain rows of dense and sparse vector values and metadata. Pinecone’s Python client supports upserting vectors from a dataset. You can also use datasets to iterate over vectors to automate queries.

Available public datasets

The following table lists information about public Pinecone datasets that are currently available:

namedocumentssourcebuckettaskdense modelsparse model
ANN_DEEP1B_d96_angular9,990,000https://github.com/erikbern/ann-benchmarksgs://pinecone-datasets-dev/ANN_DEEP1B_d96_angularANNANN benchmarkNone
ANN_Fashion-MNIST_d784_euclidean60,000https://github.com/erikbern/ann-benchmarksgs://pinecone-datasets-dev/ANN_Fashion-MNIST_d784_euclideanANNANN benchmarkNone
ANN_GloVe_d200_angular1,183,514https://github.com/erikbern/ann-benchmarksgs://pinecone-datasets-dev/ANN_GloVe_d200_angularANNANN benchmarkNone
ANN_GloVe_d50_angular1,183,514https://github.com/erikbern/ann-benchmarksgs://pinecone-datasets-dev/ANN_GloVe_d50_angularANNANN benchmarkNone
ANN_GloVe_d64_angular292,385https://github.com/erikbern/ann-benchmarksgs://pinecone-datasets-dev/ANN_GloVe_d64_angularANNANN benchmarkNone
ANN_MNIST_d784_euclidean60,000https://github.com/erikbern/ann-benchmarksgs://pinecone-datasets-dev/ANN_MNIST_d784_euclideanANNANN benchmarkNone
ANN_NYTimes_d256_angular290,000https://github.com/erikbern/ann-benchmarksgs://pinecone-datasets-dev/ANN_NYTimes_d256_angularANNANN benchmarkNone
ANN_SIFT1M_d128_euclidean1,000,000https://github.com/erikbern/ann-benchmarksgs://pinecone-datasets-dev/ANN_SIFT1M_d128_euclideanANNANN benchmarkNone
quora_all-MiniLM-L6-bm25522,931https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairsgs://pinecone-datasets-dev/quora_all-MiniLM-L6-bm25similar questionssentence-transformers/msmarco-MiniLM-L6-cos-v5naver/splade-cocondenser-ensembledistil
quora_all-MiniLM-L6-v2_Splade522,931https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairsgs://pinecone-datasets-dev/quora_all-MiniLM-L6-v2_Spladesimilar questionssentence-transformers/msmarco-MiniLM-L6-cos-v5naver/splade-cocondenser-ensembledistil

What’s next