Insert data
After creating a Pinecone index, you can start inserting vector embeddings and metadata into the index.
Inserting records
- Create a client instance and target an index:
- Use the upsert operation to write records into the index:
Immediately after the upsert response is received, records may not be visible to queries yet. This is because Pinecone is eventually consistent. In most situations, you can check if the records have been received by checking for the record counts returned by describe_index_stats()
to be updated. Keep in mind that if you have multiple replicas, they may not all become consistent at the same time.
Batching upserts
For clients upserting larger amounts of data, you should insert data into an index in batches of 100 vectors or fewer over multiple upsert requests.
Example
Sending upserts in parallel
By default, all vector operations sent using the Python client block until the response has been received. But using our client they can be made asynchronous. For the Batching Upserts example this can be done as follows:
Pinecone is thread-safe, so you can launch multiple read requests and multiple write requests in parallel. Launching multiple requests can help with improving your throughput. However, reads and writes can’t be performed in parallel, therefore writing in large batches might affect query latency and vice versa.
If you experience slow uploads, see Performance tuning for advice.
Partitioning an index into namespaces
You can organize the records added to an index into partitions, or “namespaces,” to limit queries and other vector operations to only one such namespace at a time. For more information, see: Namespaces.
Inserting records with metadata
You can insert records that contain metadata as key-value pairs.
You can then use the metadata to filter for those criteria when sending the query. Pinecone will search for similar vector embeddings only among those items that match the filter. For more information, see: Metadata Filtering.
Upserting records with sparse values
Sparse vector values can be upserted alongside dense vector values.
Limitations
The following limitations apply to upserting records with sparse vectors:
- You cannot upsert a record with sparse vector values without dense vector values.
- Only
s1
andp1
pod types using thedotproduct
metric support querying sparse vectors. There is no error at upsert time: if you attempt to query any other pod type using sparse vectors, Pinecone returns an error. - You can only upsert sparse vector values of sizes up to 1000 non-zero values.
- Indexes created before February 22, 2023 do not support sparse values.
Troubleshooting index fullness errors
When upserting data, you may receive the following error:
New upserts may fail as the capacity becomes exhausted. While your index can still serve queries, you need to scale your environment to accommodate more vectors.
To resolve this issue, you can scale your index.
Was this page helpful?