Performance tuning
This section provides some tips for getting the best performance out of Pinecone.
Basic performance checklist
- Switch to a cloud environment. For example: EC2, GCE, Google Colab, GCP AI Platform Notebook, or SageMaker Notebook. If you experience slow uploads or high query latencies, it might be because you are accessing Pinecone from your home network.
- Deploy your application and your Pinecone service in the same region. Contact us if you need a dedicated deployment.
- Reuse connections. We recommend you reuse the same
pinecone.Index()
instance when you are upserting and querying the same index. - Operate within known limits.
How to increase throughput
To increase throughput (QPS), increase the number of replicas for your index.
Example
The following example increases the number of replicas for example-index
to 4.
See the configure_index API reference for more details.
Using the gRPC client to get higher upsert speeds
Pinecone has a gRPC flavor of the standard client (installation) that can provide higher upsert speeds for multi-pod indexes.
To connect to an index via the gRPC client:
The syntax for upsert, query, fetch, and delete with the gRPC client remain the same as the standard client.
We recommend you use parallel upserts to get the best performance.
We recommend you use the gRPC client for multi-pod indexes only. The performance of the standard and gRPC clients are similar in a single-pod index.
It’s possible to get write throttled faster when upserting using the gRPC index. If you see this often, we recommend you use a backoff algorithm while upserting.
Pinecone is thread-safe, so you can launch multiple read requests and multiple write requests in parallel. Launching multiple requests can help with improving your throughput. However, reads and writes can’t be performed in parallel, therefore writing in large batches might affect query latency and vice versa.
Was this page helpful?