Pod type | Dimensions | Estimated max vectors per pod |
---|---|---|
p1 | 512 | 1,250,000 |
768 | 1,000,000 | |
1024 | 675,000 | |
p2 | 512 | 1,250,000 |
768 | 1,100,000 | |
1024 | 1,000,000 | |
s1 | 512 | 8,000,000 |
768 | 5,000,000 | |
1024 | 4,000,000 |
top_k
value of queries. The pod type is the primary factor driving QPS, as the different pod types are optimized for different approaches.
The p1 pods are performance-optimized pods which provide very low query latencies, but hold fewer vectors per pod than s1 pods. They are ideal for applications with low latency requirements (<100ms). The s1 pods are optimized for storage and provide large storage capacity and lower overall costs with slightly higher query latencies than p1 pods. They are ideal for very large indexes with moderate or relaxed latency requirements.
The p2 pod type provides greater query throughput with lower latency. They support 200 QPS per replica and return queries in less than 10ms. This means that query throughput and latency are better than s1 and p1, especially for low dimension vectors (<512D).
As a rule, a single p1 pod with 1M vectors of 768 dimensions each and no replicas can handle about 20 QPS. It’s possible to get greater or lesser speeds, depending on the size of your metadata, number of vectors, the dimensionality of your vectors, and the top_K
value for your search. See Table 2 below for more examples.
Table 2: QPS by pod type and top_k
value*
Pod type | top_k 10 | top_k 250 | top_k 1000 |
---|---|---|---|
p1 | 30 | 25 | 20 |
p2 | 150 | 50 | 20 |
s1 | 10 | 10 | 10 |