Affected Component
Codebase
Current Behavior
The native engine currently enforces a hard topk ceiling of 1024 on collection.query(). Requesting topk > 1024 raises: query validate failed: topk[2048] is too large, max is 1024
Desired Improvement
I want to request increasing this limit, ideally to 16,384 (aligned with Milvus) or making it configurable.
Use case
We run a social media search platform with millions of posts indexed in Zvec using both HNSW (dense) and sparse (BM25) vector fields. For dashboard and analytics queries, users need to retrieve large ranked result sets (2K–20K posts) in a single query — for example, "find all posts mentioning a brand, ranked by relevance."
Currently we work around this by falling back to MongoDB full-text search for large result sets, but this loses the BM25 scoring quality that Zvec's sparse index provides.
Context from other vector databases
| Database |
topk limit |
| Zvec |
1,024 |
| Milvus |
16,384 |
| Pinecone |
10,000 |
| Qdrant |
No hard cap |
| Weaviate |
No hard cap |
Proposal
One of these approaches would work for us:
- Increase the default limit to 16,384 (matching Milvus, which has a similar architecture)
- Make it configurable — e.g. a
max_topk parameter on zvec.init() or CollectionOption, so users can raise it when they accept the memory/latency tradeoff
- Pagination support — a
query() option like offset + limit to iterate beyond 1024 in batches
Impact
Option 2 feels like the best balance; the default stays conservative, but power users can opt in.
Affected Component
Codebase
Current Behavior
The native engine currently enforces a hard
topkceiling of 1024 oncollection.query(). Requestingtopk > 1024raises: query validate failed: topk[2048] is too large, max is 1024Desired Improvement
I want to request increasing this limit, ideally to 16,384 (aligned with Milvus) or making it configurable.
Use case
We run a social media search platform with millions of posts indexed in Zvec using both HNSW (dense) and sparse (BM25) vector fields. For dashboard and analytics queries, users need to retrieve large ranked result sets (2K–20K posts) in a single query — for example, "find all posts mentioning a brand, ranked by relevance."
Currently we work around this by falling back to MongoDB full-text search for large result sets, but this loses the BM25 scoring quality that Zvec's sparse index provides.
Context from other vector databases
Proposal
One of these approaches would work for us:
max_topkparameter onzvec.init()orCollectionOption, so users can raise it when they accept the memory/latency tradeoffquery()option likeoffset+limitto iterate beyond 1024 in batchesImpact
Option 2 feels like the best balance; the default stays conservative, but power users can opt in.