Skip to content

[Enhance]: Increase topk hard limit beyond 1024 for query() #335

@Rm1n90

Description

@Rm1n90

Affected Component

Codebase

Current Behavior

The native engine currently enforces a hard topk ceiling of 1024 on collection.query(). Requesting topk > 1024 raises: query validate failed: topk[2048] is too large, max is 1024

Desired Improvement

I want to request increasing this limit, ideally to 16,384 (aligned with Milvus) or making it configurable.

Use case

We run a social media search platform with millions of posts indexed in Zvec using both HNSW (dense) and sparse (BM25) vector fields. For dashboard and analytics queries, users need to retrieve large ranked result sets (2K–20K posts) in a single query — for example, "find all posts mentioning a brand, ranked by relevance."

Currently we work around this by falling back to MongoDB full-text search for large result sets, but this loses the BM25 scoring quality that Zvec's sparse index provides.

Context from other vector databases

Database topk limit
Zvec 1,024
Milvus 16,384
Pinecone 10,000
Qdrant No hard cap
Weaviate No hard cap

Proposal

One of these approaches would work for us:

  1. Increase the default limit to 16,384 (matching Milvus, which has a similar architecture)
  2. Make it configurable — e.g. a max_topk parameter on zvec.init() or CollectionOption, so users can raise it when they accept the memory/latency tradeoff
  3. Pagination support — a query() option like offset + limit to iterate beyond 1024 in batches

Impact

Option 2 feels like the best balance; the default stays conservative, but power users can opt in.

Metadata

Metadata

Labels

enhancementImprove an existing feature or component

Type

No type

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions