Skip to content

Candidate multiprocessing rebuilds full index in every worker #11

@drogers0

Description

@drogers0

Summary

Candidate retrieval multiprocessing rebuilds the full index in every worker.

Affected code

  • src/clonehunter/similarity/candidates.py:65
  • src/clonehunter/similarity/candidates.py:127

Problem

Each worker receives the full snippets/embeddings and runs _retrieve_matches, which rebuilds the full index per process.

Impact

High memory and CPU overhead on large repos; poor scaling as process count increases.

Expected

Build index once (or use shared/index-friendly parallel query strategy) and parallelize query workload without redundant full-index construction.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions