Summary
Candidate retrieval multiprocessing rebuilds the full index in every worker.
Affected code
src/clonehunter/similarity/candidates.py:65
src/clonehunter/similarity/candidates.py:127
Problem
Each worker receives the full snippets/embeddings and runs _retrieve_matches, which rebuilds the full index per process.
Impact
High memory and CPU overhead on large repos; poor scaling as process count increases.
Expected
Build index once (or use shared/index-friendly parallel query strategy) and parallelize query workload without redundant full-index construction.
Summary
Candidate retrieval multiprocessing rebuilds the full index in every worker.
Affected code
src/clonehunter/similarity/candidates.py:65src/clonehunter/similarity/candidates.py:127Problem
Each worker receives the full snippets/embeddings and runs
_retrieve_matches, which rebuilds the full index per process.Impact
High memory and CPU overhead on large repos; poor scaling as process count increases.
Expected
Build index once (or use shared/index-friendly parallel query strategy) and parallelize query workload without redundant full-index construction.