Skip to content

~200x compounds speedup for VCFs with large variants#189

Open
fellen31 wants to merge 11 commits intomainfrom
fix-slow-cat
Open

~200x compounds speedup for VCFs with large variants#189
fellen31 wants to merge 11 commits intomainfrom
fix-slow-cat

Conversation

@fellen31
Copy link
Contributor

@fellen31 fellen31 commented Jan 5, 2026

Description

The scoring and sorting of a Nallo clinical SV VCF takes ~5 seconds. Thereafter, ~15 minutes is spent on printing the variants to the output file. This PR replaces print_variant() with cat to make that process take two seconds.

Additionally, I ran into some Multiprocessing errors. Replacing Manager with Queue and then closing results seems to do the trick (LLM suggestion). Turning off logging to stderr by default to reduce noice in e.g. pytest.

Changed

  • Compounds print_variant() to use cat which provides significant speedups for large variants
  • Logging to use standard hierarchy instead of stderr
  • Multiprocessing Manager to Queue in score compounds

How to prepare for test

  • Ssh to relevant server (depending on type of change)
  • Use stage: us
  • Paxa the environment: paxa
  • Install on stage (example for Hasta):
    bash /home/proj/production/servers/resources/hasta.scilifelab.se/update-tool-stage.sh -e S_[TOOL]-t [TOOL] -b [THIS-BRANCH-NAME] -a

How to test

  • Do ...

Expected test outcome

  • Check that ...
  • Take a screenshot and attach or copy/paste the output.

Review

  • Tests executed by
  • "Merge and deploy" approved by
    Thanks for filling in who performed the code review and the test!

This version is a

  • MAJOR - when you make incompatible API changes
  • MINOR - when you add functionality in a backwards compatible manner
  • PATCH - when you make backwards compatible bug fixes or documentation/instructions

Implementation Plan

  • Document in ...
  • Deploy this branch on ...
  • Inform to ...

@fellen31 fellen31 changed the title ~200x compounds printing speedup for VCFs with large variants ~200x compounds speedup for VCFs with large variants Jan 5, 2026
@dnil
Copy link
Member

dnil commented Mar 18, 2026

Well found, well worth looking into: that priority concept feels very much an-old-research-and-development-idea-we-dont-use, doesn't it?

@fellen31
Copy link
Contributor Author

Well found, well worth looking into: that priority concept feels very much an-old-research-and-development-idea-we-dont-use, doesn't it?

Not sure I understand what you mean by priority concept?

@fellen31 fellen31 marked this pull request as ready for review March 20, 2026 16:54
@fellen31 fellen31 linked an issue Mar 23, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix pytest warnings

2 participants