Skip to content

Add 500-protein human SwissProt subset for mhcquant SDRF test#1987

Merged
jonasscheid merged 1 commit intonf-core:mhcquantfrom
jonasscheid:add-500prot-fasta
Apr 14, 2026
Merged

Add 500-protein human SwissProt subset for mhcquant SDRF test#1987
jonasscheid merged 1 commit intonf-core:mhcquantfrom
jonasscheid:add-500prot-fasta

Conversation

@jonasscheid
Copy link
Copy Markdown

Summary

Adds testdata/UP000005640_9606_500prot.fasta, a 500-protein subset of the human SwissProt proteome (UP000005640), for use as the default FASTA in the upcoming test_sdrf profile in nf-core/mhcquant.

Rationale

The current full proteome (UP000005640_9606.fasta, 20,610 proteins) causes the CometAdapter step in the SDRF test to exceed the 6 GB CI memory cap or hit the 2 h wall-time when combined with unspecific cleavage on the PXD009752 test RAW files.

Benchmarking showed the 500-protein subset:

  • Reduces CometAdapter wall-time from ~35 min to ~3 min per file (at spectrum_batch_size=20000)
  • Keeps peak RSS within the 6 GB cap
  • Still yields ~100 peptides at q<0.01 FDR, giving Percolator enough targets to train a stable SVM

Content

  • First 500 entries of UP000005640_9606.fasta (deterministic head)
  • 329 KB (vs 13.6 MB for the full proteome)

Related

Used by nf-core/mhcquant#445 (SDRF/PRIDE input support).

@jonasscheid jonasscheid merged commit c7b55c6 into nf-core:mhcquant Apr 14, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants