Skip to content

Conversation

@tpall
Copy link

@tpall tpall commented Nov 27, 2025

Changes

  • Added array_size = 10 parameter to nextflow.config and array to conf/base.config for more efficient cluster execution.
  • Fix inclusion of rRNA, tRNA, and quast summaries to genome_stats.tsv and metabolism_summary.xlsx in bin/distill.py script.
  • Refactor channel usage (Channel to channel) for consistency across workflows and improve readability + usage of implicit variable within closures (e.g. it.name to it -> it.name)

Computing environment and command

  • nextflow version 25.10.0.10289
  • openjdk 22.0.1-internal 2024-04-16
  • singularity 3.8.5
  • slurm
  • x86_64 GNU/Linux
nextflow run tpall/DRAM -r dev --input_fasta ./DRAM/input_fasta --outdir ./DRAM/call-annotate-distill --threads 8 --summarize --qc --use_kofam --use_dbcan --use_merops --use_viral --use_methyl --use_sulfur -profile singularity --slurm --partition main -with-report -with-trace -with-timeline --array_size 10 --queue_size 10 -resume --annotate 

@madeline-scyphers madeline-scyphers self-requested a review December 1, 2025 21:14
@madeline-scyphers
Copy link
Member

madeline-scyphers commented Dec 1, 2025

Hey @tpall Thanks for this. The job array addition it nice. There is a larger planned update to batch a lot of the inputs into singular jobs to reduce the burden on the queue, since running DRAM with lots of inputs can overwhelm a SLURM scheduler, but adding in job arrays, which weren't supported on the version of Nextflow we initially developed DRAM2 on, but we recently moved to >=24 (we should lock in >=24.04.0 since there are early 24.* prereleases out there if we add in job arrays). I will have to do some testing on utilizing job arrays and their implication. Because from my initial testing it seems like it stops the next stop from proceeding until their are enough inputs to fill an array. Which might be ok. But if we are going to be doing batching anyway, it might not be that important and not worth it.

Also thanks for some of the other QoL updates like updating some of the syntax to DSL2 (Channel -> channel, etc.).

I will have to more fully review the code, which I can get to in a couple weeks. I have deadline for next week, and probably won't be able to review much before then.

But I will leave just a couple of quick thoughts.

Thanks again

}

withName: 'DRAM:ANNOTATE:CALL:.*|DRAM:ANNOTATE:DB_SEARCH:.*' {
array = params.array_size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to support people running DRAM2 with local executor (such as on their own computer if they want), which doesn't support array. So the array should only be used with executors that support it.

conf/base.config Outdated
maxRetries = 2
}

withName: 'DRAM:ANNOTATE:CALL:.*|DRAM:ANNOTATE:DB_SEARCH:.*' {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jobs under DRAM:ANNOTATE:QC:COLLECT.* could also have a job array

conf/base.config Outdated
Comment on lines 68 to 71
withName: 'DRAM:ANNOTATE:CALL:.*|DRAM:ANNOTATE:DB_SEARCH:.*' {
array = params.array_size
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code here in base.config for the job array should probably be in modules.config

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: To Sort

Development

Successfully merging this pull request may close these issues.

2 participants