Submit tasks as job arrays and fix RNAs in distill summaries #472

tpall · 2025-11-27T10:08:16Z

Changes

Added array_size = 10 parameter to nextflow.config and array to conf/base.config for more efficient cluster execution.
Fix inclusion of rRNA, tRNA, and quast summaries to genome_stats.tsv and metabolism_summary.xlsx in bin/distill.py script.
Refactor channel usage (Channel to channel) for consistency across workflows and improve readability + usage of implicit variable within closures (e.g. it.name to it -> it.name)

Computing environment and command

nextflow version 25.10.0.10289
openjdk 22.0.1-internal 2024-04-16
singularity 3.8.5
slurm
x86_64 GNU/Linux

nextflow run tpall/DRAM -r dev --input_fasta ./DRAM/input_fasta --outdir ./DRAM/call-annotate-distill --threads 8 --summarize --qc --use_kofam --use_dbcan --use_merops --use_viral --use_methyl --use_sulfur -profile singularity --slurm --partition main -with-report -with-trace -with-timeline --array_size 10 --queue_size 10 -resume --annotate

…nctionality - Update groupby_column default value to "input_fasta" in distill.py - Adjust input paths in distill.nf for consistency - Enhance argument handling in SUMMARIZE process

…MARIZE process

…event errors

…eadability

…rid of FASTA_COLUMN environment variable

madeline-scyphers · 2025-12-01T22:13:09Z

Hey @tpall Thanks for this. The job array addition it nice. There is a larger planned update to batch a lot of the inputs into singular jobs to reduce the burden on the queue, since running DRAM with lots of inputs can overwhelm a SLURM scheduler, but adding in job arrays, which weren't supported on the version of Nextflow we initially developed DRAM2 on, but we recently moved to >=24 (we should lock in >=24.04.0 since there are early 24.* prereleases out there if we add in job arrays). I will have to do some testing on utilizing job arrays and their implication. Because from my initial testing it seems like it stops the next stop from proceeding until their are enough inputs to fill an array. Which might be ok. But if we are going to be doing batching anyway, it might not be that important and not worth it.

Also thanks for some of the other QoL updates like updating some of the syntax to DSL2 (Channel -> channel, etc.).

I will have to more fully review the code, which I can get to in a couple weeks. I have deadline for next week, and probably won't be able to review much before then.

But I will leave just a couple of quick thoughts.

Thanks again

madeline-scyphers · 2025-12-01T22:13:34Z

conf/base.config

    }
+
+    withName: 'DRAM:ANNOTATE:CALL:.*|DRAM:ANNOTATE:DB_SEARCH:.*' { 
+                array = params.array_size


I would like to support people running DRAM2 with local executor (such as on their own computer if they want), which doesn't support array. So the array should only be used with executors that support it.

madeline-scyphers · 2025-12-01T22:13:53Z

conf/base.config

        maxRetries    = 2
    }
+
+    withName: 'DRAM:ANNOTATE:CALL:.*|DRAM:ANNOTATE:DB_SEARCH:.*' { 


jobs under DRAM:ANNOTATE:QC:COLLECT.* could also have a job array

madeline-scyphers · 2025-12-01T22:14:54Z

conf/base.config

+    withName: 'DRAM:ANNOTATE:CALL:.*|DRAM:ANNOTATE:DB_SEARCH:.*' { 
+                array = params.array_size
+                }
+


this code here in base.config for the job array should probably be in modules.config

tpall added 10 commits November 20, 2025 09:13

updated config files, added array size parameter for cluster execution

976c683

updated nextflow.config

ef13eed

Swap output assignments for rRNA and tRNA collections

d9a3bc6

Merge branch 'dev' of https://github.com/WrightonLabCSU/DRAM into dev

a087cf4

Merge branch 'dev' of https://github.com/tpall/DRAM into dev

f5697b2

Refactor distill script and configuration for improved clarity and fu…

63ea268

…nctionality - Update groupby_column default value to "input_fasta" in distill.py - Adjust input paths in distill.nf for consistency - Enhance argument handling in SUMMARIZE process

Refactor input and output path definitions for consistency in the SUM…

7414979

…MARIZE process

Fix conditional check for gene columns in genome summary export to pr…

a77c29e

…event errors

Refactor channel usage for consistency across workflows and improve r…

e8b0e95

…eadability

Update SUMMARIZE module to use parameterized fasta column for grouping

4418739

github-project-automation bot added this to DRAM Nov 27, 2025

github-project-automation bot moved this to To Sort in DRAM Nov 27, 2025

tpall added 3 commits November 28, 2025 10:53

Fix closure in QC workflow

cd3b7ac

Fix closure in DB_SEARCH workflow

ed054bd

Updated combine_annotations.py to fix binwise summary. TODO: getting …

d39ff14

…rid of FASTA_COLUMN environment variable

madeline-scyphers self-requested a review December 1, 2025 21:14

madeline-scyphers reviewed Dec 1, 2025

View reviewed changes

Add QC:COLLECT_RNA to array pattern

64ab39e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Submit tasks as job arrays and fix RNAs in distill summaries #472

Submit tasks as job arrays and fix RNAs in distill summaries #472

Uh oh!

tpall commented Nov 27, 2025

Uh oh!

madeline-scyphers commented Dec 1, 2025 •

edited

Loading

Uh oh!

madeline-scyphers Dec 1, 2025

Uh oh!

madeline-scyphers Dec 1, 2025

Uh oh!

madeline-scyphers Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Submit tasks as job arrays and fix RNAs in distill summaries #472

Are you sure you want to change the base?

Submit tasks as job arrays and fix RNAs in distill summaries #472

Uh oh!

Conversation

tpall commented Nov 27, 2025

Changes

Computing environment and command

Uh oh!

madeline-scyphers commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

madeline-scyphers Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

madeline-scyphers Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

madeline-scyphers Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

madeline-scyphers commented Dec 1, 2025 •

edited

Loading