-
Notifications
You must be signed in to change notification settings - Fork 57
Submit tasks as job arrays and fix RNAs in distill summaries #472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
…nctionality - Update groupby_column default value to "input_fasta" in distill.py - Adjust input paths in distill.nf for consistency - Enhance argument handling in SUMMARIZE process
…rid of FASTA_COLUMN environment variable
|
Hey @tpall Thanks for this. The job array addition it nice. There is a larger planned update to batch a lot of the inputs into singular jobs to reduce the burden on the queue, since running DRAM with lots of inputs can overwhelm a SLURM scheduler, but adding in job arrays, which weren't supported on the version of Nextflow we initially developed DRAM2 on, but we recently moved to >=24 (we should lock in >=24.04.0 since there are early 24.* prereleases out there if we add in job arrays). I will have to do some testing on utilizing job arrays and their implication. Because from my initial testing it seems like it stops the next stop from proceeding until their are enough inputs to fill an array. Which might be ok. But if we are going to be doing batching anyway, it might not be that important and not worth it. Also thanks for some of the other QoL updates like updating some of the syntax to DSL2 (Channel -> channel, etc.). I will have to more fully review the code, which I can get to in a couple weeks. I have deadline for next week, and probably won't be able to review much before then. But I will leave just a couple of quick thoughts. Thanks again |
| } | ||
|
|
||
| withName: 'DRAM:ANNOTATE:CALL:.*|DRAM:ANNOTATE:DB_SEARCH:.*' { | ||
| array = params.array_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to support people running DRAM2 with local executor (such as on their own computer if they want), which doesn't support array. So the array should only be used with executors that support it.
conf/base.config
Outdated
| maxRetries = 2 | ||
| } | ||
|
|
||
| withName: 'DRAM:ANNOTATE:CALL:.*|DRAM:ANNOTATE:DB_SEARCH:.*' { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jobs under DRAM:ANNOTATE:QC:COLLECT.* could also have a job array
conf/base.config
Outdated
| withName: 'DRAM:ANNOTATE:CALL:.*|DRAM:ANNOTATE:DB_SEARCH:.*' { | ||
| array = params.array_size | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code here in base.config for the job array should probably be in modules.config
Changes
array_size = 10parameter to nextflow.config andarrayto conf/base.config for more efficient cluster execution.Channeltochannel) for consistency across workflows and improve readability + usage of implicit variable within closures (e.g.it.nametoit -> it.name)Computing environment and command