-
Notifications
You must be signed in to change notification settings - Fork 45
Description
Hi,
first, thank you for providing this profile. It's extremely useful.
I'm having a problem to launch multiple jobs at the same time. For example, I want to launch 5 jobs at the same time, each with 64 cores.
If I run snakemake --cores 64, I find that the jobs get launched sequentially rather than in parallel. I understand that this is because I requested a "maximum" of 64 cores, and thus if a job takes that up, I can only run one at a time.
Now, I wrote a function that is passed to the rules threads directive which multiples the workflow.cores by, say, 0.2. So I can pass snakemake --cores 320 and each rule will be allocated 64 cores. However, I am finding that somehow this is getting "squared". What happens is:
- the Snakemake STDOUT (what is shown in the screen) shows the correct number of threads:
rule map_reads:
input: output/mapping/H/catalogue.mmi, output/qc/merged/H_S003_R1.fq.gz, output/qc/merged/H_S003_R2.fq.gz
output: output/mapping/bam/H/H_S003.map.bam
log: output/logs/mapping/map_reads/H-H_S003.log
jobid: 247
benchmark: output/benchmarks/mapping/map_reads/H-H_S003.txt
reason: Missing output files: output/mapping/bam/H/H_S003.map.bam
wildcards: binning_group=H, sample=H_S003
threads: 64
resources: tmpdir=/tmp, mem_mb=149952
minimap2 -t 64 > output/mapping/bam/H/H_S003.map.bam
Submitted job 247 with external jobid '35894421'.
That looks fine. I want this rule to be launched with 64 cores, and when I do this, 5 instances of the rule get launched at the same time.
When I open the job's SLURM log, however, I find that this value of 64 is passed as the "Provided cores" to the job, and thus is multiplied again by 0.2.
Contents of slurm-35894421.out:
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 64
Rules claiming more threads will be scaled down.
Select jobs to execute...
rule map_reads:
input: output/mapping/H/catalogue.mmi, output/qc/merged/H_S003_R1.fq.gz, output/qc/merged/H_S003_R2.fq.gz
output: output/mapping/bam/H/H_S003.map.bam
log: output/logs/mapping/map_reads/H-H_S003.log
jobid: 247
benchmark: output/benchmarks/mapping/map_reads/H-H_S003.txt
reason: Missing output files: output/mapping/bam/H/H_S003.map.bam
wildcards: binning_group=H, sample=H_S003
threads: 13
resources: tmpdir=/tmp, mem_mb=149952
minimap2 -t 13 > output/mapping/bam/H/H_S003.map.bam
Even worse, my job is allocating 64 cores, but only using 13 (64 * 0.2, rounded). It's really weird to me that the Snakemake output shows the "correct" value, but the SLURM log shows the "real" value that was used, i.e. why do they differ?
I am trying to understand what am I doing wrong. Setting a breakpoint on my function used to get the number of threads, the workflow.cores variable is always what I pass to the command line (320), never what shows in the SLURM log.
I tried add a nodes: 5 or a jobs: 5 keys to the profile config.yaml but it doesn't do any good. Is there anything I can modify in the profile to make sure that I can launch as many parallel jobs as I can?
Please let me know what other information I can provide. Thank you very much.
Best,
V