Running pestpp on multiple nodes is slower

This issue might not be really related to the pestpp itself and might be more of my bash scripting or our cluster setup. I'm using a cluster with 16 nodes and each node carry 28 cores. I can run the the pestpp-gsa in parallel using worker/slave on one node with the slurm script as below: 

`#!/bin/bash`
`#SBATCH -n 1               # total number of tasks requested`
`#SBATCH --cpus-per-task=1 # cpus to allocate per task`
`#SBATCH -p shortq            # queue (partition) -- defq, eduq, gpuq.`
`#SBATCH -t 12:00:00        # run time (hh:mm:ss) - 12.0 hours in this.`
`cd /home/hdashti/scratch/ED_BSU/old_ed2/26jan17/ED/working_morris_ws/master`
`pestpp-gsa gsa_karun /h :4004 &`
`MASTER_PID=$!`
`cd /home/hdashti/scratch/ED_BSU/old_ed2/26jan17/ED/working_morris_ws`
`parallel -i bash -c "cd {} ; pestpp-gsa gsa_karun /h 127.0.0.1:4004" -- wrk1 wrk2 wrk3 wrk4 wrk5 wrk6` `wrk7 wrk8 wrk9 wrk10 wrk11 wrk12 wrk13 wrk14 wrk15 wrk16 wrk17 wrk18 wrk19 wrk20`
`kill ${MASTER_PID}`

The above code which uses 20 cores of one node works fine. I tried to use multiple nodes using more workers using the following script:

`#!/bin/bash`
`#SBATCH -N 4`
`#SBATCH --tasks-per-node=28`
`#SBATCH -p defq`
`#SBATCH -t 120:00:00`
`ulimit -u 9999`
`ulimit -s unlimited`
`ulimit -v unlimited`
`cd /home/hdashti/scratch/ED_BSU/old_ed2/26jan17/ED/working_morris_ws/master`
`pestpp-gsa gsa_karun /h :4004 &`
`MASTER_PID=$!`
`LEADER=$SLURMD_NODENAME`
`NODELIST=($(scontrol show hostname $SLURM_JOB_NODELIST))`
`FOLDERS=(`seq 1 112`)`
`for i in `seq 0 111`; do`
`  ssh -f ${NODELIST[$(echo "$i % 4" | bc)]} "cd` `/home/hdashti/scratch/ED_BSU/old_ed2/26jan17/ED/working_morris_ws/wrk${FOLDERS[$i]} ; nohup` `pestpp-gsa gsa_karun /h ${LEADER}:4004 > worker.log &"`
`done`
`wait ${MASTER_PID}`
 
Although I'm using 112 cores now but it takes a lot more to finish the pestpp. I was wondering did anyone else run into the same problem? Or am I missing something here? I'm posting this here beacuse I'm not sure if its pestpp problem or our cluster setup. Thanks 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running pestpp on multiple nodes is slower #35

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Running pestpp on multiple nodes is slower #35

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions