Skip to content

Running pestpp on multiple nodes is slower #35

@hamiddashti

Description

@hamiddashti

This issue might not be really related to the pestpp itself and might be more of my bash scripting or our cluster setup. I'm using a cluster with 16 nodes and each node carry 28 cores. I can run the the pestpp-gsa in parallel using worker/slave on one node with the slurm script as below:

#!/bin/bash
#SBATCH -n 1 # total number of tasks requested
#SBATCH --cpus-per-task=1 # cpus to allocate per task
#SBATCH -p shortq # queue (partition) -- defq, eduq, gpuq.
#SBATCH -t 12:00:00 # run time (hh:mm:ss) - 12.0 hours in this.
cd /home/hdashti/scratch/ED_BSU/old_ed2/26jan17/ED/working_morris_ws/master
pestpp-gsa gsa_karun /h :4004 &
MASTER_PID=$!
cd /home/hdashti/scratch/ED_BSU/old_ed2/26jan17/ED/working_morris_ws
parallel -i bash -c "cd {} ; pestpp-gsa gsa_karun /h 127.0.0.1:4004" -- wrk1 wrk2 wrk3 wrk4 wrk5 wrk6 wrk7 wrk8 wrk9 wrk10 wrk11 wrk12 wrk13 wrk14 wrk15 wrk16 wrk17 wrk18 wrk19 wrk20
kill ${MASTER_PID}

The above code which uses 20 cores of one node works fine. I tried to use multiple nodes using more workers using the following script:

#!/bin/bash
#SBATCH -N 4
#SBATCH --tasks-per-node=28
#SBATCH -p defq
#SBATCH -t 120:00:00
ulimit -u 9999
ulimit -s unlimited
ulimit -v unlimited
cd /home/hdashti/scratch/ED_BSU/old_ed2/26jan17/ED/working_morris_ws/master
pestpp-gsa gsa_karun /h :4004 &
MASTER_PID=$!
LEADER=$SLURMD_NODENAME
NODELIST=($(scontrol show hostname $SLURM_JOB_NODELIST))
FOLDERS=(seq 1 112)
for i in seq 0 111; do
ssh -f ${NODELIST[$(echo "$i % 4" | bc)]} "cd /home/hdashti/scratch/ED_BSU/old_ed2/26jan17/ED/working_morris_ws/wrk${FOLDERS[$i]} ; nohup pestpp-gsa gsa_karun /h ${LEADER}:4004 > worker.log &"
done
wait ${MASTER_PID}

Although I'm using 112 cores now but it takes a lot more to finish the pestpp. I was wondering did anyone else run into the same problem? Or am I missing something here? I'm posting this here beacuse I'm not sure if its pestpp problem or our cluster setup. Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions