This has been bugging me for a while now and has always seemed a bit mysterious. It only happens on when running Mac OSX and you redirect stdout (either > out.log or &> out.log). The Cascadia example job scripts have this < /dev/null at the end of the command:
mpirun -n 8 ./Mod3DMT -I NLCG cascad_half_prior.ws cascad_errfl5.dat 1e1 1e-6 cascad_half.cov < /dev/null >& ModEM_NLCG.out &
I'm getting the following error message when running:
[mpiexec@IGSKCI164LM054] HYDU_sock_read (lib/utils/sock.c:206): read error (Input/output error)
[mpiexec@IGSKCI164LM054] stdin_cb (mpiexec/pmiserv_cb.c:248): error reading from stdin
[mpiexec@IGSKCI164LM054] HYDT_dmxu_poll_wait_for_event (lib/tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@IGSKCI164LM054] HYD_pmci_wait_for_completion (mpiexec/pmiserv_pmci.c:180): error waiting for event
[mpiexec@IGSKCI164LM054] main (mpiexec/mpiexec.c:261): process manager error waiting for completion
I wonder if this is due to us using GETARG and iargc. I think it's better now-a-days to use GET_COMMAND, GET_COMMAND_ARGUMENT, etc. My suspicion is that MacOS somehow passes the redirect or & as an argument and ModEM get's confused?
Actually, I think this is a difference in exec implementation (man 2 execve), because I can't imagine zsh or bash passes execve different arguments on different platforms. I expect the shells to always behave the same, but I could see BSD execve behaving differently then GNU.
(Actually, it looks like BSD execve conforms to only 4.2BSD, but on a USGS sever conforms to numerous POSIXs standards and 4.3BSD).
Regardless, I think we can resolve this issue by updating the UserCtrl file to use the GET_COMMAND family of features that were introduced into the 2003 Fortran standard.
This has been bugging me for a while now and has always seemed a bit mysterious. It only happens on when running Mac OSX and you redirect stdout (either
> out.logor&> out.log). The Cascadia example job scripts have this< /dev/nullat the end of the command:I'm getting the following error message when running:
I wonder if this is due to us using GETARG and iargc. I think it's better now-a-days to use
GET_COMMAND,GET_COMMAND_ARGUMENT, etc. My suspicion is that MacOS somehow passes the redirect or&as an argument and ModEM get's confused?Actually, I think this is a difference in
execimplementation (man 2 execve), because I can't imagine zsh or bash passesexecvedifferent arguments on different platforms. I expect the shells to always behave the same, but I could see BSDexecvebehaving differently then GNU.(Actually, it looks like BSD
execveconforms to only 4.2BSD, but on a USGS sever conforms to numerous POSIXs standards and 4.3BSD).Regardless, I think we can resolve this issue by updating the UserCtrl file to use the
GET_COMMANDfamily of features that were introduced into the 2003 Fortran standard.