When running container jobs, the process is being silently killed with exit code 137 when insufficient resources are allocated. This appears to be an out-of-memory condition where the application attempts to use resources that aren't properly allocated to the container.
This was observed with a LISFLOOD-FP run. The process starts successfully, but the process gets killed with exit code 137:
[INFO] Running LISFLOOD-FP with parfile: /data/30831/runs/Q100/par.par
***************************
LISFLOOD-FP version 8.1.0 (double)
CUDA supported
***************************
./entrypoint.sh: line 27: 19 Killed /usr/local/bin/lisflood "$parfilepath"
[ERROR] LISFLOOD-FP failed with exit code 137
It would be good if SEPEX would report that it is killing the job so that the source of the job failure is clear.
When running container jobs, the process is being silently killed with exit code 137 when insufficient resources are allocated. This appears to be an out-of-memory condition where the application attempts to use resources that aren't properly allocated to the container.
This was observed with a LISFLOOD-FP run. The process starts successfully, but the process gets killed with exit code 137:
It would be good if SEPEX would report that it is killing the job so that the source of the job failure is clear.