Hello, when running the load-balancer on HPC, the first 8 jobs invariably seem to not actually perform any model evaluations, and give "bus error"s. They also appear to run serially, and it can take a few minutes to get through them all before job9 onwards runs and actually evaluates the model. Example of load-balancer output when not actually evaluating models:
...
Load balancer running port4242
Listening on port 4242...
Waiting for job 2 to start.
Job 2 started.
2024-11-18T15:29:53Z INFO Job 2 canceled (1 tasks canceled, 0 tasks already finished)
Waiting for job 3 to start.
Job 3 started.
...
Job stdout is only:
Waiting for model server to respond at XX.XX.XX.XX:XXXXX...
Model server responded
======== Running on http://X.X.X.X:XXXXX ========
(Press CTRL+C to quit)
stderr is:
After job 8 has run, job 9 onwards run fine with no bus errors and correct evaluations.
Any ideas if I'm doing something wrong? Thanks!
Hello, when running the load-balancer on HPC, the first 8 jobs invariably seem to not actually perform any model evaluations, and give "bus error"s. They also appear to run serially, and it can take a few minutes to get through them all before job9 onwards runs and actually evaluates the model. Example of load-balancer output when not actually evaluating models:
Job stdout is only:
stderr is:
After job 8 has run, job 9 onwards run fine with no bus errors and correct evaluations.
Any ideas if I'm doing something wrong? Thanks!