Skip to content

First 8 jobs run by load-balancer appear to have bus errors #88

@jonmaddock

Description

@jonmaddock

Hello, when running the load-balancer on HPC, the first 8 jobs invariably seem to not actually perform any model evaluations, and give "bus error"s. They also appear to run serially, and it can take a few minutes to get through them all before job9 onwards runs and actually evaluates the model. Example of load-balancer output when not actually evaluating models:

...
Load balancer running port4242
Listening on port 4242...
Waiting for job 2 to start.
Job 2 started.
2024-11-18T15:29:53Z INFO Job 2 canceled (1 tasks canceled, 0 tasks already finished)
Waiting for job 3 to start.
Job 3 started.
...

Job stdout is only:

Waiting for model server to respond at XX.XX.XX.XX:XXXXX...
Model server responded
======== Running on http://X.X.X.X:XXXXX ========
(Press CTRL+C to quit)

stderr is:

Bus error

After job 8 has run, job 9 onwards run fine with no bus errors and correct evaluations.

Any ideas if I'm doing something wrong? Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Future todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions