-
Notifications
You must be signed in to change notification settings - Fork 57
Open
Description
Occasionally my jobs volume will fill up such that launch_config files can be created but are empty. This results in the following looping error for any such jobs:
2025-09-24 14:54:03,105 ERROR [pulsar.managers.stateful][[manager=vgp_jetstream2]-[action=monitor]] Failed checking active job status for job_id 70361127
Traceback (most recent call last):
File "/srv/pulsar/main/venv/lib64/python3.9/site-packages/pulsar/managers/stateful.py", line 382, in _monitor_active_jobs
self._check_active_job_status(active_job_id)
File "/srv/pulsar/main/venv/lib64/python3.9/site-packages/pulsar/managers/stateful.py", line 396, in _check_active_job_status
self.stateful_manager.get_status(active_job_id)
File "/srv/pulsar/main/venv/lib64/python3.9/site-packages/pulsar/managers/stateful.py", line 165, in get_status
proxy_status, state_change = self.__proxy_status(job_directory, job_id)
File "/srv/pulsar/main/venv/lib64/python3.9/site-packages/pulsar/managers/stateful.py", line 187, in __proxy_status
proxy_status = job_directory.load_metadata(JOB_FILE_FINAL_STATUS)
File "/srv/pulsar/main/venv/lib64/python3.9/site-packages/pulsar/managers/base/__init__.py", line 367, in load_metadata
return json.loads(contents.decode())
File "/usr/lib64/python3.9/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "/usr/lib64/python3.9/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib64/python3.9/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)To fix, I remove the active-jobs files for affected jobs and requeue them. A more graceful way to recover from this would be ideal.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels