Skip to content

[Bug]: WFO Orchestrator Scheduler will not trigger jobs as Scheduled #1134

@Val-HEAnet

Description

@Val-HEAnet

Contact Details

valentine.hayes@heanet.ie

What happened?

Note: may be related to #1119
Python Version (not available in the dropdown on this bug report template): 3:13.9

Module: WFO Scheduler

Expected behaviour: Scheduled jobs should run at the configured times -

@Scheduler(name="Nightly Node Subscriptions Validator", time_unit="day", at="00:01")
def run_nightly_node_validation() -> None:
...

Observed behaviour: Jobs fail to run, instead presenting a "job store" error: Error getting due jobs from job store 'default'. Log snippet below. Full scheduler log attached to this report:

wfo.scheduler.log

Version

Orchestrator Core 4.4.2 (UI 5.3.3)

What python version are you seeing the problem on?

All

Relevant log output

2025-10-21 00:01:00 [debug    ] Looking for jobs to run        [apscheduler.scheduler]
2025-10-21 00:01:00 [warning  ] Error getting due jobs from job store 'default': (psycopg2.OperationalError) server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

[SQL: SELECT apscheduler_jobs.id, apscheduler_jobs.job_state
FROM apscheduler_jobs
WHERE apscheduler_jobs.next_run_time <= %(next_run_time_1)s ORDER BY apscheduler_jobs.next_run_time]
[parameters: {'next_run_time_1': 1761004860.000935}]
(Background on this error at: https://sqlalche.me/e/20/e3q8) [apscheduler.scheduler]

ALSO:

2025-10-21 00:06:20 [debug    ] Looking for jobs to run        [apscheduler.scheduler]
2025-10-21 00:06:20 [info     ] Running job "Resume workflows (trigger: interval[1:00:00], next run at: 2025-10-21 01:06:20 UTC)" (scheduled at 2025-10-21 00:06:20.600251+00:00) [apscheduler.executors.default]
2025-10-21 00:06:20 [debug    ] Post form                      [pydantic_forms.core.sync] state={'process_id': UUID('db81090d-77c2-4dce-9456-05b9558c0618'), 'reporter': 'SYSTEM', 'workflow_name': 'task_resume_workflows', 'workflow_target': <Target.SYSTEM: 'SYSTEM'>} user_inputs=[{}]
2025-10-21 00:06:20 [error    ] Job "Resume workflows (trigger: interval[1:00:00], next run at: 2025-10-21 01:06:20 UTC)" raised an exception [apscheduler.executors.default]
Traceback (most recent call last):
  File "/usr/local/lib/python3.13/site-packages/apscheduler/executors/base.py", line 131, in run_job
    retval = job.func(*job.args, **job.kwargs)
  File "/usr/local/lib/python3.13/site-packages/orchestrator/schedules/resume_workflows.py", line 21, in run_resume_workflows
    start_process("task_resume_workflows")
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/orchestrator/services/processes.py", line 495, in start_process
    pstat = create_process(workflow_key, user_inputs=user_inputs, user=user)
  File "/usr/local/lib/python3.13/site-packages/orchestrator/services/processes.py", line 470, in create_process
    _db_create_process(pstat)
    ~~~~~~~~~~~~~~~~~~^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/orchestrator/services/processes.py", line 107, in _db_create_process
    wf_table = get_workflow_by_name(stat.workflow.name)
  File "/usr/local/lib/python3.13/site-packages/orchestrator/services/workflows.py", line 64, in get_workflow_by_name
    return db.session.scalar(select(WorkflowTable).where(WorkflowTable.name == workflow_name))
           ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/sqlalchemy/orm/session.py", line 2413, in scalar
    return self._execute_internal(
           ~~~~~~~~~~~~~~~~~~~~~~^
        statement,
        ^^^^^^^^^^
    ...<4 lines>...
        **kw,
        ^^^^^
    )
    ^
  File "/usr/local/lib/python3.13/site-packages/sqlalchemy/orm/session.py", line 2251, in _execute_internal
    result: Result[Any] = compile_state_cls.orm_execute_statement(
                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        self,
        ^^^^^
    ...<4 lines>...
        conn,
        ^^^^^
    )
    ^
  File "/usr/local/lib/python3.13/site-packages/sqlalchemy/orm/context.py", line 306, in orm_execute_statement
    result = conn.execute(
        statement, params or {}, execution_options=execution_options
    )
  File "/usr/local/lib/python3.13/site-packages/sqlalchemy/engine/base.py", line 1415, in execute
    return meth(
        self,
        distilled_parameters,
        execution_options or NO_OPTIONS,
    )
  File "/usr/local/lib/python3.13/site-packages/sqlalchemy/sql/elements.py", line 523, in _execute_on_connection
    return connection._execute_clauseelement(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        self, distilled_params, execution_options
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/local/lib/python3.13/site-packages/sqlalchemy/engine/base.py", line 1637, in _execute_clauseelement
    ret = self._execute_context(
        dialect,
    ...<8 lines>...
        cache_hit=cache_hit,
    )
  File "/usr/local/lib/python3.13/site-packages/sqlalchemy/engine/base.py", line 1809, in _execute_context
    conn = self._revalidate_connection()
  File "/usr/local/lib/python3.13/site-packages/sqlalchemy/engine/base.py", line 675, in _revalidate_connection
    self._invalid_transaction()
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/usr/local/lib/python3.13/site-packages/sqlalchemy/engine/base.py", line 665, in _invalid_transaction
    raise exc.PendingRollbackError(
    ...<4 lines>...
    )
sqlalchemy.exc.PendingRollbackError: Can't reconnect until invalid transaction is rolled back.  Please rollback() fully before proceeding (Background on this error at: https://sqlalche.me/e/20/8s2b)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriageIssue that need to be triaged

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions