feat(job-wait): Fix IndexError and add ApiException for better error handling by haracejacob · Pull Request #155 · rundeck-plugins/kubernetes

haracejacob · 2023-03-30T10:30:06Z

When job-wait is executed immediately after job-create, the core_v1.list_namespaced_pod() function sometimes returns an empty array, it cause an IndexError when accessing pod_list.items[0]. This is due to the delay between job creation and pod creation.

To fix this issue, Throwing ApiException(404) when an empty array is returned to execute time.sleep(). And it raise TimeoutError in an error situation.

AS-IS

Traceback (most recent call last):
  File "/home1/rundeck/libext/cache/kubernetes-2.0.10/job-wait.py", line 138, in <module>
    main()
  File "/home1/rundeck/libext/cache/kubernetes-2.0.10/job-wait.py", line 134, in main
    wait()
  File "/home1/rundeck/libext/cache/kubernetes-2.0.10/job-wait.py", line 60, in wait
    first_item = pod_list.items[0]
IndexError: list index out of range
Failed: NonZeroResultCode: Script result code was: 1

TO-BE

WARNING: kubernetes-wait-job: Pod is not ready, status: 404
INFO: kubernetes-wait-job: waiting for log

mbranchnl · 2025-02-25T13:04:14Z

This issue still occurs, whe solved it by mannualy patching job-wait.py

  pod_list = core_v1.list_namespaced_pod(
      namespace,
      label_selector="job-name==" + name
  )

+++++++
  if not pod_list.items:
      log.warning("No pods found for the job yet, retry")
      time.sleep(5)	
      # Handle this situation as needed
  else:
+++++++
      first_item = pod_list.items[0]

fdevans · 2026-03-06T21:11:25Z

Thank you for identifying and fixing this race condition! This is a real issue that affects customers, as confirmed by @mbranchnl's recent comment.

However, we'd like to request a cleaner implementation approach. The current solution of throwing ApiException(404) when the pod list is empty works, but it's semantically incorrect - an empty list isn't actually an API exception, it's just a timing issue where the pod hasn't been created yet.

We'd prefer an approach similar to what @mbranchnl suggested in their manual patch - explicitly handle the empty pod list case with a clear warning and retry logic:

pod_list = core_v1.list_namespaced_pod(
    namespace,
    label_selector="job-name==" + name
)

if not pod_list.items:
    log.warning("No pods found for job yet, waiting for pod creation")
    time.sleep(5)
    continue  # Continue the while loop to retry

first_item = pod_list.items[0]
pod_name = first_item.metadata.name

This makes the code more maintainable and clearly communicates what's happening (pod not created yet) versus masking it as an API error.

Could you update the PR to use this approach instead? Also, please rebase on the latest master branch before updating.

Thanks again for the contribution!

haracejacob · 2026-03-08T12:56:08Z

@fdevans

Thank you for the review.
I have updated the retry logic based on your feedback and rebased the branch on the latest master.
I would appreciate it if you could take another look!

fdevans · 2026-03-09T22:25:06Z

Thank you for updating the PR with the cleaner approach! The implementation looks much better.

However, we found one issue: the timeout protection is bypassed in the new code path.

The problem:

The timeout check (line 74-75) only runs inside the except ApiException block. Your new code uses continue before reaching that check, which means if pods never appear, the loop will run forever instead of timing out after 300 seconds.

The fix:

Please add the timeout check before the continue:

if not pod_list.items:
    log.warning("No pods found for job yet, waiting for pod creation")
    time.sleep(5)
    if timeout and time.time() - start_time > timeout:
        raise TimeoutError
    continue

This ensures the 5-minute timeout protection works for the empty pod list scenario.

Could you update the PR with this change?

Thanks!

haracejacob · 2026-03-14T13:23:06Z

@fdevans

Thank you for catching that issue. You're right—the continue statement would have bypassed the timeout check, potentially leading to an infinite loop.

I've updated the PR to address this, but instead of adding a second timeout check, I moved the logic to the top of the while True loop. I believe this is a cleaner and more robust approach for a couple of reasons:

Centralized Protection: It ensures the timeout is checked on every single iteration, regardless of which code path (like continue) is taken later in the loop.
DRY (Don't Repeat Yourself): It avoids duplicating the timeout logic, making the code easier to maintain.

Please let me know if this centralized approach looks good to you!

haracejacob mentioned this pull request Aug 1, 2023

When job-wait is executed immediately after job-create, it cause an IndexError #165

Open

haracejacob and others added 2 commits March 8, 2026 21:52

Fix IndexError and raise ApiException to do time.sleep

f68674c

Handle empty pod list with warning and retry

9fd373d

haracejacob force-pushed the master branch from c711f16 to 9fd373d Compare March 8, 2026 12:52

fdevans requested a review from a team March 9, 2026 22:32

Refactor timeout check to cover all retry paths

8d3dc82

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(job-wait): Fix IndexError and add ApiException for better error handling#155

feat(job-wait): Fix IndexError and add ApiException for better error handling#155
haracejacob wants to merge 3 commits intorundeck-plugins:masterfrom
haracejacob:master

haracejacob commented Mar 30, 2023

Uh oh!

mbranchnl commented Feb 25, 2025

Uh oh!

fdevans commented Mar 6, 2026

Uh oh!

haracejacob commented Mar 8, 2026

Uh oh!

fdevans commented Mar 9, 2026

Uh oh!

haracejacob commented Mar 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

haracejacob commented Mar 30, 2023

AS-IS

TO-BE

Uh oh!

mbranchnl commented Feb 25, 2025

Uh oh!

fdevans commented Mar 6, 2026

Uh oh!

haracejacob commented Mar 8, 2026

Uh oh!

fdevans commented Mar 9, 2026

Uh oh!

haracejacob commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

haracejacob commented Mar 14, 2026 •

edited

Loading