Skip to content

PSv2 cleanup: use is_complete() and dispatch_mode in job progress handler#1125

Merged
mihow merged 2 commits intomainfrom
fix/job-status-cleanup
Feb 10, 2026
Merged

PSv2 cleanup: use is_complete() and dispatch_mode in job progress handler#1125
mihow merged 2 commits intomainfrom
fix/job-status-cleanup

Conversation

@mihow
Copy link
Collaborator

@mihow mihow commented Feb 10, 2026

Summary

  • Replace hardcoded stage == "results" check in _update_job_progress with job.progress.is_complete(), which verifies all stages are done rather than assuming a specific stage name signals completion
  • Replace feature_flags.async_pipeline_workers check in _cleanup_job_if_needed with job.dispatch_mode == JobDispatchMode.ASYNC_API, which is immutable for the job's lifetime and more correct than re-reading a mutable flag

Both issues were identified during the dispatch_mode PR review. The existing abstractions (JobProgress.is_complete() and Job.dispatch_mode) already handle these cases correctly.

Test plan

  • All 32 job-related tests pass (python manage.py test -k job)
  • Cleanup integration tests updated to set dispatch_mode=ASYNC_API and complete all stages
  • No remaining hardcoded stage == "results" checks in ami/

Summary by CodeRabbit

  • Bug Fixes

    • Improved job completion detection to be more reliable across different processing workflows.
  • Chores

    • Refactored resource cleanup logic to align with job dispatch configuration.

mihow and others added 2 commits February 9, 2026 16:00
Replace hardcoded `stage == "results"` check with `job.progress.is_complete()`
which verifies ALL stages are done, making it work for any job type.

Replace feature flag check in cleanup with `dispatch_mode == ASYNC_API`
which is immutable for the job's lifetime and more correct than re-reading
a mutable flag that could change between job creation and completion.

Co-Authored-By: Claude <noreply@anthropic.com>
Set dispatch_mode=ASYNC_API on test jobs to match the new cleanup guard.
Complete all stages (collect, process, results) in the completion test
since is_complete() correctly requires all stages to be done.

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 10, 2026 00:01
@netlify
Copy link

netlify bot commented Feb 10, 2026

Deploy Preview for antenna-ssec canceled.

Name Link
🔨 Latest commit 6fde58e
🔍 Latest deploy log https://app.netlify.com/projects/antenna-ssec/deploys/698a754385a16a0008997bfd

@netlify
Copy link

netlify bot commented Feb 10, 2026

Deploy Preview for antenna-preview ready!

Name Link
🔨 Latest commit 6fde58e
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/698a7543c6de8c0008bb55ae
😎 Deploy Preview https://deploy-preview-1125--antenna-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
1 paths audited
Performance: 41 (🔴 down 25 from production)
Accessibility: 80 (no change from production)
Best Practices: 100 (no change from production)
SEO: 92 (no change from production)
PWA: 80 (no change from production)
View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 10, 2026

📝 Walkthrough

Walkthrough

This change refactors job completion and cleanup logic from stage/progress-based conditions to generalized completeness checks, and replaces feature-flag-based routing with dispatch-mode-based cleanup eligibility determination.

Changes

Cohort / File(s) Summary
Job completion and cleanup logic
ami/jobs/tasks.py
Refactored progress update trigger from stage == "results" and progress >= 1.0 check to job.progress.is_complete(). Updated cleanup eligibility from ML job feature flag check to job.dispatch_mode == JobDispatchMode.ASYNC_API condition. Job is now re-fetched outside transaction before cleanup execution.
Cleanup test updates
ami/ml/orchestration/tests/test_cleanup.py
Added JobDispatchMode.ASYNC_API enum import and usage in Job creation. Test now advances through all three stages (collect, process, results) instead of single stage update to exercise completion logic.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Poem

🐰 A rabbit hops through cleanup paths so clear,
Dispatch modes guide each job without fear,
Progress checks unified, no stages to decode,
Simpler logic makes the burrow's road! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The description includes Summary and Test plan sections, but lacks List of Changes, Related Issues, Detailed Description, How to Test, Screenshots, Deployment Notes, and Checklist sections from the template. Add missing template sections: List of Changes (modified files), Related Issues (PR #1125), Detailed Description (rationale and side effects), How to Test (test commands), Deployment Notes, and Checklist items.
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main changes: refactoring to use is_complete() and dispatch_mode in the job progress handler.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/job-status-cleanup

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments
ami/ml/orchestration/tests/test_cleanup.py (1)

20-21: Stale docstring: still references async_pipeline_workers.

The setUp docstring says "Set up test fixtures with async_pipeline_workers enabled" but the cleanup eligibility is now driven by dispatch_mode, not the feature flag. Consider updating for consistency.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mihow mihow changed the title refactor: use is_complete() and dispatch_mode in job progress handler PSv2 cleanup: use is_complete() and dispatch_mode in job progress handler Feb 10, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors async job progress completion and cleanup logic to rely on existing abstractions (JobProgress.is_complete() and Job.dispatch_mode) rather than hardcoded stage names and mutable feature flags.

Changes:

  • Update _update_job_progress to treat job completion based on job.progress.is_complete() instead of stage == "results".
  • Update _cleanup_job_if_needed to gate NATS/Redis cleanup on job.dispatch_mode == ASYNC_API rather than project.feature_flags.async_pipeline_workers.
  • Adjust cleanup integration test setup to use dispatch_mode=ASYNC_API and complete all stages before asserting cleanup.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
ami/jobs/tasks.py Switches completion detection and cleanup gating to job-level abstractions (is_complete, dispatch_mode).
ami/ml/orchestration/tests/test_cleanup.py Updates integration test to align with new completion semantics and dispatch mode gating.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mihow mihow merged commit bd71745 into main Feb 10, 2026
13 checks passed
@mihow mihow deleted the fix/job-status-cleanup branch February 10, 2026 00:15
carlosgjs pushed a commit to uw-ssec/antenna that referenced this pull request Feb 10, 2026
…dler (RolnickLab#1125)

* refactor: use is_complete() and dispatch_mode in job progress handler

Replace hardcoded `stage == "results"` check with `job.progress.is_complete()`
which verifies ALL stages are done, making it work for any job type.

Replace feature flag check in cleanup with `dispatch_mode == ASYNC_API`
which is immutable for the job's lifetime and more correct than re-reading
a mutable flag that could change between job creation and completion.

Co-Authored-By: Claude <noreply@anthropic.com>

* test: update cleanup tests for is_complete() and dispatch_mode checks

Set dispatch_mode=ASYNC_API on test jobs to match the new cleanup guard.
Complete all stages (collect, process, results) in the completion test
since is_complete() correctly requires all stages to be done.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
mihow added a commit that referenced this pull request Feb 17, 2026
* merge

* Update ML job counts in async case

* Update date picker version and tweak layout logic (#1105)

* fix: update date picker version and tweak layout logic

* feat: set start month based on selected date

* fix: Properly handle async job state with celery tasks  (#1114)

* merge

* fix: Properly handle async job state with celery tasks

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Delete implemented plan

---------

Co-authored-by: Carlos Garcia Jurado Suarez <carlos@irreverentlabs.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* PSv2: Implement queue clean-up upon job completion (#1113)

* merge

* feat: PSv2 - Queue/redis clean-up upon job completion

* fix: catch specific exception

* chore: move tests to a subdir

---------

Co-authored-by: Carlos Garcia Jurado Suarez <carlos@irreverentlabs.com>
Co-authored-by: Michael Bunsen <notbot@gmail.com>

* fix: PSv2: Workers should not try to fetch tasks from v1 jobs (#1118)

Introduces the dispatch_mode field on the Job model to track how each job dispatches its workload. 


This allows API clients (including the AMI worker) to filter jobs by dispatch mode — for example, fetching only async_api jobs so workers don't pull synchronous or internal jobs.

JobDispatchMode enum (ami/jobs/models.py):

internal — work handled entirely within the platform (Celery worker, no external calls). Default for all jobs.
sync_api — worker calls an external processing service API synchronously and waits for each response.
async_api — worker publishes items to NATS for external processing service workers to pick up independently.
Database and Model Changes:

Added dispatch_mode CharField with TextChoices, defaulting to internal, with the migration in ami/jobs/migrations/0019_job_dispatch_mode.py.
ML jobs set dispatch_mode = async_api when the project's async_pipeline_workers feature flag is enabled.
ML jobs set dispatch_mode = sync_api on the synchronous processing path (previously unset).
API and Filtering:

dispatch_mode is exposed (read-only) in job list and detail serializers.
Filterable via query parameter: ?dispatch_mode=async_api
The /tasks endpoint now returns 400 for non-async_api jobs, since only those have NATS tasks to fetch.
Architecture doc: docs/claude/job-dispatch-modes.md documents the three modes, naming decisions, and per-job-type mapping.

---------

Co-authored-by: Carlos Garcia Jurado Suarez <carlos@irreverentlabs.com>
Co-authored-by: Michael Bunsen <notbot@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>

* PSv2 cleanup: use is_complete() and dispatch_mode in job progress handler (#1125)

* refactor: use is_complete() and dispatch_mode in job progress handler

Replace hardcoded `stage == "results"` check with `job.progress.is_complete()`
which verifies ALL stages are done, making it work for any job type.

Replace feature flag check in cleanup with `dispatch_mode == ASYNC_API`
which is immutable for the job's lifetime and more correct than re-reading
a mutable flag that could change between job creation and completion.

Co-Authored-By: Claude <noreply@anthropic.com>

* test: update cleanup tests for is_complete() and dispatch_mode checks

Set dispatch_mode=ASYNC_API on test jobs to match the new cleanup guard.
Complete all stages (collect, process, results) in the completion test
since is_complete() correctly requires all stages to be done.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>

* track captures and failures

* Update tests, CR feedback, log error images

* CR feedback

* fix type checking

* refactor: rename _get_progress to _commit_update in TaskStateManager

Clarify naming to distinguish mutating vs read-only methods:
- _commit_update(): private, writes mutations to Redis, returns progress
- get_progress(): public, read-only snapshot (added in #1129)
- update_state(): public API, acquires lock, calls _commit_update()

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: unify FAILURE_THRESHOLD and convert TaskProgress to dataclass

- Single FAILURE_THRESHOLD constant in tasks.py, imported by models.py
- Fix async path to use `> FAILURE_THRESHOLD` (was `>=`) to match
  the sync path's boundary behavior at exactly 50%
- Convert TaskProgress from namedtuple to dataclass with defaults,
  so new fields don't break existing callers

Co-Authored-By: Claude <noreply@anthropic.com>

* refactor: rename TaskProgress to JobStateProgress

Clarify that this dataclass tracks job-level progress in Redis,
not individual task/image progress. Aligns with the naming of
JobProgress (the Django/Pydantic model equivalent).

Co-Authored-By: Claude <noreply@anthropic.com>

* docs: update NATS todo and planning docs with session learnings

Mark connection handling as done (PR #1130), add worktree/remote
mapping and docker testing notes for future sessions.

Co-Authored-By: Claude <noreply@anthropic.com>

* Rename TaskStateManager to AsyncJobStateManager

* Track results counts in the job itself vs Redis

* small simplification

* Reset counts to 0 on reset

* chore: remove local planning docs from PR branch

Co-Authored-By: Claude <noreply@anthropic.com>

* docs: clarify three-layer job state architecture in docstrings

Explain the relationship between AsyncJobStateManager (Redis),
JobProgress (JSONB), and JobState (enum). Clarify that all counts
in JobStateProgress refer to source images (captures).

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Carlos Garcia Jurado Suarez <carlos@irreverentlabs.com>
Co-authored-by: Anna Viklund <annamariaviklund@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Michael Bunsen <notbot@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
carlosgjs pushed a commit to uw-ssec/antenna that referenced this pull request Feb 18, 2026
…dler (RolnickLab#1125)

* refactor: use is_complete() and dispatch_mode in job progress handler

Replace hardcoded `stage == "results"` check with `job.progress.is_complete()`
which verifies ALL stages are done, making it work for any job type.

Replace feature flag check in cleanup with `dispatch_mode == ASYNC_API`
which is immutable for the job's lifetime and more correct than re-reading
a mutable flag that could change between job creation and completion.

Co-Authored-By: Claude <noreply@anthropic.com>

* test: update cleanup tests for is_complete() and dispatch_mode checks

Set dispatch_mode=ASYNC_API on test jobs to match the new cleanup guard.
Complete all stages (collect, process, results) in the completion test
since is_complete() correctly requires all stages to be done.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
carlosgjs added a commit to uw-ssec/antenna that referenced this pull request Feb 18, 2026
* merge

* Update ML job counts in async case

* Update date picker version and tweak layout logic (RolnickLab#1105)

* fix: update date picker version and tweak layout logic

* feat: set start month based on selected date

* fix: Properly handle async job state with celery tasks  (RolnickLab#1114)

* merge

* fix: Properly handle async job state with celery tasks

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Delete implemented plan

---------

Co-authored-by: Carlos Garcia Jurado Suarez <carlos@irreverentlabs.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* PSv2: Implement queue clean-up upon job completion (RolnickLab#1113)

* merge

* feat: PSv2 - Queue/redis clean-up upon job completion

* fix: catch specific exception

* chore: move tests to a subdir

---------

Co-authored-by: Carlos Garcia Jurado Suarez <carlos@irreverentlabs.com>
Co-authored-by: Michael Bunsen <notbot@gmail.com>

* fix: PSv2: Workers should not try to fetch tasks from v1 jobs (RolnickLab#1118)

Introduces the dispatch_mode field on the Job model to track how each job dispatches its workload. 


This allows API clients (including the AMI worker) to filter jobs by dispatch mode — for example, fetching only async_api jobs so workers don't pull synchronous or internal jobs.

JobDispatchMode enum (ami/jobs/models.py):

internal — work handled entirely within the platform (Celery worker, no external calls). Default for all jobs.
sync_api — worker calls an external processing service API synchronously and waits for each response.
async_api — worker publishes items to NATS for external processing service workers to pick up independently.
Database and Model Changes:

Added dispatch_mode CharField with TextChoices, defaulting to internal, with the migration in ami/jobs/migrations/0019_job_dispatch_mode.py.
ML jobs set dispatch_mode = async_api when the project's async_pipeline_workers feature flag is enabled.
ML jobs set dispatch_mode = sync_api on the synchronous processing path (previously unset).
API and Filtering:

dispatch_mode is exposed (read-only) in job list and detail serializers.
Filterable via query parameter: ?dispatch_mode=async_api
The /tasks endpoint now returns 400 for non-async_api jobs, since only those have NATS tasks to fetch.
Architecture doc: docs/claude/job-dispatch-modes.md documents the three modes, naming decisions, and per-job-type mapping.

---------

Co-authored-by: Carlos Garcia Jurado Suarez <carlos@irreverentlabs.com>
Co-authored-by: Michael Bunsen <notbot@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>

* PSv2 cleanup: use is_complete() and dispatch_mode in job progress handler (RolnickLab#1125)

* refactor: use is_complete() and dispatch_mode in job progress handler

Replace hardcoded `stage == "results"` check with `job.progress.is_complete()`
which verifies ALL stages are done, making it work for any job type.

Replace feature flag check in cleanup with `dispatch_mode == ASYNC_API`
which is immutable for the job's lifetime and more correct than re-reading
a mutable flag that could change between job creation and completion.

Co-Authored-By: Claude <noreply@anthropic.com>

* test: update cleanup tests for is_complete() and dispatch_mode checks

Set dispatch_mode=ASYNC_API on test jobs to match the new cleanup guard.
Complete all stages (collect, process, results) in the completion test
since is_complete() correctly requires all stages to be done.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>

* track captures and failures

* Update tests, CR feedback, log error images

* CR feedback

* fix type checking

* refactor: rename _get_progress to _commit_update in TaskStateManager

Clarify naming to distinguish mutating vs read-only methods:
- _commit_update(): private, writes mutations to Redis, returns progress
- get_progress(): public, read-only snapshot (added in RolnickLab#1129)
- update_state(): public API, acquires lock, calls _commit_update()

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: unify FAILURE_THRESHOLD and convert TaskProgress to dataclass

- Single FAILURE_THRESHOLD constant in tasks.py, imported by models.py
- Fix async path to use `> FAILURE_THRESHOLD` (was `>=`) to match
  the sync path's boundary behavior at exactly 50%
- Convert TaskProgress from namedtuple to dataclass with defaults,
  so new fields don't break existing callers

Co-Authored-By: Claude <noreply@anthropic.com>

* refactor: rename TaskProgress to JobStateProgress

Clarify that this dataclass tracks job-level progress in Redis,
not individual task/image progress. Aligns with the naming of
JobProgress (the Django/Pydantic model equivalent).

Co-Authored-By: Claude <noreply@anthropic.com>

* docs: update NATS todo and planning docs with session learnings

Mark connection handling as done (PR RolnickLab#1130), add worktree/remote
mapping and docker testing notes for future sessions.

Co-Authored-By: Claude <noreply@anthropic.com>

* Rename TaskStateManager to AsyncJobStateManager

* Track results counts in the job itself vs Redis

* small simplification

* Reset counts to 0 on reset

* chore: remove local planning docs from PR branch

Co-Authored-By: Claude <noreply@anthropic.com>

* docs: clarify three-layer job state architecture in docstrings

Explain the relationship between AsyncJobStateManager (Redis),
JobProgress (JSONB), and JobState (enum). Clarify that all counts
in JobStateProgress refer to source images (captures).

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Carlos Garcia Jurado Suarez <carlos@irreverentlabs.com>
Co-authored-by: Anna Viklund <annamariaviklund@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Michael Bunsen <notbot@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments