Describe the bug
When using deadline queue sync-output for automatic/incremental downloads, if the download is interrupted (network failure, user cancellation, system crash, credentials expire), all progress is lost. The checkpoint is only saved at the very end of the operation, so the next run restarts from the beginning rather than resuming from where it stopped.
For queues with many files or large outputs, this causes:
- Significant time waste re-downloading already completed files
- Increased S3 transfer costs for customers
- Poor user experience during long-running sync operations
Expected Behaviour
Save the checkpoint periodically during the download process (e.g., every 60 seconds or maybe after downloading a certain amount of file etc) rather than only at completion. This would allow interrupted downloads to resume from a recent checkpoint.
Current Behaviour
The checkpoint (IncrementalDownloadState) is saved only once at the end of sync_output() in queue_group.py over here.
Reproduction Steps
- Submit a job with many output files (or multiple jobs) to a queue in the farm.
- Wait for some tasks to complete and produce outputs
- Start the incremental download. Instructions here
- While files are downloading, interrupt the process (Ctrl+C, network disconnect, or kill the process).
Environment
This is not environment specific.
Please share other details about your environment that you think might be relevant to reproducing the bug.
Describe the bug
When using
deadline queue sync-outputfor automatic/incremental downloads, if the download is interrupted (network failure, user cancellation, system crash, credentials expire), all progress is lost. The checkpoint is only saved at the very end of the operation, so the next run restarts from the beginning rather than resuming from where it stopped.For queues with many files or large outputs, this causes:
Expected Behaviour
Save the checkpoint periodically during the download process (e.g., every 60 seconds or maybe after downloading a certain amount of file etc) rather than only at completion. This would allow interrupted downloads to resume from a recent checkpoint.
Current Behaviour
The checkpoint (IncrementalDownloadState) is saved only once at the end of sync_output() in
queue_group.pyover here.Reproduction Steps
Environment
This is not environment specific.
Please share other details about your environment that you think might be relevant to reproducing the bug.