Raise ACI delete timeout from 20s to 120s (fixes #1543)#1544
Raise ACI delete timeout from 20s to 120s (fixes #1543)#1544bingran-you wants to merge 2 commits intodevfrom
Conversation
Azure CLI's `az container delete` routinely takes 25-60s in westus2, especially when the CLI process is cold. The 20s hot-path timeout was tripping on ~every completed run_task execution on prod, producing misleading "delete-request failed" log entries and leaving containers for the periodic reconciler / pool_manager to clean up. The retry helper `delete_aci_container` already uses 120s for the same operation; align the hot path with that value so cleanup actually succeeds inline. Evidence: `pm2 logs dw_worker --err --nostream | grep -c 'delete-request failed.*20s'` on prod returned 11 hits over ~7h uptime; all successful task executions.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Blocker: merge conflict — prod still wedged on Per the devops-scan follow-up on #1559 (#1559 (comment)), prod scheduler has been deferring ~35 due tasks/second since 01:08 UTC with
Requesting: rebase this branch on latest |
Summary
run_codex_task_azure_aciwith the retry helper (delete_aci_containeruses 120s). The 20s value was tripping on nearly every task completion on prod, producing noisydelete-request failedlines and leaving orphan containers for the periodic reconciler.Evidence
Prod
dowhizprod1worker stderr over ~7h uptime contained 11 occurrences of:Fixes #1543. Related: #1537, #1539.
Test plan
cargo build -p run_task_modulelocallydelete-request failedlines for at least 1h of task activity