Fix #433 Prevent per-page hangs & avoid killing job on maxbackoff#438
Fix #433 Prevent per-page hangs & avoid killing job on maxbackoff#438akshan-main wants to merge 1 commit intoallenai:mainfrom
Conversation
|
Thanks for this suggestion, let me think on it for a day or two. The reason the job exits now is because in these giant huge runs we do with hundreds of millions of documents, I found it easier to have the job die and have this show up as an obvious error right away, compared to having half complete or empty files get generated, if some consistent backend issue occurred. It happened to us that there could be weird cluster issues where jobs worked fine, then produced empty or almost incomplete jsonl result files, then went back to working and that wasn't fun. Can you explain more about the cases you ran into? |
|
Hey, I get why you’d rather crash early in giant runs. In my case, it wasn’t bad output, but there was no output because of hang. apost() waits on socket reads without timeout, so if the server stalls mid-response, the coroutine blocks forever(no per request deadline). With concurrency effectively at 1, it looks like it’s stuck on the last page, but it’s really just whichever page hit the wedged request first. That’s why I think the timeout is important. For the max-backoff, I changed sys.exit(1) because there is already fallback handling, and I wanted a one-off failure to not kill the entire PDF. But let me know if its better to make that behavior opt-in (using a flag) or put a threshold in so repeated failures still stop the job loudly. I can align my solution based on that and create a pr for that as well |
Closes #433
Changes proposed in this pull request:
apost()now takes atimeout_sparam and wraps the entire network path inasyncio.timeout(), so a stalled server cant block foreverNoneinstead ofsys.exit(1)- the existing fallback path (make_fallback_result) handles it from there, so the rest of the PDF still gets processed--request_timeout_sCLI flag (default 120s) to control per-request timeoutBefore submitting
section of the
CONTRIBUTINGdocs.Writing docstrings section of the
CONTRIBUTINGdocs.