fix(onboard): increase endpoint probe timeout for large model inference#1080
Closed
fix(onboard): increase endpoint probe timeout for large model inference#1080
Conversation
The onboard endpoint validation sends a full inference request to verify the provider is reachable. The 20s max-time was too tight for large models like nemotron-3-super-120b-a12b on NVIDIA Endpoints, causing the probe to time out and onboard to fail in non-interactive mode. Increase connect-timeout from 5s to 10s and max-time from 20s to 60s. This only runs once during onboard, so the longer timeout is acceptable. This probe was added in #648 (March 24) but has never run successfully in the nightly e2e because the e2e has been broken since March 23.
Contributor
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughUpdated curl timeout parameters in the network probe utility from 5 to 10 seconds for connection timeout and 20 to 60 seconds for maximum time. Corresponding test assertions were updated to match the new timeout values. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--connect-timeoutfrom 5s to 10s and--max-timefrom 20s to 60s ingetCurlTimingArgs()nvidia/nemotron-3-super-120b-a12bon NVIDIA Endpoints, causing a curl timeout (exit 28) and non-interactive onboard failure.Test plan
cloud-e2ejob passes the endpoint validation step (requires fix(install): upgrade Node.js via nvm when system version is below minimum #1079 to also be merged for the Node.js install fix)credential-exposure.test.jsupdated)Summary by CodeRabbit
Bug Fixes
Tests