fix(onboard): increase endpoint probe timeout for large model inference by ericksoa · Pull Request #1080 · NVIDIA/NemoClaw

ericksoa · 2026-03-30T02:17:56Z

Summary

Increase --connect-timeout from 5s to 10s and --max-time from 20s to 60s in getCurlTimingArgs()
The onboard endpoint probe sends a full inference request to validate the provider. 20s was too tight for nvidia/nemotron-3-super-120b-a12b on NVIDIA Endpoints, causing a curl timeout (exit 28) and non-interactive onboard failure.
This probe was added in feat: expand provider onboarding and validation #648 (March 24) but has never passed in the nightly e2e — the e2e has been broken since March 23 for unrelated reasons, so this code path was never validated in CI.
Only runs once during onboard, so the longer timeout has no UX impact.

Test plan

Nightly e2e cloud-e2e job passes the endpoint validation step (requires fix(install): upgrade Node.js via nvm when system version is below minimum #1079 to also be merged for the Node.js install fix)
Unit tests pass (credential-exposure.test.js updated)

Summary by CodeRabbit

Bug Fixes
- Increased network probe timeout values to improve reliability during network operations.
Tests
- Updated test assertions to reflect network probe timeout changes.

The onboard endpoint validation sends a full inference request to verify the provider is reachable. The 20s max-time was too tight for large models like nemotron-3-super-120b-a12b on NVIDIA Endpoints, causing the probe to time out and onboard to fail in non-interactive mode. Increase connect-timeout from 5s to 10s and max-time from 20s to 60s. This only runs once during onboard, so the longer timeout is acceptable. This probe was added in #648 (March 24) but has never run successfully in the nightly e2e because the e2e has been broken since March 23.

coderabbitai · 2026-03-30T02:18:13Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 542713b2-8fcd-484a-b23d-a0e9bacd0f45

📥 Commits

Reviewing files that changed from the base of the PR and between a146385 and 7875f3e.

📒 Files selected for processing (2)

bin/lib/onboard.js
test/credential-exposure.test.js

📝 Walkthrough

Walkthrough

Updated curl timeout parameters in the network probe utility from 5 to 10 seconds for connection timeout and 20 to 60 seconds for maximum time. Corresponding test assertions were updated to match the new timeout values.

Changes

Cohort / File(s)	Summary
Curl Timeout Configuration `bin/lib/onboard.js`, `test/credential-exposure.test.js`	Increased curl timing arguments in network probe from `--connect-timeout 5` and `--max-time 20` to `--connect-timeout 10` and `--max-time 60`. Updated test assertions to verify the new timeout values.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Poem

🐰 Whiskers twitching with delight,
Timeouts doubled, oh what sight!
Ten and sixty now take flight,
Network probes sleep through the night! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: increasing endpoint probe timeouts in the onboard function for better compatibility with large model inference.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/onboard-probe-timeout

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ericksoa requested a review from cv March 30, 2026 02:17

ericksoa marked this pull request as draft March 30, 2026 02:18

ericksoa had a problem deploying to NVIDIA_API_KEY March 30, 2026 02:19 — with GitHub Actions Failure

ericksoa closed this Mar 30, 2026

ericksoa deleted the fix/onboard-probe-timeout branch March 30, 2026 02:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(onboard): increase endpoint probe timeout for large model inference#1080

fix(onboard): increase endpoint probe timeout for large model inference#1080
ericksoa wants to merge 1 commit intomainfrom
fix/onboard-probe-timeout

ericksoa commented Mar 30, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 30, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ericksoa commented Mar 30, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ericksoa commented Mar 30, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 30, 2026 •

edited

Loading