-
Notifications
You must be signed in to change notification settings - Fork 42
Pull requests: NVIDIA/nvidia-resiliency-ext
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Refactor log aggregation: add writer thread and persistent reader
ci-approved
Approved to run CI
#254
opened Jan 29, 2026 by
hexinw-nvidia
Loading…
Used multi_set for group rank assignment.
ci-approved
Approved to run CI
#252
opened Jan 28, 2026 by
hexinw-nvidia
Loading…
feat: NVRX Attribution Service and NVRX Slurm Monitor Service
ci-approved
Approved to run CI
#248
opened Jan 20, 2026 by
namitdhameja
Loading…
Include Source Git Hash in NVRx Installation
ci-approved
Approved to run CI
#233
opened Dec 10, 2025 by
continue-revolution
Loading…
InJob: Include Source Git Hash in NVRx Installation
ci-approved
Approved to run CI
#229
opened Dec 8, 2025 by
continue-revolution
Loading…
Infra HC service over UDS
ci-approved
Approved to run CI
#227
opened Dec 6, 2025 by
namitdhameja
Loading…
Add cycle tracking and REST API for failure attribution
#217
opened Nov 4, 2025 by
hexinw-nvidia
•
Draft
[FR attribution] FR logic update to remove any use of PG description but window-based ordering
ci-approved
Approved to run CI
#216
opened Nov 4, 2025 by
sbak5
Loading…
feat: add non-retryable exception pattern matching
#212
opened Oct 28, 2025 by
hexinw-nvidia
Loading…
Add example for multimodal models
ci-approved
Approved to run CI
#131
opened Jul 25, 2025 by
Ava-A4098
Loading…
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.