Skip to content

GitHub API failures: "dial tcp 140.82.112.6:443: i/o timeout" #215

@jameslamb

Description

@jameslamb

Description

Recently we've seen a handful of CI failures with what look like failures from the GitHub API.

They've covered many different operations with the GitHub API, and not all in code we control.

failed to get run: Get "https://api.github.com/repos/rapidsai/ucxx/actions/runs/17744054101?exclude_pull_requests=true": dial tcp 140.82.113.6:443: i/o timeout
jq: parse error: Invalid numeric literal at line 2, column 2

(Sep 15, 2025 - rapidsai/ucxx - conda-cpp-build - "C++ build" stage)

failed to get run: Get "https://api.github.com/repos/rapidsai/ucxx/actions/runs/17744054101?exclude_pull_requests=true": dial tcp 140.82.114.6:443: i/o timeout
jq: parse error: Invalid numeric literal at line 2, column 2
failed to get run: Get "https://api.github.com/repos/rapidsai/ucxx/actions/runs/17744054101?exclude_pull_requests=true": dial tcp 140.82.113.5:443: i/o timeout
Error: Process completed with exit code 5.

(Sep 15, 2025 - rapidsai/ucxx - conda-cpp-build - "C++ build" stage)

Run if ! type gh >/dev/null; then
Get "https://api.github.com/rate_limit": dial tcp 140.82.113.6:443: i/o timeout
Error: Process completed with exit code 1.

(Sep 15, 2025 - rapidsai/ucxx - wheel-build-ucxx - "Check GitHub API Rate Limits" stage)

Download action repository 'aws-actions/configure-aws-credentials@7474bc4690e29a8392af63c5b98e7449536d5c3a' (SHA:7474bc4690e29a8392af63c5b98e7449536d5c3a)
Warning: Failed to download action 'https://api.github.com/repos/aws-actions/configure-aws-credentials/tarball/7474bc4690e29a8392af63c5b98e7449536d5c3a'. Error: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing. 
Warning: Back off 18.513 seconds before retry.
Warning: Failed to download action 'https://api.github.com/repos/aws-actions/configure-aws-credentials/tarball/7474bc4690e29a8392af63c5b98e7449536d5c3a'. Error: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing. 
Warning: Back off 16.196 seconds before retry.
Error: Action 'https://api.github.com/repos/aws-actions/configure-aws-credentials/tarball/7474bc4690e29a8392af63c5b98e7449536d5c3a' download has timed out. Error: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing. 

(Sep 15, 2025 - rapidsai/ucxx - wheel-build-distributed-ucxx - "Set up Job" stage)

failed to get run: Get "https://api.github.com/repos/rapidsai/rapidsmpf/actions/runs/18000623832?exclude_pull_requests=true": dial tcp 140.82.112.6:443: i/o timeout
jq: parse error: Invalid numeric literal at line 2, column 2

(Sept 25,2025 - rapidsai/rapidsmpf - conda-python-build -"Build Python" stage-)

Opening this to track possible remediations.

Reproducible Example

Hard to reproduce... have noticed this randomly.

It always seems to be resolved by a re-run a few hours later.

Notes

The cases like the "Set up Job" stage failing suggest that some of the failures are upstream of RAPIDS-controlled code... either networking issues with NVIDIA's self-hosted runners or something on GitHub's side (like a synchronization issues between load balancers and back-end servers).

Opening this here because some possible workaround might involve changes to gha-tools scripts (e.g. more / longer retrying, fewer overall network calls).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions