Skip to content

google-cloud-auth: permanent credential failure after transient network failures #4541

@johnsonj

Description

@johnsonj

Summary

The google-cloud-auth library can get into a state where authentication credentials are permanently marked as failed after transient token refresh failures (for example, DNS fails for a period of time). Once in this state, the client never recovers, even after the underlying network issue is resolved.

The actual authentication used to create the client is still valid, so it would be fine to retry, but the only workaround due to this behavior is to recreate the entire client.

This error shows up as: cannot create the authentication headers All retry attempts to fetch the token were exhausted. Subsequent calls with this credential will also fail. and future attempts will not succeed

Reproduction

This repro uses a mock GCE Metadata Service (MDS) server that proxies to real ADC credentials but with a shorter expiration time. Initially, it returns fine, but for a period of time it immediately closes the TCP connection.

gist here

[0-4s]  ✓ SUCCESS - Real GCS requests work
[5-19s] ✗ OUTAGE - Mock server drops connections
[20s+]  ✗ RECOVERY - Server works, but client never asks for tokens!

^^^ BUG: Client is permanently poisoned ^^^

Root Cause

In src/auth/src/retry.rs:

// After exhausting retries, the error is marked as non-transient
let transient = remaining > 0;
return Err(CredentialsError::new(transient, source));

The problem: once transient = false, this flag is permanent. The cached credential state is never cleared, so the library never attempts to refresh credentials again.

Workaround

Detect permanent auth failures and recreate the GCS client:

if is_permanent_auth_failure(&error) {
     client = Storage::builder().build().await?; // Recreate!
}

...
fn is_permanent_auth_failure(error: &GcsError) -> bool {
    use std::error::Error;

    if !error.is_authentication() {
        return false;
    }

    let mut source = error.source();
    while let Some(err) = source {
        if let Some(cred_err) = err.downcast_ref::<CredentialsError>() {
            return !cred_err.is_transient();
        }
        source = err.source();
    }

    false
}

Environment

  • google-cloud-auth: 1.x
  • google-cloud-gax: 1.4.x
  • google-cloud-storage: 1.x
  • Rust: 1.75+

Metadata

Metadata

Assignees

Labels

authIssues related to the auth librarypriority: p1Important issue which blocks shipping the next release. Will be fixed prior to next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions