Skip to content

Fix #779: don't log SAS-token rotation as a broken connection#866

Merged
Nox-MSFT merged 1 commit intodevelopfrom
user/msft/nox-msft/fix-connection-flow
Apr 27, 2026
Merged

Fix #779: don't log SAS-token rotation as a broken connection#866
Nox-MSFT merged 1 commit intodevelopfrom
user/msft/nox-msft/fix-connection-flow

Conversation

@Nox-MSFT
Copy link
Copy Markdown
Collaborator

The IoT Hub C SDK fires the connection-status callback with reason IOTHUB_CLIENT_CONNECTION_EXPIRED_SAS_TOKEN as part of its normal token-rotation flow (~every 48 minutes for symmetric-key auth). The agent's UNAUTHENTICATED handler logged "IoTHub connection is broken." unconditionally, producing ~30 alarming errors per day on healthy devices and creating noise during support triage.

Distinguish the benign SAS-token rotation case from a real failure: introduce an internal helper IoTHub_CommunicationManager_Categorize- Unauthenticated() that maps EXPIRED_SAS_TOKEN to a transient category and everything else to "broken". The callback now logs Info ("SAS token expired; SDK will renew the connection.") for the transient case and leaves g_first_unauthenticated_time untouched so a real outage that follows still trips the existing "broken for N seconds" escalation branch. All other reasons retain the original error wording and state-tracking behavior; the retry-delay logic in PerformChannelManagement (which already special-cases EXPIRED_SAS_TOKEN to a 15s retry) is unchanged.

Add unit tests in iothub_communication_manager_ut.cpp:

  • "Issue Expired SAS token logs error #779: SAS token expiry must not be categorized as 'broken'" -- direct regression assertion over the helper for EXPIRED_SAS_TOKEN plus six other reasons.
  • "Issue Expired SAS token logs error #779: SAS expiry callback is handled benignly" -- drives the public callback through an authenticated -> SAS-expiry -> SAS-expiry sequence and asserts IsAuthenticated() flips to false without crashing.
  • Update the existing "exercises both unauthenticated sub-branches" case to use BAD_CREDENTIAL for the broken-branch coverage, since EXPIRED_SAS_TOKEN no longer takes that path.

The IoT Hub C SDK fires the connection-status callback with reason
IOTHUB_CLIENT_CONNECTION_EXPIRED_SAS_TOKEN as part of its normal
token-rotation flow (~every 48 minutes for symmetric-key auth). The
agent's UNAUTHENTICATED handler logged "IoTHub connection is broken."
unconditionally, producing ~30 alarming errors per day on healthy
devices and creating noise during support triage.

Distinguish the benign SAS-token rotation case from a real failure:
introduce an internal helper IoTHub_CommunicationManager_Categorize-
Unauthenticated() that maps EXPIRED_SAS_TOKEN to a transient category
and everything else to "broken". The callback now logs Info ("SAS
token expired; SDK will renew the connection.") for the transient
case and leaves g_first_unauthenticated_time untouched so a real
outage that follows still trips the existing "broken for N seconds"
escalation branch. All other reasons retain the original error
wording and state-tracking behavior; the retry-delay logic in
PerformChannelManagement (which already special-cases
EXPIRED_SAS_TOKEN to a 15s retry) is unchanged.

Add unit tests in iothub_communication_manager_ut.cpp:
  - "Issue #779: SAS token expiry must not be categorized as
    'broken'" -- direct regression assertion over the helper for
    EXPIRED_SAS_TOKEN plus six other reasons.
  - "Issue #779: SAS expiry callback is handled benignly" -- drives
    the public callback through an authenticated -> SAS-expiry ->
    SAS-expiry sequence and asserts IsAuthenticated() flips to false
    without crashing.
  - Update the existing "exercises both unauthenticated sub-branches"
    case to use BAD_CREDENTIAL for the broken-branch coverage, since
    EXPIRED_SAS_TOKEN no longer takes that path.
@Nox-MSFT Nox-MSFT requested review from chgennar and mirskiy April 27, 2026 21:10
@Nox-MSFT Nox-MSFT self-assigned this Apr 27, 2026
@Nox-MSFT Nox-MSFT merged commit 3e8f0ee into develop Apr 27, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants