Fix #779: don't log SAS-token rotation as a broken connection#866
Merged
Fix #779: don't log SAS-token rotation as a broken connection#866
Conversation
The IoT Hub C SDK fires the connection-status callback with reason
IOTHUB_CLIENT_CONNECTION_EXPIRED_SAS_TOKEN as part of its normal
token-rotation flow (~every 48 minutes for symmetric-key auth). The
agent's UNAUTHENTICATED handler logged "IoTHub connection is broken."
unconditionally, producing ~30 alarming errors per day on healthy
devices and creating noise during support triage.
Distinguish the benign SAS-token rotation case from a real failure:
introduce an internal helper IoTHub_CommunicationManager_Categorize-
Unauthenticated() that maps EXPIRED_SAS_TOKEN to a transient category
and everything else to "broken". The callback now logs Info ("SAS
token expired; SDK will renew the connection.") for the transient
case and leaves g_first_unauthenticated_time untouched so a real
outage that follows still trips the existing "broken for N seconds"
escalation branch. All other reasons retain the original error
wording and state-tracking behavior; the retry-delay logic in
PerformChannelManagement (which already special-cases
EXPIRED_SAS_TOKEN to a 15s retry) is unchanged.
Add unit tests in iothub_communication_manager_ut.cpp:
- "Issue #779: SAS token expiry must not be categorized as
'broken'" -- direct regression assertion over the helper for
EXPIRED_SAS_TOKEN plus six other reasons.
- "Issue #779: SAS expiry callback is handled benignly" -- drives
the public callback through an authenticated -> SAS-expiry ->
SAS-expiry sequence and asserts IsAuthenticated() flips to false
without crashing.
- Update the existing "exercises both unauthenticated sub-branches"
case to use BAD_CREDENTIAL for the broken-branch coverage, since
EXPIRED_SAS_TOKEN no longer takes that path.
chgennar
approved these changes
Apr 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The IoT Hub C SDK fires the connection-status callback with reason IOTHUB_CLIENT_CONNECTION_EXPIRED_SAS_TOKEN as part of its normal token-rotation flow (~every 48 minutes for symmetric-key auth). The agent's UNAUTHENTICATED handler logged "IoTHub connection is broken." unconditionally, producing ~30 alarming errors per day on healthy devices and creating noise during support triage.
Distinguish the benign SAS-token rotation case from a real failure: introduce an internal helper IoTHub_CommunicationManager_Categorize- Unauthenticated() that maps EXPIRED_SAS_TOKEN to a transient category and everything else to "broken". The callback now logs Info ("SAS token expired; SDK will renew the connection.") for the transient case and leaves g_first_unauthenticated_time untouched so a real outage that follows still trips the existing "broken for N seconds" escalation branch. All other reasons retain the original error wording and state-tracking behavior; the retry-delay logic in PerformChannelManagement (which already special-cases EXPIRED_SAS_TOKEN to a 15s retry) is unchanged.
Add unit tests in iothub_communication_manager_ut.cpp: