fix: use replication/role metric for authoritative node role detection (GTI-608)#23
Merged
ymendez-redis merged 5 commits intomainfrom Apr 24, 2026
Merged
Conversation
The commands/calls metric's 'role' label sometimes reports both nodes as 'replica' for Standard Tier instances. This causes redis2re to calculate 0 bytes for those clusters and fall back to the 0.1 GB minimum. Query redis.googleapis.com/replication/role (1=primary, 0=replica) after initial data collection to overwrite NodeRole with the authoritative value. Ref: https://cloud.google.com/memorystore/docs/redis/supported-monitoring-metrics Fixes: GTI-608
ymendez-redis
commented
Apr 24, 2026
| ) | ||
|
|
||
| (options, _) = parser.parse_args() | ||
| options, _ = parser.parse_args() |
Collaborator
Author
There was a problem hiding this comment.
formatted because checks were not passing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
memorystore.pydetermines node roles (Master/Replica) by reading therolelabel fromredis.googleapis.com/commands/calls. After a failover, GCP does not immediately update this label, causing both nodes to report asreplica. This affects ~93 Standard Tier instances, resulting in ~408.5 GB of missed memory in downstream calculations.Root Cause
The
rolelabel oncommands/callsis metadata — not designed to authoritatively report node roles. A dedicated metric exists:redis.googleapis.com/replication/role(1=primary, 0=replica).Fix
replication_roletoREDIS_METRICS_attach_node_role()function that queriesreplication/roleand overwrites the unreliable role labelcollect_for_product()Testing — Reproduced on live GCP instance
Triggered a manual failover on
memorystore-redis-instance(Standard HA) inredislabs-sales-pivotal:scan_115638_NO_FIX.csvscan_115638_WITH_FIX.csvRaw GCP metrics after failover confirmed the discrepancy:
Ref: https://cloud.google.com/memorystore/docs/redis/supported-monitoring-metrics