Skip to content

Conversation

@dog-64
Copy link
Contributor

@dog-64 dog-64 commented Nov 23, 2025

Problem

The current implementation has a critical issue with gauge-type metrics like is_banned and is_paused:

Gauge metrics retain their last reported value indefinitely until explicitly updated. This means:

  • If a server was banned (is_banned=1) and then information about it becomes unavailable
  • The metric will continue showing is_banned=1 forever until explicitly set to 0
  • This creates a false positive in monitoring: the metrics show the server is banned even after it's been unbanned

This is especially problematic because state metrics should always reflect the current actual state, not a stale cached value.

Solution

This PR refactors push_server_stats() to separate state and activity metrics:

  • state_metrics (is_banned, is_paused) are now exported for every server on every metrics collection cycle

    • Ensures metrics always reflect current state
    • Guarantees stale values are immediately updated
  • activity_metrics (bytes_received, bytes_sent, etc.) remain conditional

    • Only exported when server_info is available
    • Reduces noise for inactive servers

Impact

  • Prometheus metrics now correctly reflect real-time server state
  • No false positives from stale gauge values
  • Monitoring/alerting based on is_banned and is_paused becomes reliable
  • No breaking changes to metric format or API

Testing

  • cargo check passes without errors
  • Metrics endpoint functionality unchanged

Split server metrics collection into two separate loops:
- state_metrics (is_banned, is_paused) are now exported for all servers
- activity_metrics (bytes_received, bytes_sent, etc.) are only exported when server_info is available

This ensures that ban and pause states are always visible in metrics,
even for servers without activity information. This improves observability
for connection pool management and server health monitoring.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant