Skip to content

Swap usage alarm enhancement #7161

@DavidePrincipi

Description

@DavidePrincipi

The current node-monitor alarm logic is based on the SwapFree/SwapTotal ratio. However after an high memory usage peak the ratio remains high and the alarm is not reset.

Improve the monitoring tool's ability to detect high Linux swap usage by adding a secondary factor to the existing SwapFree/SwapTotal ratio check. The enhancement should provide better insight into memory pressure during and after swap peaks.

Requirements:

  1. Current Metric:

    • Continue using the /proc/meminfo SwapFree/SwapTotal ratio to monitor overall swap utilization.
  2. New Metric:

    • Introduce monitoring of /proc/vmstat for:
      • pswpin: Number of pages swapped into memory.
      • pswpout: Number of pages swapped out of memory.
    • Implement tracking of the rate of change in these counters over a configurable interval (e.g., every 10 seconds).
  3. Thresholds:

    • Add configurable thresholds for both pswpin and pswpout rates to trigger alerts when they exceed a certain limit (indicating sustained memory pressure).
  4. Alerting Logic:

    • Trigger an alert if:
      • The SwapFree/SwapTotal ratio raises above the defined threshold and
      • A high rate of pswpin or pswpout events is detected over a defined period (indicating active swap use despite acceptable swap levels).

Discussion @nrauso https://mattermost.nethesis.it/nethesis/pl/5kgxh85pep8atgjd1ppce64ikr

Metadata

Metadata

Labels

verifiedAll test cases were verified successfully

Projects

Status

Done

Relationships

None yet

Development

No branches or pull requests

Issue actions