monitor: fall back to cgroup v2 memory on invalid meminfo#238
monitor: fall back to cgroup v2 memory on invalid meminfo#238pexcn wants to merge 1 commit intonezhahq:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses incorrect memory reporting (notably when /proc/meminfo / gopsutil returns invalid values such as Available > Total) by falling back to cgroup v2 memory metrics, aligning with the reported issue (#1184).
Changes:
- Added a cgroup v2 memory reader (
memory.current/memory.max) and used it to overrideMemTotalwhen available. - Adjusted
MemUsedcalculation to avoid underflow and to fall back to cgroup v2 usage whenAvailable > Total. - Switched swap used calculation to checked subtraction to avoid underflow.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func readCgroupV2Memory() (used uint64, total uint64, ok bool) { | ||
| currentBytes, err := os.ReadFile("/sys/fs/cgroup/memory.current") | ||
| if err != nil { | ||
| return 0, 0, false | ||
| } | ||
|
|
||
| maxBytes, err := os.ReadFile("/sys/fs/cgroup/memory.max") | ||
| if err != nil { |
There was a problem hiding this comment.
readCgroupV2Memory reads fixed paths under /sys/fs/cgroup. On cgroup v2 systems without a cgroup namespace, the process is usually in a sub-cgroup (per /proc/self/cgroup), so /sys/fs/cgroup/memory.{current,max} will report the root cgroup rather than the current process/container limits. Consider resolving the current cgroup v2 path from /proc/self/cgroup and reading memory.current/memory.max from that directory (fall back to /sys/fs/cgroup when parsing fails).
| func readCgroupV2Memory() (used uint64, total uint64, ok bool) { | ||
| currentBytes, err := os.ReadFile("/sys/fs/cgroup/memory.current") | ||
| if err != nil { | ||
| return 0, 0, false | ||
| } | ||
|
|
||
| maxBytes, err := os.ReadFile("/sys/fs/cgroup/memory.max") | ||
| if err != nil { | ||
| return 0, 0, false | ||
| } | ||
|
|
||
| currentStr := strings.TrimSpace(string(currentBytes)) | ||
| maxStr := strings.TrimSpace(string(maxBytes)) | ||
| if currentStr == "" || maxStr == "" || maxStr == "max" { | ||
| return 0, 0, false | ||
| } | ||
|
|
||
| used, err = strconv.ParseUint(currentStr, 10, 64) | ||
| if err != nil { | ||
| return 0, 0, false | ||
| } | ||
|
|
||
| total, err = strconv.ParseUint(maxStr, 10, 64) | ||
| if err != nil || total == 0 { |
There was a problem hiding this comment.
readCgroupV2Memory returns ok=false when memory.max is "max". That prevents using memory.current as a fallback for MemUsed when /proc/meminfo parsing is broken, even though memory.current is still meaningful when there is no hard limit. Suggest separating “can read used” from “can read total” (e.g., return usedOk/totalOk or split into two helpers) so GetState can still fall back to cgroup used independently of whether a max limit exists.
| func readCgroupV2Memory() (used uint64, total uint64, ok bool) { | |
| currentBytes, err := os.ReadFile("/sys/fs/cgroup/memory.current") | |
| if err != nil { | |
| return 0, 0, false | |
| } | |
| maxBytes, err := os.ReadFile("/sys/fs/cgroup/memory.max") | |
| if err != nil { | |
| return 0, 0, false | |
| } | |
| currentStr := strings.TrimSpace(string(currentBytes)) | |
| maxStr := strings.TrimSpace(string(maxBytes)) | |
| if currentStr == "" || maxStr == "" || maxStr == "max" { | |
| return 0, 0, false | |
| } | |
| used, err = strconv.ParseUint(currentStr, 10, 64) | |
| if err != nil { | |
| return 0, 0, false | |
| } | |
| total, err = strconv.ParseUint(maxStr, 10, 64) | |
| if err != nil || total == 0 { | |
| func readCgroupV2MemoryUsed() (used uint64, ok bool) { | |
| currentBytes, err := os.ReadFile("/sys/fs/cgroup/memory.current") | |
| if err != nil { | |
| return 0, false | |
| } | |
| currentStr := strings.TrimSpace(string(currentBytes)) | |
| if currentStr == "" { | |
| return 0, false | |
| } | |
| used, err = strconv.ParseUint(currentStr, 10, 64) | |
| if err != nil { | |
| return 0, false | |
| } | |
| return used, true | |
| } | |
| func readCgroupV2MemoryTotal() (total uint64, ok bool) { | |
| maxBytes, err := os.ReadFile("/sys/fs/cgroup/memory.max") | |
| if err != nil { | |
| return 0, false | |
| } | |
| maxStr := strings.TrimSpace(string(maxBytes)) | |
| if maxStr == "" || maxStr == "max" { | |
| return 0, false | |
| } | |
| total, err = strconv.ParseUint(maxStr, 10, 64) | |
| if err != nil || total == 0 { | |
| return 0, false | |
| } | |
| return total, true | |
| } | |
| func readCgroupV2Memory() (used uint64, total uint64, ok bool) { | |
| used, usedOK := readCgroupV2MemoryUsed() | |
| total, totalOK := readCgroupV2MemoryTotal() | |
| if !usedOK || !totalOK { |
| if _, cgTotal, ok := readCgroupV2Memory(); ok { | ||
| ret.MemTotal = cgTotal | ||
| } |
There was a problem hiding this comment.
GetHost only needs the cgroup memory limit to override MemTotal, but readCgroupV2Memory requires reading/parsing both memory.current and memory.max. This adds unnecessary I/O and can suppress a valid MemTotal override if memory.current is unreadable for any reason. Consider a dedicated helper to read/parse memory.max (and use readCgroupV2MemoryCurrent only where needed).
| vm, err := mem.VirtualMemory() | ||
| if err != nil { | ||
| printf("mem.VirtualMemory error: %v", err) | ||
| } else { | ||
| ret.MemUsed = vm.Total - vm.Available | ||
| if vm.Available > vm.Total { | ||
| if cgUsed, _, ok := readCgroupV2Memory(); ok { | ||
| ret.MemUsed = cgUsed | ||
| } else { | ||
| ret.MemUsed = util.SubUintChecked(vm.Total, vm.Available) | ||
| } | ||
| } else { | ||
| ret.MemUsed = util.SubUintChecked(vm.Total, vm.Available) | ||
| } |
There was a problem hiding this comment.
This change introduces new cgroup v2 parsing and fallback behavior, but there are currently no unit tests covering the parsing edge cases (e.g., trailing newlines, memory.max == "max", invalid numbers, and selecting the correct cgroup directory). Given CI runs go test ./..., consider adding tests by factoring the parsing/path resolution into a pure function and/or injecting a readFile function so it can be tested without relying on /sys/fs/cgroup.
|
看看copilot的review |
|
我晚点看看 |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func readCgroupV2Memory() (used uint64, total uint64, ok bool) { | ||
| currentBytes, err := os.ReadFile("/sys/fs/cgroup/memory.current") | ||
| if err != nil { | ||
| return 0, 0, false | ||
| } | ||
|
|
||
| maxBytes, err := os.ReadFile("/sys/fs/cgroup/memory.max") | ||
| if err != nil { | ||
| return 0, 0, false | ||
| } | ||
|
|
||
| currentStr := strings.TrimSpace(string(currentBytes)) | ||
| maxStr := strings.TrimSpace(string(maxBytes)) | ||
| if currentStr == "" || maxStr == "" || maxStr == "max" { | ||
| return 0, 0, false | ||
| } | ||
|
|
||
| used, err = strconv.ParseUint(currentStr, 10, 64) | ||
| if err != nil { | ||
| return 0, 0, false | ||
| } | ||
|
|
||
| total, err = strconv.ParseUint(maxStr, 10, 64) | ||
| if err != nil || total == 0 { | ||
| return 0, 0, false | ||
| } | ||
|
|
||
| if used > total { | ||
| used = total | ||
| } | ||
|
|
||
| return used, total, true |
There was a problem hiding this comment.
readCgroupV2Memory returns ok=false when memory.max is "max". That prevents using memory.current as a fallback for MemUsed even though it’s still valid/available, so the "invalid meminfo" scenario can still end up reporting 0 used memory. Consider separating reading/parsing of memory.current from the limit (or returning a separate okUsed/okTotal), so callers can still use cgroup used bytes when the limit is unlimited while keeping MemTotal unchanged.
| func readCgroupV2Memory() (used uint64, total uint64, ok bool) { | |
| currentBytes, err := os.ReadFile("/sys/fs/cgroup/memory.current") | |
| if err != nil { | |
| return 0, 0, false | |
| } | |
| maxBytes, err := os.ReadFile("/sys/fs/cgroup/memory.max") | |
| if err != nil { | |
| return 0, 0, false | |
| } | |
| currentStr := strings.TrimSpace(string(currentBytes)) | |
| maxStr := strings.TrimSpace(string(maxBytes)) | |
| if currentStr == "" || maxStr == "" || maxStr == "max" { | |
| return 0, 0, false | |
| } | |
| used, err = strconv.ParseUint(currentStr, 10, 64) | |
| if err != nil { | |
| return 0, 0, false | |
| } | |
| total, err = strconv.ParseUint(maxStr, 10, 64) | |
| if err != nil || total == 0 { | |
| return 0, 0, false | |
| } | |
| if used > total { | |
| used = total | |
| } | |
| return used, total, true | |
| func readCgroupV2Memory() (used uint64, total uint64, okUsed bool, okTotal bool) { | |
| currentBytes, err := os.ReadFile("/sys/fs/cgroup/memory.current") | |
| if err != nil { | |
| return 0, 0, false, false | |
| } | |
| currentStr := strings.TrimSpace(string(currentBytes)) | |
| if currentStr == "" { | |
| return 0, 0, false, false | |
| } | |
| used, err = strconv.ParseUint(currentStr, 10, 64) | |
| if err != nil { | |
| return 0, 0, false, false | |
| } | |
| okUsed = true | |
| maxBytes, err := os.ReadFile("/sys/fs/cgroup/memory.max") | |
| if err != nil { | |
| return used, 0, true, false | |
| } | |
| maxStr := strings.TrimSpace(string(maxBytes)) | |
| if maxStr == "" || maxStr == "max" { | |
| return used, 0, true, false | |
| } | |
| total, err = strconv.ParseUint(maxStr, 10, 64) | |
| if err != nil || total == 0 { | |
| return used, 0, true, false | |
| } | |
| okTotal = true | |
| if used > total { | |
| used = total | |
| } | |
| return used, total, true, true |
|
如果确定是 LXC 容器的问题,那么建议可以根据当前虚拟化类型选择要使用的内存读取逻辑。比如可以在 |
Fix nezhahq/nezha#1184