Skip to content

monitor: fall back to cgroup v2 memory on invalid meminfo#238

Open
pexcn wants to merge 1 commit intonezhahq:mainfrom
pexcn:fix-mem-usage
Open

monitor: fall back to cgroup v2 memory on invalid meminfo#238
pexcn wants to merge 1 commit intonezhahq:mainfrom
pexcn:fix-mem-usage

Conversation

@pexcn
Copy link
Copy Markdown

@pexcn pexcn commented Apr 17, 2026

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses incorrect memory reporting (notably when /proc/meminfo / gopsutil returns invalid values such as Available > Total) by falling back to cgroup v2 memory metrics, aligning with the reported issue (#1184).

Changes:

  • Added a cgroup v2 memory reader (memory.current / memory.max) and used it to override MemTotal when available.
  • Adjusted MemUsed calculation to avoid underflow and to fall back to cgroup v2 usage when Available > Total.
  • Switched swap used calculation to checked subtraction to avoid underflow.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/monitor/monitor.go
Comment on lines +71 to +78
func readCgroupV2Memory() (used uint64, total uint64, ok bool) {
currentBytes, err := os.ReadFile("/sys/fs/cgroup/memory.current")
if err != nil {
return 0, 0, false
}

maxBytes, err := os.ReadFile("/sys/fs/cgroup/memory.max")
if err != nil {
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readCgroupV2Memory reads fixed paths under /sys/fs/cgroup. On cgroup v2 systems without a cgroup namespace, the process is usually in a sub-cgroup (per /proc/self/cgroup), so /sys/fs/cgroup/memory.{current,max} will report the root cgroup rather than the current process/container limits. Consider resolving the current cgroup v2 path from /proc/self/cgroup and reading memory.current/memory.max from that directory (fall back to /sys/fs/cgroup when parsing fails).

Copilot uses AI. Check for mistakes.
Comment thread pkg/monitor/monitor.go
Comment on lines +71 to +94
func readCgroupV2Memory() (used uint64, total uint64, ok bool) {
currentBytes, err := os.ReadFile("/sys/fs/cgroup/memory.current")
if err != nil {
return 0, 0, false
}

maxBytes, err := os.ReadFile("/sys/fs/cgroup/memory.max")
if err != nil {
return 0, 0, false
}

currentStr := strings.TrimSpace(string(currentBytes))
maxStr := strings.TrimSpace(string(maxBytes))
if currentStr == "" || maxStr == "" || maxStr == "max" {
return 0, 0, false
}

used, err = strconv.ParseUint(currentStr, 10, 64)
if err != nil {
return 0, 0, false
}

total, err = strconv.ParseUint(maxStr, 10, 64)
if err != nil || total == 0 {
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readCgroupV2Memory returns ok=false when memory.max is "max". That prevents using memory.current as a fallback for MemUsed when /proc/meminfo parsing is broken, even though memory.current is still meaningful when there is no hard limit. Suggest separating “can read used” from “can read total” (e.g., return usedOk/totalOk or split into two helpers) so GetState can still fall back to cgroup used independently of whether a max limit exists.

Suggested change
func readCgroupV2Memory() (used uint64, total uint64, ok bool) {
currentBytes, err := os.ReadFile("/sys/fs/cgroup/memory.current")
if err != nil {
return 0, 0, false
}
maxBytes, err := os.ReadFile("/sys/fs/cgroup/memory.max")
if err != nil {
return 0, 0, false
}
currentStr := strings.TrimSpace(string(currentBytes))
maxStr := strings.TrimSpace(string(maxBytes))
if currentStr == "" || maxStr == "" || maxStr == "max" {
return 0, 0, false
}
used, err = strconv.ParseUint(currentStr, 10, 64)
if err != nil {
return 0, 0, false
}
total, err = strconv.ParseUint(maxStr, 10, 64)
if err != nil || total == 0 {
func readCgroupV2MemoryUsed() (used uint64, ok bool) {
currentBytes, err := os.ReadFile("/sys/fs/cgroup/memory.current")
if err != nil {
return 0, false
}
currentStr := strings.TrimSpace(string(currentBytes))
if currentStr == "" {
return 0, false
}
used, err = strconv.ParseUint(currentStr, 10, 64)
if err != nil {
return 0, false
}
return used, true
}
func readCgroupV2MemoryTotal() (total uint64, ok bool) {
maxBytes, err := os.ReadFile("/sys/fs/cgroup/memory.max")
if err != nil {
return 0, false
}
maxStr := strings.TrimSpace(string(maxBytes))
if maxStr == "" || maxStr == "max" {
return 0, false
}
total, err = strconv.ParseUint(maxStr, 10, 64)
if err != nil || total == 0 {
return 0, false
}
return total, true
}
func readCgroupV2Memory() (used uint64, total uint64, ok bool) {
used, usedOK := readCgroupV2MemoryUsed()
total, totalOK := readCgroupV2MemoryTotal()
if !usedOK || !totalOK {

Copilot uses AI. Check for mistakes.
Comment thread pkg/monitor/monitor.go
Comment on lines +149 to +151
if _, cgTotal, ok := readCgroupV2Memory(); ok {
ret.MemTotal = cgTotal
}
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetHost only needs the cgroup memory limit to override MemTotal, but readCgroupV2Memory requires reading/parsing both memory.current and memory.max. This adds unnecessary I/O and can suppress a valid MemTotal override if memory.current is unreadable for any reason. Consider a dedicated helper to read/parse memory.max (and use readCgroupV2MemoryCurrent only where needed).

Copilot uses AI. Check for mistakes.
Comment thread pkg/monitor/monitor.go
Comment on lines 176 to +188
vm, err := mem.VirtualMemory()
if err != nil {
printf("mem.VirtualMemory error: %v", err)
} else {
ret.MemUsed = vm.Total - vm.Available
if vm.Available > vm.Total {
if cgUsed, _, ok := readCgroupV2Memory(); ok {
ret.MemUsed = cgUsed
} else {
ret.MemUsed = util.SubUintChecked(vm.Total, vm.Available)
}
} else {
ret.MemUsed = util.SubUintChecked(vm.Total, vm.Available)
}
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change introduces new cgroup v2 parsing and fallback behavior, but there are currently no unit tests covering the parsing edge cases (e.g., trailing newlines, memory.max == "max", invalid numbers, and selecting the correct cgroup directory). Given CI runs go test ./..., consider adding tests by factoring the parsing/path resolution into a pure function and/or injecting a readFile function so it can be tested without relying on /sys/fs/cgroup.

Copilot uses AI. Check for mistakes.
@naiba
Copy link
Copy Markdown
Member

naiba commented Apr 23, 2026

看看copilot的review

@pexcn
Copy link
Copy Markdown
Author

pexcn commented Apr 23, 2026

我晚点看看

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/monitor/monitor.go
Comment on lines +71 to +102
func readCgroupV2Memory() (used uint64, total uint64, ok bool) {
currentBytes, err := os.ReadFile("/sys/fs/cgroup/memory.current")
if err != nil {
return 0, 0, false
}

maxBytes, err := os.ReadFile("/sys/fs/cgroup/memory.max")
if err != nil {
return 0, 0, false
}

currentStr := strings.TrimSpace(string(currentBytes))
maxStr := strings.TrimSpace(string(maxBytes))
if currentStr == "" || maxStr == "" || maxStr == "max" {
return 0, 0, false
}

used, err = strconv.ParseUint(currentStr, 10, 64)
if err != nil {
return 0, 0, false
}

total, err = strconv.ParseUint(maxStr, 10, 64)
if err != nil || total == 0 {
return 0, 0, false
}

if used > total {
used = total
}

return used, total, true
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readCgroupV2Memory returns ok=false when memory.max is "max". That prevents using memory.current as a fallback for MemUsed even though it’s still valid/available, so the "invalid meminfo" scenario can still end up reporting 0 used memory. Consider separating reading/parsing of memory.current from the limit (or returning a separate okUsed/okTotal), so callers can still use cgroup used bytes when the limit is unlimited while keeping MemTotal unchanged.

Suggested change
func readCgroupV2Memory() (used uint64, total uint64, ok bool) {
currentBytes, err := os.ReadFile("/sys/fs/cgroup/memory.current")
if err != nil {
return 0, 0, false
}
maxBytes, err := os.ReadFile("/sys/fs/cgroup/memory.max")
if err != nil {
return 0, 0, false
}
currentStr := strings.TrimSpace(string(currentBytes))
maxStr := strings.TrimSpace(string(maxBytes))
if currentStr == "" || maxStr == "" || maxStr == "max" {
return 0, 0, false
}
used, err = strconv.ParseUint(currentStr, 10, 64)
if err != nil {
return 0, 0, false
}
total, err = strconv.ParseUint(maxStr, 10, 64)
if err != nil || total == 0 {
return 0, 0, false
}
if used > total {
used = total
}
return used, total, true
func readCgroupV2Memory() (used uint64, total uint64, okUsed bool, okTotal bool) {
currentBytes, err := os.ReadFile("/sys/fs/cgroup/memory.current")
if err != nil {
return 0, 0, false, false
}
currentStr := strings.TrimSpace(string(currentBytes))
if currentStr == "" {
return 0, 0, false, false
}
used, err = strconv.ParseUint(currentStr, 10, 64)
if err != nil {
return 0, 0, false, false
}
okUsed = true
maxBytes, err := os.ReadFile("/sys/fs/cgroup/memory.max")
if err != nil {
return used, 0, true, false
}
maxStr := strings.TrimSpace(string(maxBytes))
if maxStr == "" || maxStr == "max" {
return used, 0, true, false
}
total, err = strconv.ParseUint(maxStr, 10, 64)
if err != nil || total == 0 {
return used, 0, true, false
}
okTotal = true
if used > total {
used = total
}
return used, total, true, true

Copilot uses AI. Check for mistakes.
@uubulb
Copy link
Copy Markdown
Contributor

uubulb commented Apr 26, 2026

如果确定是 LXC 容器的问题,那么建议可以根据当前虚拟化类型选择要使用的内存读取逻辑。比如可以在 monitor/mem 下新建一个包,把目前内存数据读取的逻辑放在这里,然后根据传入条件处理一般情况和 LXC 的情况。
虚拟化类型在 GetHost() 中有获取

另外看一下 nezhahq/nezha#1184 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dashboard内存显示16EB的问题

4 participants