perf: override GOMAXPROCS to use all available CPUs#215
perf: override GOMAXPROCS to use all available CPUs#215joshfriend wants to merge 1 commit intomainfrom
Conversation
Container environments often set GOMAXPROCS to a low value (e.g. 2) based on cgroup CPU limits. With only 2 OS threads available, the Go scheduler can't run the zstd decoder, tar reader, and file-write goroutines concurrently — they time-slice, and the tar reader (the single-threaded funnel feeding the pipeline) gets starved. Overriding to runtime.NumCPU() lets all goroutines actually run in parallel. Benchmarked on r8id.metal-48xlarge, this alone accounted for 3.7s of a 5.7s total improvement on a 334K-file 2.4 GB cache restore.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c170c5fc30
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Go scheduler can't run the zstd decoder, tar reader, and file-write | ||
| // goroutines concurrently — they time-slice, and the pipeline stalls. | ||
| // Using all available CPUs lets all goroutines actually run in parallel. | ||
| runtime.GOMAXPROCS(runtime.NumCPU()) |
There was a problem hiding this comment.
Avoid bypassing quota-aware GOMAXPROCS defaults
Calling runtime.GOMAXPROCS(runtime.NumCPU()) here overrides both Go 1.25’s container-aware default and any operator-provided GOMAXPROCS value, which can over-subscribe CPU in quota-limited containers (for example, a pod limited to 2 CPUs on a 64-core node) and cause cgroup throttling/latency regressions under load; the same pattern is also introduced in cmd/cachew/main.go. Please keep the runtime default (or gate this behind an explicit opt-in) so deployments that rely on CPU limits are not penalized.
Useful? React with 👍 / 👎.
Container environments often set GOMAXPROCS to a low value (e.g. 2) based on cgroup CPU limits. With only 2 OS threads available, the Go scheduler can't run the zstd decoder, tar reader, and file-write goroutines concurrently — they time-slice, and the tar reader (the single-threaded funnel feeding the pipeline) gets starved.
Overriding to
runtime.NumCPU()lets all goroutines actually run in parallel. Benchmarked on r8id.metal-48xlarge, this alone accounted for 3.7s of a 5.7s total improvement on a 334K-file 2.4 GB cache restore.