Auto-stop warmup when server is already warmed up (#541) by excelle08 · Pull Request #541 · facebookresearch/DCPerf

excelle08 · 2026-03-24T16:42:08Z

Summary:

Add auto-warmup detection that monitors server stats during warmup and signals
clients to stop early when the server is warmed up. The server also terminates
early once warmup is detected, instead of waiting for the full warmup_time.

The server monitors hit_rate and QPS stability via background log tailing threads.
When all server instances reach hit_rate >= 95% of target (default 0.855) and QPS
stabilizes (CV < 5% over 2 minutes), a TCP control server responds READY to client
polls, causing clients to terminate warmup early and proceed to the test phase.

Changes:

New warmup_monitor.py: WarmupMonitor, WarmupControlServer, LogTailer classes
Add --auto-warmup and --target-hit-ratio server args
Add --control-port client arg
Server-side: start warmup monitors and control server in run_autoscale.py
Server-side: dynamic wait replaces fixed warmup_time sleep when auto-warmup enabled
Client-side: poll control port during warmup, stop early on READY
IPv6 dual-stack support for control port (AF_INET6 + IPV6_V6ONLY=0)
Opt-in for tao_bench_autoscale (auto_warmup=0)
Default on for tao_bench_autoscale_v2_beta (auto_warmup=1)
Update README with auto-warmup documentation

Reviewed By: gandhijayneel

Differential Revision: D97244665

meta-codesync · 2026-03-24T16:42:18Z

@excelle08 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D97244665.

Summary: Add auto-warmup detection that monitors server stats during warmup and signals clients to stop early when the server is warmed up. The server also terminates early once warmup is detected, instead of waiting for the full warmup_time. The server monitors hit_rate and QPS stability via background log tailing threads. When all server instances reach hit_rate >= 95% of target (default 0.855) and QPS stabilizes (CV < 5% over 2 minutes), a TCP control server responds READY to client polls, causing clients to terminate warmup early and proceed to the test phase. Changes: - New warmup_monitor.py: WarmupMonitor, WarmupControlServer, LogTailer classes - Add --auto-warmup and --target-hit-ratio server args - Add --control-port client arg - Server-side: start warmup monitors and control server in run_autoscale.py - Server-side: dynamic wait replaces fixed warmup_time sleep when auto-warmup enabled - Client-side: poll control port during warmup, stop early on READY - IPv6 dual-stack support for control port (AF_INET6 + IPV6_V6ONLY=0) - Opt-in for tao_bench_autoscale (auto_warmup=0) - Default on for tao_bench_autoscale_v2_beta (auto_warmup=1) - Update README with auto-warmup documentation Reviewed By: gandhijayneel Differential Revision: D97244665

Summary: Pull Request resolved: facebookresearch#540 Expose memcached's native `-e` memory file option in TaoBench. When specified, memcached mmaps the given file for slab storage. On graceful shutdown (SIGUSR1), state is saved to the file. On subsequent runs with the same file, cache data is pre-loaded, drastically reducing warmup time. Changes: - Add `--memory-file` argument to server args - Append `-e <path>` to memcached command when memory file specified - Use SIGUSR1 for graceful shutdown (60s grace period) when memory file is in use - Per-instance memory files in autoscale mode (suffix `.0`, `.1`, etc.) - Auto-expand `/dev/shm` when tmpfs is smaller than memsize - Add `memory_file` variable to all TaoBench job configurations - Update README with memory file documentation Reviewed By: gandhijayneel Differential Revision: D97244738

Summary: Add auto-warmup detection that monitors server stats during warmup and signals clients to stop early when the server is warmed up. The server also terminates early once warmup is detected, instead of waiting for the full warmup_time. The server monitors hit_rate and QPS stability via background log tailing threads. When all server instances reach hit_rate >= 95% of target (default 0.855) and QPS stabilizes (CV < 5% over 2 minutes), a TCP control server responds READY to client polls, causing clients to terminate warmup early and proceed to the test phase. Changes: - New warmup_monitor.py: WarmupMonitor, WarmupControlServer, LogTailer classes - Add --auto-warmup and --target-hit-ratio server args - Add --control-port client arg - Server-side: start warmup monitors and control server in run_autoscale.py - Server-side: dynamic wait replaces fixed warmup_time sleep when auto-warmup enabled - Client-side: poll control port during warmup, stop early on READY - IPv6 dual-stack support for control port (AF_INET6 + IPV6_V6ONLY=0) - Opt-in for tao_bench_autoscale (auto_warmup=0) - Default on for tao_bench_autoscale_v2_beta (auto_warmup=1) - Update README with auto-warmup documentation Reviewed By: gandhijayneel Differential Revision: D97244665

Summary: Pull Request resolved: facebookresearch#541 Add auto-warmup detection that monitors server stats during warmup and signals clients to stop early when the server is warmed up. The server also terminates early once warmup is detected, instead of waiting for the full warmup_time. The server monitors hit_rate and QPS stability via background log tailing threads. When all server instances reach hit_rate >= 95% of target (default 0.855) and QPS stabilizes (CV < 5% over 2 minutes), a TCP control server responds READY to client polls, causing clients to terminate warmup early and proceed to the test phase. Changes: - New warmup_monitor.py: WarmupMonitor, WarmupControlServer, LogTailer classes - Add --auto-warmup and --target-hit-ratio server args - Add --control-port client arg - Server-side: start warmup monitors and control server in run_autoscale.py - Server-side: dynamic wait replaces fixed warmup_time sleep when auto-warmup enabled - Client-side: poll control port during warmup, stop early on READY - IPv6 dual-stack support for control port (AF_INET6 + IPV6_V6ONLY=0) - Opt-in for tao_bench_autoscale (auto_warmup=0) - Default on for tao_bench_autoscale_v2_beta (auto_warmup=1) - Update README with auto-warmup documentation Reviewed By: gandhijayneel Differential Revision: D97244665

Summary: Pull Request resolved: #541 Add auto-warmup detection that monitors server stats during warmup and signals clients to stop early when the server is warmed up. The server also terminates early once warmup is detected, instead of waiting for the full warmup_time. The server monitors hit_rate and QPS stability via background log tailing threads. When all server instances reach hit_rate >= 95% of target (default 0.855) and QPS stabilizes (CV < 5% over 2 minutes), a TCP control server responds READY to client polls, causing clients to terminate warmup early and proceed to the test phase. Changes: - New warmup_monitor.py: WarmupMonitor, WarmupControlServer, LogTailer classes - Add --auto-warmup and --target-hit-ratio server args - Add --control-port client arg - Server-side: start warmup monitors and control server in run_autoscale.py - Server-side: dynamic wait replaces fixed warmup_time sleep when auto-warmup enabled - Client-side: poll control port during warmup, stop early on READY - IPv6 dual-stack support for control port (AF_INET6 + IPV6_V6ONLY=0) - Opt-in for tao_bench_autoscale (auto_warmup=0) - Default on for tao_bench_autoscale_v2_beta (auto_warmup=1) - Update README with auto-warmup documentation Reviewed By: gandhijayneel Differential Revision: D97244665 fbshipit-source-id: 9abaf658b236f7b87f96b9e2baf5a34b7deb99cf

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 24, 2026

meta-codesync bot added fb-exported meta-exported labels Mar 24, 2026

meta-codesync bot changed the title ~~Auto-stop warmup when server is already warmed up~~ Auto-stop warmup when server is already warmed up (#541) Mar 25, 2026

excelle08 force-pushed the export-D97244665-to-v2-beta branch from d2d1e29 to ed5f2ca Compare March 25, 2026 17:47

excelle08 force-pushed the export-D97244665-to-v2-beta branch from ed5f2ca to dfacb7d Compare March 25, 2026 20:42

excelle08 force-pushed the export-D97244665-to-v2-beta branch from dfacb7d to 6beb904 Compare March 25, 2026 20:45

excelle08 force-pushed the export-D97244665-to-v2-beta branch from 6beb904 to 9715176 Compare March 25, 2026 20:46

excelle08 force-pushed the export-D97244665-to-v2-beta branch from 9715176 to a8cb3df Compare March 25, 2026 20:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-stop warmup when server is already warmed up (#541)#541

Auto-stop warmup when server is already warmed up (#541)#541
excelle08 wants to merge 2 commits intofacebookresearch:v2-betafrom
excelle08:export-D97244665-to-v2-beta

excelle08 commented Mar 24, 2026 •

edited by meta-codesync bot

Loading

Uh oh!

meta-codesync bot commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

excelle08 commented Mar 24, 2026 • edited by meta-codesync bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-codesync bot commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

excelle08 commented Mar 24, 2026 •

edited by meta-codesync bot

Loading