Auto-stop warmup when server is already warmed up (#541)#541
Open
excelle08 wants to merge 2 commits intofacebookresearch:v2-betafrom
Open
Auto-stop warmup when server is already warmed up (#541)#541excelle08 wants to merge 2 commits intofacebookresearch:v2-betafrom
excelle08 wants to merge 2 commits intofacebookresearch:v2-betafrom
Conversation
|
@excelle08 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D97244665. |
excelle08
added a commit
to excelle08/DCPerf-1
that referenced
this pull request
Mar 25, 2026
Summary: Add auto-warmup detection that monitors server stats during warmup and signals clients to stop early when the server is warmed up. The server also terminates early once warmup is detected, instead of waiting for the full warmup_time. The server monitors hit_rate and QPS stability via background log tailing threads. When all server instances reach hit_rate >= 95% of target (default 0.855) and QPS stabilizes (CV < 5% over 2 minutes), a TCP control server responds READY to client polls, causing clients to terminate warmup early and proceed to the test phase. Changes: - New warmup_monitor.py: WarmupMonitor, WarmupControlServer, LogTailer classes - Add --auto-warmup and --target-hit-ratio server args - Add --control-port client arg - Server-side: start warmup monitors and control server in run_autoscale.py - Server-side: dynamic wait replaces fixed warmup_time sleep when auto-warmup enabled - Client-side: poll control port during warmup, stop early on READY - IPv6 dual-stack support for control port (AF_INET6 + IPV6_V6ONLY=0) - Opt-in for tao_bench_autoscale (auto_warmup=0) - Default on for tao_bench_autoscale_v2_beta (auto_warmup=1) - Update README with auto-warmup documentation Reviewed By: gandhijayneel Differential Revision: D97244665
d2d1e29 to
ed5f2ca
Compare
Summary: Pull Request resolved: facebookresearch#540 Expose memcached's native `-e` memory file option in TaoBench. When specified, memcached mmaps the given file for slab storage. On graceful shutdown (SIGUSR1), state is saved to the file. On subsequent runs with the same file, cache data is pre-loaded, drastically reducing warmup time. Changes: - Add `--memory-file` argument to server args - Append `-e <path>` to memcached command when memory file specified - Use SIGUSR1 for graceful shutdown (60s grace period) when memory file is in use - Per-instance memory files in autoscale mode (suffix `.0`, `.1`, etc.) - Auto-expand `/dev/shm` when tmpfs is smaller than memsize - Add `memory_file` variable to all TaoBench job configurations - Update README with memory file documentation Reviewed By: gandhijayneel Differential Revision: D97244738
excelle08
added a commit
to excelle08/DCPerf-1
that referenced
this pull request
Mar 25, 2026
Summary: Add auto-warmup detection that monitors server stats during warmup and signals clients to stop early when the server is warmed up. The server also terminates early once warmup is detected, instead of waiting for the full warmup_time. The server monitors hit_rate and QPS stability via background log tailing threads. When all server instances reach hit_rate >= 95% of target (default 0.855) and QPS stabilizes (CV < 5% over 2 minutes), a TCP control server responds READY to client polls, causing clients to terminate warmup early and proceed to the test phase. Changes: - New warmup_monitor.py: WarmupMonitor, WarmupControlServer, LogTailer classes - Add --auto-warmup and --target-hit-ratio server args - Add --control-port client arg - Server-side: start warmup monitors and control server in run_autoscale.py - Server-side: dynamic wait replaces fixed warmup_time sleep when auto-warmup enabled - Client-side: poll control port during warmup, stop early on READY - IPv6 dual-stack support for control port (AF_INET6 + IPV6_V6ONLY=0) - Opt-in for tao_bench_autoscale (auto_warmup=0) - Default on for tao_bench_autoscale_v2_beta (auto_warmup=1) - Update README with auto-warmup documentation Reviewed By: gandhijayneel Differential Revision: D97244665
ed5f2ca to
dfacb7d
Compare
excelle08
added a commit
to excelle08/DCPerf-1
that referenced
this pull request
Mar 25, 2026
Summary: Add auto-warmup detection that monitors server stats during warmup and signals clients to stop early when the server is warmed up. The server also terminates early once warmup is detected, instead of waiting for the full warmup_time. The server monitors hit_rate and QPS stability via background log tailing threads. When all server instances reach hit_rate >= 95% of target (default 0.855) and QPS stabilizes (CV < 5% over 2 minutes), a TCP control server responds READY to client polls, causing clients to terminate warmup early and proceed to the test phase. Changes: - New warmup_monitor.py: WarmupMonitor, WarmupControlServer, LogTailer classes - Add --auto-warmup and --target-hit-ratio server args - Add --control-port client arg - Server-side: start warmup monitors and control server in run_autoscale.py - Server-side: dynamic wait replaces fixed warmup_time sleep when auto-warmup enabled - Client-side: poll control port during warmup, stop early on READY - IPv6 dual-stack support for control port (AF_INET6 + IPV6_V6ONLY=0) - Opt-in for tao_bench_autoscale (auto_warmup=0) - Default on for tao_bench_autoscale_v2_beta (auto_warmup=1) - Update README with auto-warmup documentation Reviewed By: gandhijayneel Differential Revision: D97244665
dfacb7d to
6beb904
Compare
excelle08
added a commit
to excelle08/DCPerf-1
that referenced
this pull request
Mar 25, 2026
Summary: Pull Request resolved: facebookresearch#541 Add auto-warmup detection that monitors server stats during warmup and signals clients to stop early when the server is warmed up. The server also terminates early once warmup is detected, instead of waiting for the full warmup_time. The server monitors hit_rate and QPS stability via background log tailing threads. When all server instances reach hit_rate >= 95% of target (default 0.855) and QPS stabilizes (CV < 5% over 2 minutes), a TCP control server responds READY to client polls, causing clients to terminate warmup early and proceed to the test phase. Changes: - New warmup_monitor.py: WarmupMonitor, WarmupControlServer, LogTailer classes - Add --auto-warmup and --target-hit-ratio server args - Add --control-port client arg - Server-side: start warmup monitors and control server in run_autoscale.py - Server-side: dynamic wait replaces fixed warmup_time sleep when auto-warmup enabled - Client-side: poll control port during warmup, stop early on READY - IPv6 dual-stack support for control port (AF_INET6 + IPV6_V6ONLY=0) - Opt-in for tao_bench_autoscale (auto_warmup=0) - Default on for tao_bench_autoscale_v2_beta (auto_warmup=1) - Update README with auto-warmup documentation Reviewed By: gandhijayneel Differential Revision: D97244665
6beb904 to
9715176
Compare
Summary: Pull Request resolved: facebookresearch#541 Add auto-warmup detection that monitors server stats during warmup and signals clients to stop early when the server is warmed up. The server also terminates early once warmup is detected, instead of waiting for the full warmup_time. The server monitors hit_rate and QPS stability via background log tailing threads. When all server instances reach hit_rate >= 95% of target (default 0.855) and QPS stabilizes (CV < 5% over 2 minutes), a TCP control server responds READY to client polls, causing clients to terminate warmup early and proceed to the test phase. Changes: - New warmup_monitor.py: WarmupMonitor, WarmupControlServer, LogTailer classes - Add --auto-warmup and --target-hit-ratio server args - Add --control-port client arg - Server-side: start warmup monitors and control server in run_autoscale.py - Server-side: dynamic wait replaces fixed warmup_time sleep when auto-warmup enabled - Client-side: poll control port during warmup, stop early on READY - IPv6 dual-stack support for control port (AF_INET6 + IPV6_V6ONLY=0) - Opt-in for tao_bench_autoscale (auto_warmup=0) - Default on for tao_bench_autoscale_v2_beta (auto_warmup=1) - Update README with auto-warmup documentation Reviewed By: gandhijayneel Differential Revision: D97244665
9715176 to
a8cb3df
Compare
meta-codesync bot
pushed a commit
that referenced
this pull request
Mar 26, 2026
Summary: Pull Request resolved: #541 Add auto-warmup detection that monitors server stats during warmup and signals clients to stop early when the server is warmed up. The server also terminates early once warmup is detected, instead of waiting for the full warmup_time. The server monitors hit_rate and QPS stability via background log tailing threads. When all server instances reach hit_rate >= 95% of target (default 0.855) and QPS stabilizes (CV < 5% over 2 minutes), a TCP control server responds READY to client polls, causing clients to terminate warmup early and proceed to the test phase. Changes: - New warmup_monitor.py: WarmupMonitor, WarmupControlServer, LogTailer classes - Add --auto-warmup and --target-hit-ratio server args - Add --control-port client arg - Server-side: start warmup monitors and control server in run_autoscale.py - Server-side: dynamic wait replaces fixed warmup_time sleep when auto-warmup enabled - Client-side: poll control port during warmup, stop early on READY - IPv6 dual-stack support for control port (AF_INET6 + IPV6_V6ONLY=0) - Opt-in for tao_bench_autoscale (auto_warmup=0) - Default on for tao_bench_autoscale_v2_beta (auto_warmup=1) - Update README with auto-warmup documentation Reviewed By: gandhijayneel Differential Revision: D97244665 fbshipit-source-id: 9abaf658b236f7b87f96b9e2baf5a34b7deb99cf
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Add auto-warmup detection that monitors server stats during warmup and signals
clients to stop early when the server is warmed up. The server also terminates
early once warmup is detected, instead of waiting for the full warmup_time.
The server monitors hit_rate and QPS stability via background log tailing threads.
When all server instances reach hit_rate >= 95% of target (default 0.855) and QPS
stabilizes (CV < 5% over 2 minutes), a TCP control server responds READY to client
polls, causing clients to terminate warmup early and proceed to the test phase.
Changes:
Reviewed By: gandhijayneel
Differential Revision: D97244665