Skip to content

Auto-stop warmup when server is already warmed up (#541)#541

Open
excelle08 wants to merge 2 commits intofacebookresearch:v2-betafrom
excelle08:export-D97244665-to-v2-beta
Open

Auto-stop warmup when server is already warmed up (#541)#541
excelle08 wants to merge 2 commits intofacebookresearch:v2-betafrom
excelle08:export-D97244665-to-v2-beta

Conversation

@excelle08
Copy link
Copy Markdown
Contributor

@excelle08 excelle08 commented Mar 24, 2026

Summary:

Add auto-warmup detection that monitors server stats during warmup and signals
clients to stop early when the server is warmed up. The server also terminates
early once warmup is detected, instead of waiting for the full warmup_time.

The server monitors hit_rate and QPS stability via background log tailing threads.
When all server instances reach hit_rate >= 95% of target (default 0.855) and QPS
stabilizes (CV < 5% over 2 minutes), a TCP control server responds READY to client
polls, causing clients to terminate warmup early and proceed to the test phase.

Changes:

  • New warmup_monitor.py: WarmupMonitor, WarmupControlServer, LogTailer classes
  • Add --auto-warmup and --target-hit-ratio server args
  • Add --control-port client arg
  • Server-side: start warmup monitors and control server in run_autoscale.py
  • Server-side: dynamic wait replaces fixed warmup_time sleep when auto-warmup enabled
  • Client-side: poll control port during warmup, stop early on READY
  • IPv6 dual-stack support for control port (AF_INET6 + IPV6_V6ONLY=0)
  • Opt-in for tao_bench_autoscale (auto_warmup=0)
  • Default on for tao_bench_autoscale_v2_beta (auto_warmup=1)
  • Update README with auto-warmup documentation

Reviewed By: gandhijayneel

Differential Revision: D97244665

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 24, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Mar 24, 2026

@excelle08 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D97244665.

@meta-codesync meta-codesync bot changed the title Auto-stop warmup when server is already warmed up Auto-stop warmup when server is already warmed up (#541) Mar 25, 2026
excelle08 added a commit to excelle08/DCPerf-1 that referenced this pull request Mar 25, 2026
Summary:

Add auto-warmup detection that monitors server stats during warmup and signals
clients to stop early when the server is warmed up. The server also terminates
early once warmup is detected, instead of waiting for the full warmup_time.

The server monitors hit_rate and QPS stability via background log tailing threads.
When all server instances reach hit_rate >= 95% of target (default 0.855) and QPS
stabilizes (CV < 5% over 2 minutes), a TCP control server responds READY to client
polls, causing clients to terminate warmup early and proceed to the test phase.

Changes:
- New warmup_monitor.py: WarmupMonitor, WarmupControlServer, LogTailer classes
- Add --auto-warmup and --target-hit-ratio server args
- Add --control-port client arg
- Server-side: start warmup monitors and control server in run_autoscale.py
- Server-side: dynamic wait replaces fixed warmup_time sleep when auto-warmup enabled
- Client-side: poll control port during warmup, stop early on READY
- IPv6 dual-stack support for control port (AF_INET6 + IPV6_V6ONLY=0)
- Opt-in for tao_bench_autoscale (auto_warmup=0)
- Default on for tao_bench_autoscale_v2_beta (auto_warmup=1)
- Update README with auto-warmup documentation

Reviewed By: gandhijayneel

Differential Revision: D97244665
@excelle08 excelle08 force-pushed the export-D97244665-to-v2-beta branch from d2d1e29 to ed5f2ca Compare March 25, 2026 17:47
Summary:
Pull Request resolved: facebookresearch#540

Expose memcached's native `-e` memory file option in TaoBench. When specified,
memcached mmaps the given file for slab storage. On graceful shutdown (SIGUSR1),
state is saved to the file. On subsequent runs with the same file, cache data is
pre-loaded, drastically reducing warmup time.

Changes:
- Add `--memory-file` argument to server args
- Append `-e <path>` to memcached command when memory file specified
- Use SIGUSR1 for graceful shutdown (60s grace period) when memory file is in use
- Per-instance memory files in autoscale mode (suffix `.0`, `.1`, etc.)
- Auto-expand `/dev/shm` when tmpfs is smaller than memsize
- Add `memory_file` variable to all TaoBench job configurations
- Update README with memory file documentation

Reviewed By: gandhijayneel

Differential Revision: D97244738
excelle08 added a commit to excelle08/DCPerf-1 that referenced this pull request Mar 25, 2026
Summary:

Add auto-warmup detection that monitors server stats during warmup and signals
clients to stop early when the server is warmed up. The server also terminates
early once warmup is detected, instead of waiting for the full warmup_time.

The server monitors hit_rate and QPS stability via background log tailing threads.
When all server instances reach hit_rate >= 95% of target (default 0.855) and QPS
stabilizes (CV < 5% over 2 minutes), a TCP control server responds READY to client
polls, causing clients to terminate warmup early and proceed to the test phase.

Changes:
- New warmup_monitor.py: WarmupMonitor, WarmupControlServer, LogTailer classes
- Add --auto-warmup and --target-hit-ratio server args
- Add --control-port client arg
- Server-side: start warmup monitors and control server in run_autoscale.py
- Server-side: dynamic wait replaces fixed warmup_time sleep when auto-warmup enabled
- Client-side: poll control port during warmup, stop early on READY
- IPv6 dual-stack support for control port (AF_INET6 + IPV6_V6ONLY=0)
- Opt-in for tao_bench_autoscale (auto_warmup=0)
- Default on for tao_bench_autoscale_v2_beta (auto_warmup=1)
- Update README with auto-warmup documentation

Reviewed By: gandhijayneel

Differential Revision: D97244665
@excelle08 excelle08 force-pushed the export-D97244665-to-v2-beta branch from ed5f2ca to dfacb7d Compare March 25, 2026 20:42
excelle08 added a commit to excelle08/DCPerf-1 that referenced this pull request Mar 25, 2026
Summary:

Add auto-warmup detection that monitors server stats during warmup and signals
clients to stop early when the server is warmed up. The server also terminates
early once warmup is detected, instead of waiting for the full warmup_time.

The server monitors hit_rate and QPS stability via background log tailing threads.
When all server instances reach hit_rate >= 95% of target (default 0.855) and QPS
stabilizes (CV < 5% over 2 minutes), a TCP control server responds READY to client
polls, causing clients to terminate warmup early and proceed to the test phase.

Changes:
- New warmup_monitor.py: WarmupMonitor, WarmupControlServer, LogTailer classes
- Add --auto-warmup and --target-hit-ratio server args
- Add --control-port client arg
- Server-side: start warmup monitors and control server in run_autoscale.py
- Server-side: dynamic wait replaces fixed warmup_time sleep when auto-warmup enabled
- Client-side: poll control port during warmup, stop early on READY
- IPv6 dual-stack support for control port (AF_INET6 + IPV6_V6ONLY=0)
- Opt-in for tao_bench_autoscale (auto_warmup=0)
- Default on for tao_bench_autoscale_v2_beta (auto_warmup=1)
- Update README with auto-warmup documentation

Reviewed By: gandhijayneel

Differential Revision: D97244665
@excelle08 excelle08 force-pushed the export-D97244665-to-v2-beta branch from dfacb7d to 6beb904 Compare March 25, 2026 20:45
excelle08 added a commit to excelle08/DCPerf-1 that referenced this pull request Mar 25, 2026
Summary:
Pull Request resolved: facebookresearch#541

Add auto-warmup detection that monitors server stats during warmup and signals
clients to stop early when the server is warmed up. The server also terminates
early once warmup is detected, instead of waiting for the full warmup_time.

The server monitors hit_rate and QPS stability via background log tailing threads.
When all server instances reach hit_rate >= 95% of target (default 0.855) and QPS
stabilizes (CV < 5% over 2 minutes), a TCP control server responds READY to client
polls, causing clients to terminate warmup early and proceed to the test phase.

Changes:
- New warmup_monitor.py: WarmupMonitor, WarmupControlServer, LogTailer classes
- Add --auto-warmup and --target-hit-ratio server args
- Add --control-port client arg
- Server-side: start warmup monitors and control server in run_autoscale.py
- Server-side: dynamic wait replaces fixed warmup_time sleep when auto-warmup enabled
- Client-side: poll control port during warmup, stop early on READY
- IPv6 dual-stack support for control port (AF_INET6 + IPV6_V6ONLY=0)
- Opt-in for tao_bench_autoscale (auto_warmup=0)
- Default on for tao_bench_autoscale_v2_beta (auto_warmup=1)
- Update README with auto-warmup documentation

Reviewed By: gandhijayneel

Differential Revision: D97244665
@excelle08 excelle08 force-pushed the export-D97244665-to-v2-beta branch from 6beb904 to 9715176 Compare March 25, 2026 20:46
Summary:
Pull Request resolved: facebookresearch#541

Add auto-warmup detection that monitors server stats during warmup and signals
clients to stop early when the server is warmed up. The server also terminates
early once warmup is detected, instead of waiting for the full warmup_time.

The server monitors hit_rate and QPS stability via background log tailing threads.
When all server instances reach hit_rate >= 95% of target (default 0.855) and QPS
stabilizes (CV < 5% over 2 minutes), a TCP control server responds READY to client
polls, causing clients to terminate warmup early and proceed to the test phase.

Changes:
- New warmup_monitor.py: WarmupMonitor, WarmupControlServer, LogTailer classes
- Add --auto-warmup and --target-hit-ratio server args
- Add --control-port client arg
- Server-side: start warmup monitors and control server in run_autoscale.py
- Server-side: dynamic wait replaces fixed warmup_time sleep when auto-warmup enabled
- Client-side: poll control port during warmup, stop early on READY
- IPv6 dual-stack support for control port (AF_INET6 + IPV6_V6ONLY=0)
- Opt-in for tao_bench_autoscale (auto_warmup=0)
- Default on for tao_bench_autoscale_v2_beta (auto_warmup=1)
- Update README with auto-warmup documentation

Reviewed By: gandhijayneel

Differential Revision: D97244665
@excelle08 excelle08 force-pushed the export-D97244665-to-v2-beta branch from 9715176 to a8cb3df Compare March 25, 2026 20:54
meta-codesync bot pushed a commit that referenced this pull request Mar 26, 2026
Summary:
Pull Request resolved: #541

Add auto-warmup detection that monitors server stats during warmup and signals
clients to stop early when the server is warmed up. The server also terminates
early once warmup is detected, instead of waiting for the full warmup_time.

The server monitors hit_rate and QPS stability via background log tailing threads.
When all server instances reach hit_rate >= 95% of target (default 0.855) and QPS
stabilizes (CV < 5% over 2 minutes), a TCP control server responds READY to client
polls, causing clients to terminate warmup early and proceed to the test phase.

Changes:
- New warmup_monitor.py: WarmupMonitor, WarmupControlServer, LogTailer classes
- Add --auto-warmup and --target-hit-ratio server args
- Add --control-port client arg
- Server-side: start warmup monitors and control server in run_autoscale.py
- Server-side: dynamic wait replaces fixed warmup_time sleep when auto-warmup enabled
- Client-side: poll control port during warmup, stop early on READY
- IPv6 dual-stack support for control port (AF_INET6 + IPV6_V6ONLY=0)
- Opt-in for tao_bench_autoscale (auto_warmup=0)
- Default on for tao_bench_autoscale_v2_beta (auto_warmup=1)
- Update README with auto-warmup documentation

Reviewed By: gandhijayneel

Differential Revision: D97244665

fbshipit-source-id: 9abaf658b236f7b87f96b9e2baf5a34b7deb99cf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant