[Performance]: High CPU usage due to busy-wait in TransferEngineOperationState::wait_for_completion

### Describe your performance question

### Describe

TransferEngineOperationState::[wait_for_completion](https://github.com/kvcache-ai/Mooncake/blob/6d05c9344ba4522144ca4b22106cc3fc8661fdb1/mooncake-store/src/transfer_task.cpp#L291)() performs busy-waiting: it loops indefinitely and repeatedly calls [check_task_status()](https://github.com/kvcache-ai/Mooncake/blob/6d05c9344ba4522144ca4b22106cc3fc8661fdb1/mooncake-store/src/transfer_task.cpp#L224) without any wait/yield/backoff. Under high RDMA latency or bandwidth saturation, this can peg a CPU core and hurt overall throughput when CPU is constrained.

### Impact

- Sustained high CPU usage by the waiting thread(s) during large/long transfers.
- Lower system throughput under CPU contention;

### Real-world scenario: Offline inference, throughput-first

Network: Large data streaming easily saturates the RDMA NIC; under saturation or transient congestion, transfer completion latency increases, so the current tight polling keeps spinning for long periods.

CPU contention: Each waiting thread can peg a CPU core. This competes with CPU-heavy preprocessing stages (tokenization, chunking/sharding, mmap reads, decoding).

Resource efficiency: Spinning wastes CPU cycles that could be used for data preparation, further affecting subsequent GPU inference and leading to throughput degradation.


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues and read the [documentation](https://kvcache-ai.github.io/Mooncake/)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Performance]: High CPU usage due to busy-wait in TransferEngineOperationState::wait_for_completion #1033

Describe your performance question

Describe

Impact

Real-world scenario: Offline inference, throughput-first

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Performance]: High CPU usage due to busy-wait in TransferEngineOperationState::wait_for_completion #1033

Description

Describe your performance question

Describe

Impact

Real-world scenario: Offline inference, throughput-first

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions