-
Notifications
You must be signed in to change notification settings - Fork 38
Description
Summary
When multiple pipeline load requests arrive concurrently for the same pipeline, a race condition causes transient failures. The second request detects that another thread is already loading the pipeline but returns a failure instead of waiting for the first load to complete.
Error Logs
From Grafana fal.ai logs (2026-02-21 04:33 UTC):
04:33:27.277 - Loading 1 pipeline(s): ['streamdiffusionv2']
04:33:27.645 - Loading 1 pipeline(s): ['streamdiffusionv2'] # Second concurrent request
04:33:28.134 - Loading pipeline: streamdiffusionv2
04:33:28.135 - Pipeline streamdiffusionv2 already loading by another thread
04:33:28.138 - ERROR - Failed to load pipeline: streamdiffusionv2
04:33:28.139 - ERROR - Some pipelines failed to load
The pipeline eventually loaded successfully ~27 seconds later, but the intermediate failure triggers error logs and potentially user-facing errors.
Expected Behavior
When a pipeline load request detects that another thread is already loading the same pipeline, it should:
- Wait for the first load to complete
- Return success if the pipeline is now loaded (reusing the result from the first load)
- Only fail if the first load also failed
Current Behavior
The second concurrent request immediately returns a failure when it detects another thread is loading.
Impact
- Unnecessary ERROR level logs in monitoring
- Potential user-facing errors during concurrent operations
- Self-recovers but creates confusion in logs
Component
scope/server/pipeline_manager.py
Filed automatically by Scope Error Monitor