Fix: resolve performance profiling deadlock with dynamic task count u…#94
Open
ChaoZheng109 wants to merge 1 commit intoChaoWao:mainfrom
Open
Fix: resolve performance profiling deadlock with dynamic task count u…#94ChaoZheng109 wants to merge 1 commit intoChaoWao:mainfrom
ChaoZheng109 wants to merge 1 commit intoChaoWao:mainfrom
Conversation
…pdates Fixes spinlock deadlock in AICPU performance profiling where Device threads hang waiting for Host to read buffers. The root cause was Host exiting collection early due to seeing total_tasks=0 during parallel orchestration. Changes: - Use 0xFFFFFFFF as uninitialized marker for total_tasks instead of 0 - AICPU dynamically updates total_tasks from orchestrator's current_task_index - Host polls and refreshes expected_tasks during collection - Optimize to avoid duplicate atomic reads by reusing visible_tasks - Reduce log frequency to only first update and orchestration completion - Increase profiling timeout from 2s to 30s for large graphs - Add timeout detection in switch_perf_buffer spinlock This enables real-time visibility into orchestrator progress and prevents deadlock when orchestration and scheduling run in parallel.
c56b48c to
1f13cf5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…pdates
Fixes spinlock deadlock in AICPU performance profiling where Device threads hang waiting for Host to read buffers. The root cause was Host exiting collection early due to seeing total_tasks=0 during parallel orchestration.
Changes:
This enables real-time visibility into orchestrator progress and prevents deadlock when orchestration and scheduling run in parallel.