Fix #858: manage linux platform layer async worker thread lifecycle#862
Open
Fix #858: manage linux platform layer async worker thread lifecycle#862
Conversation
The Download/Backup/Install/Apply/Restore callbacks in LinuxPlatformLayer spawned a std::thread and called detach() on it. The worker captured raw pointers to workCompletionData and workflowData, which are only guaranteed valid until WorkCompletionCallback fires. Agent shutdown (or workflow teardown) while the worker was still running could therefore invalidate those pointers and cause use-after-free / core dumps (related to #852). Changes: - Add an instance-owned std::thread (_activeWorker) plus a std::mutex (_activeWorkerMutex) to the LinuxPlatformLayer. - Introduce a TrackWorker() helper that joins any previously installed worker under the mutex and takes ownership of the new one. Replace each of the five worker.detach() sites with a single call to TrackWorker(std::move(worker)), keeping the workflow-engine guarantee that at most one async operation is in flight while also making the ownership model explicit. - Extend ~LinuxPlatformLayer to signal cancellation and join the active worker before ExtensionManager::Uninit(). This guarantees that by the time the platform layer is destroyed, no worker can still be touching workCompletionData / workflowData. Tests (src/platform_layers/linux_platform_layer/tests/linux_adu_core_impl_ut.cpp): - Happy-path async completion for Apply and Download (worker runs on a separate thread and invokes WorkCompletionCallback). - Structural coverage for Backup, Install, Restore through the same TrackWorker path. - Sequential invocations: five back-to-back ApplyCallback calls all complete without leaking or detaching the prior worker. - Destructor waits for a naturally-completing worker (regression for #858). - Destructor blocks until an in-flight worker finishes: the completion callback is pinned via a hold flag, Unregister is invoked from a second thread, and the test asserts Unregister does not return until the hold is released. Pre-fix (with detach) this would have observed Unregister returning early, proving the join is effective.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The Download/Backup/Install/Apply/Restore callbacks in LinuxPlatformLayer spawned a std::thread and called detach() on it. The worker captured raw pointers to workCompletionData and workflowData, which are only guaranteed valid until WorkCompletionCallback fires. Agent shutdown (or workflow teardown) while the worker was still running could therefore invalidate those pointers and cause use-after-free / core dumps (related to #852).
Changes:
Tests (src/platform_layers/linux_platform_layer/tests/linux_adu_core_impl_ut.cpp):