fix: timing issue in S3 locker to prevent zombie locks #812
+74
−7
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What kind of change does this PR introduce?
Bug fix
What is the current behavior?
There is a potential race condition in the S3 locker that can cause zombie locks to be created if the lock is unlocked during the renewal operation (Between get and put)
What is the new behavior?
Check that the lock is still locked after the S3 get operation completes and exit if it was already unlocked
Additional context
This issue was causing intermittent test failures which would manifest in the "double unlock" test but was actually caused by the abort test leaving behind zombie objects. Running the S3 locker tests in a loop I was able to reproduce this semi-consistently. Usually it would take only a few test runs, but sometimes it'd go upwards of 50 runs without a failure.
The newly added test expands on the "handle abort signal" test and consistently fails without this fix.