Merged
Conversation
Ensure directory prefixes always end with '/' when listing S3 objects to prevent matching adjacent paths (e.g., 'data/' vs 'data_backup/'). Fixes the issue where downloading s3://bucket/data/ would incorrectly include files from s3://bucket/data_backup/, causing 404 errors.
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
Verifies that downloading s3://bucket/my_dir (no trailing slash) correctly adds the slash after head_object returns 404, preventing matches against adjacent paths like my_dir_backup/.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
pos3 incorrectly included files from adjacent S3 paths when downloading due to overly broad prefix matching.
When downloading
s3://bucket/data/, pos3 would also try to download files froms3://bucket/data_backup/, resulting in 404 errors because it would construct invalid S3 keys.Root Cause
The
_list_s3_objects()function used_normalize_s3_url()which strips trailing slashes, sos3://bucket/data/becames3://bucket/data. When calling S3'slist_objects_v2withPrefix="data", it would match bothdata/ANDdata_backup/because S3 does string prefix matching.Solution
Modified
_list_s3_objects()to ensure directory prefixes always end with/when listing:list_prefixvariable to track the final prefix used for listing/andhead_objectreturns 404 (not a single file), append/to createlist_prefix/skiphead_objectentirely and use the key as-ishead_objectwhen the file existsThis ensures
Prefix="data/"only matches keys starting withdata/, notdata_backup/.Changes
_list_s3_objects()method (lines 709-741)TestPrefixBoundaryMatchingclass with 3 test casesTesting
Impact
This bug affected ANY S3 path where multiple keys share the same string prefix:
data/vsdata_backup/logs/vslogs_archive/recovery/vsrecovery_towels/The fix ensures proper "directory boundary" semantics for S3 prefix matching.