Skip to content

Fix S3 prefix boundary matching bug#5

Merged
vertix merged 2 commits intomainfrom
fix/s3-prefix-boundary-matching
Jan 14, 2026
Merged

Fix S3 prefix boundary matching bug#5
vertix merged 2 commits intomainfrom
fix/s3-prefix-boundary-matching

Conversation

@vertix
Copy link
Copy Markdown
Contributor

@vertix vertix commented Jan 14, 2026

Problem

pos3 incorrectly included files from adjacent S3 paths when downloading due to overly broad prefix matching.

When downloading s3://bucket/data/, pos3 would also try to download files from s3://bucket/data_backup/, resulting in 404 errors because it would construct invalid S3 keys.

Root Cause

The _list_s3_objects() function used _normalize_s3_url() which strips trailing slashes, so s3://bucket/data/ became s3://bucket/data. When calling S3's list_objects_v2 with Prefix="data", it would match both data/ AND data_backup/ because S3 does string prefix matching.

Solution

Modified _list_s3_objects() to ensure directory prefixes always end with / when listing:

  1. Introduced list_prefix variable to track the final prefix used for listing
  2. When a key doesn't end with / and head_object returns 404 (not a single file), append / to create list_prefix
  3. Keys already ending with / skip head_object entirely and use the key as-is
  4. Single file downloads still work via head_object when the file exists

This ensures Prefix="data/" only matches keys starting with data/, not data_backup/.

Changes

  • pos3/init.py: Fixed _list_s3_objects() method (lines 709-741)
  • tests/test_s3.py: Added TestPrefixBoundaryMatching class with 3 test cases
  • pyproject.toml: Bumped version to 0.2.1
  • CHANGELOG.md: Documented the fix
  • uv.lock: Updated for new version

Testing

  • All 58 tests pass (including 3 new regression tests)
  • 93% code coverage maintained
  • Created verification script that confirms the fix prevents spurious matches
  • No regressions in existing functionality

Impact

This bug affected ANY S3 path where multiple keys share the same string prefix:

  • data/ vs data_backup/
  • logs/ vs logs_archive/
  • recovery/ vs recovery_towels/

The fix ensures proper "directory boundary" semantics for S3 prefix matching.

Ensure directory prefixes always end with '/' when listing S3 objects
to prevent matching adjacent paths (e.g., 'data/' vs 'data_backup/').

Fixes the issue where downloading s3://bucket/data/ would incorrectly
include files from s3://bucket/data_backup/, causing 404 errors.
@vertix vertix marked this pull request as draft January 14, 2026 12:25
@vertix vertix marked this pull request as ready for review January 14, 2026 12:25
@github-actions
Copy link
Copy Markdown

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

Verifies that downloading s3://bucket/my_dir (no trailing slash)
correctly adds the slash after head_object returns 404, preventing
matches against adjacent paths like my_dir_backup/.
@vertix vertix merged commit cf69d82 into main Jan 14, 2026
2 checks passed
@vertix vertix deleted the fix/s3-prefix-boundary-matching branch January 14, 2026 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant