Skip to content

Conversation

@jonasyr
Copy link
Collaborator

@jonasyr jonasyr commented Aug 23, 2025

Closes #10

Problem

The codebase-to-text tool was vulnerable to directory traversal attacks, allowing malicious repositories to access files outside the intended directory using paths like ../../../etc/passwd. This could expose sensitive system files when processing untrusted repositories.

Solution

  • Added _validate_file_path() method that uses os.path.commonpath() to ensure all file paths stay within the base directory
  • Integrated path validation into both file processing (_process_single_file) and folder parsing (_generate_file_entries) pipelines
  • Added comprehensive security logging for rejected paths when verbose mode is enabled
  • Handles symlinks by resolving them with os.path.realpath() before validation

Tests

  • 15 new security-focused test cases covering:
    • Legitimate files (should pass)
    • Directory traversal attempts with ../ patterns (should be blocked)
    • Absolute paths outside base directory (should be blocked)
    • Malicious symlinks pointing outside (should be blocked)
    • Internal symlinks (should pass)
    • Security verbose logging validation
    • Performance impact verification (<5% overhead)
    • Error handling for edge cases

Coverage/CI Notes

  • All existing tests pass (no regression)
  • Cross-platform testing for Windows/Unix path handling

Risks

  • Low risk: Purely additive security feature with no API changes
  • Backward compatible: No breaking changes to existing functionality
  • Performance: Minimal overhead (~2-3%) from path validation operations

Rollback Plan

If issues arise:

  1. Revert the three modified methods to original versions
  2. Remove _validate_file_path() method
  3. Remove security test classes
  4. All functionality returns to pre-patch behavior

Testing Checklist

  • Unit tests for path validation logic
  • Integration tests with malicious path simulation
  • Performance impact verification
  • Cross-platform path handling (Windows/Unix)
  • Symlink handling on supported systems
  • Verbose logging verification
  • Existing functionality regression testing
  • Error handling for edge cases

@jonasyr jonasyr requested a review from QaisarRajput August 23, 2025 16:46
@jonasyr jonasyr self-assigned this Aug 23, 2025
@jonasyr jonasyr linked an issue Aug 23, 2025 that may be closed by this pull request
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[SECURITY] Add path traversal protection to prevent malicious file access

3 participants