Skip to content

Feat/upload dir#121

Open
sqhyz55 wants to merge 4 commits intoxorbitsai:mainfrom
sqhyz55:feat/upload_dir
Open

Feat/upload dir#121
sqhyz55 wants to merge 4 commits intoxorbitsai:mainfrom
sqhyz55:feat/upload_dir

Conversation

@sqhyz55
Copy link
Collaborator

@sqhyz55 sqhyz55 commented Mar 10, 2026

No description provided.

@sqhyz55 sqhyz55 requested a review from rogercloud March 10, 2026 01:42
@qinxuye
Copy link
Contributor

qinxuye commented Mar 10, 2026

At a glance, I think we need some alignment on the files part in KB.

Now KB relied on its own file uploading ability. But I think it should leverage the ability from file management. Benefit is pretty clear that in the future, file will only expose file id and meta, reading file can come from internal utility function. the path can not be the local file system only but also for some external storage.

We should stop relying on the local file systems any more.

@qinxuye
Copy link
Contributor

qinxuye commented Mar 10, 2026

To be clear, we need to wait for #15 before we introduce any file related modification.

@sqhyz55
Copy link
Collaborator Author

sqhyz55 commented Mar 10, 2026

To be clear, we need to wait for #15 before we introduce any file related modification.

Agreed. We can evaluate and make adjustments after #15 is merged. I’ll switch this PR to draft status for now.

@sqhyz55 sqhyz55 marked this pull request as draft March 10, 2026 04:03
sqhyz55 and others added 3 commits March 12, 2026 10:35
…support physical cleanup

- Add sanitize_path_component() for path traversal protection
- Add collection parameter to get_upload_path() and get_file_url()
- KB uploads now stored in user_{id}/{collection}/{filename} structure
- Delete collection physically removes directory before DB deletion
- Rename collection physically renames directory before DB update
- Add validate_file_path() and extract_user_id_from_prefix() to files API
- Enhanced security validation in upload, download, preview endpoints

Ported from FenixAOS feat/upload_dir branch (without LanceDB upgrade)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add filelock for collection dir operations to avoid concurrent conflicts
- Replace rmtree with move_collection_dir_to_trash for atomicity and recovery
- Add kb_physical_sync module (lock + trash helpers, cleanup_trash)
- Update delete_collection_api and rename_collection_api to use lock and trash
- Add filelock dependency via pyproject.toml
- Fix tests: mock move_collection_dir_to_trash and assert new behavior
- .gitignore: ignore filelock test artifacts (*MagicMock*.lock)

Made-with: Cursor
- kb: register UploadedFile on ingest, sync DB on collection delete/rename
  with path sanitization; add type ignore for SQLAlchemy assignment
- workspace_file_tool: use workspace_dir.resolve() for relative_to to fix
  macOS /var vs /private/var path mismatch in tests
- tests: override get_db in test_kb_ingest_separators (mock); pass mock_db
  in test_multitenancy delete_collection_api; fix test_file_upload for
  file_id (task_id for folder validation, use /upload, KB ingest for list)
- tests: remove test_files_security (tested helpers removed with file_id)

Made-with: Cursor
@sqhyz55 sqhyz55 marked this pull request as ready for review March 12, 2026 04:00
CI uses langfuse 4.x stubs which omit start_span; local may run 3.x.
Cast to Any in v3 branch to keep mypy clean without type: ignore.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants