fix: robust bug fixes and storage verification tools#110
fix: robust bug fixes and storage verification tools#110zerodiscount wants to merge 12 commits intoagritheory:version-15from
Conversation
This commit introduces several stability improvements and tools for the Cloud Storage app, developed during a large-scale enterprise migration.
1. **Fix Recursion Error**:
* Addressed a RecursionError in CloudStorageFile.validate. The recursive loop occurred when associate_files() called save(), re-triggering validation. This fix checks ignore_file_validate *before* calling association logic.
2. **Robust Migration Tools**:
* Updated migrate-files-to-cloud-storage:
* **Smart Sync**: Checks if file already exists in S3 (head_object) before uploading. If found, it syncs the DB record (s3_key) and skips upload.
* **Validation Bypass**: Added ignore_links, ignore_permissions, and ignore_validate flags to file_doc.
3. **File Logic Hardening**:
* Updated write_file and associate_files in overrides/file.py to correctly propagate validation flags when updating *existing* file records.
* Patched has_permission to gracefully handle DoesNotExistError.
lauty95
left a comment
There was a problem hiding this comment.
Hi!
Great audit tool! I have a few suggestions:
-
Documentation: Please add documentation in the
docs/directory explaining what this tool does and how to use it (similar to commands.md). -
CLI Command: Consider adding a bench command in
commands.pyfor easier access -
Additional Statistics: Consider adding more useful metrics:
- Total storage used (size in MB/GB) - you can reuse
_format_bytes()frommigration.py - Local files not yet migrated (count)
- Files marked as migrated but missing in cloud (this is critical for data integrity)
| [tool.poetry] | ||
| name = "cloud_storage" | ||
| version = "15.9.1" | ||
| version = "1.1.1" |
There was a problem hiding this comment.
The version change from 15.9.1 to 1.1.1 breaks the existing versioning convention. Please keep the major version aligned with the Frappe version (e.g., 15.9.1) for clarity and to indicate compatibility.
| "jszip": "^3.10.1", | ||
| "madr": "^3.0.0" | ||
| "madr": "^3.0.0", | ||
| "vue": "^3.3.4" |
There was a problem hiding this comment.
In the PR body you wrote: "Vue Dependency: Added Vue to package.json to resolve bench build failures on v15."
Could you share the exact error message?
Frappe v15 already includes Vue 3 in core, so it shouldn't throw an error. Adding Vue here might cause version conflicts or bundle duplication.
|
|
||
| local_count = 0 | ||
| minio_count = 0 | ||
| synced_count = 0 |
There was a problem hiding this comment.
synced_count is an unused variable.
| client = get_cloud_storage_client() | ||
| bucket = client.bucket | ||
| except Exception as e: | ||
| print(f"Failed to initialize MinIO client: {e}") |
There was a problem hiding this comment.
This project supports S3, DigitalOcean Spaces, Backblaze, etc. Consider using generic messaging like:
storage_name = config.get("endpoint_url", "Cloud Storage")
print(f"Failed to initialize client: {e}")
print(f"Found in Cloud Storage: {minio_count}")| print(f"\n[ERROR] Missing in MinIO (DB claims migrated) ({len(missing_minio)}):") | ||
| print(", ".join(missing_minio[:10]) + ("..." if len(missing_minio) > 10 else "")) | ||
|
|
||
| # Verification of count |
There was a problem hiding this comment.
Comparing local_count vs minio_count is confusing because:
- After a complete migration with
remove_local=True,local_countwould be 0 - If a migration is in progress, they will always be different
A better logic would be:
marked_as_migrated = len([f for f in files if f.s3_key])
if marked_as_migrated != minio_count:
print(f"[ERROR] Sync Issue: DB claims {marked_as_migrated} files migrated, but only {minio_count} found in cloud!")This catches the critical case where DB thinks files are migrated but they're missing from cloud storage.
| if f.is_private: | ||
| local_path = frappe.get_site_path("private", "files", f.file_name) | ||
| else: | ||
| local_path = frappe.get_site_path("public", "files", f.file_name) |
There was a problem hiding this comment.
This assumes all public files are in public/files/, but they can be in various paths (/files/, /assets/, etc.).
Should we use something like:
local_path = frappe.get_site_path("public", f.file_url.lstrip("/"))
Description
This PR addresses several critical issues and adds valid verification tools for the cloud storage integration.
Bug Fixes
New Features