-
Notifications
You must be signed in to change notification settings - Fork 2
Skip checking for missing media the first time #538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Reviewer's GuideIntroduce an optimization to skip per-file missing-media checks on first load by detecting an empty cache directory, thread this behavior through load/stream paths via a new scan_for_missing_files flag, and add a targeted regression test plus a utility helper to detect empty directories. Sequence diagram for skipping missing-file scans on first loadsequenceDiagram
actor Client
participant Load as load
participant CacheRoot as database_cache_root
participant Utils as is_empty
participant Loader as _load_files
participant Missing as _missing_files
Client->>Load: load(name, version, cache_root, flavor, verbose)
Load->>CacheRoot: database_cache_root(name, version, cache_root, flavor)
CacheRoot-->>Load: db_root
Load->>Utils: is_empty(db_root)
Utils-->>Load: is_empty
Load->>Load: scan_for_missing_files = not is_empty
alt cache_not_empty
Load->>Loader: _load_files(files, files_type, db_root, cached_versions, flavor, cache_root, pickle_tables, scan_for_missing_files=true, num_workers, verbose)
Loader->>Missing: _missing_files(files, files_type, db_root, flavor, verbose)
Missing-->>Loader: missing_files
else cache_empty
Load->>Loader: _load_files(files, files_type, db_root, cached_versions, flavor, cache_root, pickle_tables, scan_for_missing_files=false, num_workers, verbose)
Loader->>Loader: missing_files = list(files)
end
Loader-->>Load: cached_versions
Load-->>Client: database
Updated class diagram for load and utility functions with scan_for_missing_filesclassDiagram
class LoadModule {
+load(name, version, cache_root, flavor, verbose, ...) database
+load_media(name, version, cache_root, flavor, verbose, ...) media
+load_table(name, version, cache_root, verbose, ...) table
+_load_files(files, files_type, db_root, cached_versions, flavor, cache_root, pickle_tables, scan_for_missing_files, num_workers, verbose) CachedVersions
}
class StreamModule {
+stream(name, version, cache_root, flavor, verbose, ...) DatabaseIterator
}
class UtilsModule {
+is_empty(path) bool
}
class CacheModule {
+database_cache_root(name, version, cache_root, flavor) str
}
class MissingFilesModule {
+_missing_files(files, files_type, db_root, flavor, verbose) list
+_cached_versions(name, version, db_root, cache_root, flavor, verbose) CachedVersions
}
LoadModule --> UtilsModule : uses is_empty
StreamModule --> UtilsModule : uses is_empty
LoadModule --> CacheModule : uses database_cache_root
StreamModule --> CacheModule : uses database_cache_root
LoadModule --> MissingFilesModule : uses _missing_files
LoadModule --> MissingFilesModule : uses _cached_versions
StreamModule --> LoadModule : uses _load_files
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've left some high level feedback:
- The new
is_empty()helper assumes the directory exists and is a directory; consider handling non-existent paths (and possibly non-directory paths) by returningTrueinstead of raising to make it safer for reuse outside the current call sites. - You now compute
scan_for_missing_filesin several places (load,load_media,load_table,stream) with identical logic; consider centralizing this decision in a small helper or inside_load_filesto avoid repetition and keep behavior consistent if it ever needs to change.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new `is_empty()` helper assumes the directory exists and is a directory; consider handling non-existent paths (and possibly non-directory paths) by returning `True` instead of raising to make it safer for reuse outside the current call sites.
- You now compute `scan_for_missing_files` in several places (`load`, `load_media`, `load_table`, `stream`) with identical logic; consider centralizing this decision in a small helper or inside `_load_files` to avoid repetition and keep behavior consistent if it ever needs to change.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Closes #526
This checks if we have any files in the cache folder and only then run the scan for missing files.
Instead of checking for empty folder with
is_empty()as introduced in this pull request we could also change howaudb.core.cache.database_cache_root()works by letting it not create the folder. Then we could simply check if thedb_rootexists here instead of usingis_empty(db_root). If you think this is the better approach, we should first merge #540, and otherwise close #540.Summary by Sourcery
Tests:
Summary by Sourcery
Optimize detection of missing media files on first database load by skipping per-file existence checks when the cache directory is still empty and wiring this optimization through all relevant load and streaming paths.
New Features:
Enhancements:
Tests: