perf: Caching and I/O optimizations (P3, P10) #158
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Performance optimizations for caching and I/O operations.
Changes
P3: list_cansim_cached_tables single-pass optimization
lapply()calls into a single iteration over cached table pathstimeCached,rawSize, andtitlein one pass per cached tabledir()calls and file read operations that were being done 3x per pathvapply()for type-safe extraction from collected metadataP10: Avoid unnecessary tibble conversion
tibble::is_tibble()before callingas_tibble()Files Modified
R/cansim.R: tibble conversion checkR/cansim_parquet.R: single-pass cache metadata collectionBenchmark Notes
P3: list_cansim_cached_tables - Not directly benchmarked
Reason: Requires a populated cache directory with multiple cached tables to meaningfully benchmark.
Expected improvement: The optimization reduces file I/O operations from 3N to N (where N is number of cached tables):
To benchmark manually:
P10: tibble conversion check - Not directly benchmarked
Reason: Negligible impact - this is primarily a code quality improvement.
Analysis: The
is_tibble()check is O(1) and very fast. Most data flowing throughnormalize_cansim_values()is already a tibble from prior processing, so theas_tibble()call was largely unnecessary. The improvement is in avoiding the conversion overhead when data is already the correct type.Deferred Optimizations
P6 (field cache utilization): Evaluated but not implemented
P7 (csv2sqlite transform copies): Evaluated but not implemented
{if (...) ... else .}would harm code readabilitySummary
Test Plan
devtools::check()passes (0 errors, 0 warnings)🤖 Generated with Claude Code