Open
Conversation
- Add persistent cache directory for downloaded hydrology data - Data persists across test runs (in .test_cache/) - Significantly reduces test execution time on subsequent runs - Update test fixtures to use shared cache directory - default_config and consolidate_config now include CACHE_DIR - All tests share the same cache for downloaded files - Add GitHub Actions cache for test data - Uses actions/cache to persist .test_cache between CI runs - First CI run downloads data, subsequent runs use cache - Add pytest-xdist for parallel test capability - Available for local parallel test execution if desired Performance improvement: ~36% faster on cached runs (97s -> 62s) Co-authored-by: Sam Neubardt <samneubardt@gmail.com>
- Document cache location and size (~85MB) - Explain how to clear the cache - Note CI cache key versioning - Describe available session-scoped fixtures for future optimization Co-authored-by: Sam Neubardt <samneubardt@gmail.com>
|
Cursor Agent can help with this pull request. Just |
- Fix download_if_missing to handle partial/corrupted downloads: - Check file size > 0, not just existence - Use atomic writes (temp file + rename) to prevent caching incomplete files - Clean up corrupted 0-byte files automatically - Increase timeout from 10s to 30s for large files - Fix CI cache workflow: - Split into separate restore/save steps - Only save cache if not already cached (prevents overwriting good cache) - Bump cache key to v2 to invalidate any corrupted cache This fixes the issue where interrupted downloads created 0-byte files that would never be re-downloaded. Co-authored-by: Sam Neubardt <samneubardt@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cache static hydrology data and enable CI caching to significantly improve test performance.
Previously, static hydrology data (~85MB) was re-downloaded for every test run, leading to slow execution. This PR introduces a persistent local cache (
.test_cache/) for downloaded hydrology data, integrates it with GitHub Actions for CI caching, and updates test configurations to utilize this shared cache. It also addspytest-xdistfor parallel test execution and provides session-scoped fixtures for sharing expensive delineation results. The changes resulted in a 36% speedup for cached test runs (from 97.92s to 62.28s).