Rev caching #398

chrisiacovella · 2026-01-23T17:59:02Z

Pull Request Summary

The main purpose of this PR is to do a slight revision on the caching, as discussed in #396 . Briefly, the solution prior to this PR would fetch any datasets not in the cache (if say, the defaults changed), but the updated files would not be saved into a new cache (requiring manual removal to generate a new one). This adds in a new step that write to a text file that contains a list of the dataset names and versions. We then hash that file and add it to the key, so we know if we have a cache miss.

Also, this changes the location of the cache to ~\.cache/modelforge_testing_dataset_cache. For some reason this was not working for me initially (in terms of saving the cache), but there could have been other issues. This was done to remove problems associated with local testing.

Also, this adds in fall backs for downloading files. As discussed in #397, when zenodo was rejecting python.requests connections, I found I could use curl or wget successfully from the command line. I've added in wrappers to those system tools; they will be called after the max number of retries fail (first trying wget if available, then curl if available). I did add in logic to specify the method used in the download function, but that is not exposed at the level of the .toml files as I'm not really sure that is necessary.

Key changes

Notable points that this PR has either accomplished or will accomplish.

modify caching scheme
add fallback downloading schemes
Add 3.13 to the testing matrix

Associated Issue(s)

Pull Request Checklist

Issue(s) raised/addressed and linked
Includes appropriate unit test(s)
Appropriate docstring(s) added/updated
Appropriate .rst doc file(s) added/updated
PR is ready for review

…ks (those still tend to work ok if zenodo is having issues)

codecov-commenter · 2026-01-23T18:01:49Z

Codecov Report

❌ Patch coverage is 75.75758% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.68%. Comparing base (eaf3817) to head (e85fb34).

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…at it is handled separately). attempt to fix linting complaints

Copilot

Pull request overview

This pull request revises the dataset caching mechanism to better handle cache invalidation and adds fallback download methods (wget, curl) for when the primary requests library fails. The cache location is moved to ~/.cache/modelforge_testing_dataset_cache, and a new tracking file downloaded_datasets.txt is used to generate cache keys based on required datasets and versions. Additionally, a new Aimnet2 dataset curation module is introduced.

Changes:

Modified caching scheme to track dataset requirements via hash of downloaded_datasets.txt file for proper cache invalidation
Added wget and curl fallback mechanisms for dataset downloads when Python requests library fails
Relocated dataset cache from local directory to ~/.cache/modelforge_testing_dataset_cache

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 17 comments.

Show a summary per file

File	Description
modelforge/utils/remote.py	Added wget_wrapper, curl_wrapper, and requests_wrapper functions for download fallbacks; refactored download_from_url to support multiple download schemes with retry logic
modelforge/tests/list_test_datasets.py	New file that generates downloaded_datasets.txt tracking file for cache key hashing
.github/workflows/CI.yaml	Updated cache configuration to use hashFiles on downloaded_datasets.txt for cache invalidation; upgraded to actions/cache@v4
modelforge/tests/conftest.py	Updated dataset_temp_dir fixture to use new cache location with expanduser
modelforge/tests/fetch_test_datasets.py	Updated to use new cache location and added documentation
modelforge/tests/test_remote.py	Added parametrized tests for different download schemes (auto, curl, wget)
modelforge/tests/test_aaa_setup.py	Removed entire file (dataset download logic moved to fetch_test_datasets.py)
scripts/plot_dataset.py	Updated hardcoded paths to use new cache location
modelforge/utils/io.py	Added fmt:off comments to preserve message formatting
modelforge-curate/modelforge/curate/datasets/aimnet2_curation.py	New curation module for AIMNet2 dataset
modelforge/dataset/yaml_files/ani2x.yaml	Updated file names to match correct naming conventions
modelforge/dataset/parameters.py	Added commented placeholder for future spin_multiplicity_per_atom property
modelforge/tests/precalculated_values.py	Added fmt:off comment for code formatting
README.md	Added installation instructions and updated copyright year to 2026

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

modelforge/utils/remote.py

modelforge-curate/modelforge/curate/datasets/aimnet2_curation.py

modelforge/utils/remote.py

modelforge/tests/list_test_datasets.py

modelforge/utils/remote.py

.github/workflows/CI.yaml

modelforge-curate/modelforge/curate/datasets/aimnet2_curation.py

…r minor typos.

chrisiacovella added 2 commits January 23, 2026 09:14

modifying caching; revising downloads adding in curl and wget fallbac…

4e7d23f

…ks (those still tend to work ok if zenodo is having issues)

Adding in list of datasets so we can hash it in the cache.

670368f

chrisiacovella added 3 commits January 23, 2026 10:02

modifying CI

3a9430b

Forgot to create the cache folder if it does not exist.

61adf40

remove unnecessary test meant just meant to pre-download data (now th…

78f7b0b

…at it is handled separately). attempt to fix linting complaints

chrisiacovella requested a review from Copilot January 23, 2026 18:46

Copilot started reviewing on behalf of chrisiacovella January 23, 2026 18:47 View session

Copilot AI reviewed Jan 23, 2026

View reviewed changes

chrisiacovella added 5 commits January 23, 2026 10:54

adding 3.13 support to testing matrix

fca7284

revising error handling/returns of downloading wrappers and some othe…

86ea0dc

…r minor typos.

fixed typo in CI.yaml

33dff9b

fix numpy 2 minor issue.

d474ef0

swapping tilde for $HOME in CI

e85fb34

chrisiacovella merged commit fe4932a into choderalab:main Jan 23, 2026
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rev caching #398

Rev caching #398

Uh oh!

chrisiacovella commented Jan 23, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Jan 23, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Rev caching #398

Rev caching #398

Uh oh!

Conversation

chrisiacovella commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Summary

Key changes

Associated Issue(s)

Pull Request Checklist

Uh oh!

codecov-commenter commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chrisiacovella commented Jan 23, 2026 •

edited

Loading

codecov-commenter commented Jan 23, 2026 •

edited

Loading