Skip to content

chore(cache): for generating test outputs#786

Open
HaoZeke wants to merge 10 commits intometatensor:mainfrom
HaoZeke:cacheTestOutputs
Open

chore(cache): for generating test outputs#786
HaoZeke wants to merge 10 commits intometatensor:mainfrom
HaoZeke:cacheTestOutputs

Conversation

@HaoZeke
Copy link
Member

@HaoZeke HaoZeke commented Oct 8, 2025

Not really sure it matters to CI but locally it is nicer.


📚 Documentation preview 📚: https://metatrain--786.org.readthedocs.build/en/786/

mtt train options.yaml -o model-32-bit.pt -r base_precision=32
mtt train options.yaml -o model-64-bit.pt -r base_precision=64
mtt train options-pet.yaml -o model-pet.pt
FORCE_REGENERATE=false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also want to regenerate if the code changed, in case one of the change broke the tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, but this is a bit trickier? rn I pushed something relative to the last commit but then of course it doesn't run if the user ended up making an uncommitted change which would have broken tests..

Copy link
Contributor

@PicoCentauri PicoCentauri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally like the idea of caching a lot! I was always to lazy to implement it. Thanks!

Comment on lines +30 to +45
if [ "$FORCE_REGENERATE" = true ] || [ ! -f "model-32-bit.pt" ]; then
mtt train options.yaml -o model-32-bit.pt -r base_precision=32
fi

if [ "$FORCE_REGENERATE" = true ] || [ ! -f "model-64-bit.pt" ]; then
mtt train options.yaml -o model-64-bit.pt -r base_precision=64
fi

if [ "$FORCE_REGENERATE" = true ] || [ ! -f "model-pet.pt" ]; then
mtt train options-pet.yaml -o model-pet.pt
fi

if [ "$FORCE_REGENERATE" = true ]; then
echo "Saving current git commit hash to version the data."
git rev-parse HEAD > "$HASH_FILE"
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we loop this? If we add more files.

BTW: we also have some files to generate for the docs 🙈

Copy link
Member Author

@HaoZeke HaoZeke Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we loop this? If we add more files.

+1

BTW: we also have some files to generate for the docs 🙈

makes sense to design it once here then! I'm thinking now of maybe watching the status of some sources with git status as well. Or @Luthaf suggested we make the cache opt-in instead of opt-out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PicoCentauri looping is a bit weird because of the different arguments, so I have 527daae

but I'd rather revert that one 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm going to push a bit for making this opt-in for now. This way we don't break anything in the default case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm going to push a bit for making this opt-in for now. This way we don't break anything in the default case

Made an opt-in variant.

@HaoZeke HaoZeke requested a review from PicoCentauri October 8, 2025 15:15
Comment on lines +47 to +48
# CI should always generate test files
FORCE_REGENERATE: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is now opt in. We don't need to declare these variables here, right?

package.json

# caching githash
.data_version.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather create this file inside .tox/<env-using-the-cached-data>


By default, the main test suite regenerates the necessary model files every time
it runs. For faster local development, you can **opt-in** to caching these files
by setting the ``USE_CACHE`` environment variable to ``1``:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
by setting the ``USE_CACHE`` environment variable to ``1``:
by setting the ``MTT_CACHE_TEST_DATA`` environment variable to ``1``:

would be a lot clearer

(or MTT_CACHE_TOX_DATA to also apply to the docs things?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants