Conversation
|
There are still some tests that are not using the fixtures, and that is why they are failing, will fix it later. |
1ae874b to
d0e20fb
Compare
Luthaf
left a comment
There was a problem hiding this comment.
This looks good to me! I like using upper case names for global fixtures.
|
cscs-ci run |
|
Ok, I tried to run the tests locally with a GPU and they pass, so I don't know exactly what is going on here, I will try to investigate. The training script fails when running |
|
cscs-ci run |
We could write this down somewhere. For the LLM and for us to remember. |
|
CI failure seems relevant! |
|
Yes yes, I have to understand what is going on 😅 |
Motivation
Whenever I wanted to run an isolated test with
tox -e teststhegenerate-outputs.shscript was run and therefore I had to wait for all the training runs to finish just to run a test that doesn't need them.Alternatives until now
pytest, which is fine but I think it is better if we can run it with the true testing environment.tox.ini, super annoying to do it each time and remember to undo it when pushing changes.Implementation in this PR
The model paths are made a fixture. They run only when a test needs them. At that point, we can run the training for that specific model.
In this way, trainings are run only if/when needed.
Complications
We support running tests in parallel, so we have to make sure that if two processes request the fixture they don't train both at the same time. I used a simple lockfile so that the workers wait for the worker that is doing the training.
Nice side effects
Since the trainings are ran by each model, in principle it is possible that the trainings run in parallel. However, I think this could not be the case because it will happen that different workers ask for the same training, and all but one worker will be just idle. Some smarter splitting of the tests would ensure that trainings are run in parallel, but I don't know if it's possible (haven't loooked into it).
📚 Documentation preview 📚: https://metatrain--1015.org.readthedocs.build/en/1015/