An assortment of helper functions for machine learning. Heavily overfit to my coding idiosyncrasies, and meant to be used in conjunction with my project template.
The base package installs the shared helpers (including mlh.hypers) without PyTorch. Install via pip install machine-learning-helpers or include the git dependency in a uv script without extra flags. Only the modules that rely on PyTorch need it:
- Install with PyTorch helpers when needed:
pip install machine-learning-helpers[torch]. - Importing
mlh.torch_helpersor other torch-based utilities without the extra will raise anImportErrorexplaining thattorchis required.
To have these helpers be globally available for all your projects, create a hidden directory in home (i.e. ~/.python) and set your PYTHONPATH in your bashrc via:
export PYTHONPATH=~/.python:$PYTHONPATH
For notebooks, store your boilerplate code in init.ipynb and run
%run ~/.python/init at the beginning of each new notebook.
Then you can use import ml_helpers as mlh in all your projects.
These scripts are highly opinionated, meaning they enforce a strict directory structure and are only set up to work with my machine learning project skeleton. That said, they should be easy to modify for your own use.
Submissions happen entirely though python. See submit.py for an example. User imports specifies a dictionary of job options and a dictionary of hyperparameters, and calls submit() from job_submitter.py. A few comments:
-
Rather than copying
static.pyandjob_submitter.pyinto every new project, create a hidden directory in home (i.e.~/.python) and set yourPYTHONPATHin your bashrc via:export PYTHONPATH=~/.python:$PYTHONPATH
then you can use import job_submitter in all your projects. Super handy to prevent the case where you have K copies of job_submitter.py and you've tweaked one of them but can't remember which.
submit.pystrictly enforces a few things (see comments withinsubmit.pyfor more info). If you use my project skeleton, all requirements should be met. Just make sure to make a new directory for each experiment, and callsubmit.pywithinmodified_ml_project_skeleton/experiments/my_experiment_name/submit.py. Strict enforcement of directory structure is to ensure the output from each experiment is self contained.
The logic of job_submitter.py is broken into a few components, each of which should be easily modifiable for your own purposes:
- First it validates directory structure.
verify_dirs()checks the required path exists, and loads global parameters to avoid passing paths around everywhere. Also creates a unique timestamped results directory within the experiments folder.
- Then process hyperparameters
- This expects a dictionary (or a list of dictionaries) in the format
and will return a list of strings in of the form
my_hypers = { "lr":[0.001, 0.0001, ...], "seed":[1, 2, ...], "other_hyper": ['cat'] }
note the extra single-quotes within the string. This is tailored for sacred's command line interface. Modify line 157 for a different string format.my_hypers_strings = [ "'lr=0.001' 'seed=1' 'other_hyper=cat'", "'lr=0.001' 'seed=2' 'other_hyper=cat'", "'lr=0.0001' 'seed=1' 'other_hyper=cat'", "'lr=0.0001' 'seed=2' 'other_hyper=cat'", ... ]
- This expects a dictionary (or a list of dictionaries) in the format
- Next, iterate through each hyperparameter string and ask the user if they want to submit the job. The purpose of this is that the first submission will invariably fail for some reason or another. Submit a test job, wait to see that it runs correctly, then submit the rest.
- When a user submits a job, two command line string are made in
make_commands(). The first turns the hyperparameter string into a sacred-specific python command, which looks something likeModify line 209 for a different python command. The second produces the slurm command itself and shouldn't need to be modified.python_command = "python main.py with 'lr=0.001' 'seed=1' 'other_hyper=cat' "
- Finally, in
make_bash_script(), a bash scriptsubmit.shis made and saved using a prewritten template instatic.pyand the previously made python command. Modify make_bash_script() and static.py for different slurm configurations. Line 79 actually calls the bash command.submit.shis rewritten each time to prevent a buildup of submit.sh files, but if just want to make them then submit them yourself for debugging purposes, use themanual_mode=Trueflag.