A bunch of helper functions for machine learning

An assortment of helper functions for machine learning. Heavily overfit to my coding idiosyncrasies, and meant to be used in conjunction with my project template.

Installation

The base package installs the shared helpers (including mlh.hypers) without PyTorch. Install via pip install machine-learning-helpers or include the git dependency in a uv script without extra flags. Only the modules that rely on PyTorch need it:

Install with PyTorch helpers when needed: pip install machine-learning-helpers[torch].
Importing mlh.torch_helpers or other torch-based utilities without the extra will raise an ImportError explaining that torch is required.

To have these helpers be globally available for all your projects, create a hidden directory in home (i.e. ~/.python) and set your PYTHONPATH in your bashrc via:

export PYTHONPATH=~/.python:$PYTHONPATH

For notebooks, store your boilerplate code in init.ipynb and run

%run ~/.python/init at the beginning of each new notebook.

Then you can use import ml_helpers as mlh in all your projects.

Job submission script instructions

These scripts are highly opinionated, meaning they enforce a strict directory structure and are only set up to work with my machine learning project skeleton. That said, they should be easy to modify for your own use.

If you want to use the project template + job submission script

Submissions happen entirely though python. See submit.py for an example. User imports specifies a dictionary of job options and a dictionary of hyperparameters, and calls submit() from job_submitter.py. A few comments:

Rather than copying static.py and job_submitter.py into every new project, create a hidden directory in home (i.e. ~/.python) and set your PYTHONPATH in your bashrc via:

export PYTHONPATH=~/.python:$PYTHONPATH

then you can use import job_submitter in all your projects. Super handy to prevent the case where you have K copies of job_submitter.py and you've tweaked one of them but can't remember which.

submit.py strictly enforces a few things (see comments within submit.py for more info). If you use my project skeleton, all requirements should be met. Just make sure to make a new directory for each experiment, and call submit.py within modified_ml_project_skeleton/experiments/my_experiment_name/submit.py. Strict enforcement of directory structure is to ensure the output from each experiment is self contained.

If you want to modify the scripts for your own project

The logic of job_submitter.py is broken into a few components, each of which should be easily modifiable for your own purposes:

First it validates directory structure.
- verify_dirs() checks the required path exists, and loads global parameters to avoid passing paths around everywhere. Also creates a unique timestamped results directory within the experiments folder.

Then process hyperparameters

This expects a dictionary (or a list of dictionaries) in the format

my_hypers = {
    "lr":[0.001, 0.0001, ...],
    "seed":[1, 2, ...],
    "other_hyper": ['cat']
}

and will return a list of strings in of the form

my_hypers_strings = [
    "'lr=0.001' 'seed=1' 'other_hyper=cat'",
    "'lr=0.001' 'seed=2' 'other_hyper=cat'",
    "'lr=0.0001' 'seed=1' 'other_hyper=cat'",
    "'lr=0.0001' 'seed=2' 'other_hyper=cat'",
    ...
]

note the extra single-quotes within the string. This is tailored for sacred's command line interface. Modify line 157 for a different string format.

Next, iterate through each hyperparameter string and ask the user if they want to submit the job. The purpose of this is that the first submission will invariably fail for some reason or another. Submit a test job, wait to see that it runs correctly, then submit the rest.
When a user submits a job, two command line string are made in make_commands(). The first turns the hyperparameter string into a sacred-specific python command, which looks something like
```
 python_command = "python main.py with 'lr=0.001' 'seed=1' 'other_hyper=cat' "
```
Modify line 209 for a different python command. The second produces the slurm command itself and shouldn't need to be modified.
Finally, in make_bash_script(), a bash script submit.sh is made and saved using a prewritten template in static.py and the previously made python command. Modify make_bash_script() and static.py for different slurm configurations. Line 79 actually calls the bash command. submit.sh is rewritten each time to prevent a buildup of submit.sh files, but if just want to make them then submit them yourself for debugging purposes, use the manual_mode=True flag.

Name		Name	Last commit message	Last commit date
Latest commit History 218 Commits
mlh		mlh
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
debug.ipynb		debug.ipynb
main.py		main.py
pyproject.toml		pyproject.toml
test_batch_size.py		test_batch_size.py
test_loguru_capture.py		test_loguru_capture.py
test_pmap_all.py		test_pmap_all.py
test_pmap_capture.py		test_pmap_capture.py
test_warning_capture.py		test_warning_capture.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A bunch of helper functions for machine learning

Installation

Job submission script instructions

If you want to use the project template + job submission script

If you want to modify the scripts for your own project

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

vmasrani/machine_learning_helpers

Folders and files

Latest commit

History

Repository files navigation

A bunch of helper functions for machine learning

Installation

Job submission script instructions

If you want to use the project template + job submission script

If you want to modify the scripts for your own project

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages