-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Open
Description
Hi folks 👋 I made an end-to-end example on how to launch & track parameter-golf training experiments on Hugging Face infra using HF Jobs and Buckets, with live metrics via Trackio: https://github.com/lewtun/parameter-golf/tree/hf-jobs-example/examples/hf_jobs
The setup is just two files:
launch_job.pyruns locally: it creates a bucket, submits the training job, and streams logs back to your terminal.train_job.pyruns on the Hub as a uv script: it installs dependencies, downloads the FineWeb data, logs metrics to Trackio, and uploads results (logs, metrics, model artifact) to your bucket.
A typical launch looks like:
python launch_job.py --name baseline-9L --hardware h200x8 And the outputs look something like this:
- Trackio Space: https://huggingface.co/spaces/lewtun/parameter-golf-experiments
- Bucket with all the experiments: https://huggingface.co/buckets/lewtun/parameter-golf
It also comes with instructions to equip your favourite agent with skills for managing the infra, so you can launch autoresearch experiments to your heart's content. I hope you find it useful in your golfing 🏌️♂️!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels