Skip to content

Run & track experiments on Hugging Face infra #520

@lewtun

Description

@lewtun

Hi folks 👋 I made an end-to-end example on how to launch & track parameter-golf training experiments on Hugging Face infra using HF Jobs and Buckets, with live metrics via Trackio: https://github.com/lewtun/parameter-golf/tree/hf-jobs-example/examples/hf_jobs

The setup is just two files:

  • launch_job.py runs locally: it creates a bucket, submits the training job, and streams logs back to your terminal.
  • train_job.py runs on the Hub as a uv script: it installs dependencies, downloads the FineWeb data, logs metrics to Trackio, and uploads results (logs, metrics, model artifact) to your bucket.

A typical launch looks like:

python launch_job.py --name baseline-9L --hardware h200x8                                                                                                                                                                                                                                                                                                                        

And the outputs look something like this:

It also comes with instructions to equip your favourite agent with skills for managing the infra, so you can launch autoresearch experiments to your heart's content. I hope you find it useful in your golfing 🏌️‍♂️!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions