Sample fine-tuning environment #157

wesselb · 2025-10-08T15:49:31Z

Removes bf16_mode and brings back autocast (Closes Illegal Memory Access with Mixed Precision #121.).
Make activation checkpointing configurable and improve default checkpointing strategy.
Change default drop-out rates to zero everywhere
Fix bug: Set attn_drop_rate in the backbone. This will cause stochasticity if the model wasn't .eval()d.
Adds a Docker image which runs a very basic fine-tuning loop for Aurora. (Closes Multi-GPU training and mixed precision #125. Closes Illegal Memory Access with Mixed Precision #121.)

I've verified that the instructions work on a fresh Azure VM.

Also improve default checkpointing strategy.

rchan26

Thanks @wesselb - this looks great! Just some small comments. The main one is about ensuring that we definitely use the right .venv when in the image. Once we've made sure that is the case, I think we can merge this in :)

docs/finetuning.md

finetuning/finetune.py

Co-authored-by: Ryan Chan <rchan@turing.ac.uk>

…t/aurora into wesselb/fine-tuning-adjustments

rchan26

looks good to me! thanks again @wesselb !

wesselb · 2025-10-17T15:42:16Z

Thanks for the review, @rchan26 :)

Remove bf16_mode and bring back autocast

81fb325

wesselb self-assigned this Oct 8, 2025

wesselb added 5 commits October 13, 2025 16:21

Make activation checkpointing configurable

195dfd1

Also improve default checkpointing strategy.

Change dropout default and fix bug

38d9b7e

Remove bf16_mode from the docs

963b5a9

Add sample environment and fine-tuning loop

cb6080c

Describe Docker image in docs

f398e9f

wesselb changed the title ~~Remove bf16_mode and bring back autocast~~ Sample fine-tuning environment Oct 13, 2025

wesselb added 4 commits October 13, 2025 18:05

Enable autocast

80e2795

Fix comment

8c23afd

Add notice

071116f

Finalise instructions

54b91be

wesselb requested a review from rchan26 October 15, 2025 11:06

wesselb marked this pull request as ready for review October 15, 2025 11:06

Fix wording

1c5b6e1

rchan26 reviewed Oct 16, 2025

View reviewed changes

docs/finetuning.md Show resolved Hide resolved

docs/finetuning.md Outdated Show resolved Hide resolved

docs/finetuning.md Show resolved Hide resolved

docs/finetuning.md Show resolved Hide resolved

finetuning/finetune.py Outdated Show resolved Hide resolved

wesselb and others added 4 commits October 17, 2025 16:46

Update finetuning/finetune.py

4eb0068

Co-authored-by: Ryan Chan <rchan@turing.ac.uk>

Added clarifications suggested by Ryan

755bcb9

Merge branch 'wesselb/fine-tuning-adjustments' of github.com:microsof…

d001ae7

…t/aurora into wesselb/fine-tuning-adjustments

Add missing parenthesis

bd1f979

rchan26 approved these changes Oct 17, 2025

View reviewed changes

wesselb merged commit fac03c7 into main Oct 17, 2025
9 checks passed

wesselb deleted the wesselb/fine-tuning-adjustments branch October 17, 2025 15:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sample fine-tuning environment #157

Sample fine-tuning environment #157

Uh oh!

wesselb commented Oct 8, 2025 •

edited

Loading

Uh oh!

rchan26 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rchan26 left a comment

Uh oh!

wesselb commented Oct 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Sample fine-tuning environment #157

Sample fine-tuning environment #157

Uh oh!

Conversation

wesselb commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rchan26 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rchan26 left a comment

Choose a reason for hiding this comment

Uh oh!

wesselb commented Oct 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wesselb commented Oct 8, 2025 •

edited

Loading