Skip to content

Speed up by reducing precision#308

Draft
danbraunai-goodfire wants to merge 3 commits intomainfrom
feature/bf16
Draft

Speed up by reducing precision#308
danbraunai-goodfire wants to merge 3 commits intomainfrom
feature/bf16

Conversation

@danbraunai-goodfire
Copy link
Collaborator

@danbraunai-goodfire danbraunai-goodfire commented Dec 12, 2025

STATUS:
initial tests showed that bfloat16 everywhere was ~3 times faster but diverged wildly, and that mixed precision with torch.autocast was ~0.5x faster. But later tests in multi-node setup didn't seem to be faster at all. Needs investigating.

Description

Related Issue

Motivation and Context

How Has This Been Tested?

Does this PR introduce a breaking change?

@danbraunai-goodfire danbraunai-goodfire changed the title Support bfloat16 Speed up by reducing precision Dec 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant