Add script for converting to a HF model by danbraunai · Pull Request #37 · simple-stories/simple_stories_train

danbraunai · 2025-08-02T15:37:59Z

Description

TODO:

Script for uploading a trained model to HF. It should first convert to hf then upload. May require the user to have environment variables to authenticate with HF.
Adds convert_to_hf.py which contains convert_llama_model_to_hf for converting our custom Llama models to a HF LlamaForCausalLM model.
Tests that the above works for our canonical runs

Misc changes separate to main thrust of this PR:

Adds rms_norm_eps: float = 1e-6 to the config. Without this, we have to hardcode 1e-6 when converting to a HF model, which is dangerous.
Re-enable tests in CI
Fix all linting errors.

Related Issue

Closes #35

Motivation and Context

Our models are hosted on HF, but we don't have code to convert our LLama models to HF models.

How Has This Been Tested?

Added tests/test_hf_compatibility which tests that each of our canonical models can be converted and produce the same logits on the same inputs (with both the custom and hf tokenizers).

Does this PR introduce a breaking change?

No

danbraunai · 2025-08-02T15:38:33Z

.github/workflows/checks.yaml

Separate to the main thrust of this PR. Not sure why this was commented out.

danbraunai · 2025-08-02T15:39:04Z

simple_stories_train/models/llama.py


 from simple_stories_train.utils import print0

+# pyright: reportAttributeAccessIssue=false


I noticed that we had a bunch of linter errors without things like this. I didn't fix them all "properly", not worth it atm.

danbraunai · 2025-08-02T15:39:45Z

simple_stories_train/models/llama.py

    )  # Note that llama 3.1 n_key_value_heads does not scale with n_heads
    use_grouped_query_attention: bool = True
    flash_attention: bool = True
+    rms_norm_eps: float = 1e-6


Without this, we have to hardcode 1e-6 when converting to a HF model, which is dangerous.

danbraunai · 2025-08-02T15:40:30Z

pyproject.toml

Separate to the main thrust of this PR, sorry.

chandanms · 2025-08-03T14:58:09Z

tests/test_hf_compatibility.py

+
+    # Load the custom model
+    model_config = MODEL_CONFIGS[model_size]
+    custom_model = Llama.from_pretrained(f"SimpleStories/SimpleStories-{model_size}", model_config)


I think a better test here is to initiate a model from Llama, convert it and then compare the outputs as Llama.from_pretrained(f"SimpleStories/SimpleStories-{model_size}", model_config) loads directly from HF. Something like

custom_model = Llama(model_config) hf_model = convert_llama_model_to_hf(custom_model) # compare the outputs

@chandanms Oh good spot. The Llama.from_pretrained is actually broken. The line which loads in the weights is:

model.load_state_dict(state_dict, strict=False)

The keys of the Llama model.state_dict() are completely different to the LlamaForCausalLM HF state_dict that's being downloaded, but strict=False hides everything. If this method was to actually load the LlamaForCausalLM weights into the Llama model, then there would need to be a function that's basically the inverse of the new convert_llama_model_to_hf created in this PR.

I think perhaps the Llama.from_pretrained() does work when it's actually loading some kind of Llama weights from HF, as opposed to LlamaForCausalLM weights? Though we don't seem to save any Llama weights on HF (which is reasonable).

I've added Issue #38 for this. This PR shouldn't be sorted until that's done. We should probably have both convert_llama_to_llama_for_causal_lm as well as convert_llama_for_causal_lm_to_llama, with tests for both of them. We should also avoid strict=False for all of these. (cc @lennart-finke, heads up for this bug)

I probably won't get to this tomorrow or maybe Tuesday. But if it's not fixed in a few days then I'll try to make sure to do it as it's a pretty pressing issue.

Ah yes. The converted HF model would have to be converted back into Llama. I think currently, Llama.from_pretrained can only load models that are locally trained. I will try to take this up tomorrow

This reverts commit 2c9aedb. Wrong commit with pytest!

…on the backward compatibility

Added conversion scripts and corresponding tests

danbraunai-goodfire added 5 commits August 2, 2025 15:19

Add rms_norm_eps argument to config with default 1e-6

db43035

Fix type issues

18e485a

Add convert_to_hf.py script and tests

14417b0

Re-enable tests in CI

3d3d4f5

Use torch>=2.6 in pyproject.toml

737e450

danbraunai commented Aug 2, 2025

View reviewed changes

.github/workflows/checks.yaml

Copy link

Collaborator Author

danbraunai Aug 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate to the main thrust of this PR. Not sure why this was commented out.

danbraunai commented Aug 2, 2025

View reviewed changes

pyproject.toml

Copy link

Collaborator Author

danbraunai Aug 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate to the main thrust of this PR, sorry.

danbraunai-goodfire added 2 commits August 2, 2025 15:44

Improve naming

7cd7276

Fix type errors

1f10f61

chandanms reviewed Aug 3, 2025

View reviewed changes

Added conversion scripts and corresponding tests

0158c78

chandanms mentioned this pull request Aug 6, 2025

Added conversion scripts and corresponding tests #39

Merged

chandanms added 6 commits August 6, 2025 16:49

Fixed pyright issues

53af81d

Marked a test as slow since it downloads all models from HF

2c9aedb

Revert "Marked a test as slow since it downloads all models from HF"

1d510d1

This reverts commit 2c9aedb. Wrong commit with pytest!

Marked a test as slow since it downloads all models from HF

60579a5

corrected the docstring of a test case. Made it more verbose to menti…

d1bcdd5

…on the backward compatibility

Merge pull request #39 from chandanms/fix/llama-from-pretrained-loading

99ff967

Added conversion scripts and corresponding tests

danbraunai changed the base branch from main to dev August 13, 2025 16:32

danbraunai marked this pull request as ready for review August 13, 2025 16:32

danbraunai merged commit 3502203 into dev Aug 13, 2025
1 check passed

danbraunai mentioned this pull request Aug 14, 2025

Conversion Script to HF #35

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add script for converting to a HF model#37

Add script for converting to a HF model#37
danbraunai merged 14 commits intodevfrom
feature/convert-to-hf

danbraunai commented Aug 2, 2025 •

edited

Loading

Uh oh!

danbraunai Aug 2, 2025

Uh oh!

danbraunai Aug 2, 2025

Uh oh!

danbraunai Aug 2, 2025

Uh oh!

danbraunai Aug 2, 2025

Uh oh!

chandanms Aug 3, 2025 •

edited

Loading

Uh oh!

danbraunai Aug 3, 2025 •

edited

Loading

Uh oh!

chandanms Aug 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		from simple_stories_train.utils import print0

		# pyright: reportAttributeAccessIssue=false

Conversation

danbraunai commented Aug 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Misc changes separate to main thrust of this PR:

Related Issue

Motivation and Context

How Has This Been Tested?

Does this PR introduce a breaking change?

Uh oh!

danbraunai Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

danbraunai Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

danbraunai Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

danbraunai Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

chandanms Aug 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danbraunai Aug 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chandanms Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

danbraunai commented Aug 2, 2025 •

edited

Loading

chandanms Aug 3, 2025 •

edited

Loading

danbraunai Aug 3, 2025 •

edited

Loading