Skip to content

Add coefficient scheduling (warmup + anneal) to importance minimality loss#439

Open
Antovigo wants to merge 2 commits intodevfrom
feature/impmin_scheduling
Open

Add coefficient scheduling (warmup + anneal) to importance minimality loss#439
Antovigo wants to merge 2 commits intodevfrom
feature/impmin_scheduling

Conversation

@Antovigo
Copy link
Collaborator

Description

Add coefficient scheduling to the importance minimality loss. Four new config fields on ImportanceMinimalityLossConfig:

  • coeff_peak_multiplier — multiplier applied to the loss coeff at the peak
  • coeff_anneal_start_frac / coeff_anneal_end_frac — linearly anneal the multiplier back to 1.0
  • coeff_warmup_frac — linearly ramp the loss coefficient from 0 to the peak value (coeff*coeff_peak_multiplier) over this fraction of training

Also adds a config validator ensuring scheduling fractions are ordered correctly (including for the existing p_anneal_* fields), and moves the p_anneal ordering assertion from the loss function into the validator for consistency.

Related Issue

NA

Motivation and Context

Allows experimenting with non-constant importance minimality loss weighting — e.g. starting with a stronger sparsity pressure and relaxing it, or warming up the loss gradually.

How Has This Been Tested?

Tested on a pile_llama transformer and on the resid_mlp2 toy model.

Does this PR introduce a breaking change?

No. The default values for all new fields preserve existing behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant