Releases · HomebrewML/HeavyBall

20 Dec 12:08

ClashLuke

v2.2.3

5f0abc0

SCION + Split Optimizer Latest

Latest

SplitOptimizer, following Andreas Kirsch's research on continual learning (https://x.com/BlackHC/status/2001961535120568542)
Approximate SCION
Numpy 2.0.0 support

Assets 2

15 Oct 21:19

ClashLuke

v2.2.1

182bea0

Configurable Division

By default, HeavyBall's division differs from the industry standard, potentially giving meaningfully different results for otherwise identical optimizer hyperparameters.

You can now set heavyball.utils.default_division_backend to one of

eps_clamp, HeavyBall's default (x / y.clamp(min=eps))
eps_add, Standard, used by PyTorch, Optax and others (x / (y + eps))
atan2, following Adam-Atan2 (atan2(x / scale, y) * scale) - may require heavyball.utils.atan2_scale to be to clamp to a different range of target values
nan_to_0, resulting in (x/y).nan_to_num(0, 0, 0)

Assets 2

12 Oct 21:25

ClashLuke

v2.2.0

599189c

NAdam + AdEMAMix

This release focusses on adding new optimizers

NAdam (following @tom-jod's research)
AdEMAMix
SOAPNAdam - SOAP with NAdam in the eigenbasis
SOAPAdEMAMix - SOAP with AdEMAMix in the eigenbasis

Note that this changes the previous SOAP infrastructure. SOAP variants manually created for the previous version will not work out of the box, but can be trivially converted.

Contributors

tom-jod

Assets 2

11 Oct 22:58

ClashLuke

v2.1.4

4e12207

Bugfixes, Memory reduction, Save/Restore

torch autocasts psgd's internal step from int64->fp64, which caused a mismatch in states before/after loading
an unbounded lru_cache, used to speed up parameter accesses, may have kept parameters around indefinitely
with psgd, caution=True and foreach=False, caution was only applied on the first parameter
psgd quad with bf16 parameters tried to multiply a bf16 matrix with an fp32 matrix
psgd's preconditioner_update_probability was ignored if set to 0 - resulting in the default schedule being used
not all optimizers were exposed in __all__
psgd's scheduler was not stepped properly, causing the scheduler to remain at 100% precond update probability
msign/thinky_polar_express (a zeroth-power backend) always returned fp32 tensors, where it should've been adaptive to the input dtype
soap's initial eigh did not put the tensors back into their original dtype
the built-in ema exited after the first empty parameter group, potentially skipping updates
the built-in ema was updated once for each parameter group, causing different effective ema betas for different param group counts
finite differences hvp did not divide by the epsilon scaling
pointwise_lr_adaptation called lr_adaptation - not pointwise_lr_adaptation
fused_hook may have processed 1-tensor models incorrectly

Assets 2

09 Oct 12:49

ClashLuke

v2.1.2

f7d7476

Stable Muon

#78, #79 from @sozforex fix Muon's norm calculation and backport Polar Express' NewtonSchulz-10 iteration

Contributors

sozforex

Assets 2

21 Sep 16:28

ClashLuke

v2.0.0

59e0a5f

Stability, Stability, Stability (and bug-fixes)

Higher numerical stability in Newton-Schulz orthogonalization (affects: Muon)
Higher accuracy in SVD computation (affects: PSGD)
Advanced checkpointing (affects: old checkpoints, SOAP and PSGD)
Reworked chainable backend, allowing more freedom in function composition (affects: custom optimizers)

For the full release notes and migration instructions, see here

Benchmark
- extended to real-world tasks (#76 & #75, by @tom-jod)
- improve hyperparam tuning; include the newly required depencies in the requirements (#70, by @MirzaSamad20)
- fixed dependencies and configurations (#60, by @tom-jod)
Optimizers
- AdamC implemented (#65, by @Ryu1845 & #77, by @drexalt)
- Fixed all clipping algorithms with non-default clipping thresholds, add clipping tests (#73 & #74, by @alexjwilliams)
- More accurate NS5 iterations, backporting Muon research (#69, by @xTimeCrystal)

Contributors

alexjwilliams, drexalt, and 4 other contributors

Assets 2

08 Mar 16:49

ClashLuke

v1.6.3

388190f

Fixed SOAP, HVP PSGD

Bugfixes:

@francois-rozet fixed a severe convergence regression in SOAP. It's now faster and converges better than before (#42)
ADOPT now correctly matches the paper, significantly improving its convergence
FP64 storage and/or computation now works for more optimizers

Improvements:

NewtonPSGD now supports exact HVP calculation instead of the previous approximate. (Handles BatchNorm better but doesn't support all architectures.)
"smart_one_diag" is a next-to-no-downsides memory_save_mode for PSGD. It reduces memory and compute cost compared to memory_save_mode=None and improves convergence compared to memory_save_mode="one_diag"*

*Instead of preconditioning all dimensions (memory_save_mode=None) or preconditioning all but the largest dimension (memory_save_mode="one_diag") we remove the largest dimension iff it's larger than the second largest. So, a Linear(128, 1024) will now create one 128x128 preconditioner (instead of 128x128 + 1024x1024, 8x as large as the parameters), while a Linear(128, 128) can still benefit from preconditioning both sides.

Contributors

francois-rozet

Assets 2

18 Jan 07:57

ClashLuke

v1.5.2

512ffd0

OrthoGrad & PSGD improvements

General
- precond_schedule matches its docs (@francois-rozet, #31)
- unified warmup_steps API (@francois-rozet, #32 )
- add eps arg to scale_by_adam (#33)
- allow external management of LR (for foreach=True optimizers)
OrthoGrad, a "grokking-first" optimizer that works
PSGD
- no more OOM in torch.linalg.solve
- speed up cache by skipping it when it wouldn't give speedups
- add newton-PSGD ("hvp-PSGD") using finite-difference approximation
- caution momentum, not update (-> improved convergence; closer to paper's intention)
Benchmarks
- grokking benchmark, using modular addition and wide MLPs

Contributors

francois-rozet

Assets 2

01 Jan 15:50

ClashLuke

v1.4.0

0519edb

Fix PSGD, spring cleaning

Previously, only the first parameter of PSGD was trained; This is fixed now
All PSGDs were PurePSGD - now momentum_into_precond_update and exp_avg_input have their expected effect again
preliminary support for external changes of group['lr']

Assets 2

18 Dec 17:54

ClashLuke

v1.3.0

9a20be2

v1.3.0

fixes: in 1.2.x (not 1.1.x), all optimizers were SGD; AdamW now runs AdamW again
heavyball.utils.disable_caution_scaling implements the behavior documented here
SOAP converges well again

Assets 2

Releases: HomebrewML/HeavyBall

SCION + Split Optimizer

Uh oh!

Configurable Division

Uh oh!

NAdam + AdEMAMix

Contributors

Uh oh!

Bugfixes, Memory reduction, Save/Restore

Uh oh!

Stable Muon

Contributors

Uh oh!

Stability, Stability, Stability (and bug-fixes)

Contributors

Uh oh!

Fixed SOAP, HVP PSGD

Contributors

Uh oh!

OrthoGrad & PSGD improvements

Contributors

Uh oh!

Fix PSGD, spring cleaning

Uh oh!

v1.3.0

Uh oh!