
I am attaching a graph comparing loss curves for my model across two commits of HeavyBall
be36047 and
c48bcaf
There was no other change in my code, not in the model or data. My call to the optimizer looks as follows:
optimizer = heavyball.ForeachPSGDKron(
opt_grouped_params,
lr=2e-3,
weight_decay=5e-4,
warmup_steps=1024*16,
)
Granted c48bcaf is not current HEAD, but I was taken aback by this sudden and drastic loss of performance. @ClashLuke Any hints?