-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
Just a quick comment: I did a little bit of profiling with this code (thanks by the way, great to have a clean reference) and it seems that for the TopKSAE at least the aux_loss and the l1_loss do not significantly alter training performance. The l2_loss alone seems to work just as well.
l2_loss = (x_reconstruct.float() - x.float()).pow(2).mean()
variance = ((x - x.mean(0)) ** 2).mean()
l1_norm = acts_topk.float().abs().sum(-1).mean()
l1_loss = self.cfg["l1_coeff"] * l1_norm
l0_norm = (acts_topk > 0).float().sum(-1).mean()
aux_loss = self.get_auxiliary_loss(x, x_reconstruct, acts)
loss = l2_loss + l1_loss + aux_loss
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels