aux and l1 loss may not be worthwhile for topksae

Just a quick comment: I did a little bit of profiling with this code (thanks by the way, great to have a clean reference) and it seems that for the `TopKSAE` at least the `aux_loss` and the `l1_loss` do not significantly alter training performance. The `l2_loss` alone seems to work just as well.


```
      l2_loss = (x_reconstruct.float() - x.float()).pow(2).mean()
      variance = ((x - x.mean(0)) ** 2).mean()
      l1_norm = acts_topk.float().abs().sum(-1).mean()
      l1_loss = self.cfg["l1_coeff"] * l1_norm
      l0_norm = (acts_topk > 0).float().sum(-1).mean()
      aux_loss = self.get_auxiliary_loss(x, x_reconstruct, acts)
      loss = l2_loss + l1_loss + aux_loss
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aux and l1 loss may not be worthwhile for topksae #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

aux and l1 loss may not be worthwhile for topksae #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions