-
Notifications
You must be signed in to change notification settings - Fork 0
Description
General idea: the hypernetwork effective initializes an infinite (many in practice) weights. Optimization should therefore explore a far wider range of local optimas. See figure for toy example of this idea.
We should do two things to explore this:
-
Examine the diversity in learned weights: given a known multimodal loss task (dataset+target) we should be able to use UMAP to visualize the final trained weights to see that sampled weights cluster in distinct optimas. We could actually use this to visualize the loss surface by plotting the loss of each weight sample in the umap embedding space! Would be a very cool figure.
-
We can also have a post-hoc weight selection step to select learned theta samples that minimize a validation loss. This can be seen as exploring multiple minimas and then selecting the resulting weight of the global optima. See figure above for toy example; we select the weights in the deepest trough.
