Hi Kaspar,
I tried replicating the results in your post in PyTorch, and I'm unable to get even close to the kind of results you display on your blog. I am sure there is an error on my end somewhere, but I have poured over the paper and your code and your blog post and I'm unable to see anything that could be behind it. I had a friend look over my implementation too and were unable to spot anything of substance different.
I have tried as best as possible to follow the architecture and setup of your experiment 1, and I see very different behavior. The code is very simple, It's afterall only a handful of simple nns,
https://github.com/chrisorm/Machine-Learning/blob/ngp/Neural%20GP.ipynb
Some things I witness that you don't seem to see (shown in the notebook):
-My q distribution concentrates (i.e. std goes to 0).
- Related to above, I see no substantial difference between function samples when extrapolating outside of the data like you seem to.
-My prior function samples display substantially less variance than yours seem to.
The first led me to suspect an error in my KLD term, but that does not seem to be the case - I unit tested my implementation and I think it is correct. The loss looks good and the network clearly converges.
The second is a bit stranger - do you perhaps use some particular initialization of the weights to draw these samples, over and above setting z ~ N(0,1)?
Would you happen to have any insights as to what may be behind this difference?
Thanks for taking the time to do your post, it has some really great insights into the method!
Chris
Hi Kaspar,
I tried replicating the results in your post in PyTorch, and I'm unable to get even close to the kind of results you display on your blog. I am sure there is an error on my end somewhere, but I have poured over the paper and your code and your blog post and I'm unable to see anything that could be behind it. I had a friend look over my implementation too and were unable to spot anything of substance different.
I have tried as best as possible to follow the architecture and setup of your experiment 1, and I see very different behavior. The code is very simple, It's afterall only a handful of simple nns,
https://github.com/chrisorm/Machine-Learning/blob/ngp/Neural%20GP.ipynb
Some things I witness that you don't seem to see (shown in the notebook):
-My q distribution concentrates (i.e. std goes to 0).
-My prior function samples display substantially less variance than yours seem to.
The first led me to suspect an error in my KLD term, but that does not seem to be the case - I unit tested my implementation and I think it is correct. The loss looks good and the network clearly converges.
The second is a bit stranger - do you perhaps use some particular initialization of the weights to draw these samples, over and above setting z ~ N(0,1)?
Would you happen to have any insights as to what may be behind this difference?
Thanks for taking the time to do your post, it has some really great insights into the method!
Chris