understanding the weight learning algorithm

In the paper, the weights are the  solution to equation (8), which minimizes the squared frobenius norms of the weighted RFF covariance matrices for each pair of features, subject to the constraint that the weights are a probability distribution. 

In the code, the weight_learner function solves this problem (?) by using gradient descent on a modified objective that combines the squared frobenius norms of the weighted RFF covariance matrices and a lp norm of the weight vector. What is the purpose of the lp norm on the weight vector  (which is already created using softmax on logits, so it is a probability vector)?

Does this somehow ensure that the logits don't go off to infinity? If that is the aim, why not directly regularize by the size of the logits?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

understanding the weight learning algorithm #18

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

understanding the weight learning algorithm #18

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions