Skip to content

understanding the weight learning algorithm #18

@gittea-rpi

Description

@gittea-rpi

In the paper, the weights are the solution to equation (8), which minimizes the squared frobenius norms of the weighted RFF covariance matrices for each pair of features, subject to the constraint that the weights are a probability distribution.

In the code, the weight_learner function solves this problem (?) by using gradient descent on a modified objective that combines the squared frobenius norms of the weighted RFF covariance matrices and a lp norm of the weight vector. What is the purpose of the lp norm on the weight vector (which is already created using softmax on logits, so it is a probability vector)?

Does this somehow ensure that the logits don't go off to infinity? If that is the aim, why not directly regularize by the size of the logits?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions