Skip to content

FETA subsampling not working #160

@timokau

Description

@timokau

While working on #116, I noticed that the sub_sampling function of feta_network is broken. Its not exercised in our standard test-suite, since its only needed when the number of objects is higher than the 5 objects our testsuite uses.

The function is implemented as follows:

def sub_sampling(self, X, Y):
    if self.n_objects_fit_ > self.max_number_of_objects:
        bucket_size = int(self.n_objects_fit_ / self.max_number_of_objects)
        idx = self.random_state_.randint(
            bucket_size, size=(len(X), self.n_objects_fit_)
        )
        # TODO: subsampling multiple rankings
        idx += np.arange(start=0, stop=self.n_objects_fit_, step=bucket_size)[
            : self.n_objects_fit_
        ]
        X = X[np.arange(len(X))[:, None], idx]
        Y = Y[np.arange(len(X))[:, None], idx]
        tmp_sort = Y.argsort(axis=-1)
        Y = np.empty_like(Y)
        Y[np.arange(len(X))[:, None], tmp_sort] = np.arange(self.n_objects_fit_)
    return X, Y

and breaks at the idx += line because of a dimension mismatch. It's trying to concatenate arrays like

[[0 1 0 0 0]
 [0 0 1 1 0]]

and

[0 2 4]

i.e. a 2d array with a 1d array. I'm not sure how this sampling is supposed to work. Is the intention documented somewhere @kiudee @prithagupta?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions