Skip to content

multiclass-utils #5

@percyliang

Description

@percyliang

This program is very inefficient because it's written in Ruby. Make it faster.

Also, it should reject datasets that:

  • have repeated features
  • have features which are not sorted
  • have not exactly one space between features and labels

Says Benoit Favre:
The dataset checker for mutliclass does not check that the dataset
is well formed, it just counts features/labels: it should check that
labels go from 1 to N, features go from 1 to M in ascendent manner,
with valid values (no infinity), that empty examples do not exist
(only a label, no features), that feature ids are not sparse (most
classifiers fail on that one).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions