-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
This program is very inefficient because it's written in Ruby. Make it faster.
Also, it should reject datasets that:
- have repeated features
- have features which are not sorted
- have not exactly one space between features and labels
Says Benoit Favre:
The dataset checker for mutliclass does not check that the dataset
is well formed, it just counts features/labels: it should check that
labels go from 1 to N, features go from 1 to M in ascendent manner,
with valid values (no infinity), that empty examples do not exist
(only a label, no features), that feature ids are not sparse (most
classifiers fail on that one).
Metadata
Metadata
Assignees
Labels
No labels