Skip to content

Create multiple versions of NoVote_Min_Cost for data quality -- Esoteric, fix last #19

@ipeirotis

Description

@ipeirotis

The NoVote_Min_Cost uses the value of the prior probabilities to define what is the baseline cost of a "strategic spammer"

One key thing is that the prior probabilities, which can be estimated in different ways:

  1. Use fixed priors, passed by the user in the categories.txt file (preferred)
  2. Estimate the priors from the evaluation data (measure percentage of objects in different categories in the evaluation data)
  3. Estimate the priors from the training data, (DS.categories.getPrior when running without fixed priors, if running with fixed priors we need to measure percentage of objects in different categories). This generates the problem that DS reports different priors than MV.

I would put an advanced switch in the command line to determine what type of prior to use for the normalization. By default it should be (1), with a secondary preference for (2). The option (3) [which is the current implementation, when we do not have fixed priors, and uses the DS priors] should come with a warning.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions