Skip to content

Add ability to binarise / discretise targets based on threshold or distribution #34

@hmacdope

Description

@hmacdope

Equivalent issue in models OpenADMET/openadmet-models#37

We need the ability to discretise continuous data into binary or multiclass labels based on two main conditions

Based on thresholds:
Thresholds (preferably based on some kind of scientifically reasonable criterion)

Polaris folks have a great example of how this is done here:

https://github.com/polaris-hub/auroris/blob/main/auroris/curation/actions/_discretize.py

Based on the distribution of values. (ie equal bins, quartiles or clustering). We need to discuss this further as is likely only applicable in certain situations where dynamic range is large.

At this point its unclear whether we should implement here or in models, just collecting issues here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions