This repository is the official implementation of DISCO: Internal Evaluation of Density-Based Clustering with Noise Labels.
Notebook to generate this can be found here.
DISCO is designed on a pointwise level:
The code used to generate these examples can be found here.
Internal CVIs should reflect what external CVIs, like ARI, indicate when the ground truth is given.
To assess this, we regard the Pearson Correlation Coefficient (PCC) between the CVI scores and the ARI values across various common clusterings.
The shown PCC indicates the correlation between the CVIs given in the columns for the datasets given in the rows.
As some edge cases, e.g., singleton clusters, are not defined for some CVIs, the PCC cannot always be computed (stated with --).
The scripts used to generate the results can be found here.
Our repository is structured as follows:
.
│
├── data # dataset infos
│
├── datasets
│ ├── DENSIRED # data generator
│ ├── synth # synthetic data
│ └── ... # file to provide access to data
│
├── imgs # image files (plots, figures)
│ └── ...
│
├── src
│ ├── Clusterer # implementations for clustering methods
│ ├── Evaluation # implementations of CVIs
│ ├── Experiments # experiment scripts
│ │ ├── DatasetsJupyterNotebooks # datasets
│ │ ├── JupyterNotebooks_Analysis # notebooks to analyse datasets
│ │ ├── JupyterNotebooks_SyntheticExperiments # notebooks to generate experiment results
│ │ ├── scripts # additional experiments
│ │ └── ...
│ ├── utils # colors, metrics, utility functions
│ ├── __init__.py # init file
│ └── __setup.ipynb # notebook for setup
│
├── .gitignore # ignore files should not commit to Git
└── README.md # project description We performed an ablation study to assess the sensitivity of DISCO regarding its only hyperparameter 
The experiments for the ablation study can be found here. We set our hyperparameter to
| Method | Hyperparameter | Value |
|---|---|---|
| CDBW | number of representative points | 10 |
| CVDD | number of neighborhoods | 7 |
| CVNN | number of nearest neighbors | 10 |
| DCSI | corepoints | 5 |
| --------- | --------------------------------- | ------- |
| DISCO | min pts | 5 |
| --------- | --------------------------------- | ------- |
