3CPO

Poisson Subspace Clustering: Focusing on the Essentials in Count Data

Install Requirements

To install the requirements do the following:

pip install numpy

pip install git+https://github.com/collinleiber/ClustPy.git

Datasets

The used datasets are available here:

Wholesales: https://archive.ics.uci.edu/dataset/292/wholesale+customers
SportA: https://archive.ics.uci.edu/dataset/450/sports+articles+for+objectivity+analysis
Optdigits: https://archive.ics.uci.edu/dataset/80/optical+recognition+of+handwritten+digits
BBCSports: http://mlg.ucd.ie/datasets/bbc.html
BBCNews: http://mlg.ucd.ie/datasets/bbc.html
WebKB: https://www.cs.cmu.edu/~webkb/
Reuters: https://archive.ics.uci.edu/dataset/137/reuters+21578+text+categorization+collection
20NewsG: https://archive.ics.uci.edu/dataset/113/twenty+newsgroups (https://scikit-learn.org/0.19/datasets/twenty_newsgroups.html)
MouseAtlas: https://github.com/JinmiaoChenLab/Batch-effect-removal-benchmarking/tree/master/Data/dataset2 (renamd "filtered_total_batch1_seqwell_batch2_10x.txt" -> "mouse_cell_atlas.txt")
GeneExp: https://archive.ics.uci.edu/dataset/401/gene+expression+cancer+rna+seq
HDendritic: https://github.com/JinmiaoChenLab/Batch-effect-removal-benchmarking/tree/master/Data/dataset1 (renamed "dataset1_sm_uc3.txt" -> "human_dendritic_cells.txt")

Comparison Algorithms

The comparison algorithms Spherical-k-Means (SKM), PoissonL and PoissonC were implemented by us and can be found in the competitors.py file.

The co-clustering algorithms CROINFO, CoclustMod, CoclustSpecMod, ELBM, SELBM and TauCC are contained in the coclustering directory and were originally obtained here:

CROINFO, CoclustMod, CoclustSpecMod: https://github.com/franrole/cclust_package / https://www.jstatsoft.org/article/view/v088i07
ELBM, SELBM: https://github.com/Saeidhoseinipour/ELBMcoclust
TauCC: https://github.com/rupensa/tauCC

Experiments

You can test 3CPO manually by testing on some dataset.

from threecpo import ThreeCPO
from datasets import load_synth_data

X, L = load_synth_data() # Replace by any other dataset
n_clusters = len(np.unique(L))
threeCPO = ThreeCPO(n_clusters=n_clusters)
threeCPO.fit(X)

Our results and executions can be obtained by running the methods within the experiments.py file. The methods include the whole pipeline (loading datasets, running algorithms, evaluation). Examples:

from threecpo import experiment_table, experiment_text_data, experiment_ablations, experiment_initializations, experiment_robustness_amount_noise_columns, experiment_robustness_maximum_noise_value, experiment_runtime_rows, experiment_runtime_columns, experiment_estiamte_k, load_synth_data

experiment_table()
experiment_text_data()
experiment_ablations()
experiment_initializations()
experiment_robustness_amount_noise_columns()
experiment_robustness_maximum_noise_value()
experiment_runtime_rows()
experiment_runtime_columns()
X, L = load_synth_data(return_X_y=True)
experiment_estimate_k(X, L)

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
coclustering		coclustering
README.md		README.md
competitors.py		competitors.py
datasets.py		datasets.py
experiments.py		experiments.py
requirements.txt		requirements.txt
threecpo.py		threecpo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

3CPO

Install Requirements

Datasets

Comparison Algorithms

Experiments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

3CPO

Install Requirements

Datasets

Comparison Algorithms

Experiments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages