Skip to content

Reproducibility of BBKNN results #66

@liangyf24

Description

@liangyf24

Hi there,

Thank you very much for developing this wonderful tool for data integration.

However, I have encountered some reproducibility issues in my analysis and would really appreciate your suggestions. When running BBKNN with the code below, I find that the final UMAP embeddings are not reproducible, even though I have set all the random seeds that I can think of.

np.random.seed(202504)
random.seed(202504)
numba.set_num_threads(1)

adata=sc.read_h5ad("~/adata.h5ad")
adata

I run BBKNN using either of the following configurations:
%%time
sc.external.pp.bbknn(adata, batch_key="sample_id",use_rep='X_pca', n_pcs=50,neighbors_within_batch=3,) # (which uses the default Annoy backend)

or

%%time
sc.external.pp.bbknn(adata, batch_key="sample_id",use_rep='X_pca', n_pcs=30, neighbors_within_batch=3, computation="pynndescent", pynndescent_n_neighbors=100, pynndescent_random_state=0)

sc.tl.umap(adata_int_PBMC,random_state=202504)

2 results showed here:

Image Image

I would like to ask whether this behavior is expected, because I use the same code in another datasets, I did not detect this question and if there are additional settings or recommended practices to achieve fully reproducible results when using BBKNN and UMAP.

Thank you very much for your time and help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions