Reproducibility of BBKNN results

Hi there,

Thank you very much for developing this wonderful tool for data integration.

However, I have encountered some reproducibility issues in my analysis and would really appreciate your suggestions. When running BBKNN with the code below, I find that the final UMAP embeddings are not reproducible, even though I have set all the random seeds that I can think of.

np.random.seed(202504)
random.seed(202504)
numba.set_num_threads(1)

adata=sc.read_h5ad("~/adata.h5ad")
adata

**I run BBKNN using either of the following configurations:**
%%time
sc.external.pp.bbknn(adata, batch_key="sample_id",use_rep='X_pca', n_pcs=50,neighbors_within_batch=3,) # (which uses the default Annoy backend)

**or**

%%time
sc.external.pp.bbknn(adata, batch_key="sample_id",use_rep='X_pca', n_pcs=30, neighbors_within_batch=3, computation="pynndescent", pynndescent_n_neighbors=100, pynndescent_random_state=0)

sc.tl.umap(adata_int_PBMC,random_state=202504)

2 results showed here:

<img width="362" height="370" alt="Image" src="https://github.com/user-attachments/assets/fc27faa0-c12c-4be8-ba7f-305d572006f6" />

<img width="359" height="370" alt="Image" src="https://github.com/user-attachments/assets/a3bb23ff-a30d-4bbe-8830-e20f7d4fe149" />

I would like to ask whether this behavior is expected, because I use the same code in another datasets, I did not detect this question and if there are additional settings or recommended practices to achieve fully reproducible results when using BBKNN and UMAP.

Thank you very much for your time and help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducibility of BBKNN results #66

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reproducibility of BBKNN results #66

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions