-
Notifications
You must be signed in to change notification settings - Fork 27
Description
Hi there,
Thank you very much for developing this wonderful tool for data integration.
However, I have encountered some reproducibility issues in my analysis and would really appreciate your suggestions. When running BBKNN with the code below, I find that the final UMAP embeddings are not reproducible, even though I have set all the random seeds that I can think of.
np.random.seed(202504)
random.seed(202504)
numba.set_num_threads(1)
adata=sc.read_h5ad("~/adata.h5ad")
adata
I run BBKNN using either of the following configurations:
%%time
sc.external.pp.bbknn(adata, batch_key="sample_id",use_rep='X_pca', n_pcs=50,neighbors_within_batch=3,) # (which uses the default Annoy backend)
or
%%time
sc.external.pp.bbknn(adata, batch_key="sample_id",use_rep='X_pca', n_pcs=30, neighbors_within_batch=3, computation="pynndescent", pynndescent_n_neighbors=100, pynndescent_random_state=0)
sc.tl.umap(adata_int_PBMC,random_state=202504)
2 results showed here:
I would like to ask whether this behavior is expected, because I use the same code in another datasets, I did not detect this question and if there are additional settings or recommended practices to achieve fully reproducible results when using BBKNN and UMAP.
Thank you very much for your time and help.