Skip to content

Feature request: Neighborhood groups in milopy #49

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MaximilianNuber opened this issue Nov 18, 2024 · 2 comments
Closed

Feature request: Neighborhood groups in milopy #49

MaximilianNuber opened this issue Nov 18, 2024 · 2 comments

Comments

@MaximilianNuber
Copy link

MaximilianNuber commented Nov 18, 2024

Dear @emdann,

I like to use Milo for analyzing intra-celltype heterogeneity, particularly with neighborhood groups.
To my understanding, neighborhood groups are currently implemented only in R.
Therefore, I would like to ask if this feature will be added to milopy?

In case that there are no preparations for this, yet, would you be interested to implement neighborhood groups on basis of the following code? For any case, I would be thankful for any help to make it work correctly.

As an example I process the embryo gastrulation data from the MiloR tutorial in R, calculate neighborhood groups, transfer to python and calculate neighborhood groups for comparison with code I adapted from the clustering modules in scanpy.

image image

Code for examples:

python-code for find_nhood_groups:

from contextlib import contextmanager, suppress
def get_igraph_from_adjacency(adjacency, directed=None):
    """Get igraph graph from adjacency matrix. Copy/pasted from scanpy."""
    import igraph as ig

    sources, targets = adjacency.nonzero()
    weights = adjacency[sources, targets]
    if isinstance(weights, np.matrix):
        weights = weights.A1
    g = ig.Graph(directed=directed)
    g.add_vertices(adjacency.shape[0])  # this adds adjacency.shape[0] vertices
    g.add_edges(list(zip(sources, targets)))
    with suppress(KeyError):
        g.es["weight"] = weights
    if g.vcount() != adjacency.shape[0]:
        logg.warning(
            f"The constructed graph has only {g.vcount()} nodes. "
            "Your adjacency matrix contained redundant nodes."
        )
    return g

def find_nhood_groups(
    adata, 
    SpatialFDR_threshold = 0.1,
    merge_discord = False,
    max_lfc_delta=None,
    overlap=1,
    subset_nhoods=None,
):
    """Get igraph graph from adjacency matrix. Adjusted from scanpy louvain clustering."""
    nhs = adata.obsm["nhoods"].copy()
    nhood_adj = adata.uns["nhood_adata"].obsp["nhood_connectivities"].copy()
    da_res = adata.uns["nhood_adata"].obs.copy()
    is_da = np.asarray(da_res.SpatialFDR <= SpatialFDR_threshold)

    # da_res.loc[is_da, 'logFC'].values @ (da_res.loc[is_da, 'logFC']).T.values

    if merge_discord is False:
        # discord_sign = np.sign(da_res.loc[is_da, 'logFC'].values @ (da_res.loc[is_da, 'logFC'])) < 0
        # Assuming da.res is a pandas DataFrame and is.da is a boolean array
        discord_sign = np.sign(da_res.loc[is_da, 'logFC'].values @ da_res.loc[is_da, 'logFC'].values.T) < 0
        # nhood_adj[is_da, is_da][discord_sign] <- 0
        nhood_adj[(is_da, is_da)][discord_sign] = 0

    if overlap > 1:
        nhood_adj[nhood_adj < overlap] = 0

    if max_lfc_delta is not None:  
        lfc_diff = np.array([da_res['logFC'].values - x for x in da_res['logFC'].values])
        nhood_adj[np.abs(lfc_diff) > max_lfc_delta] = 0

    # binarise
    # nhood_adj = np.asarray((nhood_adj > 0) + 0)
    nhood_adj = (nhood_adj > 0).astype(int)

    g = get_igraph_from_adjacency(nhood_adj, directed = False)

    weights = None
    part = g.community_multilevel(weights=weights)

    groups = np.array(part.membership)

    adata.uns["nhood_adata"].obs["nhood_groups"] = groups

Example code -- R processing

%%R
suppressMessages(library(MouseGastrulationData))
library(miloR)
library(SingleCellExperiment)
library(scater)
library(scran)
library(dplyr)
library(patchwork)

select_samples <- c(2,  3,  6, 15,
                    # 4, 19, 
                    10, 14, 20, 30
                    #31, 32
                    )
embryo_data = EmbryoAtlasData(samples = select_samples)

embryo_data <- embryo_data[,apply(reducedDim(embryo_data, "pca.corrected"), 1, function(x) !all(is.na(x)))]
embryo_data <- runUMAP(embryo_data, dimred = "pca.corrected", name = 'umap')

embryo_milo <- Milo(embryo_data)
embryo_milo <- buildGraph(embryo_milo, k = 30, d = 30, reduced.dim = "pca.corrected")
embryo_milo <- makeNhoods(embryo_milo, prop = 0.1, k = 30, d=30, refined = TRUE, reduced_dims = "pca.corrected")
# plotNhoodSizeHist(embryo_milo)

embryo_milo <- countCells(embryo_milo, meta.data = as.data.frame(colData(embryo_milo)), sample="sample")

embryo_design <- data.frame(colData(embryo_milo))[,c("sample", "stage", "sequencing.batch")]

## Convert batch info from integer to factor
embryo_design$sequencing.batch <- as.factor(embryo_design$sequencing.batch) 
embryo_design <- distinct(embryo_design)
rownames(embryo_design) <- embryo_design$sample

embryo_milo <- calcNhoodDistance(embryo_milo, d=30, reduced.dim = "pca.corrected")

da_results <- testNhoods(embryo_milo, design = ~ sequencing.batch + stage, design.df = embryo_design, reduced.dim = "pca.corrected")

embryo_milo <- buildNhoodGraph(embryo_milo)

## Plot single-cell UMAP
umap_pl <- plotReducedDim(embryo_milo, dimred = "umap", colour_by="stage", text_by = "celltype", 
                          text_size = 3, point_size=0.5) +
  guides(fill="none")

## Plot neighbourhood graph
nh_graph_pl <- plotNhoodGraphDA(embryo_milo, da_results, layout="umap",alpha=0.1) 

## Add log normalized count to Milo object
embryo_milo <- logNormCounts(embryo_milo)

da_results$NhoodGroup <- as.numeric(da_results$SpatialFDR < 0.1 & da_results$logFC < 0)
da_nhood_markers <- findNhoodGroupMarkers(embryo_milo, da_results, subset.row = rownames(embryo_milo)[1:10])

## Run buildNhoodGraph to store nhood adjacency matrix
embryo_milo <- buildNhoodGraph(embryo_milo)

## Find groups
da_results <- groupNhoods(embryo_milo, da_results, max.lfc.delta = 2)
plotNhoodGroups(embryo_milo, da_results, layout="umap")

Transfer to python:

adata = anndata2ri.rpy2py(r('embryo_data'))
adata.obsm["nhoods"] = scipy2ri.rpy2py(r('nhoods(embryo_milo)'))

nhoodCounts = scipy2ri.rpy2py(r('(nhoodCounts(embryo_milo))'))
nhood_adata = anndata.AnnData(X=nhoodCounts)

with localconverter(default_converter+numpy2ri.converter):
    nhood_adata.obs_names = (r('rownames(nhoodCounts(embryo_milo))'))

with localconverter(default_converter+numpy2ri.converter):
    nhood_adata.var_names = (r('colnames(nhoodCounts(embryo_milo))'))

with localconverter(default_converter+pandas2ri.converter): 
    da_results = (r('da_results'))

nhood_adata.obs = pd.concat([nhood_adata.obs, da_results], axis = 1)

adata.uns["nhood_adata"] = nhood_adata

with localconverter(default_converter+numpy2ri.converter):
    adata.uns["nhood_adata"].obs["index_cell"] = r('colnames(embryo_milo)[unlist(nhoodIndex(embryo_milo))]')

adata.uns["nhood_adata"].obsp["nhood_connectivities"] = scipy2ri.rpy2py(r('nhoodAdjacency(embryo_milo)'))

nhood_sizes = adata.obsm["nhoods"].sum(axis=0)
nhood_sizes = np.asarray(nhood_sizes)
nhood_sizes = nhood_sizes.ravel()
adata.uns["nhood_adata"].obs["Nhood_size"] = nhood_sizes
# adata.uns["nhood_adata"].obs["nhood_groups"] = adata.uns["nhood_adata"].obs["nhood_groups"].astype("str")
%%R -o milo_graph
suppressMessages(library(dplyr))
milo_graph = nhoodIndex(embryo_milo) %>%
    unlist() %>%
    reducedDim(embryo_milo, "umap")[., ] %>%
    as.data.frame
milo_graph = milo_graph.loc[adata.uns["nhood_adata"].obs["index_cell"], :]

milo_graph = milo_graph.reset_index(drop = True)
milo_graph.index = np.asarray(milo_graph.index)+1
adata.uns["nhood_adata"].obsm["X_milo_graph"] = milo_graph.values

find_nhood_groups(adata, max_lfc_delta = 2)
adata.uns["nhood_adata"].obs["nhood_groups"] = adata.uns["nhood_adata"].obs["nhood_groups"].astype("str")
plt_df = pd.DataFrame(adata.uns["nhood_adata"].obsm["X_milo_graph"],
                      columns = ["UMAP_"+str(i+1) for i in range(2)]
                     )

plt_df["nhood_groups"] = adata.uns["nhood_adata"].obs["nhood_groups"].values
plt_df["nhood_groups"] = plt_df["nhood_groups"].astype(str)

plt_df["size"] = np.asarray(adata.uns["nhood_adata"].X.sum(axis = 1)).ravel()
plt_df = plt_df.sort_values("nhood_groups")

label_df = label_df = (
    plt_df
    .groupby("nhood_groups")
    .median()
)

from legendkit import size_legend, cat_legend, vstack


# groups = plt_df["nhood_groups"].unique()
groups = []
[groups.append(n) for n in plt_df["nhood_groups"] if n not in groups]
print(groups)

palette = [color for color in sns.color_palette("Paired", 12) for _ in range(len(groups))]
sns.scatterplot(
    x=plt_df["UMAP_1"],
    y=plt_df["UMAP_2"],
    size = plt_df["size"], #.apply(lambda x: x*0.2),
    hue = plt_df["nhood_groups"],#.astype("category").cat.codes,
    legend = False,
    linewidth=0,
    palette = sns.color_palette("Paired", 12)
)
sns.despine()
slegend = size_legend(adata.uns["nhood_adata"].obs.Nhood_size.to_list(), loc = "out right center",
           title = "Nhood size", )

color = dict(
    colors = sns.color_palette("Paired", 12),
    labels = groups
)

ax = plt.gca()
ax.set_title("Python Neighborhood groups", fontsize = 18)
handles, labels = ax.get_legend_handles_labels()

clegend = cat_legend(**color, title = "Nhood group", handle = "circle")
vstack([clegend, slegend], # title="Stack Vertically",
       loc="out right center", spacing=10, frameon=False, ax=ax)

for _group, row in label_df.iterrows():
    plt.text(row["UMAP_1"], row["UMAP_2"], _group, fontsize = 10)

Best regards,
max

@emdann
Copy link
Owner

emdann commented Nov 20, 2024

Hi Max! Thank you for taking the time to code this up. I am soon going to archive this python package, since we've moved and improved all the functionality in here within the Pertpy toolbox https://github.com/scverse/pertpy/tree/main (see this tutorial for example usage). I would recommend posting a feature request in pertpy - you can reference this issue - and we can take it from there! It should be relatively easy to implement if there's sufficient interest in this feature

@MaximilianNuber
Copy link
Author

Thank you for your reply!
I submitted the issue here: scverse/pertpy#680

Best regards, max

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants