Skip to content

cell_selection

Anna Vlot edited this page Jan 5, 2022 · 2 revisions

from_gui(umap, figsize=(1500, 1500))
Creates a figure widget to manually select cells from a 2D UMAP embedding.

Arguments
umap: matrix-like
    The input coordinates of the 2D UMAP representation of the cells in a sample.
figsize: (int, int)
    The size of a figure to pass to the plotly graph_object.

Returns a plotly graph_object


from_gui_3d(umap, figsize=(1500, 1500))
Creates a figure widget to manually select cells from a 3D UMAP embedding.

Arguments
umap: matrix-like
    The input coordinates of the 3D UMAP representation of the cells in a sample.
figsize: (int, int)
    The size of a figure to pass to the plotly graph_object.

Returns a plotly graph_object


get_cells_from_gui(fig)
Returns the cells selected in from_gui()

Arguments
fig: a plotly graph_object
    The plotly graph object created by from_gui()

Returns a numpy array of selected cells.


from_knn_dist(X, start=np.random(rows), n_ret=None, metric=“euclidean”, seed=42, metric_params=None, roundup=True)
This function selects a set of cells of length n_ret cells based on the distance to the previously selected cells without considering each selected cell’s nearest neighbours. The number of neighbours to be considered is dependent on the number of cells that is to be returned.

Arguments
X: matrix-like (n_samples, n_features)
    An array where the rows are samples (e.g. cells) and the columns are features (i.e. genes or peaks).
    Accepts numpy arrays, pandas dataframes, and scipy CSR matrices.
start: str or int
    If X is a numpy array or scipy matrix, the int index of the desired starting cell.
    Alternatively, provide the string “random” to select a random starting cell.
    We advise to select the cell with the largest average distance to the rest of the population.
n_ret: int
    The number of cells to be returned. Defaults to 0.01 * n_cells.
metric: str
    The metric to be used for the distance calculations.
    Any metric from sklearn.metrics.pairwise_distances or sklearn.metrics.pairwise.pairwise_kernels is available.
seed: int
    The random seed used for numpy.random modules
metric_params: dict
    A dictionary of parameter values to pass to the distance function if applicable.
roundup: boolean
    Whether to round up the number of nearest neighbours to exclude.
    This ensures that the whole space is explored but less than n_ret cells might be selected.
    If roundup is False, exactly n_ret cells will be selected but a subset of the space might be left unexplored.

Returns a list of cell indices of approximately size n_ret


from_2D_embedding(X, g=(5, 5), d=0.25)
This function returns a set of cells by fitting a grid to a 2D embedding. The number of selected cells is dependent on the size of the grid and the minimum distance between the cells.

Arguments
X: matrix-like object (n_samples, 2)
    A matrix-like object which contains the coordinates of a 2D embedding of the original data,
    where rows are the cells and the columns are the two dimensions.
g: tuple (int, int)
    A tuple of the grid size (g, g) to fit over the 2D embedding.
    Since cells are selected based on the intersection of grid lines, this determines the maximum number of cells to be selected.
d: float
    Determines how far away cells should be at minimum.
    Defaults to 0.25, which means that two cells have to be at least 0.25 times the length of the diagonal in a single grid cell.

Returns a numpy array of cell indices


from_kmeans_pp(X, n_cells, seed=42)
Runs the sklearn implementation of kmeans++ to select n_cells number of reference cells.

Arguments
X: matrix-like object (n_samples, n_features)
    An array where the rows are samples (i.e. cells) and the columns are features (i.e. genes or peaks).
    Accepts pandas dataframes, numpy arrays, and scipy compressed sparse row matrix.
n_cells: int
    The number of cells to return from the population.
seed: int
    The random_state seed to enable reporudcibility.

Returns an array of reference cell indices

Clone this wiki locally