tacco.tools.segment

segment(molecules, distance_scale, max_size, position_scale='auto', position_keys=['x', 'y'], result_key=None, max_distance=None, annotation_key=None, annotation_distance=None, distance_kw_args={}, gene_key=None, verbose=1, **kw_args)[source]

Segment single molecules in space to get a cell-like annotation.

Parameters:
  • molecules – A DataFrame with columns containing spatial coordinates and annotation.

  • distance_scale – This is a smooth size of the molecule neighborhood to be considered in clustering (the width parameter of the Gaussian for affinity calculation).

  • max_size – The most important parameter for the hierarchical spectral clustering in spectral_clustering(): The clustering goes on until no cluster has more elements than this. Additional arguments can be supplied as key word arguments.

  • position_scale – The most important parameter for the hierarchical spectral clustering in spectral_clustering() when spatial information is provided: The expected feature size to use for splitting the problem spatially. If position_key or position_scale is None, do hirarchical clustering to iteratively split the problems in smaller subproblems. If position_scale is “auto”, it is estimated based on a heuristic.

  • position_keys – Array-like of column keys which contain the position of the molecules.

  • result_key – The key of molecules where to store the resulting annotation. If None, do not write to molecules and return the annotation as Series instead.

  • max_distance – The maximum distance to consider in the distance matrix. This should be large enough to capture the wider local connectivity between molecules. max_distance and sigma have similar effects, with max_distance giving a hard cutoff which is crucial for fast computations, while sigma gives a smooth cutoff. If None, max_distance is taken to be 2*distance_scale.

  • annotation_key – The column containing categorical annotation information to support the segmentation, e.g. cell type. If None, the segmentation is done using the molecule distribution without any further annotation.

  • annotation_distance

    Specifies the effect of annotation_key in adding a distances between two observations of different type. It can be:

    • a scalar to use for all annotation pairs

    • a DataFrame to give every annotation pair its own finite distance. If some should retain infinite distance, use np.inf, np.nan or negative values

    • None to use an infinite distance between different annotations

    • a metric to calculate a distance between the annotation profiles. This is forwarded to cdist() as the metric argument, so everything available there is also posible here, e.g. ‘h2’.

  • distance_kw_args – A dictionary of additional keyword arguments to be forwarded to distance_matrix().

  • gene_key – The name of the column which contains the molecule species annotation. This is used iff the annotation distance is to be calculated with a metric specified by a string via annotation_distance.

  • verbose – Level of verbosity, with 0 (no output), 1 (some output), …

  • **kw_args – Additional keyword arguments are forwarded to spectral_clustering().

Returns:

Depending on result_key, either returns the original molecules with cell-like annotation written in the corresponding column, or just the cell-like annotation as a new Series.