tacco.tools.segment¶
- segment(molecules, distance_scale, max_size, position_scale='auto', position_keys=['x', 'y'], result_key=None, max_distance=None, annotation_key=None, annotation_distance=None, distance_kw_args={}, gene_key=None, verbose=1, **kw_args)[source]¶
Segment single molecules in space to get a cell-like annotation.
- Parameters:
molecules – A
DataFrame
with columns containing spatial coordinates and annotation.distance_scale – This is a smooth size of the molecule neighborhood to be considered in clustering (the width parameter of the Gaussian for affinity calculation).
max_size – The most important parameter for the hierarchical spectral clustering in
spectral_clustering()
: The clustering goes on until no cluster has more elements than this. Additional arguments can be supplied as key word arguments.position_scale – The most important parameter for the hierarchical spectral clustering in
spectral_clustering()
when spatial information is provided: The expected feature size to use for splitting the problem spatially. If position_key or position_scale is None, do hirarchical clustering to iteratively split the problems in smaller subproblems. If position_scale is “auto”, it is estimated based on a heuristic.position_keys – Array-like of column keys which contain the position of the molecules.
result_key – The key of molecules where to store the resulting annotation. If None, do not write to molecules and return the annotation as
Series
instead.max_distance – The maximum distance to consider in the distance matrix. This should be large enough to capture the wider local connectivity between molecules. max_distance and sigma have similar effects, with max_distance giving a hard cutoff which is crucial for fast computations, while sigma gives a smooth cutoff. If None, max_distance is taken to be 2*distance_scale.
annotation_key – The column containing categorical annotation information to support the segmentation, e.g. cell type. If None, the segmentation is done using the molecule distribution without any further annotation.
annotation_distance –
Specifies the effect of annotation_key in adding a distances between two observations of different type. It can be:
a scalar to use for all annotation pairs
a
DataFrame
to give every annotation pair its own finite distance. If some should retain infinite distance, use np.inf, np.nan or negative valuesNone to use an infinite distance between different annotations
a metric to calculate a distance between the annotation profiles. This is forwarded to
cdist()
as the metric argument, so everything available there is also posible here, e.g. ‘h2’.
distance_kw_args – A dictionary of additional keyword arguments to be forwarded to
distance_matrix()
.gene_key – The name of the column which contains the molecule species annotation. This is used iff the annotation distance is to be calculated with a metric specified by a string via annotation_distance.
verbose – Level of verbosity, with 0 (no output), 1 (some output), …
**kw_args – Additional keyword arguments are forwarded to
spectral_clustering()
.
- Returns:
Depending on result_key, either returns the original molecules with cell-like annotation written in the corresponding column, or just the cell-like annotation as a new
Series
.