tacco.tools.co_occurrence

co_occurrence(adata, annotation_key, center_key=None, sample_key=None, distance_key=None, position_key=('x', 'y'), result_key=None, max_distance=None, sparse=True, min_distance=None, delta_distance=None, reads=True, counts_location=None, n_permutation=0, seed=42, verbose=1, **kw_args)[source]

Calculates a spatial co-occurence score given by the conditional probability to find an annotation a at some distance d from an observation with annotation b, normalized by the probability to find an annotation a at distance d from an observation disregarding the value of its annotation: p(a|bd)/p(a|d). This is a more general, more accurate, and faster alternative to the function of the same name in squidpy co_occurrence(). For center_key==None the result is compatible with the corresponding squidpy code. The result is not identical to squidpy, as the parametrization and heuristics are different.

Parameters:
  • adata – A AnnData

  • annotation_key – The .obs or .obsm key for the annotation a in p(a|bd)/p(a|d).

  • center_key – The .obs or .obsm key for the annotation b in p(a|bd)/p(a|d). If None, takes the annotation_key.

  • sample_key – A categorical sample annotation. The result from different samples is averaged for the final result and their standard deviation gives an estimation of the error. If None, all observations are assumed to be on the same sample.

  • distance_key – The .obsp key containing a precomputed distance matrix to use. If None, the distances are computed on the fly with the positions found in position_key. Otherwise position_key is ignored.

  • position_key – The .obsm key or array-like of .obs keys with the position space coordinates

  • result_key

    The .uns key to contain the result. If None, the result is returned as a dictionary containing the keys:

    • ”occ”: mean over samples of the p(a|bd)/p(a|d) scores as a ndarray with dimensions according to a, b, and d,

    • ”log_occ”: like “occ”, but with the sample mean taken over the logarithms of the scores,

    • ”z”: z_scores, see n_permutation,

    • ”composition”: like “occ”, but with the sample mean taken over p(a|bd),

    • ”log_composition”: like “occ”, but with the sample mean taken over log(p(a|bd)),

    • ”distance_distribution”: like “occ”, but with the sample mean taken over p(d|ab),

    • ”log_distance_distribution”: like “occ”, but with the sample mean taken over log(p(d|ab)),

    • ”relative_distance_distribution”: like “occ”, but with the sample mean taken over p(d|ab)/p(d|*b),

    • ”log_relative_distance_distribution”: like “occ”, but with the sample mean taken over log(p(d|ab)/p(d|*b)),

    • ”sample_counts”: the neighbourship counts per sample as a ndarray with dimensions according to samples, a, b, and d,

    • ”permutation_counts”: the neighbourship counts per permutation sample as a ndarray with dimensions according to permutation samples, a, b, and d, see also n_permutation,

    • ”interval”: the boundaries of the distance bins,

    • ”annotation”: containing the order of the a annotations,

    • ”center”: containing the order of the b annotations,

  • max_distance – The maximum distance to use. If None or np.inf, uses the maximum distance in the data (if there are multiple samples, then only the first sample is used). If the distance matrix is not precomputed (see distance_key), None and np.inf result in dense distance computation (which can be infeasible for larger datasets).

  • sparse – Whether to calculate a sparse or dense distance matrix, if it is not precomputed. If None, this is determined by the value of max_distance.

  • min_distance – The minimum distance to use. If None, uses a heuristic to find a sensible low distance cutoff which excludes distances with deviations from uniform distribution (e.g. cell-size effects).

  • delta_distance – The width in distance for distance discretization. If None, takes max_distance/100.

  • reads – Whether to weight the co-occurence counts with the counts of the two participating observations.

  • counts_location – A string or tuple specifying where the count matrix is stored, e.g. ‘X’, (‘raw’,’X’), (‘raw’,’obsm’,’my_counts_key’), (‘layer’,’my_counts_key’), … For details see counts(). This is only relevant if reads==True.

  • n_permutation – The number of permutation samples to generate with randomly permuted annotations at fixed centers. This is used only for the calculation of the z-score. If 0, the z-score is not calculated.

  • seed – A random seed for the z-score computation. See n_permutation.

  • verbose – Level of verbosity, with 0 (no output), 1 (some output), …

  • **kw_args – Additional keyword arguments are forwarded to on-the-fly distance calculation with distance_matrix() if necessary.

Returns:

Depending on result_key returns either the updated input adata or the result directly in the format described under result_key.