tacco.tools.get_contributions

get_contributions(adata, value_key, group_key=None, sample_key=None, position_key=None, position_split=2, min_obs=0, value_location=None, fillna=None, restrict_groups=None, restrict_values=None, reduction='sum', normalization='gmean', assume_counts=None, reads=False, counts_location=None)[source]

Get the contributions of groups.

Parameters:
  • adata – An AnnData with annotation in .obs (and .obsm). Can also be a DataFrame which is then used in place of .obs.

  • value_key – The .obs, .obsm key or .var index value i.e. gene with the values to determine the enrichment for. Can also be a list of genes and non-categorical .obs keys. If None, use all annotation available in value_location (see below).

  • group_key – The .obs key with categorical group information. If None, determine the contributions for the whole dataset and names this group ‘all’.

  • sample_key – The .obs key with categorical sample information. If not None, the data is aggregated per sample otherwise as a whole.

  • position_key – The .obsm key or array-like of .obs keys with the position space coordinates. If None, no position splits are performed. NOTE: Splitting samples spatially on the fly is deprecated. Instead, use split_spatial_samples() explicitly and supply it as the sample_key.

  • position_split – The number of splits per spatial dimension before enrichment. Can be a tuple with the spatial dimension as length to assign a different split per dimension. If None, no position splits are performed. See also min_obs. NOTE: Splitting samples spatially on the fly is deprecated. Instead, use split_spatial_samples() explicitly and supply it as the sample_key.

  • min_obs – The minimum number of observations per sample: if less observations are available, the sample is not used. This also limits the number of position_split to stop splitting if the split would decrease the number of observations below this threshold.

  • value_location

    The location of value_key within adata. Possible values are:

    • ’obs’: value_key is a key in .obs

    • ’obsm’: value_key is a key in .obsm

    • ’X’: value_key is a index value in .var, i.e. a gene

    • None: find it automatically if possible

    Can also be a list of specifications if value_key is a list. If value_key is None, all keys found in value_location are used.

  • fillna – NAN values in the data are replaced with this value. If None, the reduction and/or normalization operation handle the NANs, e.g. by ignoring them in a sum.

  • restrict_groups – A list-like containing the groups within which the enrichment analysis is to be done. If None, all groups are included.

  • restrict_values – A list-like containing the values within which the enrichment analysis is to be done. If None, all values are included. Works only for categorical values.

  • reduction

    The reduction to apply on each (group,sample) subset of the data. Possible values are:

    • ’sum’: sum of the values over observations

    • ’mean’: mean of the values over observations

    • ’median’: median of the values over observations

    • None: use observations directly

    • a callable mapping a DataFrame to its reduced counterpart

  • normalization

    The normalization to apply on each reduced (group,sample) subset of the data. Possible values are:

    • ’sum’: normalize values by their sum (yields fractions)

    • ’percent’: like ‘sum’ scaled by 100 (yields percentages)

    • ’gmean’: normalize values by their geometric mean (yields contributions which make more sense for enrichments than fractions, due to zero-sum issue; see enrichments())

    • ’clr’: “Center logratio transform”; like ‘gmean’ with additional log transform; makes the distribution more normal and better suited for t tests

    • None: no normalization

    • a value name from value_key: all values are normalized to this contribution

    • a callable mapping a DataFrame to its normalized counterpart

  • assume_counts – Ony relevant for normalization==’gmean’ and normalization==’clr’; whether to regularize zeros by adding a pseudo count of 1 or by replacing them by 1e-3 of the minimum value. If None, check whether the data are consistent with count data and assume counts accordingly, except if reads==True, then also assume_counts==True.

  • reads – Whether to weight the values by the total count per observation

  • counts_location – A string or tuple specifying where the count matrix is stored, e.g. ‘X’, (‘raw’,’X’), (‘raw’,’obsm’,’my_counts_key’), (‘layer’,’my_counts_key’), … For details see counts(). The counts are only used if reads is True.

Returns:

A DataFrame containing the contributions of groups.