tacco.tools.enrichments¶

enrichments(adata, value_key, group_key, sample_key=None, position_key=None, position_split=2, reference_group=None, min_obs=0, value_location=None, p_corr='fdr_bh', method='mwu', n_boot=0, direction='both', reduction=None, normalization=None, assume_counts=None, fillna=None, restrict_groups=None, restrict_values=None, reads=False, counts_location=None)[source]¶

Find enrichments in groups.

Parameters:

adata – An AnnData with annotation in .obs (and .obsm). Can also be a DataFrame which is then used in place of .obs.
value_key – The .obs, .obsm key or .var index value i.e. gene with the values to determine the enrichment for. Can also be a list of genes and non-categorical .obs keys. If None, use all annotation available in value_location (see below).
group_key – The .obs key with categorical group information.
sample_key – The .obs key with categorical sample information. If None, the enrichment is calculated on an observation level, otherwise on averaged quantities per sample. See parameters normalization and reduction for details.
position_key – The .obsm key or array-like of .obs keys with the position space coordinates. If None, no position splits are performed. NOTE: Splitting samples spatially on the fly is deprecated. Instead, use split_spatial_samples() explicitly and supply it as the sample_key.
position_split – The number of splits per spatial dimension before enrichment. Can be a tuple with the spatial dimension as length to assign a different split per dimension. If None, no position splits are performed. See also min_obs. NOTE: Splitting samples spatially on the fly is deprecated. Instead, use split_spatial_samples() explicitly and supply it as the sample_key.
reference_group – The particular group value to which all other groups should be compared. This group will be compared to the rest. If None, all groups are compared in a 1-vs-rest scheme.
min_obs – The minimum number of observations per sample: if less observations are available, the sample is not used. This also limits the number of position_split to stop splitting if the split would decrease the number of observations below this threshold.
value_location –
The location of value_key within adata. Possible values are:
- ’obs’: value_key is a key in .obs
- ’obsm’: value_key is a key in .obsm
- ’X’: value_key is a index value in .var, i.e. a gene
- None: find it automatically if possible
Can also be a list of specifications if value_key is a list. If value_key is None, all keys found in value_location are used.
p_corr – The name of the p-value correction method to use. Possible values are the ones available in multipletests(). If None, no p-value correction is performed.
method –
Specification of methods to use for enrichment. Available are:
- ’fisher’: Fishers exact test; only for categorical values. Ignores the reduction and normalization arguments.
- ’mwu’: MannWhitneyU test
- ’t’: Student’s t test
- ’welch’: Welch’s t test
n_boot – The number of bootstrap samples which are included in addition to the real samples. Working with bootstrap samples is only implemented for the t tests.
direction –
What should be tested for. This influences the multiple testing correction. Available options are:
- ’enrichment’: Test only for enrichment
- ’purification’: Test only for purification
- ’both’: Test for both
reduction –
The reduction to apply on each (group,sample) subset of the data. Possible values are:
- ’sum’: sum of the values over observations
- ’mean’: mean of the values over observations
- ’median’: median of the values over observations
- None: use observations directly
- a callable mapping a DataFrame to its reduced counterpart
normalization –
The normalization to apply on each reduced (group,sample) subset of the data. Possible values are:
- ’sum’: normalize values by their sum (yields fractions)
- ’percent’: like ‘sum’ scaled by 100 (yields percentages)
- ’gmean’: normalize values by their geometric mean (yields contributions which make more sense for enrichments than fractions, due to zero-sum issue; see enrichments())
- ’clr’: “Center logratio transform”; like ‘gmean’ with additional log transform; makes the distribution more normal and better suited for t tests
- None: no normalization
- a value name from value_key: all values are normalized to this contribution
- a callable mapping a DataFrame to its normalized counterpart
assume_counts – Ony relevant for normalization==’gmean’ and normalization==’clr’; whether to regularize zeros by adding a pseudo count of 1 or by replacing them by 1e-3 of the minimum value. If None, check whether the data are consistent with count data and assume counts accordingly, except if reads==True, then also assume_counts==True.
fillna – If None, observation containing NA in the values are filtered. Else, NA values are replaced with this value.
restrict_groups – A list-like containing the groups within which the enrichment analysis is to be done. If None, all groups are included.
restrict_values – A list-like containing the values within which the enrichment analysis is to be done. If None, all values are included. Works only for categorical values.
reads – Whether to weight the values by the total count per observation
counts_location – A string or tuple specifying where the count matrix is stored, e.g. ‘X’, (‘raw’,’X’), (‘raw’,’obsm’,’my_counts_key’), (‘layer’,’my_counts_key’), … For details see counts(). The counts are only used if reads is True.

Returns:

An DataFrame containing the enrichment p-values.