tacco.preprocessing.refine_reference

refine_reference(adata, annotation_key=None, counts_location=None, inplace=False, normalize=True, regularization=0.001)[source]

Refines a reference data set by scaling profiles and annotation to match the expression data. Specifically, determines the normalization factors n(cg) in the read model p(cga) = n(cg) p(g|a) p(a|c) for the joint probability distribution of cells c, genes g, and annotation a per read to p(cg) from the expression data, and updates the profiles p(g|a) and the annotation p(a|c) from the marginals of p(cga).

Parameters:
  • adata – An AnnData including expression data in .X and profiles in .varm and/or annotation in .obs or .obsm.

  • annotation_key – The .obs, .obsm, and/or .varm key where the annotation and profiles are and/or will be stored.

  • counts_location – A string or tuple specifying where the count matrix is stored, e.g. ‘X’, (‘raw’,’X’), (‘raw’,’obsm’,’my_counts_key’), (‘layer’,’my_counts_key’), … For details see counts().

  • inplace – Whether to modify the input AnnData or return a copy.

  • normalize – Whether to normalize the reference annotation and profiles.

  • regularization – Relative factor to determine a regularization addition to the profiles to avoid unsolvable count distributions (e.g. for some (g,c): sum_a p(g|a) * p(a|c) = 0, but p(cg) != 0). If set to 0, no regularization is done.

Returns:

Returns an AnnData containing the refined reference, depending on copy either as copy or as a reference to the original adata.