tacco.tools.split_observations

split_observations(adata, annotation_key, result_key=None, counts_location=None, mode='exact', map_all_genes=False, min_counts=None, seed=42, delta=1e-06, scaling_jobs=None, scaling_batch_size=1000, rounding_batch_size=50000, obs_index_key=None, map_obs_keys=False, map_obsm_keys=False, verbose=1)[source]

Splits expression data with a “soft” weights annotation in .obsm into multiple “virtual” observations per input observation e.g. splits expression data for cell type mixtures with annotated cell type fractions into expression of single type observations with categorical annotation.

Parameters:
  • adata – An AnnData including expression data in .X, profiles in .varm and annotation in .obsm.

  • annotation_key – The .obsm and .varm key where the annotation and profiles are stored. If the annotation_key is also available in .uns, it should contain a mapping of annotation categories from .obsm and .varm to the target ones. Such a triplett of annotation can be generated by annotate() with a reconstruction_key argument.

  • result_key – The name of the .obs annotation column of the result to contain the split annotation. If None, tries to find a sensible value automatically: If annotation_key is not in .uns, then annotation_key is used, else the name attribute of the mapping is used if available.

  • counts_location – A string or tuple specifying where the count matrix is stored, e.g. ‘X’, (‘raw’,’X’), (‘raw’,’obsm’,’my_counts_key’), (‘layer’,’my_counts_key’), … For details see counts().

  • mode

    String to switches between the type of split:

    • ’exact’: All counts in the input are distributed to the splitted observations conserving the total number of counts per gene and bead and the annotation fractions; see map_zero_profile_genes.

    • ’denoise’: The counts per gene and bead are ignored and the split results from a matrix product of mean expression profiles and the annotation. Depending on the input this can be done on a sub-category level providing variation within the categories, if annotation_key is a .uns key.

    • ’bulk’: Like ‘denoise’, but summing over all (sub-)categories.

  • map_all_genes – Only for mode ‘exact’: Whether to map counts of genes without profile information assuming equal probabilities for all profiles.

  • min_counts – Minimum count per observation to include in the splitted data. If None, include all non-zero observations.

  • seed – Random seed for integerizing the splitted count matrix. If None, directly return the non-integer valued count matrix. If mode==’denoise’ a non-None value leads to plain rounding.

  • delta – The relative error target for the matrix scaling. Ignored if mode==’denoise’.

  • scaling_jobs – Number of jobs or cores to use for the matrix scaling task. Ignored if mode==’denoise’.

  • scaling_batch_size – Batch size for the matrix scaling task. Ignored if mode==’denoise’.

  • rounding_batch_size – Batch size for the rounding task. Ignored if mode==’denoise’.

  • obs_index_key – A string specifying the name of the obs column to write the old .obs.index (i.e. the cell names) to. If None, tries to guess a reasonable name.

  • map_obs_keys – List of .obs keys to map to the new data. If True or False, maps all or no .obs keys.

  • map_obsm_keys – List of .obsm keys to map to the new data. If True or False, maps all or no .obsm keys.

  • verbose – Level of verbosity, with 0 (no output), 1 (some output), …

Returns:

Returns an AnnData containing the splitted expression data.