tacco.tools.annotate

annotate(adata, reference, annotation_key=None, result_key=None, counts_location=None, method='OT', bisections=None, bisection_divisor=3, platform_iterations=None, normalize_to='adata', annotation_prior=None, multi_center=None, multi_center_amplitudes=True, reconstruction_key=None, max_annotation=None, min_counts_per_gene=None, min_counts_per_cell=None, min_cells_per_gene=None, min_genes_per_cell=None, remove_constant_genes=True, remove_zero_cells=True, min_log2foldchange=None, min_expression=None, remove_mito=False, n_hvg=None, skip_checks=False, assume_valid_counts=False, return_reference=False, gene_keys=None, verbose=1, **kw_args)[source]

Annotates an AnnData using reference data.

Parameters:
  • adata – An AnnData including expression data in .X.

  • reference – Reference data to get the annotation definition from.

  • annotation_key – The .obs and/or .varm key where the annotation and/or profiles are stored in the reference. If None, it is inferred from reference, if possible.

  • result_key – The .obsm key of adata where to store the resulting annotation. If None, do not write to adata and return the annotation as DataFrame instead.

  • counts_location – A string or tuple specifying where the count matrix is stored, e.g. ‘X’, (‘raw’,’X’), (‘raw’,’obsm’,’my_counts_key’), (‘layer’,’my_counts_key’), … For details see counts().

  • method

    String selecting the method to use for annotation. Can also be a callable of signature method(adata, reference, annotation_key, **kw_args) returning a DataFrame. Possible methods include:

  • bisections – If larger than 0, runs a boosted annotator using a basis annotator by iteratively running the basis annotator and removing a reconstructed fraction of the counts. The parameter gives the number of recursive bisections of the annotation. If None, defaults to method dependent values.

  • bisection_divisor – Number of parts for bisection: First, bisections times a fraction of 1/bisection_divisor of the unassigned counts of every observation is assigned. The remainder is then split evely in bisection_divisor parts. E.g. if bisections is 2 and bisection_divisor is 3, then the assigned fractions per round of typing are 1/3,2/3*(1/3,1/3,1/3), if bisections is 3 and bisection_divisor is 2, then they are 1/2,(1/2,(1/2,1/2)/2)/2. Generally, the total number of typing rounds is bisections + bisection_divisor - 1.

  • platform_iterations – Number of platform normalization iterations before running the annotation. If 0, platform normalization is done once in the beginning, but no iteration is done. If smaller than 0, no platform normalization is performed at all. If None, defaults to method dependent values.

  • normalize_to

    To what expression the adatas should be normalized. Can be one of:

    • ’adata’: normalize reference to conform to adata; the resulting annotation fractions give how many of the actual reads in adata are belonging to which annotation.

    • ’reference’: normalize adata to conform to reference; the resulting annotation fractions give how many of the reads in adata would belong to which annotation if they were measured with the same platform effects as reference.

  • annotation_prior – A callable of signature method(adata, reference, annotation_key) which returns priors for the annotation or a Series containing the annotation prior distribution directly. This parameter is used only for methods which require such a parameter. If None, it is determined by summing the annotation in the reference data weighted with the counts from reference.X.

  • multi_center – The number of sub-categories per annotation category. If a category has less observations than this number, uses all the available observations individually. If None or smaller than 1, then the original categories are used.

  • multi_center_amplitudes – Whether to run k-means on amplitudes of the observation profiles or on the profiles directly.

  • reconstruction_key – The key for .varm, .obsm, and .uns where to put information for reconstructing “denoised” data: profiles, annotation, and a mapping of annotation sub-categories to annotation categories. If multi_center==1, reconstruction_key can be equal to result_key, as the annotation information is identical; a mapping is not necessary; just the profiles are additionally stored in .varm. If None, and multi_center==1, the result_key is used, else “{result_key}_mc{multi_center}”. If result_key is None, no reconstruction information is returned.

  • max_annotation – Number of different annotations to allow per observation. 1 assigns the maximum annotation, higher values assign the top annotations and distribute the remaining annotations equally on the top annotations. If None or smaller than 1, no restrictions are imposed.

  • min_counts_per_gene – The minimum number of counts genes must have in both adata and reference to be kept.

  • min_counts_per_cell – The minimum number of counts cells must have in both adata and reference to be kept.

  • min_cells_per_gene – The minimum number of cells genes must have in both adata and reference to be kept.

  • min_genes_per_cell – The minimum number of genes cells must have in both adata and reference to be kept.

  • remove_constant_genes – Whether to remove genes which do not show any variation between cells

  • remove_zero_cells – Whether to remove cells without non-zero genes

  • min_log2foldchange – Minimum log2-fold change a gene must have in at least one annotation category relative to the mean of the other categories to be kept.

  • min_expression – Minimum expression level relative to all expression a gene must have in at least one annotation category to be kept.

  • remove_mito – Whether to remove genes starting with “mt-” and “MT-“.

  • n_hvg – The number of highly variable genes to run on. If None, use all genes.

  • skip_checks – Whether to skip data integrity checks and save time. Only recommended for internal use - or people who really know what they are doing.

  • assume_valid_counts – Disable checking for invalid counts (e.g. non-integer or negative).

  • return_reference – Whether to return the platform normalized reference.

  • gene_keys – String or list of strings specifying additional count-like .var and .varm annotations to scale along with the platform normalization. The annotation_key is included automatically. If True, take all .var and .varm keys. This makes only sense if return_reference is True.

  • verbose – Level of verbosity, with 0 (no output), 1 (some output), …

  • **kw_args – Additional keyword arguments are forwarded to the annotation method. See the functions mentioned in the documentation of the parameter method for details.

Returns:

Depending on result_key, either returns the original adata with annotation written in the corresponding .obsm key, or just the annotation as a new DataFrame.