tacco.tools.mix_in_silico

mix_in_silico(adata, type_key=None, topic_key=None, n_samples=30000, bead_shape=0.1, bead_size=1.0, norm_cells=False, platform_log10_mean=None, platform_log10_std=0.6, seed=42, round=True, min_counts=100, capture_rate=1.0)[source]

Given single cell data, create an in-silico mixed dataset. The mixtures are generated by placing the cells randomly in space, placing measurement points (“beads”) randomly in space, and convoluting them with some spatial profile, e.g. a gaussian. Optionally also applies a random log-laplace distributed rescaling per gene.

Parameters:
  • adata – An AnnData with annotation in .obs.

  • type_key – An .obs key with categorical information to propagate through to the mixed data, e.g. cell types.

  • topic_key – An .obsm key with continuous information to propagate through to the mixed data, e.g. transciptional topics.

  • n_samples – The number of measurement points (“beads”) which are put randomly in space. Note that depending on min_counts and the mixing parameters the number of returned measurement points is somewhat smaller than this value.

  • bead_shape

    The shape to use for determining the contributions of cells to “beads”. Can also be a list of shapes to save setup time wrt. isolated calls. Possible values:

    • ’gauss’: weights decrease with distance like a gaussian.

    • ’disc’: weights are constant until some distance and then drop to 0.

    • number: weights decrease with distance according to a tanh-profile with the sharpness of the decrease given by this number. It can be used to interpolates between 0 (disc-like) and 1 (gauss-like).

  • bead_size – Scaling factor determining the effetive size of the beads/profile. A value of 1 corresponds to tightly packed cells and beads of the size of a cell.

  • norm_cells – Whether to normalize the total counts per cell in the single cell data prior to mixing.

  • platform_log10_mean – log10 of the mean of the Laplace distribution for Log-Laplace distributed per gene platform effect. The per-gene factors are available in .var[‘platform_effect’]. If None, no platform factors are applied.

  • platform_log10_std – log10 of the standard deviation of the Laplace distribution for Log-Laplace distributed per gene platform effect

  • seed – The random seed to use

  • round – Whether to round the resulting expression matrix to integer counts after rescaling

  • min_counts – The returned adata is filtered to have at least this number of counts per observation. If None, return all observations.

  • capture_rate – The fraction of counts to keep from a cell with maximum coverage from the bead. If ‘normalized’, normalize weights per bead to sum to 1. If None, normalize the bead psf to 1.

Returns:

Returns the mixed data as AnnData. If beadshape is a list, returns a dictionary containing a AnnData per beadshape.