tacco.utils.cdist

cdist(A, B=None, metric='euclidean', parallel=True)[source]

Calclulate a dense pairwise distance matrix of sparse and dense inputs. For some metrics (‘euclidean’, ‘cosine’), this is considerably faster than scipy.spatial.distance.cdist(). For basically all other metrics this falls back to scipy.spatial.distance.cdist(). Special distances are:

  • ‘bc’: 1 - Bhattacharyya coefficient, a cosine similarity equivalent for the Bhattacharyya coefficient, which is the overlap of two probability distributions. The input vectors are normalized to sum 1 first.

  • ‘bc2’: 1 - (Bhattacharyya coefficient)^2, a cosine similarity equivalent for the squared Bhattacharyya coefficient. The input vectors are normalized to sum 1 first.

  • ‘hellinger’: The Hellinger(-Bhattacharyya) distance defined as sqrt(1 - Bhattacharyya coefficient)

  • ‘h2’: squared Hellinger Distance; synonymous to ‘bc’.

Parameters:
  • A – A 2d ndarray or a scipy sparse matrix.

  • B – A 2d ndarray or a scipy sparse matrix with the same second dimension as A. If None, use A.

  • metric – A string specifying the metric to use.

  • parallel – Whether to run the operation in parallel - if possible.

Returns:

A ndarray containing the distances.