tacco.utils.hash

hash(data, keys=None, hash_key=None, other=None, compress=True)[source]

Create collision-free hash of several categorical columns by lexicograhical indexing.

Parameters:
  • data – A DataFrame.

  • keys – The names of the columns containing the categorical properties to hash. Not-categorical columns are transformed to categoricals first. If None, uses all columns.

  • hash_key – The name of the column to contain the hash values. If None, a series of hash assignments is returned.

  • other – Another DataFrame, which also has the keys with the same datatypes and should get the same hash trasformation as data.

  • compress – Whether to compress the lexicographical indices into contiguous 0-based hash values. This can take quite some time, but can also decrease the object size of the hash columns.

Returns:

Depending on hash_key returns either a Series of hash assignments or the updated input data contining the hash assignments under hash_key. Depending on other returns this as a pair of the results for data and `other.