tacco.utils.hash¶
- hash(data, keys=None, hash_key=None, other=None, compress=True)[source]¶
Create collision-free hash of several categorical columns by lexicograhical indexing.
- Parameters:
data – A
DataFrame
.keys – The names of the columns containing the categorical properties to hash. Not-categorical columns are transformed to categoricals first. If None, uses all columns.
hash_key – The name of the column to contain the hash values. If None, a series of hash assignments is returned.
other – Another
DataFrame
, which also has the keys with the same datatypes and should get the same hash trasformation as data.compress – Whether to compress the lexicographical indices into contiguous 0-based hash values. This can take quite some time, but can also decrease the object size of the hash columns.
- Returns:
Depending on hash_key returns either a
Series
of hash assignments or the updated input data contining the hash assignments under hash_key. Depending on other returns this as a pair of the results for data and `other.