tacco.utils.coo_tocsr_buffered

coo_tocsr_buffered(A, blocksize=1000000, buffer_directory=None)[source]

Converts a sparse matrix in coo format into a sparse matrix in csr format consuming less memory than working in-memory by using hard disc buffer. This is slower than the out-of-place in-memory scipy version, but faster than coo_tocsr_inplace().

Parameters:
  • A – A coo_matrix. The memory of this matrix is reused in the construction of the csr matrix, so it is effectively destroyed.

  • blocksize – The number of items to read per hard disc access. This has some effect on performance, but usually the default value is fine.

  • buffer_directory – A directory with files containing A.col and A.data dumped to files named “col.bin” and “data.bin” by their .tofile() method. This avoids an additional write operation if the dumped data happens to exist already.

Returns:

Returns a csr_matrix which reuses the memory of the input coo matrix, thereby destroying it.