I wrote some tests to look a little further into
trac ticket #1533. I
wanted to know what sparse matrix types cannot
.toarray() with dtype
as bool. Suprisingly, the only type that passed was the Lists of Lists
(lil) type. Why is this?
Lists of Lists
Looking around in the sparse package, every toarray method except for lil's basically does this.
def toarray(self, order=None, out=None): """See the docstring for `spmatrix.toarray`.""" return self.tocoo(copy=False).toarray(order=order, out=out)
Where the Coordinate list (coo) matrix's toarray is
def toarray(self, order=None, out=None): """See the docstring for `spmatrix.toarray`.""" B = self._process_toarray_args(order, out) fortran = int(B.flags.f_contiguous) if not fortran and not B.flags.c_contiguous: raise ValueError("Output array must be C or F contiguous") M,N = self.shape coo_todense(M, N, self.nnz, self.row, self.col, self.data, B.ravel('A'), fortran) return B
The coo toarray calls the
coo_todense function, which just creates a
dense matrix with the data, but it doesn't support the bool dtype. This
is a c function deffined in coo.h.
But why didn't lil fail? Looking at its toarray:
def toarray(self, order=None, out=None): """See the docstring for `spmatrix.toarray`.""" d = self._process_toarray_args(order, out) for i, row in enumerate(self.rows): for pos, j in enumerate(row): d[i, j] = self.data[i][pos] return d
It is not using any of these c functions. Why not? Python is slow and c is fast, so is lil taking a preformance hit?
lil's .toarray() benchmark
I wrote some code to benchmark lil's .toarray() performance compared with other types. A typical result with 3000 by 3000 matrix with around 5 nonzero random values per row is:
$ python lil_benchmark.py lil 0.0481379032135 dok 0.113062858582 dia 0.530215024948 csr 0.045028924942 csc 0.0446619987488 bsr 0.0462918281555 coo 0.0453071594238
It's not much slower at all. But dia and dok are really slow, this is probably from converting them to coo type before doing .toarray(). This is promising, maybe the toarray methods can be written in python instead trying to tack on bool support to the existing c code. Without much of a performance hit.