I wrote some [tests](https://github.com/cowlicks/scipy-sparse-tests/blob/master/bools_suck.py) to look a little further into
[trac ticket #1533](http://projects.scipy.org/scipy/ticket/1533). I
wanted to know what sparse matrix types cannot `.toarray()` with dtype
as bool. Suprisingly, the only type that passed was the Lists of Lists
(lil) type. Why is this?
## Lists of Lists
Looking around in the sparse package, every toarray method except for
lil's basically does this.
def toarray(self, order=None, out=None):
"""See the docstring for `spmatrix.toarray`."""
return self.tocoo(copy=False).toarray(order=order, out=out)
Where the Coordinate list (coo) matrix's toarray is
def toarray(self, order=None, out=None):
"""See the docstring for `spmatrix.toarray`."""
B = self._process_toarray_args(order, out)
fortran = int(B.flags.f_contiguous)
if not fortran and not B.flags.c_contiguous:
raise ValueError("Output array must be C or F contiguous")
M,N = self.shape
coo_todense(M, N, self.nnz, self.row, self.col, self.data,
B.ravel('A'), fortran)
return B
The coo toarray calls the `coo_todense` function, which just creates a
dense matrix with the data, but it doesn't support the bool dtype. This
is a c function deffined in [coo.h](https://github.com/cowlicks/scipy/blob/master/scipy/sparse/sparsetools/coo.h#L106).
But why didn't lil fail? Looking at its toarray:
def toarray(self, order=None, out=None):
"""See the docstring for `spmatrix.toarray`."""
d = self._process_toarray_args(order, out)
for i, row in enumerate(self.rows):
for pos, j in enumerate(row):
d[i, j] = self.data[i][pos]
return d
It is not using any of these c functions. Why not? Python is slow and c
is fast, so is lil taking a preformance hit?
## lil's .toarray() benchmark
I wrote some [code](https://github.com/cowlicks/scipy-sparse-tests/blob/master/lil_benchmark.py)
to benchmark lil's .toarray() performance compared with other types.
A typical result with 3000 by 3000 matrix with around 5 nonzero random
values per row is:
$ python lil_benchmark.py
lil
0.0481379032135
dok
0.113062858582
dia
0.530215024948
csr
0.045028924942
csc
0.0446619987488
bsr
0.0462918281555
coo
0.0453071594238
It's not much slower at all. But dia and dok are really slow, this
is probably from converting them to coo type before doing .toarray().
This is promising, maybe the toarray methods can be written in python
instead trying to tack on bool support to the existing c code. Without
much of a performance hit.