<!-- 
.. title: .toarray()
.. slug: scipysparsetoarray
.. date: 2013/04/21 23:32:19
.. tags: 
.. link: 
.. description: 
-->

I wrote some [tests](https://github.com/cowlicks/scipy-sparse-tests/blob/master/bools_suck.py) to look a little further into
 [trac ticket #1533](http://projects.scipy.org/scipy/ticket/1533). I
wanted to know what sparse matrix types cannot `.toarray()` with dtype
as bool. Suprisingly, the only type that passed was the Lists of Lists
(lil) type. Why is this?

## Lists of Lists
Looking around in the sparse package, every toarray method except for
lil's basically does this.

    def toarray(self, order=None, out=None):
            """See the docstring for `spmatrix.toarray`."""
            return self.tocoo(copy=False).toarray(order=order, out=out)

Where the Coordinate list (coo) matrix's toarray is
    
    def toarray(self, order=None, out=None):
        """See the docstring for `spmatrix.toarray`."""
        B = self._process_toarray_args(order, out)
        fortran = int(B.flags.f_contiguous)
        if not fortran and not B.flags.c_contiguous:
            raise ValueError("Output array must be C or F contiguous")
        M,N = self.shape
        coo_todense(M, N, self.nnz, self.row, self.col, self.data,
                    B.ravel('A'), fortran)
        return B

The coo toarray calls the `coo_todense` function, which just creates a
dense matrix with the data, but it doesn't support the bool dtype. This
is a c function deffined in [coo.h](https://github.com/cowlicks/scipy/blob/master/scipy/sparse/sparsetools/coo.h#L106).

But why didn't lil fail? Looking at its toarray:

     def toarray(self, order=None, out=None):
        """See the docstring for `spmatrix.toarray`."""
        d = self._process_toarray_args(order, out)
        for i, row in enumerate(self.rows):
            for pos, j in enumerate(row):
                d[i, j] = self.data[i][pos]
        return d 

It is not using any of these c functions. Why not? Python is slow and c
is fast, so is lil taking a preformance hit?

## lil's .toarray() benchmark
I wrote some [code](https://github.com/cowlicks/scipy-sparse-tests/blob/master/lil_benchmark.py)
to benchmark lil's .toarray() performance compared with other types. 
A typical result with 3000 by 3000 matrix with around 5 nonzero random 
values per row is:

    $ python lil_benchmark.py
    lil
    0.0481379032135
    dok
    0.113062858582
    dia
    0.530215024948
    csr
    0.045028924942
    csc
    0.0446619987488
    bsr
    0.0462918281555
    coo
    0.0453071594238

It's not much slower at all. But dia and dok are really slow, this 
is probably from converting them to coo type before doing .toarray().
This is promising, maybe the toarray methods can be written in python
instead trying to tack on bool support to the existing c code. Without
much of a performance hit.