Google Summer of Code Proposal Preperation

Improving the sparse matrix package in Scipy

Among otherthings, I'm preparing a proposal to improve the handling of the bool datatype in Scipy's sparse matrix package. Once these types are consistently codified, improving interactions between spmatrix objects and other Numpy/Scipy types should be easier. So fixing things like trac ticket#1598 would come next.

So first I would need to write a specification for handling bools with other spmatrix objects and other kinds of objects like ndarrays. But what is wrong now? Here is one thing.

Bool problems: Making sparse bool matricies

Not every sparse matrix format supports bool dtypes. Try instantiating any class which inherits from _cs_matrix with a ndarray of bool dtype, and it will upcast it to int8. See my comments on the trac ticket #1533.

In [3]: A = np.array([[True, False],[False, True]], dtype=bool)

In [4]: B = sp.csr_matrix(A)

In [5]: B.data
Out[5]: array([1, 1], dtype=int8)

But we can get a spmatrix with a bool dtype if we pass the kwarg dtype=bool

In [6]: C = sp.csr_matrix(A, dtype=bool)

In [7]: C.data
Out[7]: array([ True,  True], dtype=bool)

This seems inconsisent; but now we have other problems.

In [8]: C.toarray()

...

    171     """
--> 172     return _coo.coo_todense(*args)
    173 
    174 def coo_matvec(*args):


TypeError: Array of type 'byte' required.  Array of type 'bool' given

I think the type is upcast here because there is no support for bool types in the coo_todense function. This is defined in coo.h , where the type is required by the class T. To apparently not be bool.

So to add support for bool, this class T needs to have support for it added. Once that is done the toarray() method should work with dtype==bool.

I don't yet understand how this SWIG wrapper works. I'm not sure where this class T is coming from. Any comments would be helpful.

Comments

Comments powered by Disqus