week 4 - Sparse Matrix Inequalities

This week I worked on adding support for inequalities to sparse matrices. The pull request is here (still pending). I also modified, the c++ routines used to produce a boolean output, pull request here. This is so these inequalities don't have to be cast to bool in python. Once both of these are accepted the inequalities will be using the c++ routines that produce boolean output.

These are separate pull requests but they are related and one will need to be rebased once the other is accepted. Or I may combine them.

Each of the four basic inequalities has its own particular quirks which make it more or less efficient with sparse matrices. Here we consider A to be a sparse matrix. The cases where B is a scalar, or sparse are considered. (By scalar we effectively mean a matrix the same size as A with every element as the scalar value B.

A < B

For scalar B, this operation is only efficient if B < 0 otherwise the resulting matrix will be dense.

For sparse A and B this is also efficient.

A > B

For scalar B the efficiency is the opposite of the less than operator. That is B > 0 has efficient output. Similarly to < this operation is efficient with other sparse matrices.

A <= B

This operation is pretty much useless. The only case which make this efficient is when B < 0. But this case is already covered by <, and < can be used efficiently with sparse matrices. So why use <=? I'll discuss the pros and cons of having these non-strict inequalities in a moment.

Another disadvantage of non strict inequalities like these is that they raies NotImplementedError when comparing with 0. This is because of the nature of the c++ routines. When comparing every pair of elements they don't consider cases where both are zero.

A >= B

Like <= this operation is only efficient for one case, when B > 0. And also like <=, it is not very useful.

Why have non strict inequalities?

I can't really think of uses for >= and <= but they might exist, I added them because they were easy to implement once I had done > and <. In practice, they don't slow the usage down. But removing non - strict inequalities removes 24,990 lines of code.

Implementation

The inequality operations are implemented as c++ routines, these are wrapped to provide a python interface using SWIG, then the various inequalities are added appropriately.

C++

The inequalities routines were implemented by reusing existing routines like csr_binop_csr. Overrides for inequalities had to be added to complex_wrapper.h too.

The binop routines had to altered, another class was added to their templates, T2, for the type of data out. In theory this could be something other than bool.

SWIG

To handle boolean output data a new swig macro was added to create typemaps that have npy_bool_wrapper as this output class. I then use these to instantiate the boolean operations.

Python

Here the associated python special methods corresponding to the inequalities were added in compressed.py. These operations can be used with a scalar, dense, or sparse matrix. Numpy style broadcasting is not implemented.

To produce a bool output by default for boolean comparisons, I altered the _binopt function to pass the c++ routines a matrix to use for output with a bool dtype.

Problems

When testing the inequalities with dense matrices, I could not get the proper behavior with a dense matrix on the left hand side. Previously, with != and == I modified the __bool__ method but that did not work in this case. This is because of the same problem I will be dealing with in the next stage of my proposal, interactions with numpy ufuncs.

Comments

Comments powered by Disqus