This week I worked on adding support for inequalities to sparse
matrices. The pull request is
[here](https://github.com/scipy/scipy/pull/2586) (still pending).
I also modified, the c++ routines used to produce a boolean output, pull
request [here](https://github.com/scipy/scipy/pull/2596). This is so
these inequalities don't have to be cast to bool in python. Once both of
these are accepted the inequalities will be using the c++ routines that
produce boolean output.
These are separate pull requests but they are related and one will need
to be rebased once the other is accepted. Or I may combine them.
Each of the four basic inequalities has its own particular quirks which
make it more or less efficient with sparse matrices. Here we consider
`A` to be a sparse matrix. The cases where `B` is a scalar, or sparse
are considered. (By scalar we effectively mean a matrix the same size as
`A` with every element as the scalar value `B`.
### `A < B`
For scalar B, this operation is only efficient if `B < 0` otherwise the
resulting matrix will be dense.
For sparse `A` and `B` this is also efficient.
### `A > B`
For scalar `B` the efficiency is the opposite of the less than operator.
That is `B > 0` has efficient output. Similarly to `<` this operation is
efficient with other sparse matrices.
### `A <= B`
This operation is pretty much useless. The only case which make this
efficient is when `B < 0`. But this case is already covered by `<`, and
`<` can be used efficiently with sparse matrices. So why use `<=`? I'll
discuss the pros and cons of having these non-strict inequalities in a
moment.
Another disadvantage of non strict inequalities like these is that they
raies `NotImplementedError` when comparing with 0. This is because of
the nature of the c++ routines. When comparing every pair of elements
they don't consider cases where both are zero.
### `A >= B`
Like `<=` this operation is only efficient for one case, when `B > 0`.
And also like `<=`, it is not very useful.
## Why have non strict inequalities?
I can't really think of uses for `>=` and `<=` but they might exist, I
added them because they were easy to implement once I had done `>` and
`<`. In practice, they don't slow the usage down. But removing non -
strict inequalities removes _24,990_ lines of code.
# Implementation
The inequality operations are implemented as c++ routines, these are
wrapped to provide a python interface using SWIG, then the various
inequalities are added appropriately.
## C++
The inequalities routines were implemented by reusing existing routines
like [`csr_binop_csr`](). Overrides for inequalities had to be added to
[`complex_wrapper.h`](https://github.com/cowlicks/scipy/blob/master/scipy/sparse/sparsetools/complex_ops.h) too.
The binop routines had to altered, another class was added to their
templates, `T2`, for the type of data out. In theory this could be
something other than bool.
## SWIG
To handle boolean output data a new [swig
macro](https://github.com/cowlicks/scipy/blob/binop-output/scipy/sparse/sparsetools/sparsetools.i#L203)
was added to create typemaps that have `npy_bool_wrapper` as this output
class. I then use these to instantiate the boolean operations.
## Python
Here the associated python special methods corresponding to the
inequalities were added in
[`compressed.py`](https://github.com/cowlicks/scipy/blob/inequalities-override/scipy/sparse/compressed.py#L231).
These operations can be used with a scalar, dense, or sparse matrix.
Numpy style broadcasting is not implemented.
To produce a bool output by default for boolean comparisons, I altered
the
[`_binopt`](https://github.com/cowlicks/scipy/blob/binop-output/scipy/sparse/compressed.py#L755)
function to pass the c++ routines a matrix to use for output with a bool
dtype.
# Problems
When testing the inequalities with dense matrices, I could not get the proper
behavior with a dense matrix on the left hand side. Previously, with `!=` and
`==` I modified the `__bool__` method but that did not work in this case. This
is because of the same problem I will be dealing with in the next stage of my
proposal, interactions with numpy ufuncs.