I spent this week tying up loose ends relating to the sparse package.
# DepreciationWarning
A recent
[change](https://github.com/numpy/numpy/commit/a9a470c841eeb5f0fb2c2ae9639f6c2833f03d00) to NumPy caused `DeprecationWarning` to be thrown
whenever there was (potentially) implicit casting between dtypes.
DeprecationWarning: Implicitly casting between incompatible kinds.
In a future numpy release, this will raise an error. Use
casting="unsafe" if this is intentional.
This was happening a lot in the sparse test suite, where in-place
division and multiplication are tested. Since this behavior is being
deprecated, I removed the tests for the appropriate cases. But not
without some trouble first.
Recall that in python3 `/` is always true division. So there is always a
potential to change the type from `int` to `float`, however if the
result is expressible as an integer, `int` is returned. In cases
like this where there is no difference in type between the input and result type,
NumPy will still throw this warning. So simply checking if there is a
type difference between input and result was not enough to
programmatically remove the deprecated tests, (which I noticed when my
first patch did not work). Eventually I just added special cases to the
tests for different data.
# Output Type for SWIG Routines
The compressed types of sparse matrices (BSR, CSC, CSR) each have a set
of routines for preforming binary operations (binop) between two sparse
matrices. These routines however, only returned data that was the same
type as the input data. So boolean operations had to have their result
cast to `bool` at the python level. I modified the sparsetools routines
to output bool data where appropriate.
## Implementation
This involved changes at every level of the codebase (c++, SWIG, python), it is a good
demonstration of how these levels work together.
### C++
The binop routines are implemented as function templates in c++, where
the template arguments are (including the new one I added):
* `I`: Always `int` for storing `nnz` (number of non-zeros), vector length
etc.
* `T`: The input data type.
* `T2`: The output data type, which I added.
Adding this new argument required minimal changes to the existing code.
Basically I just changed `T` to `T2` in a few places.
Since the output type we used here was `bool` I also had to add a
conversion from the complex wrapper class to `bool`.
### SWIG
At the SWIG level typemaps define what types the function templates
inputs and outputs are. I had to add new typemaps to instantiate
functions with boolean output.
### Python
Since the new binop routines now output the correct dtype. I removed all
the casting to bool. However in the cases where boolean data must be
returned, python has to pass an empty matrix to the binop routines with
the correct dtype. So `_binopt` now checks what the operation is to
decide what kind of empty matrix it should create.
# Sparse Bool Wrapper
[Pull request](https://github.com/scipy/scipy/pull/2607). Previously the c++ routines in sparsetools defined the
bool dtype with:
typedef npy_bool_wrapper npy_int8;
This was so that bool value would be one byte. But since this is stored
as an `int8` type, it would rollover when the underlying integer got to 256.
Like this:
In [2]: a = sp.csr_matrix([True, False])
In [3]: for _ in range(8):
...: a = a + a
...: print(a.todense())
...:
[[ True False]]
[[ True False]]
[[ True False]]
[[ True False]]
[[ True False]]
[[ True False]]
[[ True False]]
[[False False]] # <----- !
So I rewrote `npy_bool_wrapper` as a class with one `char` data member
(recall that in c++ `char` is 1 byte). And added the required arithmatic
overrides for boolean algebra. e.g. 1 + 1 = 1.
# Build Failures
After modifying `complex_ops.h` in my pull request for sparse matrix
inequalities, people were getting build failures on clang and intel
compilers. This was related to ill defined overrides of the boolean
comparison operators. As an example, previously we had:
bool operator !=(const c_type& B) const{
return npy_type::real != B || npy_type::imag != c_type(0);
}
Where `c_type` is a template argument, which is used as the type of the
underlying `real` and `imag` numbers. So comparisons with things that
were not `c_type` were ambiguous. So I redefined the boolean
comparisons as template functions like:
template
bool operator !=(const T& B) const{
return npy_type::real != B || npy_type::imag != T(0);
}
Which should handle all comparisons, within reason.