week-6 Ufunc Overrides

This week marked the beginning of the next stage of my proposal ambiguously named "NumPy Interactions". The first item scheduled is "Modify NumPy universal functions (ufuncs) to aware of sparse matrices (1 week)" The first thing I realized this wee is that my time allotment was too short. This is going to take a several weeks.

I'll explain the problem sparse matrices have with sparse matrices first, then describe my proposed solution.

Problem

Ufunc don't play well with objects that are not derived from ndarrays. There is some limited functionality such as __array_prepare__ and __array_wrap__ but these methods still prohibit you from accessing the array data, or changing it shape (not very useful for coercing sparse matrices into something ufuncable)

Sparse matrices cannot be used with ufuncs reliably, for example:

In [1]:
import numpy as np
import scipy.sparse as sp

a = np.random.randint(5, size=(3,3))
b = np.random.randint(5, size=(3,3))

asp = sp.csr_matrix(a)
bsp = sp.csr_matrix(b)
In [2]:
a, b
Out[2]:
(array([[0, 4, 4],
       [1, 3, 2],
       [1, 3, 1]]),
 array([[0, 1, 0],
       [0, 0, 1],
       [4, 0, 1]]))
In [3]:
np.multiply(a, b) # The right answer
Out[3]:
array([[0, 4, 0],
       [0, 0, 2],
       [4, 0, 1]])
In [4]:
np.multiply(asp, bsp).todense() # calls __mul__ which does matrix multiplication
Out[4]:
matrix([[16,  0,  8],
        [ 8,  1,  5],
        [ 4,  1,  4]], dtype=int64)
In [5]:
np.multiply(a, bsp) # Returns NotImplemented to user, bad! 
Out[5]:
NotImplemented

Returning NotImplemented to user should not happen. I'm not sure if the blame for this lies in scipy.sparse or numpy, but it should be fixed.

In [6]:
np.multiply(asp, b)
/home/blake/.virtualenvs/scipy3/lib/python3.2/site-packages/scipy/sparse/compressed.py:350: SparseEfficiencyWarning: Comparing sparse matrices using >= and <= is inefficient, try using <, >, or !=, instead.
  "try using <, >, or !=, instead.", SparseEfficiencyWarning)
Out[6]:
array([[ <3x3 sparse matrix of type '<class 'numpy.int64'>'
	with 8 stored elements in Compressed Sparse Row format>,
        <3x3 sparse matrix of type '<class 'numpy.int64'>'
	with 8 stored elements in Compressed Sparse Row format>,
        <3x3 sparse matrix of type '<class 'numpy.int64'>'
	with 8 stored elements in Compressed Sparse Row format>],
       [ <3x3 sparse matrix of type '<class 'numpy.int64'>'
	with 8 stored elements in Compressed Sparse Row format>,
        <3x3 sparse matrix of type '<class 'numpy.int64'>'
	with 8 stored elements in Compressed Sparse Row format>,
        <3x3 sparse matrix of type '<class 'numpy.int64'>'
	with 8 stored elements in Compressed Sparse Row format>],
       [ <3x3 sparse matrix of type '<class 'numpy.int64'>'
	with 8 stored elements in Compressed Sparse Row format>,
        <3x3 sparse matrix of type '<class 'numpy.int64'>'
	with 8 stored elements in Compressed Sparse Row format>,
        <3x3 sparse matrix of type '<class 'numpy.int64'>'
	with 8 stored elements in Compressed Sparse Row format>]], dtype=object)

I'm not sure what happened here either, but I think raising TypeError would be preferable.

Proposed Solution

Basically, I want to implement a mechanism that will let arguments override ufuncs. So any object that ufuncs do not work with can define the attribute __ufunc_override__ as a dictionary keyed the name of the ufunc to be overridden, and valued with the callable function which should override it.

Each time a ufunc is run it will check each argument for the __ufunc_override__ attribute. So this will add the overhead of one attribute check for each argument. The logic of calling the correct replacement function will come after this check. So it won't impact performance for normal operations that don't want to override the ufunc.

I prototyped this behavior in python. ufunc_override.py (link) defines a decorator that preforms the logic described above. Then all the ufuncs in numpy are decorated with it. So import ufunc_override imports a wrapped version of numpy.

I'm also working with a slightly modified version of scipy.sparse. With this added to the base class.

__ufunc_override__ = {'multiply':multiply}

The relavant code on this branch is here.

A little demo.

In [7]:
import ufunc_override as uo
In [8]:
asp.__ufunc_override__
Out[8]:
{'multiply': <function scipy.sparse.base.multiply>}
In [9]:
uo.multiply(asp, bsp).todense()
Out[9]:
matrix([[0, 4, 0],
        [0, 0, 2],
        [4, 0, 1]], dtype=int64)
In [10]:
uo.multiply(a, bsp)
Out[10]:
matrix([[0, 4, 0],
        [0, 0, 2],
        [4, 0, 1]], dtype=int64)
In [11]:
uo.multiply(asp, b)
Out[11]:
matrix([[0, 4, 0],
        [0, 0, 2],
        [4, 0, 1]], dtype=int64)

It works! How about we try with something a little more basic too, a bare bones class that should override the add ufunc.

In [12]:
class DemoClass(object):
    def replacement_func(*args):
        return 42 # The answer.
    __array_priority__= 66 # Some number that's > 10.1
    __ufunc_override__={'add':replacement_func}
    
foo = DemoClass()
In [13]:
uo.add(a, foo)
Out[13]:
42
In [14]:
uo.add(asp, foo)
Out[14]:
42

Comments

Comments powered by Disqus