Distarray Lightning Talk Slides

Last week I gave a lightning talk at the Austin Python User Group about the Distarray project which I've been working on at Enthought.

I briefly discussed what Distarray is, then compute the Julia set using both NumPy and Distarray. From my abstract:

Distarray is project which aims to allow NumPy arrays and operations to be easily distributed/parallelized, while maintaining a NumPy like interface. This is done by allowing the user to specify the array distribution across processes in array creation routines, these objects are the distarrays. We wrap IPython.parallel to do this.

We leverage PyTrillinos routines for parallel operations. PyTrillinos is being adapted to use the Distributed Array Protocol (another Enthought project). Here I present a demo for computing the Julia set, written 2 different ways. With pure NumPy, and with Distarray.

After the talk I had some interesting discussions about the architectural choices of distarray. In particular: Why did we choose IPython.parallel, and PyTrillinos?

IPython.parallel is problematic because its scaling is limited. Partially because it is built on zeromq (ØMQ), which lacks infiniBand support. Other backends of IPython.parallel have only been tested to scale to 10's of nodes.

From discussions with the team, there are two reasons we chose IPython.parallel.

  • It provides a unparalleled (no pun intended) level of interactivity.
  • Most jobs on supercomputers use less than 10 nodes (citation needed).

I can't speak much to the choice of PyTrillinos over something like PETSc. But PETSc does seem to have more users. In theory we could use PETSc if they decide to implement the distributed array protocol.

My slides:

Xubuntu Tweaks

My I recently upgraded my office computer to Ubuntu 13.04. Rebooting into the Unity environment was a total failure seemingly due to some graphics card/X11 issues. So I installed the xubuntu-desktop package instead of fooling around with graphics drivers/xorg.conf (relevant xkcd).

These are a few tweaks and hotkey changes that I made to match (standard?) Ubuntu hotkeys.

Ctrl + Alt + t to launch terminal

menu > Settings Manager > Keyboard > Application Shortcuts > Add

I chose to use the native xfce4-terminal so just enter that and bind it to your desired hotkey.

Application launcher takes 5 seconds to launch...

menu > Settings Manager > Keyboard > Application Shortcuts

At the bottom is xfrun4. Select it and change the command to xfrun4 --disable-server (source).

Ctrl + Alt + l lock workspace

menu > Settings Manager > Keyboard > Application Shortcuts

Change xflock4 to the desired hotkey.

Ctrl + Alt + Shift + Arrow to Move window to Arrow workspace

menu > Settings Manager > Window Manager > Keyboard

Find Move window to * workspace then bind as desired.

Wrapping Up Google Summer of Code

My Summer has been over for a while, but GSoC is officially ending this week. And what do I have to show?

  • SciPy's sparse matrices now support boolean data.
  • Sparse matrices now support boolean operations (!=, <=, etc.).
  • NumPy universal functions (ufuncs) can now be overridden for compatibility.
  • Sparse matrices uses this ufunc override functionality to do sensible things when acted on by ufuncs.
  • Support fancy indexing.
  • Axis arguments for min and max methods.
  • various other things: speed ups, small features, test coverage, bugs fixes, code clean ups, and a few backports.

But I think that I have benefited personally from this experience much more than SciPy has. I learned so much this summer, a short list of things I know vastly more about than I did before this summer:

  • PYTHON. Jeez looking at code I wrote before this summer is painful.
  • The Python and NumPy C API, before this summer I was a total C noob. Now I can scratch together a python API in C.
  • SciPy.
  • NumPy, numpy.frompyfunc is awesome.
  • Git, before this summer I could commit thing to my own repos and that was pretty much it. Git hooks are awesome.
  • In general, development workflow.

I grown attached to SciPy and am excited I can make meaningful contributions to that are actually helping people. But contributing more will have to wait untill after the GRE, grad school applications, and my senior thesis. There are a few things I would like to add down the road to incorporate into SciPy 1.0. e.g. 64bit indices, change spares matrices to emulate numpy.array rather than numpy.matrix. I would also like to see if I can apply ufunc overrides in a few other places that mix numpy.array and scipy.sparse like scikit-learn.

Here is a link to git log -p upstream/master --author="Blake Griffith" in numpy and scipy.

Fancy Indexing in Sparse Matrices

Recently I've been working on a pull request for fancy indexing in sparse matrices. Most of the work has already been done by Daniel Smith (thanks!). I've been working on fixing some bugs that were causing test failures. And cleaning up the interface.

Below is some data I will be working with throughout this post.

In [1]:
import numpy as np
import scipy.sparse as sp

A = np.asmatrix(np.arange(50).reshape(5, 10))
Asp = sp.csr_matrix(A)

Empty Indexing

Python is pretty forgiving in what it allows for indexing with slices. For example it does not raise an error when the upper bound of a slice is outside of the object. Things like list(range(3))[:9000] are perfectly legal. Here I adopted python's forgiveness when implementing how sparse matrices should be have when indexing should return something empty.

What should happen when indexing should return an empty matrix? With ndarrays, a matrix of "size" 0 is returned. Where is size is all the elements of shape multiplied:

In [2]:
A[1:1], A[1:2:-1]
(matrix([], shape=(0, 10), dtype=int64),
 matrix([], shape=(0, 10), dtype=int64))

However in scipy.sparse there is currently no way to create a size 0 matrix. There is an explicit requirement in the sparse matrix base class that both entries of shape be >= 1. So the next best thing I could do is to return a size 1 matrix with a (1,1) shape and a single zero element. This is currently supported by all sparse matrix formats. The result of fancy indexing is sometimes created using the coordinate sparse matrix format (COO). A small change in its __init__ method now has it return a (1, 1) empty COO matrix when you try to create a COO matrix which would haze size 0. There was also a change to how slices are checked to allow the start and stop arguments to be equal, and if so to return an empty matrix as described above. So now we have:

In [3]:
Asp[1:1], Asp[1:2:-1]
(<1x1 sparse matrix of type '<class 'numpy.int64'>'
	with 0 stored elements in Compressed Sparse Row format>,
 <1x1 sparse matrix of type '<class 'numpy.float64'>'
	with 0 stored elements in Compressed Sparse Row format>)

Sparse Boolean Indexing

I also add a new feature to the fancy indexing. Since Boolean sparse matrices work now, I added support for indexing with them. So this works:

In [4]:
Asp[Asp > 30] = 42
matrix([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
        [30, 42, 42, 42, 42, 42, 42, 42, 42, 42],
        [42, 42, 42, 42, 42, 42, 42, 42, 42, 42]], dtype=int64)

Supporting this was mostly a matter of allowing existing code paths to accept the sparse matrices.

However the boolean indexing is not completely analogous to NumPy. Since sparse matrices cannot be one dimensional. So there is no way to do something like.

In [5]:
bool_vector = np.random.randint(2, size=5).astype(np.bool)
matrix([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
        [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])

CSC's .nonzero method

There was still a bit of work to do with CSC. The CSC __getitem__ was simplified to just two cases. The first is the case where a sparse matrix is returned, the second is the case where a sequence of elements is returned. However there was still a bug which was causing problems. The csc_matrix.nonzero method was return its indices sorted incorrectly. NumPy states that ndarray.nonzero should sort it's indices C-style, with the last entry varying the fastest. However the indices in this case were sorted with the first index varying the fastest.

I fixed this by defining a .nonzero method in the csc_matrix class to override the method defined in the base class (spmatrix). I tried to fix this in a few other ways. Like by resorting the indices in _cs_matrix.tocoo which is used to define the base .nonzero but this caused a lot of round off problems for some reason I could not figure out.

Week-9 Ufunc Overrides

This week I'm still working on the pull request for ufunc overrides, however I think it is finally nearing completion. The implementation is working as described and all tests are passing. I'm looking forward to working on SciPy again.

The main aspects I've been working on are ufunc overrides method resolution order, and argument normalization.

Method Resolution Order (MRO)

Python functions in certain cases have a special MRO which is a bit more complicated than the default left to right. Whenever ther right argument is a subtype of the left argument its special method corresponding to the calling function is called first. So for example, add(a, b) would normally call a.__add__(b) but if b is a subtype of a, then b.__radd__(a) is called. However there are no examples in the Python standard library of a ternary or greater function from which we can copy. So we have tried to generalize this concept to a higher order.

The MRO algorithm is succinctly stated by @njsmith as:

While there remain untried inputs, pick the leftmost one which does not have a proper subtype still remaining to be checked.

Where inputs are the inputs to the ufunc. Trying an input refers to calling its __numpy_ufunc__ method (if it has one) and seeing that it does not return NotImplemented.

So Basically if a ufunc has three inputs (a, b, as) in that order, where as is a subtype of a. They will be tried in the order b > as > a.

Argument Normalization

Many of NumPy's ufuncs can take a out argument, which specifies where the output of the ufunc is assigned. The out argument can be passed as either a positional, or keyword argument. So naïvely checking a ufuncs arguments for __numpy_ufunc__ and then passing said arguments unaltered is not always sensible. So given nargs - the number of args, and nin - the number of inputs, we can check for an out positional argument and move it to the keyword arguments. The normalized arguments are then passed the __numpy_ufunc__ method.

week-8 More ufunc overrides

This week was spent implementing the API for overriding ufuncs. After a lengthy discussion, I decided that it would be best to proceed with implementing things using a duck typing mechanism, similar to the one Python uses, instead of multi-methods. Nathaniel Smith and Pauli Virtanen had a lot of great suggestions on the interface which has been totally changed since last week. __ufunc_override__ has become __numpy_ufunc__ and is now a method which takes arguments like:

def __numpy_ufunc__(self, ufunc, method, i, inputs, kwargs):


  • ufunc is the ufunc object that was called.
  • method is a string indicating which Ufunc method was called (one of "__call__", "reduce", "reduceat", "accumulate", "outer", "inner").
  • i is the index of self in inputs.
  • inputs is a tuple of the input arguments to the ufunc
  • kwargs is a dictionary containing the optional input arguments of the ufunc. The out argument is always contained in kwargs, if given.

The implementation is best described by the NEP:

  • If one of the input arguments implements __numpy_ufunc__ it is executed instead of the Ufunc.

  • If more than one of the input arguments implements __numpy_ufunc__, they are tried in the following order: subclasses before superclasses, otherwise left to right. The first __numpy_ufunc__ method returning something else than NotImplemented determines the return value of the Ufunc. This is to accommodate Python's own MRO behavior.

  • If all __numpy_ufunc__ methods of the input arguments return NotImplemented, a TypeError is raised.

  • If a __numpy_ufunc__ method raises an error, the error is propagated immediately.


There have been several models working in a limited fashion so far, but here is an example. I added this to scipy/sparse/base.py

def __numpy_ufunc__(*args, **kwargs): self = args[0] ufunc = args[1] method = args[2] i = args[3] ufunc_args = args[4] if ufunc == np.multiply and method == "__call__": other_args = tuple(j for j in ufunc_args if j is not ufunc_args[i]) return self.multiply(*other_args, **kwds) else: return NotImplemented

So __numpy_ufunc__ is now a method on all sparse matrices.

In [21]:
import scipy.sparse as sp
import numpy as np

a = np.random.randint(5, size=(3,3))
b = np.random.randint(5, size=(3,3))

asp = sp.csr_matrix(a)
bsp = sp.csr_matrix(b)
In [22]:
array([[ 2,  0, 12],
       [ 0,  0, 12],
       [ 3,  0,  9]])
In [23]:
np.multiply(asp, bsp).todense()
matrix([[ 2,  0, 12],
        [ 0,  0, 12],
        [ 3,  0,  9]], dtype=int64)
In [24]:
matrix([[ 2,  0, 12],
        [ 0,  0, 12],
        [ 3,  0,  9]], dtype=int64)
In [25]:
np.multiply(a, bsp)
matrix([[ 2,  0, 12],
        [ 0,  0, 12],
        [ 3,  0,  9]], dtype=int64)

week-7 Adding Ufunc Overrides to NumPy

This week was spent adding the Ufunc override functionality to NumPy. There is a pull request, a lot of review still needs to be done. The current implementation is still subject to change, and docs and tests need to be added. But I feel much more comfortable with ufuncs inner workings and the Python/NumPy C API now.

Implementation Changes

The code path of interest here is in numpy/core/src/umath/ufunc_object.c when ufunc_generic_call is called. Here there is a new function which checks the arguments for an overriding function. If it is found, it is called with the arguments.

A few thing which could be changed in the implementation:

  • The name of the attribute. Since numpy.dot is not a ufunc but we still want to override it. __ufunc_override__ might not be the best name.
  • The overriding function should be moved into the array API or ufunc API.
  • The keying of __ufunc_override__. Currently this dictionary is keyed with the __name__ of the ufnucs, however this probably is not unique. I'm currently changing the implementation to be keyed with the callable ufunc function, I'll push this if it is working out alright because it seems more reliable.
  • Coercing the args before passing them to the overriding function. Currently args are passed unchanged to the overriding function, however this can be a problem if the overriding function is a method that expects self to be the first argument. A possible solution is to just change the args to have the overriding arg come first. But this might be problematic for ufuncs which are not associative.
  • ??? I don't have a lot of feedback from the NumPy community yet. There could be a bigger problem I'm missing.

New Ufunc Overhead

Now _find_ufunc_override is called for every ufunc call. For standard cases where the ufunc is operating on ndarrays or scalars the overhead comes from calling:

if (PyArray_CheckExact(obj) || PyArray_IsAnyScalar(obj)) {

On each argument. This adds about 5μs to 10μs, or 4% to 5% decrease in speed. (Tested with %timeit and multiply and add). Either way this seems to be acceptably small. I'm going to try Arink Verma's setup to get a more accurate benchmarks.

week-6 Ufunc Overrides

This week marked the beginning of the next stage of my proposal ambiguously named "NumPy Interactions". The first item scheduled is "Modify NumPy universal functions (ufuncs) to aware of sparse matrices (1 week)" The first thing I realized this wee is that my time allotment was too short. This is going to take a several weeks.

I'll explain the problem sparse matrices have with sparse matrices first, then describe my proposed solution.


Ufunc don't play well with objects that are not derived from ndarrays. There is some limited functionality such as __array_prepare__ and __array_wrap__ but these methods still prohibit you from accessing the array data, or changing it shape (not very useful for coercing sparse matrices into something ufuncable)

Sparse matrices cannot be used with ufuncs reliably, for example:

In [1]:
import numpy as np
import scipy.sparse as sp

a = np.random.randint(5, size=(3,3))
b = np.random.randint(5, size=(3,3))

asp = sp.csr_matrix(a)
bsp = sp.csr_matrix(b)
In [2]:
a, b
(array([[0, 4, 4],
       [1, 3, 2],
       [1, 3, 1]]),
 array([[0, 1, 0],
       [0, 0, 1],
       [4, 0, 1]]))
In [3]:
np.multiply(a, b) # The right answer
array([[0, 4, 0],
       [0, 0, 2],
       [4, 0, 1]])
In [4]:
np.multiply(asp, bsp).todense() # calls __mul__ which does matrix multiplication
matrix([[16,  0,  8],
        [ 8,  1,  5],
        [ 4,  1,  4]], dtype=int64)
In [5]:
np.multiply(a, bsp) # Returns NotImplemented to user, bad! 

Returning NotImplemented to user should not happen. I'm not sure if the blame for this lies in scipy.sparse or numpy, but it should be fixed.

In [6]:
np.multiply(asp, b)
/home/blake/.virtualenvs/scipy3/lib/python3.2/site-packages/scipy/sparse/compressed.py:350: SparseEfficiencyWarning: Comparing sparse matrices using >= and <= is inefficient, try using <, >, or !=, instead.
  "try using <, >, or !=, instead.", SparseEfficiencyWarning)
array([[ <3x3 sparse matrix of type '<class 'numpy.int64'>'
	with 8 stored elements in Compressed Sparse Row format>,
        <3x3 sparse matrix of type '<class 'numpy.int64'>'
	with 8 stored elements in Compressed Sparse Row format>,
        <3x3 sparse matrix of type '<class 'numpy.int64'>'
	with 8 stored elements in Compressed Sparse Row format>],
       [ <3x3 sparse matrix of type '<class 'numpy.int64'>'
	with 8 stored elements in Compressed Sparse Row format>,
        <3x3 sparse matrix of type '<class 'numpy.int64'>'
	with 8 stored elements in Compressed Sparse Row format>,
        <3x3 sparse matrix of type '<class 'numpy.int64'>'
	with 8 stored elements in Compressed Sparse Row format>],
       [ <3x3 sparse matrix of type '<class 'numpy.int64'>'
	with 8 stored elements in Compressed Sparse Row format>,
        <3x3 sparse matrix of type '<class 'numpy.int64'>'
	with 8 stored elements in Compressed Sparse Row format>,
        <3x3 sparse matrix of type '<class 'numpy.int64'>'
	with 8 stored elements in Compressed Sparse Row format>]], dtype=object)

I'm not sure what happened here either, but I think raising TypeError would be preferable.

Proposed Solution

Basically, I want to implement a mechanism that will let arguments override ufuncs. So any object that ufuncs do not work with can define the attribute __ufunc_override__ as a dictionary keyed the name of the ufunc to be overridden, and valued with the callable function which should override it.

Each time a ufunc is run it will check each argument for the __ufunc_override__ attribute. So this will add the overhead of one attribute check for each argument. The logic of calling the correct replacement function will come after this check. So it won't impact performance for normal operations that don't want to override the ufunc.

I prototyped this behavior in python. ufunc_override.py (link) defines a decorator that preforms the logic described above. Then all the ufuncs in numpy are decorated with it. So import ufunc_override imports a wrapped version of numpy.

I'm also working with a slightly modified version of scipy.sparse. With this added to the base class.

__ufunc_override__ = {'multiply':multiply}

The relavant code on this branch is here.

A little demo.

In [7]:
import ufunc_override as uo
In [8]:
{'multiply': <function scipy.sparse.base.multiply>}
In [9]:
uo.multiply(asp, bsp).todense()
matrix([[0, 4, 0],
        [0, 0, 2],
        [4, 0, 1]], dtype=int64)
In [10]:
uo.multiply(a, bsp)
matrix([[0, 4, 0],
        [0, 0, 2],
        [4, 0, 1]], dtype=int64)
In [11]:
uo.multiply(asp, b)
matrix([[0, 4, 0],
        [0, 0, 2],
        [4, 0, 1]], dtype=int64)

It works! How about we try with something a little more basic too, a bare bones class that should override the add ufunc.

In [12]:
class DemoClass(object):
    def replacement_func(*args):
        return 42 # The answer.
    __array_priority__= 66 # Some number that's > 10.1
foo = DemoClass()
In [13]:
uo.add(a, foo)
In [14]:
uo.add(asp, foo)


I recently sent a pull request for adding sparse matrix boolean comparisons to SciPy. However when comparing a sparse matrix with dense matrix, with the dense matrix on the LHS, the behavior was unpredictable. This is because the __array_priority__ was not being respected. Which was fixed in NumPy by this pull request #3324 (Thanks nouiz!). This will be included in NumPy 1.8, but I would like to have it in the current stable version of NumPy, so that I can use it in my work with SciPy. So I am backporting it.

I thought this backporting process was a insightful demonstration of slightly more advanced git topics. So I'm writing this walkthrough.

Basically we make a branch off of numpy/maintenance/1.7.x, cherry pick a commit from numpy/master then submit a pull request back to numpy/maintenance/1.7.x.

  1. Assuming you already have a fork of Numpy on github. We need to update it from upstream.

    # Add upstream.
    git remote add upstream https://github.com/numpy/numpy.git
    # Get the latest updates.
    git fetch upstream
    # Make sure you are on master.
    git checkout master
    # Apply the updates locally.
    git rebase upstream/master
    # Push the updated code to your github repo.
    git push origin
  2. Next we need to make the branch we will work on. This needs to be based on the older version of numpy (not master).

    # Make a new branch based of numpy/maintenance/1.7.x, 
    # backport-3324 is our new name for the branch.
    git checkout -b backport-3324 upstream/maintenance/1.7.x
  3. Now we need to apply the changes from master to this branch using cherry-pick

    # This pull request included commits aa7a047 to c098283 (inclusive)
    # so use the .. syntax, the ^ makes the range inclusive.
    git cherry-pick aa7a047^..c098283
    # Fix any conflicts then if needed.
    git cherry-pick --continue
  4. I ran into some conflicts cherry picking here. These are resolved the same as with merge/rebase conflicts. Except here you can use git blame to see the difference between master and the backported branch to make sure nothing gets screwed up.

  5. Push your new branch to your git hub repo.

    git push -u origin backport-3324

Finally make your pull request using github.com. My pull request is here.

I wonder how long it will take for Travis ci to update it version of NumPy for testing SciPy?