week-7 Adding Ufunc Overrides to NumPy

This week was spent adding the Ufunc override functionality to NumPy. There is a pull request, a lot of review still needs to be done. The current implementation is still subject to change, and docs and tests need to be added. But I feel much more comfortable with ufuncs inner workings and the Python/NumPy C API now.

Implementation Changes

The code path of interest here is in numpy/core/src/umath/ufunc_object.c when ufunc_generic_call is called. Here there is a new function which checks the arguments for an overriding function. If it is found, it is called with the arguments.

A few thing which could be changed in the implementation:

  • The name of the attribute. Since numpy.dot is not a ufunc but we still want to override it. __ufunc_override__ might not be the best name.
  • The overriding function should be moved into the array API or ufunc API.
  • The keying of __ufunc_override__. Currently this dictionary is keyed with the __name__ of the ufnucs, however this probably is not unique. I'm currently changing the implementation to be keyed with the callable ufunc function, I'll push this if it is working out alright because it seems more reliable.
  • Coercing the args before passing them to the overriding function. Currently args are passed unchanged to the overriding function, however this can be a problem if the overriding function is a method that expects self to be the first argument. A possible solution is to just change the args to have the overriding arg come first. But this might be problematic for ufuncs which are not associative.
  • ??? I don't have a lot of feedback from the NumPy community yet. There could be a bigger problem I'm missing.

New Ufunc Overhead

Now _find_ufunc_override is called for every ufunc call. For standard cases where the ufunc is operating on ndarrays or scalars the overhead comes from calling:

if (PyArray_CheckExact(obj) || PyArray_IsAnyScalar(obj)) {
    continue;}

On each argument. This adds about 5μs to 10μs, or 4% to 5% decrease in speed. (Tested with %timeit and multiply and add). Either way this seems to be acceptably small. I'm going to try Arink Verma's setup to get a more accurate benchmarks.

Comments

Comments powered by Disqus