week-7 Adding Ufunc Overrides to NumPy
This week was spent adding the Ufunc override functionality to NumPy. There is a pull request, a lot of review still needs to be done. The current implementation is still subject to change, and docs and tests need to be added. But I feel much more comfortable with ufuncs inner workings and the Python/NumPy C API now.
Implementation Changes
The code path of interest here is in
numpy/core/src/umath/ufunc_object.c
when ufunc_generic_call
is
called. Here there is a new function which checks the
arguments for an overriding function. If it is found, it is called with the
arguments.
A few thing which could be changed in the implementation:
The name of the attribute. Since
numpy.dot
is not a ufunc but we still want to override it.__ufunc_override__
might not be the best name.The overriding function should be moved into the array API or ufunc API.
The keying of
__ufunc_override__
. Currently this dictionary is keyed with the__name__
of the ufnucs, however this probably is not unique. I'm currently changing the implementation to be keyed with the callable ufunc function, I'll push this if it is working out alright because it seems more reliable.Coercing the args before passing them to the overriding function. Currently args are passed unchanged to the overriding function, however this can be a problem if the overriding function is a method that expects
self
to be the first argument. A possible solution is to just change the args to have the overriding arg come first. But this might be problematic for ufuncs which are not associative.??? I don't have a lot of feedback from the NumPy community yet. There could be a bigger problem I'm missing.
New Ufunc Overhead
Now _find_ufunc_override
is called for every ufunc call. For
standard cases where the ufunc is operating on ndarrays or scalars the
overhead comes from calling:
if (PyArray_CheckExact(obj) || PyArray_IsAnyScalar(obj)) { continue;}
On each argument. This adds about 5μs to 10μs, or 4% to 5% decrease in speed. (Tested with %timeit and multiply and add). Either way this seems to be acceptably small. I'm going to try Arink Verma's setup to get a more accurate benchmarks.
Comments
Comments powered by Disqus