Last week I gave a lightning talk at the Austin Python User Group about the Distarray project which I've been working on at Enthought.
I briefly discussed what Distarray is, then compute the Julia set using both NumPy and Distarray. From my abstract:
Distarray is project which aims to allow NumPy arrays and operations to be easily distributed/parallelized, while maintaining a NumPy like interface. This is done by allowing the user to specify the array distribution across processes in array creation routines, these objects are the distarrays. We wrap IPython.parallel to do this.
We leverage PyTrillinos routines for parallel operations. PyTrillinos is being adapted to use the Distributed Array Protocol (another Enthought project). Here I present a demo for computing the Julia set, written 2 different ways. With pure NumPy, and with Distarray.
After the talk I had some interesting discussions about the architectural choices of distarray. In particular: Why did we choose IPython.parallel, and PyTrillinos?
IPython.parallel is problematic because its scaling is limited. Partially because it is built on zeromq (ØMQ), which lacks infiniBand support. Other backends of IPython.parallel have only been tested to scale to 10's of nodes.
From discussions with the team, there are two reasons we chose IPython.parallel.
- It provides a unparalleled (no pun intended) level of interactivity.
- Most jobs on supercomputers use less than 10 nodes (citation needed).
I can't speak much to the choice of PyTrillinos over something like PETSc. But PETSc does seem to have more users. In theory we could use PETSc if they decide to implement the distributed array protocol.