PM was in part inspired by ZPL, a parallel array language.
One of the key goals of the ZPL project was to provide a “What You See Is What
You Get” (WYSIWYG) approach to parallel communication – in particular as this
related to communication overhead. This was achieved using a range of operators
that explicitly invoke parallel communication on distributed arrays. For example, the @ operator moves the elements of an
array by a fixed displacement while the #
operator provides for more general element permutation. A programmer would know
that the former operation is much less expensive than the latter. To shift a
distributed array by a fixed displacement, computational nodes only need to
exchange information for edge cells that have moved into the region governed by
a different node. However, general permutation requires all nodes to request and
then receive data from arbitrary other nodes and for this exchange to be
globally synchronized. The ZPL # operator thus needs avoiding unless its use is essential. This type of explicit performance model is extremely useful for creating efficient distributed code.
Unlike ZPL, PM is not an array language. However, it keeps the
concept of explicit communicating operators (see this earlier post) and also
strives to present a WYSIWYG performance model to the programmer. PM only has
two communicating operators @x[...] and x@[...] which provide access the value
of x associated with either an absolute or a relative point in the domain
of the enclosing for
statement (again see this post. for more information.) Each of these
operations is more efficient if the subscript term is independent of the point of the domain being processed than if is locally computed for each
point. For example:
const model_grid=grid(1..N,1..N),
centre_point=[N/2,N/2],
halo:=[-1..1,-1..1]
for i in model_grid do
cell:=spin_up_model%()
! This is a communicating procedure – may contain its own
! communicating (@) operations
cell= @cell[centre_point]
! Broadcast central value to all cells – efficient
for each time in 1..max_time_steps do
nbd:= cell @ [ halo ]
! Get neighbouring cells - efficient
advection := compute_motion(cell,nbd)
moved_cell := cell @ [ advection ]
! Get arbitrary cells - expensive
cell = update_model_state(cell,nbd,moved_cell)
endfor
endfor
Note that this example would work just as well if halo and centre_point had been dynamically computed, providing this was done so outside of the for statement. It is also important to note that the use of communicating operators is not dependent on data being distributed – the notation is equally applicable when the entirety of a for statement is executed on a single node. PM thus avoids forcing the user to make early choices of which loops to parallelise – the basic rule is “make everything a for statement unless you absolutely have to use a sequential for-each”. An advantage of this approach is that for statements (which explicitly prohibit inter-element interactions, except through communicating operations) are much easier to vectorize. Similar arguments could be made for SMP systems and avoiding unnecessary cache updates. The PM performance model thus applies across multiple levels of parallelism, from distributed computing to vectorization.
Comments
Post a Comment