Skip to main content

Structured parallel communication


One of the PM design goals was to create a combined model for parallelisation and vectorisation. Basic PM parallel structures have the same form when distributing code over a distributed cluster as they do running vector operations on a single core. This not to say that this underlying hardware structure is invisible to the programmer; the mapping from parallel programming structures to hardware is both explicit and configurable, as will be described in a future post.

A previous post introduced the PM for statement and communicating operators. The for statement executes its body of statements concurrently for each member of an array or domain.  Communicating operators provide both communication and synchronisation by allowing a local variable to be viewed as an array over the complete ‘iteration’ domain.

In the absence of other control structures, communicating operators act as straightforward synchronisation points between invocations:

 for .. do   
      statements_a  
      .. @x ..  
      statements_b  
      .. @y ..   
      statements_c  
 endfor  

In terms of logical program flow, every invocation of the for statement body must complete statements_a and the communicating operator @x before any invocation can proceed to statements_b. Similarly, all invocations must complete statements_b and @y before any invocation can commence statements_c. The actual implementation may soften this hard conceptual lockstep, though only if this can be made transparent to the programmer. Once nested control structures come into the picture, things get a little more complicated. Here PM imposes the structured parallel communication rule: each invocation must execute the same operators in the same order on the same variables.


In practice, this rule is implemented though a set of more specific rules associated with different forms of control statement. For conditional statements, each branch must contain the same sequence of operators:
 if .. then  
      ..@x..  
      ..@y..  
 else  
      ..@x..  
      ..@y..  
 endif  

This rule, of course, applies recursively:
 if .. then  
      if .. then  
           ..@x..  
      else  
           ..@x..  
      endif  
      ..@y..  
 else  
      ..@x..  
      ..@y..  
 endif  
With loops, the rule is relaxed slightly. Loops containing communicating operators can execute differing numbers of times in each invocation. When a loop in one invocation runs for longer than those in other invocations, any communicating operator that it executes will obtain variable loop-exit values from those invocations in which the loop has already terminated (for a zero-iteration loop the exit value for a variable is equal to the entry value). The end of any loop containing communicating operators is an implicit synchronisation point. Execution only proceeds beyond the end of such a loop once all invocations have attained the loop exit conditions.
 for x in grid do  
      while abs(mean::x@{-1:1,-1:1}-x) > thresh do  
           x=mean::x@{=1:1,-1:1}  
      endwhile  
 endfor  
The most complex case of structured parallel communication concerns loops inside conditional statements. Here the rule is that each branch of the conditional statement must contain an iterative loop in the same position, respective to the overall sequence of communicating operators, and that each of these loops must contain the same sequence of operators. The loops do not have to be of the same type. For example, one could be a while loop and the other a repeat.. until. It is also possible for the operator sequence in one loop to be an exact repetition on the sequence in a corresponding loop.

 if .. then  
      while .. do  
           .. @x ..  
           .. @y ..  
      endwhile  
 else  
      repeat  
           .. @x .. 
           .. @y ..  
           .. @x ..  
           .. @y ..  
      until ..  
 endif       
Structured parallel communication is designed to achieve a number of goals. As with other structuring regimes, it is hoped that it will assist the early detection of bugs. It also makes possible a range of efficient implementation approaches, ranging from vector operations to MPI collective communication (to be discussed in a future post).

Of course, there will be some parallel algorithms that do not fit into this structured model. These are facilitated through a more conventional message-passing model, modified to work in both vectorised and parallelised environments. Again, more about this at a later date.


Comments

Popular posts from this blog

‘Tis the season

… to give an end-of-year update of what is happening with the PM language. It is a long time now since I wrote a PM blog. However, behind the scenes there has been a lot of work done on the guts of the compiler with new inference and optimisation passes added and others taken apart, polished, and put back together again. New language features have also been implemented, including methods, improved object lifetime management and, most importantly, a radically simplified approach to coding stencils. The handling of stencils was one aspect of PM-0.4 with which I was least happy. Since these are a key requirement for nearly all numerical modelling, I have long been looking for a way to make their coding as straightforward as possible. However, this has been one of the greatest challenges in terms of implementation, since the amount of code restructuring need to make a stencil operate efficiently (separating out halo computation, overlapping computation and halo exchange, tiling and the in...

Compile time, run time, coffee time

[ Please note that in the following discussion I will be using PM 0.5 notation, which is slightly different to PM 0.4 and also in some flux as PM 0.5 is still in development. ]   Many programming languages (and most newly developed ones) include some form of compile-time programming, ranging from simple preprocessors ( #define , #if in C) to fully blown macro systems capable of re-writing the abstract syntax tree during compilation (Rust, Julia, etc .). In line with its philosophy of keeping things as simple as possible, but not simpler, PM takes a middle road with compile-time programming supported primarily through the type system. There is nothing too radical here – this is not an area where the language aims to break new ground.  The PM approach centres around the use of compile-time types. Many languages use special types to denote literal values and PM follows this trend. Literal integers, reals, strings and Booleans each have their own types: literal(int) , litera...

WYSIWYG

PM was in part inspired by ZPL , a parallel array language. One of the key goals of the ZPL project was to provide a “What You See Is What You Get” (WYSIWYG) approach to parallel communication – in particular as this related to communication overhead. This was achieved using a range of operators that explicitly invoke parallel communication on distributed arrays. For example, the @ operator moves the elements of an array by a fixed displacement while the # operator provides for more general element permutation. A programmer would know that the former operation is much less expensive than the latter. To shift a distributed array by a fixed displacement, computational nodes only need to exchange information for edge cells that have moved into the region governed by a different node. However, general permutation requires all nodes to request and then receive data from arbitrary other nodes and for this exchange to be globally synchronized. The ZPL # operator thus needs avoiding unless ...