next up previous contents index
Next: Parallel Efficiency Up: PerformancePortability and Scalability Previous: Two-Dimensional Block Cyclic Data

BLACS as an Efficient, Portable and Adequate Message-Passing Interface

 

The total volume of data communicated by most of the ScaLAPACK driver routines for dense matrices can be approximated by the quantity tex2html_wrap_inline16256 , where N is the order of the largest matrix operand. The number of messages, however, is proportional to N and can be approximated by the quantity tex2html_wrap_inline12070 , where NB is the logical blocking factor used in the computation. Similar to the situation described above, the ``standard'' constants tex2html_wrap_inline16270 for the communication volume depend upon the performed computation and are of the same order as the floating-point operation constants tex2html_wrap_inline16191 shown in Table 5.8. The values of the ``standard'' constants tex2html_wrap_inline16270 for a few selected ScaLAPACK drivers are presented in Table 5.8. As a result, a significant percentage of the ScaLAPACK software aims at exchanging messages between processes.

Developing an adequate message-passing interface specialized for linear algebra operations has been one of the first achievements of the ScaLAPACK project. The Basic Linear Algebra Communications Subprograms (BLACS)  [50, 54] were thus specifically designed to facilitate the expression of the relevant communication operations. The simplicity of the BLACS interface, as well as the rigor of their specification, allows for an easy port of the entire ScaLAPACK software. Currently, the BLACS have been efficiently ported on machine-specific message-passing libraries such as the IBM (MPL)   and Intel (NX)   message-passing libraries, as well as more generic interfaces such as PVM   and MPI  . The BLACS  overhead has been shown to be negligible [54].

The BLACS interface provides the user and library designer with an appropriate level of notation. Indeed, the BLACS operate on typed two-dimensional arrays. The computational model consists of a one- or two-dimensional grid of processes, where each process stores matrices and vectors. The BLACS include synchronous send/receive routines to send a matrix or submatrix from one process to another, to broadcast submatrices, or to perform global reductions (sums, maxima and minima). Other routines establish, change, or query the process grid. The BLACS provide an adequate interface level for linear algebra communication operations.

For ease of use and flexibility, the BLACS send operation is locally blocking;  that is, the return from the send operation indicates that the resources may be reused. However, since this depends only on local information, it is unknown whether the receive operation has been called. Buffering is necessary on the sending or the receiving process. The BLACS receive operation is globally blocking . The return from the receive operation indicates that the message has been (sent and) received. On a system natively supporting globally blocking sends such as the IBM SP2 computer, nonblocking sends coupled with buffering are used to simulate locally blocking sends. This extra buffering operation may cause a slight performance degradation on those systems.

The BLACS broadcast and combine operations feature the ability of selecting different virtual network topologies. This easy-to-use built-in facility allows for the expression of various message scheduling approaches, such as a communication pipeline. This unique and distinctive BLACS characteristic is necessary for achieving the highest performance levels on distributed-memory platforms.




next up previous contents index
Next: Parallel Efficiency Up: PerformancePortability and Scalability Previous: Two-Dimensional Block Cyclic Data

Susan Blackford
Tue May 13 09:21:01 EDT 1997