It is well known [26][13][4] that certain algorithms based on a two-dimensional block-cyclic data distribution scheme become more efficient and scalable when appropriate communication topologies are used for the broadcast and global combine operations [26][13][4]. For example, pipelining the broadcast operation along the rows of the process grid improves the efficiency and scalability of the LU factorization algorithm [13][4]. The BLACS topologies allow the user to optimize communication patterns for these particular operations. A default topology can also be selected. The list of BLACS topologies as well as the different possible scopes are documented in [14]. In order to set this low level information, the PBLAS provide two routines having the following FORTRAN 77 interface:
SUBROUTINE PTOPSET( ICTXT, OP, SCOPE, TOP ) SUBROUTINE PTOPGET( ICTXT, OP, SCOPE, TOP ) INTEGER ICTXT CHARACTER*1 OP, SCOPE, TOP
PTOPSET assigns the BLACS topology [14] TOP to be used in the communication operations OP along the scope specified by SCOPE. PTOPGET returns the BLACS topology TOP used in the communication operations OP along the scope specified by SCOPE. Application examples of these routines are given in appendix B. The BLACS provide broadcast (OP=`B') and global combine (OP=`C') operations to which different topologies are associated. The scope refers to the group of processes involved in such a BLACS operation. It indicates whether a process row (SCOPE=`R'), process column (SCOPE=`C'), or the entire grid (SCOPE=`A') will participate in these operations.
In addition, the PBLAS provide a subroutine to dispose of the PBLAS buffer allocated in every process's dynamic memory. Its FORTRAN 77 interface is:
SUBROUTINE PBFREEBUF()