next up previous contents index
Next: Local Storage Scheme for Up: In-Core Narrow Band and Previous: The Block Column and

The Block Mapping

 

The one-dimensional distribution scheme is a mapping of a set of blocks onto the processes. The previous section informally described this mapping as well as some of its properties. To be complete, we shall describe the precise mapping that associates to a matrix entry identified by its global indexes the coordinates of the process that owns it and its local position within that process's memory.

Suppose we have a two dimensional array A of size tex2html_wrap_inline15127 to be distributed on a tex2html_wrap_inline15088 process grid in a block-column fashion. By convention, the array columns are numbered 1 through N and the processes are numbered 0 through P-1. First, the array is divided into contiguous blocks of NB columns with tex2html_wrap_inline15090. When NB does not divide N evenly, the last block of columns will only contain tex2html_wrap_inline14699 columns instead of NB. By convention, these blocks are numbered starting from zero and dealt out to the processes. In other words, if we assume that the process 0 receives the first block, the tex2html_wrap_inline15153 block is assigned to the process of coordinate (0,p). The mapping of a column of the array globally indexed by J is defined by the following analytical equation:
displaymath15123
where J is a global column index in the array, p is the column coordinate of the process owning that column, and finally x is the column coordinate within that block of columns where the global array column of index J is to be found. It is then fairly easy to establish the analytical relationship between these variables. One obtains:
 equation2708
These equations allow to determine the local information, i.e. the local index x as well as the process column coordinate p corresponding to a global column identified by its global index J and conversely. Table 4.9 illustrates this mapping layout when P=2 and N=16 and NB=8. At most one block is assigned to each process.

  table2713
Table 4.9: One-dimensional block-column mapping example for P=2 and N=16

This example of the one-dimensional block-column distribution mapping    can be expressed in HPF  by using the following statements:

      REAL :: A( M, N )
!HPF$ PROCESSORS PROC( 1, P )
!HPF$ DISTRIBUTE A( *, BLOCK( NB ) ) ONTO PROC

A similar example of block-row distribution    can easily be constructed. For an tex2html_wrap_inline15189 array B, such an example can be expressed in HPF  by using the following statements:

      REAL :: B( N, NRHS )
!HPF$ PROCESSORS PROC( P, 1 )
!HPF$ DISTRIBUTE B( BLOCK( NB ), * ) ONTO PROC

There is in fact no real reason to always deal out the blocks starting with the process 0. In fact, it is sometimes useful to start the data distribution with the process of arbitrary coordinate SRC, in which case Equation 4.3 becomes:
 equation2729

Table 4.10 illustrates Equation 4.4 for the block-cyclic layout when tex2html_wrap_inline14791, tex2html_wrap_inline14793, tex2html_wrap_inline14795 and tex2html_wrap_inline14797.

  table2738
Table 4.10: One-dimensional block-column mapping for P=2, SRC=1, N=16 and NB=8

This example of the one-dimensional block-column    distribution mapping can be expressed in HPF  by using the following statements:

      REAL :: A( M, N )
!HPF$ PROCESSORS PROC( 1, P )
!HPF$ TEMPLATE T( M, N + P*NB )
!HPF$ DISTRIBUTE T( *, BLOCK( NB ) ) ONTO PROC
!HPF$ ALIGN A( I, J ) WITH T( I, SRC*NB + J )

A similar example of block-row distribution     can easily be constructed. For an tex2html_wrap_inline15189 array B, such an example can be expressed in HPF  by using the following statements:

      REAL :: B( N, NRHS )
!HPF$ PROCESSORS PROC( P, 1 )
!HPF$ TEMPLATE T( N + P*NB, NRHS )
!HPF$ DISTRIBUTE T( BLOCK( NB ), * ) ONTO PROC
!HPF$ ALIGN A( I, J ) WITH T( SRC*NB + I, J )

In ScaLAPACK, the local storage convention of the one-dimensional block distributed matrix in every process's memory is assumed to be Fortran-like, that is, ``column major'' .

Determining the number of rows or columns     of a global band matrix that a specific process receives is an essential task for the user. The notation LOCtex2html_wrap_inline12112() is used for block-row distributions and LOCtex2html_wrap_inline12114() is used for block-column distributions. These local quantities occur throughout the leading comments of the source code, and are reflected in the sample argument description in section 4.4.7.

For block distribution, a matrix can be distributed unevenly. More specifically, one process in the process grid can receive an array that is smaller than other processes. It is also possible that some processes receive no data. For further information on one-dimensional block-column or block-row data distribution, please refer to section 4.4.1.

Block-Column Distribution: LOCtex2html_wrap_inline12114(N_A) denotes the number of columns that a process would receive if N_A columns of a matrix is distributed over tex2html_wrap_inline12162 columns of its process row.

For example, let us assume that the coefficient matrix A is band symmetric of order N and has been block-column distributed on a tex2html_wrap_inline15233 process grid.

In the ideal case where the matrix is evenly distributed to all processes in the process grid, tex2html_wrap_inline15235 and tex2html_wrap_inline15237. Thus, each process receives a block of size tex2html_wrap_inline14839 of the matrix A. Therefore,

LOCtex2html_wrap_inline12114(N_A) = NB_A.

However, if tex2html_wrap_inline15245, at least one of the processes in the process grid will receive a block of size smaller than tex2html_wrap_inline14839. Thus,

 if ( tex2html_wrap_inline15245 and tex2html_wrap_inline15251 ) then

processes (0,0), ... , (0,K-1) receive

LOCtex2html_wrap_inline12114(N_A) = NB_A

and process (0,K) receives

LOCtex2html_wrap_inline12114(N_A) = N_A - K tex2html_wrap_inline15263 NB_A.

if tex2html_wrap_inline15265 then processes tex2html_wrap_inline15267 do not receive any data.

end if

Block-Row Distribution: LOCtex2html_wrap_inline12112(M_B) denotes the number of rows that a process would receive if M_B rows of a matrix is distributed over tex2html_wrap_inline12172 rows of its process column.

Let us assume that the N-by-NRHS right-hand-side matrix B has been block-row distributed on a tex2html_wrap_inline15275 process grid.

In the ideal case where the matrix is evenly distributed to all processes in the process grid, tex2html_wrap_inline15277 and tex2html_wrap_inline15279. Thus, each process receives a block of size tex2html_wrap_inline15281 of the matrix B. Therefore,

LOCtex2html_wrap_inline12112(M_B) = MB_B.

However, if tex2html_wrap_inline15287, then at least one of the processes in the process grid will receive a block of size smaller than tex2html_wrap_inline15281. Thus,

 if ( tex2html_wrap_inline15287 and tex2html_wrap_inline15293 ) then

processes t#tex2html_wrap_inline15295# receive

LOCtex2html_wrap_inline12112(M_B) = MB_B

and process (K,0) receives

LOCtex2html_wrap_inline12112(M_B) = M_B - K tex2html_wrap_inline15263 MB_B.

if tex2html_wrap_inline15307 then processes tex2html_wrap_inline15309 do not receive any data.

end if


next up previous contents index
Next: Local Storage Scheme for Up: In-Core Narrow Band and Previous: The Block Column and

Susan Blackford
Tue May 13 09:21:01 EDT 1997