The way in which a matrix is distributed over the processes
has a major impact on the load balance and
communication characteristics of the concurrent algorithm, and hence
largely determines its performance and scalability.
The block cyclic distribution provides a simple, yet
general-purpose way of distributing a block-partitioned matrix on
distributed memory concurrent computers.
The block cyclic data distribution is parameterized by the four
numbers ,
,
, and
, where
is the
process grid and
is the block size.
Blocks separated by a fixed stride in the column and row
directions are assigned to the same process.
Suppose we have objects indexed by the integers
.
In the block cyclic data distribution the mapping of the global index,
,
can be expressed as
,
where
is the logical
process number,
is the block number in process
, and
is the index within block
to which
is mapped.
Thus, if the number of data objects in a block is
,
the block cyclic data distribution may be written as follows:
where and
is the number of processes.
The distribution of a block-partitioned matrix can be regarded as the
tensor product of two such mappings: one that distributes
the rows of the matrix over
processes, and another that distributes
the columns over
processes. That is, the matrix element
indexed globally by
can be written as
Figure 1: Example of a block cyclic data distribution.
Figure 1 (a) shows an example of the block cyclic data
distribution, where a matrix with blocks
is distributed over a
grid.
The numbered squares represent blocks of elements, and the number indicates
at which location in the process grid the block is stored - all blocks
labeled with the same number are stored in the same process.
The slanted numbers, on the left and on the top of the matrix,
represent indices of a row of blocks and of a column of blocks, respectively.
Figure 1 (b) reflects the distribution from a process
point-of-view. Each process has
blocks.
The block cyclic data distribution is the only distribution supported by
the ScaLAPACK routines. The block cyclic data distribution can reproduce
most data distributions used in linear algebra computations.
For example, one-dimensional distributions over rows or columns
are obtained by choosing
or
to be 1.
The non-scattered decomposition (or pure block distribution) is just a
special case of the cyclic distribution in which the block size is given by
and
.
That is,
Similarly a purely scattered decomposition (or two dimensional wrapped
distribution) is another special case in which the block size is given
by ,