In order to call a PBLAS routine, it is necessary to initialize the BLACS and create the process grid. This can be done by calling the routine BLACS_GRIDINIT (see [14] for more details). The following segment of code will arrange four processes into a 22 process grid. When running on platforms such as PVM [20], where the number of computational nodes available is unknown a priori, it is necessary to call the routine BLACS_SETUP, so that copies (3 in our example) of the main program can be spawned on the virtual machine. Finally, in order to ensure a safe coexistence with other parallel libraries using a distinct message passing layer, such as MPI [17], the BLACS routine BLACS_GET queries for an eventual system context (see [14] for more details).
INTEGER IAM, ICTXT, NPROCS * * (...) * CALL BLACS_PINFO( IAM, NPROCS ) * IF( NPROCS.LT.1 ) THEN NPROCS = 4 CALL BLACS_SETUP( IAM, NPROCS ) END IF * CALL BLACS_GET( -1, 0, ICTXT ) CALL BLACS_GRIDINIT( ICTXT, 'Row-major', 2, 2 ) * * (...) *
Moreover, to convey the data distribution information to the PBLAS, the descriptor of the matrix operands should be set. The ScaLAPACK library contains a tool routine called DESCINIT for that purpose. This routine takes as arguments the 8-integer (descriptor) array to be initialized, as well as the 8 entries to be used. Finally, an error flag is set on output to detect if an incoherent descriptor entry is passed to this routine. DESCINIT should be called by every process in the grid.
We present in the following code fragment the descriptor initialization phase as well as a call to a PBLAS routine. This sample program performs the matrix multiplication: .
This example program is to be run on four processes arranged in a 22 process grid. The matrices and are 55 matrices partitioned into 22 blocks. We choose the process of coordinates to be the owner of the first entries of the matrices and . The mapping of these matrices is identical to the example of Fig. 1 given in Sect. 3.2.
INTEGER INFO, NMAX, LDA, LDB, LDC, NMAX PARAMETER ( NMAX = 3, LDA = NMAX, LDB = NMAX, LDC = NMAX ) * INTEGER DESCA( 8 ), DESCB( 8 ), DESCC( 8 ) DOUBLE PRECISION A( NMAX, NMAX ), B( NMAX, NMAX ), C( NMAX, NMAX ) * * (...) * * Initialize the array descriptors for the matrices A, B and C * CALL DESCINIT( DESCA, 5, 5, 2, 2, 0, 0, ICTXT, LDA, INFO ) CALL DESCINIT( DESCB, 5, 5, 2, 2, 0, 0, ICTXT, LDB, INFO ) CALL DESCINIT( DESCC, 5, 5, 2, 2, 0, 0, ICTXT, LDC, INFO ) * * (...) * CALL PDGEMM( 'No transpose', 'No transpose', 4, 4, 4, 1.0D+0, $ A, 1, 1, DESCA, B, 1, 1, DESCB, 0.0D+0, $ C, 1, 1, DESCC ) * * (...) *
Finally, it is recommended to release the resources allocated by the BLACS and the PBLAS just before ending the program segment using the BLACS and the PBLAS. Note that the routine BLACS_GRIDEXIT will free the resources associated with a particular context, while the routine BLACS_EXIT will free all BLACS resources (see [14] for more details).
CALL PBFREEBUF() * CALL BLACS_GRIDEXIT( ICTXT ) * CALL BLACS_EXIT( 0 )