The order of the arguments of a PBLAS routine is as follows:
Note that every category is not present in each of the routines. The arguments that specify options are character arguments with the names SIDE, TRANS, TRANSA, TRANSB, UPLO and DIAG.
SIDE is used by the routines as follows:
TRANS, TRANSA and TRANSB are used by the routines as follows:
In the real case the values `T' and `C' have the same meaning, and in the complex case the value `T' is not allowed.
UPLO is used by the Hermitian, symmetric, and triangular distributed matrix routines to specify whether the upper or lower triangle is being referenced as follows:
DIAG is used by the triangular distributed matrix routines to specify whether or not the distributed matrix is unit triangular, as follows:
When DIAG is supplied as `U' the diagonal elements are not referenced.
Thus, these arguments have similar values and meanings as for the BLAS; TRANSA and TRANSB have the same values and meanings as TRANS, where TRANSA and TRANSB apply to the distributed matrix operands A and B respectively. We recommend that the equivalent lower case characters be accepted with the same meaning.
The distributed submatrix operands of the Level 3 PBLAS are determined by the arguments M, N and K, which specify their size. These numbers may differ from the two first entries of the descriptor (M_ and N_), which specifies the size of the distributed matrix containing the submatrix operand. Also required are the global starting indices IA, JA, IB, JB, IC and JC. It is permissible to call a routine with M or N equal to zero, in which case the routine exits immediately without referencing its distributed matrix arguments. If M and N are greater than zero, but K is equal to zero, the operation reduces to C(IC:*,JC:*)C(IC:*,JC:*) (this applies to the GEMM, SYRK, SYR2K, HERK and HER2K routines). The input-output distributed submatrix (B(IB:*,JB:*) for the TR-routines, C(IC:*,JC:*) otherwise) is always M N if rectangular, or NN if square.
The description of the distributed matrix operands consists of
The description of a distributed vector operand is similar to the description of a distributed matrix (X, IX, JX, DESCX) followed by a global increment INCX, which allows the selection of a matrix row or a matrix column as a vector operand. Only two increment values are currently supported by our model implementation, namely to select a matrix column and DESCX(1) (i.e INCX=MX) specifying a matrix row.
The input scalars always have the dummy argument names ALPHA and BETA. Output scalars are only present in the Level 1 PBLAS and are called AMAX, ASUM, DOT, INDX and NORM2.
We use the description of two distributed matrix operands X and Y to describe the invalid values of the arguments:
If a routine is called with an invalid value for any of its arguments, then it must report the fact and terminate the execution of the program. In the model implementation, each routine, on detecting an error, calls a common error-handling routine PBERROR(), passing to it the current BLACS context, the name of the routine and the number of the first argument that is in error. If an error is detected in the j-th entry of a descriptor array, which is the i-th argument in the parameter list, the number passed to PBERROR() has been arbitrarily chosen to be . This allows the user to distinguish an error on a descriptor entry from an error on a scalar argument. For efficiency purposes, the PBLAS routines only perform a local validity check of their argument list. If an error is detected in at least one process of the current context, the program execution is stopped.
A global validity check of the input arguments passed to a PBLAS routine must be performed in the higher-level calling procedure. To demonstrate the need and cost of global checking, as well as the reason why this type of checking is not performed in the PBLAS, consider the following example: the value of a global input argument is legal but differs from one process to another. The results are unpredictable. In order to detect this kind of error situation, a synchronization point would be necessary, which may result in a significant performance degradation. Since every process must call the same routine to perform the desired operation successfully, it is natural and safe to restrict somewhat the amount of checking operations performed in the PBLAS routines.
Specialized implementations may call system-specific exception-handling facilities, either via an auxiliary routine PBERROR or directly from the routine. In addition, the testing programs can take advantage of this exception-handling mechanism by simulating specific erroneous input argument lists and then verifying that particular errors are correctly detected.