BLACS Errata
[Home]
[Errata]
[MPI Errata]
[PVM Errata]
[MPL Errata]
[NX Errata]
[TESTER Errata]
STATUS SECTION:
- This file last modified on 07/10/03, to add the Intel Fortran Compiler
workaround.
- This modified 06/28/03, in order to switch to a more readable format.
- Because seeing past errors and problems can sometimes help in finding
new ones, the original BLACS Errata file can still be found
here.
- Version 1.1 release of the BLACS was on 5/01/97.
- The tester for all versions of this release is available
here.
- The BLACS are available in four versions:
- BLACS using MPI
- BLACS using PVM
- BLACS using IBM's MPL
- BLACS using Intel's NX
- There used to be a fifth BLACS version, CMMDBLACS, which died with the CM-5.
- Before asking for support, review the relavant errata section for your
BLACS version (available on the toolbar below).
- The BLACS are no longer under active development. Minimal (and unpaid)
support is provided on a volunteer basis by researchers with other jobs
to do. Try to take this into account when submitting questions.
[Home]
[Errata]
[MPI Errata]
[PVM Errata]
[MPL Errata]
[NX Errata]
[TESTER Errata]
TESTER SECTION:
[Home]
[Errata]
[MPI Errata]
[PVM Errata]
[MPL Errata]
[NX Errata]
[TESTER Errata]
Necessary flags for compiling the BLACS tester with
g77
The BLACS tester uses a large array in order to simulate dynamic memory. It
passes this array to routines that accept it as an array of differing data
types. G77 has upgraded this, in some cases, from warning to error. In order
to tell g77 to allow this behavior, change
BLACS/TESTING/Makefile line 39 from:
$(F77) $(F77NO_OPTFLAGS) -c $*.f
to:
$(F77) $(F77NO_OPTFLAGS) -fno-globals -fno-f90 -fugly-complex -w -c $*.f
If you are compiling it with Intel's Fortran compiler, the tester will hang in
determining epsilon unless you add -fp_port to F77NO_OPTFLAGS
in your Bmake.inc file.
MPIBLACS SECTION:
[Home]
[Errata]
[MPI Errata]
[PVM Errata]
[MPL Errata]
[NX Errata]
[TESTER Errata]
New TRANSCOMM settings available in patch
MPI-2 provides a standard way to translate communicators between C and
Fortran77. If your MPI implements these routines, set
TRANSCOMM to -DUseMpi2.
We have reports that the newer versions of LAM-MPI use this setting.
This error last confirmed in MPICH 1.0.13 and MPICH 1.1. MPI_Abort
does not kill any other processes at all, but seems to behave pretty much
like calling a local exit(). This will cause the BLACS tester to
hang on the BLACS_ABORT test in the auxiliary test. Here is straight
MPI code demonstrating the error:
#include
#include "mpi.h"
main(int narg, char **args)
{
int i, Iam, Np;
MPI_Init(&narg, &args);
MPI_Comm_size(MPI_COMM_WORLD, &Np);
MPI_Comm_rank(MPI_COMM_WORLD, &Iam);
if (Iam == Np-1) MPI_Abort(MPI_COMM_WORLD, -2);
while(1);
MPI_Finalize();
}
There is a undiagnosed problem that causes some users' dwalltime00
routine to return bad values. It appears likely that there is a problem with
macro name overruns, but errors in cpp or the code have not been ruled out.
If you get bad return values from dwalltime00, overwrite
BLACS/SRC/MPI/dwalltime00_.c with:
#include "Bdef.h"
#if (INTFACE == C_CALL)
double Cdwalltime00(void)
#else
F_DOUBLE_FUNC dwalltime00_(void)
#endif
{
return(MPI_Wtime());
}
User's of Sun's f77 compilers may need to throw the -f
flag to force 8-byte double precision scalar alignment, which
gcc-compiled BLACS expect. Therefore, add -f to the
NOPT macro in SLmake.inc and to the
F77NO_OPTFLAGS in Bmake.inc.
NOTE: this is an old entry, and my no longer be needed.
mpt.1.2.0.0.6beta couldn't handle 0-length segments used with
MPI_Type_indexed. To work around this problem, throw the
T3ETrError flag in your Bmake.inc of patched MPIBLACS
(as shown in the example Bmake.T3E supplied with the patch).
NOTE: this is an old entry, and my no longer be needed.
mpt.1.2.0.0.6beta couldn't handle certain reductions where you mix types
iwth a MPI data type. To work around this problem, apply the patch and throw
the T3EReductErr flag in your Bmake.inc
(as shown in the example Bmake.T3E supplied with the patch).
NOTE: this is an old entry, and my no longer be needed.
PVMBLACS SECTION:
[Home]
[Errata]
[MPI Errata]
[PVM Errata]
[MPL Errata]
[NX Errata]
[TESTER Errata]
PVM3.3.11 SUNMP broken.
I have no idea if this has been fixed or not. With PVM 3.3.11, your best
bet was to rig your PVM_ARCH so that it things it is a
SUN4SOL2.
This appears to be a compiler problem with including files within the
brackets of a routine. Must include system files before starting scope of the
routine. Therefore, in BLACS/SRC/PVM/blacs_setup_.c, move line:
#include "string.h"
to second line of file (ie., after #include "Bdef.h").
SGI5 compiler does not rename.
The compiler does not accept the -o (renaming option) if optimization is turned
on. This breaks the compilation of the C interface. Bmake.PVM-SGI5 defaults
to using gcc. If you can't use gcc, you may be able to do a workaround like
the following in BLACS/SRC/PVM/Makefile:
Line 166 of original Makefile:
.SUFFIXES: .o .C
.c.C:
$(CC) -c $(CCFLAGS) -o C$*.o $(BLACSDEFS) -DCallFromC $<
mv C$*.o $*.C
SGI error workaround:
.SUFFIXES: .o .C
.c.C:
ln -s $*.c C$*.c
$(CC) -c $(CCFLAGS) $(BLACSDEFS) -DCallFromC C$*.c
mv C$*.o $*.C
rm -f C$*.c
MPLBLACS SECTION:
[Home]
[Errata]
[MPI Errata]
[PVM Errata]
[MPL Errata]
[NX Errata]
[TESTER Errata]
MP_BRECV ordering error
It appears that MP_BRECV requires that messages be received in the order they
were sent, even if all messages have been successfully sent. IBM has reported
that this is not an error, but rather perhaps an oversight in documentation.
MPL does not support receiving messages any any order except that which they
are sent. Here is a small routine showing the problem:
program tst
integer k, iam, Np, ictxt, i, j
call mpc_environ(Np, Iam);
k = Iam + 100
print*,'start'
if (iam.eq.1) then
call mp_send(Iam, 4, 0, 2, i)
call mp_send(k, 4, 0, 3, j)
print*,mp_status(i)
print*,mp_status(j)
else if (iam .eq. 0) then
call mp_brecv(k, 4, 1, 3, j)
call mp_brecv(k, 4, 1, 2, j)
end if
print*,'done'
stop
end
When this is run, the output is:
xtst2 -procs 2
start
start
4
4
done
So both sends complete, but the receives still hang.
NXBLACS SECTION:
[Home]
[Errata]
[MPI Errata]
[PVM Errata]
[MPL Errata]
[NX Errata]
[TESTER Errata]
The NXBLACS use a copy optimization which is, according to strict IEEE
arithmetic rules, illegal. More precisely, doubles are sometimes used to
copy floats or integers. At implementation time, the author tested all
available NX platforms, found no errors, so put the optimization in even
thought it was known to be illegal. Unfortunately, on more recent platforms
(i.e., ASCI red with newest MPI) this causes problems. So, if you get
mysterious errors in the tester, this may what's happening. To prevent the
BLACS from applying this illegal optimization, delete the following lines in
BLACS/SRC/NX/INTERNAL/mvcopy4.c:
long iaddr;
iaddr = (long) A;
/*
* If address is on a 8 byte boundary, and lda and m are evenly divisible by 2,
* can use double sized pointers for faster packing
*/
if ( !(iaddr % 8) && !(lda % 2) && !(m % 2) )
mvcopy8(m/2, n, (double *) A, lda/2, (double *) buff);
/*
* Otherwise, must use 4 byte packing
*/
else
You also need to delete basically the same lines from
BLACS/SRC/NX/INTERNAL/vmcopy4.c:
long iaddr;
iaddr = (long) A;
/*
* If address is on a 8 byte boundary, and lda and m are evenly divisible by 2,
* can use double sized pointers for faster packing
*/
if ( !(iaddr % 8) && !(lda % 2) && !(m % 2) )
vmcopy8(m/2, n, (double *) A, lda/2, (double *) buff);
/*
* Otherwise, must use 4 byte packing
*/
else