BLACS Errata

[Home] [Errata] [MPI Errata] [PVM Errata] [MPL Errata] [NX Errata] [TESTER Errata]

STATUS SECTION:

This file last modified on 07/10/03, to add the Intel Fortran Compiler workaround.
This modified 06/28/03, in order to switch to a more readable format.
Because seeing past errors and problems can sometimes help in finding new ones, the original BLACS Errata file can still be found here.
Version 1.1 release of the BLACS was on 5/01/97.
The tester for all versions of this release is available here.
The BLACS are available in four versions:
1. BLACS using MPI
  - MPI users need to apply the MPIBLACS Patch.
2. BLACS using PVM
3. BLACS using IBM's MPL
4. BLACS using Intel's NX
5. There used to be a fifth BLACS version, CMMDBLACS, which died with the CM-5.
Before asking for support, review the relavant errata section for your BLACS version (available on the toolbar below).
The BLACS are no longer under active development. Minimal (and unpaid) support is provided on a volunteer basis by researchers with other jobs to do. Try to take this into account when submitting questions.

[Home] [Errata] [MPI Errata] [PVM Errata] [MPL Errata] [NX Errata] [TESTER Errata]

TESTER SECTION:

Necessary flags for compiling the BLACS tester with g77
Necessary flags for compiling the BLACS tester with Intel Fortran compiler

[Home] [Errata] [MPI Errata] [PVM Errata] [MPL Errata] [NX Errata] [TESTER Errata]

Necessary flags for compiling the BLACS tester with g77

The BLACS tester uses a large array in order to simulate dynamic memory. It passes this array to routines that accept it as an array of differing data types. G77 has upgraded this, in some cases, from warning to error. In order to tell g77 to allow this behavior, change BLACS/TESTING/Makefile line 39 from:

        $(F77) $(F77NO_OPTFLAGS) -c $*.f

to:

        $(F77) $(F77NO_OPTFLAGS) -fno-globals -fno-f90 -fugly-complex -w -c $*.f

Flags necessary to compile the BLACS tester with Intel's Fortran compiler

If you are compiling it with Intel's Fortran compiler, the tester will hang in determining epsilon unless you add -fp_port to F77NO_OPTFLAGS in your Bmake.inc file.

MPIBLACS SECTION:

All users should scope the TESTER errata.
MPI-2 provides new TRANSCOMM settings (used by LAM-MPI).
Error in many implementations of MPI_Abort.
Problems compiling dwalltime00
Possible flag mismatch between gcc and Sun f77.
Old T3E errors:
1. T3E MPI error in handling zero-length segments
2. T3E MPI error in handling mixed-type reductions

[Home] [Errata] [MPI Errata] [PVM Errata] [MPL Errata] [NX Errata] [TESTER Errata]

New TRANSCOMM settings available in patch

MPI-2 provides a standard way to translate communicators between C and Fortran77. If your MPI implements these routines, set TRANSCOMM to -DUseMpi2. We have reports that the newer versions of LAM-MPI use this setting.

Error in most MPI implementations of `MPI_Abort`.

This error last confirmed in MPICH 1.0.13 and MPICH 1.1. MPI_Abort does not kill any other processes at all, but seems to behave pretty much like calling a local exit(). This will cause the BLACS tester to hang on the BLACS_ABORT test in the auxiliary test. Here is straight MPI code demonstrating the error:

#include 
#include "mpi.h"
main(int narg, char **args)
{
   int i, Iam, Np;

   MPI_Init(&narg, &args);
   MPI_Comm_size(MPI_COMM_WORLD, &Np);
   MPI_Comm_rank(MPI_COMM_WORLD, &Iam);
   if (Iam == Np-1) MPI_Abort(MPI_COMM_WORLD, -2);
   while(1);
   MPI_Finalize();
}

Problems compiling `dwalltime00`

There is a undiagnosed problem that causes some users' dwalltime00 routine to return bad values. It appears likely that there is a problem with macro name overruns, but errors in cpp or the code have not been ruled out. If you get bad return values from dwalltime00, overwrite BLACS/SRC/MPI/dwalltime00_.c with:

#include "Bdef.h"

#if (INTFACE == C_CALL)
double Cdwalltime00(void)
#else
F_DOUBLE_FUNC dwalltime00_(void)
#endif
{
   return(MPI_Wtime());
}

Sun f77 and gcc compiler mismatch.

User's of Sun's f77 compilers may need to throw the -f flag to force 8-byte double precision scalar alignment, which gcc-compiled BLACS expect. Therefore, add -f to the NOPT macro in SLmake.inc and to the F77NO_OPTFLAGS in Bmake.inc. NOTE: this is an old entry, and my no longer be needed.

T3E MPI error in handling zero-length segments

mpt.1.2.0.0.6beta couldn't handle 0-length segments used with MPI_Type_indexed. To work around this problem, throw the T3ETrError flag in your Bmake.inc of patched MPIBLACS (as shown in the example Bmake.T3E supplied with the patch). NOTE: this is an old entry, and my no longer be needed.

T3E MPI error in handling mixed types

mpt.1.2.0.0.6beta couldn't handle certain reductions where you mix types iwth a MPI data type. To work around this problem, apply the patch and throw the T3EReductErr flag in your Bmake.inc (as shown in the example Bmake.T3E supplied with the patch). NOTE: this is an old entry, and my no longer be needed.

PVMBLACS SECTION:

All users should scope the TESTER errata.
PVM3.3.11 SUNMP broken.
Include file scoping problem.
SGI5 compiler does not rename.

[Home] [Errata] [MPI Errata] [PVM Errata] [MPL Errata] [NX Errata] [TESTER Errata]

PVM3.3.11 SUNMP broken.

I have no idea if this has been fixed or not. With PVM 3.3.11, your best bet was to rig your PVM_ARCH so that it things it is a SUN4SOL2.

Include file scoping problem.

This appears to be a compiler problem with including files within the brackets of a routine. Must include system files before starting scope of the routine. Therefore, in BLACS/SRC/PVM/blacs_setup_.c, move line:

#include "string.h"

to second line of file (ie., after #include "Bdef.h").

SGI5 compiler does not rename.

The compiler does not accept the -o (renaming option) if optimization is turned on. This breaks the compilation of the C interface. Bmake.PVM-SGI5 defaults to using gcc. If you can't use gcc, you may be able to do a workaround like the following in BLACS/SRC/PVM/Makefile: Line 166 of original Makefile:

.SUFFIXES: .o .C
.c.C:
        $(CC) -c $(CCFLAGS) -o C$*.o $(BLACSDEFS) -DCallFromC $<
        mv C$*.o $*.C

SGI error workaround:

.SUFFIXES: .o .C
.c.C:
        ln -s $*.c C$*.c
        $(CC) -c $(CCFLAGS) $(BLACSDEFS) -DCallFromC C$*.c
        mv C$*.o $*.C
        rm -f C$*.c

MPLBLACS SECTION:

All users should scope the TESTER errata.
MP_BRECV ordering error

[Home] [Errata] [MPI Errata] [PVM Errata] [MPL Errata] [NX Errata] [TESTER Errata]

MP_BRECV ordering error

It appears that MP_BRECV requires that messages be received in the order they were sent, even if all messages have been successfully sent. IBM has reported that this is not an error, but rather perhaps an oversight in documentation. MPL does not support receiving messages any any order except that which they are sent. Here is a small routine showing the problem:

      program tst
      integer k, iam, Np, ictxt, i, j

      call mpc_environ(Np, Iam);
      k = Iam + 100
      print*,'start'
      if (iam.eq.1) then
         call mp_send(Iam, 4, 0, 2, i)
         call mp_send(k,   4, 0, 3, j)
         print*,mp_status(i)
         print*,mp_status(j)
      else if (iam .eq. 0) then
         call mp_brecv(k, 4, 1, 3, j)
         call mp_brecv(k, 4, 1, 2, j)
      end if
      print*,'done'

      stop
      end

When this is run, the output is:
xtst2 -procs 2
 start
 start
 4
 4
 done

So both sends complete, but the receives still hang.

NXBLACS SECTION:

All users should scope the TESTER errata.
Illegal copy in NXBLACS.

[Home] [Errata] [MPI Errata] [PVM Errata] [MPL Errata] [NX Errata] [TESTER Errata]

Illegal copy in NXBLACS

The NXBLACS use a copy optimization which is, according to strict IEEE arithmetic rules, illegal. More precisely, doubles are sometimes used to copy floats or integers. At implementation time, the author tested all available NX platforms, found no errors, so put the optimization in even thought it was known to be illegal. Unfortunately, on more recent platforms (i.e., ASCI red with newest MPI) this causes problems. So, if you get mysterious errors in the tester, this may what's happening. To prevent the BLACS from applying this illegal optimization, delete the following lines in BLACS/SRC/NX/INTERNAL/mvcopy4.c:

   long iaddr;

   iaddr = (long) A;
/*
 * If address is on a 8 byte boundary, and lda and m are evenly divisible by 2,
 * can use double sized pointers for faster packing
 */
   if ( !(iaddr % 8) && !(lda % 2) && !(m % 2) )
      mvcopy8(m/2, n, (double *) A, lda/2, (double *) buff);
/*
 * Otherwise, must use 4 byte packing
 */
   else

You also need to delete basically the same lines from BLACS/SRC/NX/INTERNAL/vmcopy4.c:

   long iaddr;

   iaddr = (long) A;

/*
 * If address is on a 8 byte boundary, and lda and m are evenly divisible by 2,
 * can use double sized pointers for faster packing
 */
   if ( !(iaddr % 8) && !(lda % 2) && !(m % 2) )
      vmcopy8(m/2, n, (double *) A, lda/2, (double *) buff);
/*
 * Otherwise, must use 4 byte packing
 */
   else