GBIS Benchmark Header File: qcd1


   ==================================================================
   ===                                                            ===
   ===           GENESIS Distributed Memory Benchmarks            ===
   ===                                                            ===
   ===                           QCD1                             ===
   ===                                                            ===
   ===    Monte-Carlo Simulation of the (3+1)-Dimensional Pure    ===
   ===                 SU(3) Lattice Gauge Theory                 ===
   ===                                                            ===
   ===              Author:    Eckardt Kehl                       ===
   ===              PALLAS GmbH                                   ===
   ===              Hermulheimer Str. 10                          ===
   ===              5040 Bruhl, GERMANY                           ===
   ===     tel.:+49-2232-18960   e-mail:karls@pallas-gmbh.de      ===
   ===                                                            ===
   ===     Copyright: PALLAS GmbH                                 ===
   ===                                                            ===
   ===          Last update: June 1993; Release: 2.2              ===
   ===                                                            ===
   ==================================================================


1. Description
--------------
This benchmark is based on a 'pure gluon' SU(3) lattice gauge theory 
simulation, using the Monte-Carlo Metropolis technique. It differs from
the QCD2 benchmark in that it uses the 'quenched' approximation which
neglects dynamical fermions. 

The simulation is defined on a four-dimensional lattice which is a
discrete approximations to continuum space-time. The basic variables
are 3 by 3 complex matrices. Four such matrices are associated with
every lattice site. The lattice update is performed using a multi-hit
Metropolis algorithm.

In the parallel version of the program, the lattice can be distributed
in any one or more of the four lattice directions. 


2. Operating Instructions
-------------------------

File I/O :

The distributed version reads an input file, "qcd1.dat" to determine
the required lattice size and number of processors. Further information 
on this is given below.
A permanent record of the benchmark run is saved in a file called "result". 
This contains information on the lattice size and the number of processes 
over which the problem is distributed in each lattice direction,
and some information on the physical solution for each iteration.
The information for each iteration is also output to standard output on 
channel 6 to give some idea of how the run is progressing.


Changing problem size and numbers of processes:
-----------------------------------------------

The problem is based on a 4-dimensional space-time lattice of size:  

	N = NS**3 * NT. 

For the purposes of the benchmark, NS & NT are specified as integer powers 
of 2, so that:    NS = 2**LOGNS ,  NT = 2**LOGNT

In the parallel version of the program the number of processors (NP) over 
which the lattice is distributed is determined by the input parameter LOGP,
which is the log to base 2 of the required number of processors, 
ie.  NP = 2**LOGP.

The specified number of processors is configured as a 4D grid internally 
within the program.

      NP = NPX * NPY * NPZ * NPT

Where NPX, NPY, NPZ & NPT are all powers of two, NPT >= NPZ >= NPY >= NPX.
 
The local lattice size on each processor is then: 

      n = (NS/NPX) * (NS/NPY) * (NS/NPZ) * (NT/NPT)

In the sequential version of the program the lattice size is set by 
changing the values of LOGNS & LOGNT in PARAMETER statements in the 
include file qcd1.inc

In the parallel version of the program the parameters LOGNS, LOGNT & LOGP 
are read from the input data file qcd1.dat. 
The maximum number of processes in each dimension are specified by 
PARAMETER statements in the include file `qcd1h.inc', if
any of these values (normally 4) are exceeded the program prints an error
message and the program terminates. Similarly the maximum local lattice
dimensions are specified by PARAMETER statements in the include file 
`qcd1n.inc', an error is again notified if any of these maximum dimensions
is exceeded. These maximum values can be changed by altering the PARAMETER
statements, but care must be taken not to exceed the available node memory
as a consequence.

The node memory requirement is given very approximately by the expression:

	Node Memory (Mbyte) = (NXD+2) * (NYD+2) * (NZD+2) * (NTD+2) / 1000.

	Where NXD, NYD, NZD, NTD are the maximum local lattice dimensions

To give a rough feel for the approximate node memory requirement -

If NTD =  8 & NXD = NYD = NZD = 4, the approximate node memory required for
arrays is 2.2 Mbyte,

If NXD = NYD = NZD = NTD = 8, the approximate node memory required for
arrays is 10 Mbyte.
       

Suggested Problem Sizes:
------------------------

It is recommended that the benchmark is run for four standard problem
sizes with the input parameters given in the following table:

    Problem Size        LOGNS           LOGNT
       4**3 * 16          2               4 
       8**3 * 16          3               4
      16**3 * 16          4               4
      32**3 * 16          5               4


Compiling and Running The Benchmark:
------------------------------------

1) Choose problem size and number of processes. In the sequential
   version this is done by editing PARAMETER statements in the file
   qcd1.inc. In the distributed version the problem size and number
   of processes in each dimension is set in the input data file
   qcd1.dat. Upper limits for the numbers of processes are set in the
   include file qcd1h.inc. Similarly the upper limits for the local
   lattice size are set in the file qcd1n.inc. These upper limits may
   be changed but care should be taken not to exceed the available
   node memory (see above).

2) To compile and link the benchmark type:   `make' for the distributed 
   version or `make slave' for the single-node version.

3) If any of the parameters in the include files are changed,
   the code has to be recompiled. The make-file will automatically
   send to the compiler only affected files, Type   make

4) On some systems it may be necessary to allocate the appropriate
   resources before running the benchmark, eg. on the iPSC/860
   to reserve a cube of 8 processors, type:    getcube -t8

5) To run either sequential or distributed version of the benchmark,
   type:    qcd1

   The progress of the benchmark execution can be monitored via
   the standard output, whilst a permanent copy of the benchmark
   output is written to a file called 'result'.

6) If the run is successful and a permanent record is required, the
   file 'result' should be copied to another file before the next run
   overwrites it.




Vectorization:
-------------
The program has been written completely in a vectorizable form.
The vector length equals half the lattice volume. 
The most important subroutines for vectorization are: PRO, STAPLE,
MERTRO, ADD, GATHER, SCATTER and ACCEPT.

$Id: ReadMe,v 1.2 1994/04/20 17:19:30 igl Rel igl $
High Performance Computing Centre
Submitted by Mark Papiani,
last updated on 10 Jan 1995.