P4 is a library of macros and subroutines developed at Argonne National Laboratory for programming a variety of parallel machines. The P4 system supports both the shared-memory model (based on monitors) and the distributed-memory model (using message-passing). For the shared-memory model of parallel computation, P4 provides a set of primitives from which monitors can be constructed, as well as a set of useful monitors. For the distributed-memory model, P4 provides typed send and receive operations, and creation of processes according to a text file describing group and process structure. P4 is intended to be portable, simple to install and use, and efficient. It can be used to program networks of workstations, advanced parallel supercomputers like the Intel Touchstone Delta and the Alliant Campus HiPPI-based system, and single shared-memory multiprocessors. It has currently been installed on most uniprocessor workstations, shared memory multiprocessors, and several high-performance parallel machines.
Process management in the P4 system is based on a configuration file that specifies the host pool, the object file to be executed on each machine, the number of processes to be started on each host (intended primarily for multiprocessor systems) and other auxiliary information. An example of a configuration file is
# start one slave on each of sun2 and sun3 local 0 sun2 1 /home/mylogin/p4pgms/sr_test sun3 1 /home/mylogin/p4pgms/sr_test
Two issues are noteworthy in regard to the process management mechanism in P4. First, there is the notion a ``master'' process and ``slave'' processes, and multilevel hierarchies may be formed to implement what is termed a cluster model of computation. Second, the primary mode of process creation is static, via the configuration file; dynamic process creation is possible only by a statically created process that must invoke a special P4 function that spawns a new process on the local machine. However, despite these restrictions, a variety of application paradigms may be implemented in the P4 system in a fairly straightforward manner.
Message Passing in the P4 system is achieved through the use of traditional send and recv primitives, parameterized almost exactly as other message passing systems. Several variants are provided for semantics such as heterogeneous exchange, and blocking or nonblocking transfer. A significant proportion of the burden of buffer allocation and management however, is left to the user. Apart from basic message passing, P4 also offers a variety of global operations, including broadcast, global maxima and minima, and barrier synchronization.
Shared Memory support via monitors is a facility that distinguishes P4 from other systems. However, this feature is not distributed shared memory; but rather, a portable mechanism for shared address space programming in true shared memory multiprocessors. The abstraction provided by P4 for managing data in shared memory is monitors. The specific approach taken by P4 is described in [3]. P4 provides several useful monitors (p4_barrier_t, p4_getsub_monitor_t, p4_askfor_monitor_t) as well as a general monitor type to help the user in constructing his own monitors (p4_monitor_t).
P4 also supports a variety of auxiliary and support functions, for timing purposes and for debugging. The latter set of functions are essentially printing facilities that identify the source of a debugging message, and ``levels'' of debugging are provided so that the user may control the volume of debugging information that is printed. Finally, the P4 system also contains a package (ALOG) for creating logs of time-stamped events, that is of general utility, outside of P4. The timestamps are obtained from various microsecond-level resolution timers on various machines. These log files are primarily intended for use with a separate tool termed Upshot [5] that visually depicts events and their ordering from a P4 application run.
Parmacs is a project that is closely related to the P4 effort. Essentially, Parmacs is a set of macro extensions to the P4 system developed at GMD [6]. It originated in an effort to provide Fortran interfaces to the P4 system, but is now a significantly enhanced package that provides a variety of high-level abstractions, mostly dealing with global operations. Parmacs provides macros for logically configuring a set of P4 processes; for example, the macro torus produces a suitable configuration file for use by P4 that results in a logical process configuration corresponding to a 3-d torus. Other logical topologies, including general graphs may also be implemented, and Parmacs provides macros used in conjunction with send and recv to achieve topology-specific communications within executing programs.