Next: The ch_shmem and ch_lfshmem
Up: MPICH
Previous: Problems.
This is the ``network of workstations'' implementation of MPICH. P4
(Portable Programs for Parallel Processors) is an older message
passing library that was used to implement the MPICH ADI[9]. The ``ch''
in ``ch_p4'' stands for ``channel.'' The ADI is in fact implemented
in terms of a simpler ``channel'' interface, and the channel interface
is implemented in terms of P4. The layering is not strict.
The ch_p4 device is characterized by the following.
- P4 runs on Sun/SunOS, Sun/Solaris, Solaris86, Cray, HP, Dec 5000,
Dec Alpha, Next, IBM RS6000, Linux86, FreeBSD, IBM3090, SGI (5, 6), and others.
- The device uses process-to-process sockets, for processes not on
the same host, or shared memory (using the ``-comm=shared''
configuration flag), for processes on the same host.
- The user provides a list of programs and machines to start them
on in a P4 ``procgroup'' file. P4 starts remote processes using
rsh (or optionally, using a ``secure server'' that provides faster
startup). I/O and signal propagation are handled by rsh. P4 processes
start a ``listener'' subprocess that helps to establish
process-to-process connections if there aren't enough TCP connections
to fully connect the MPI application.
- An interesting feature of the ch_p4 device is that the user
starts up a single process, and that process starts the other MPI
processes inside MPI_Init. To do this, ch_p4 relies on the
argc and argv arguments to MPI_Init.
- The ch_p4 device handles heterogeneous MPI applications --
applications with processes running on more than one
architecture. Data representation conversion, if needed, is
automatically performed.
While the ch_p4 device provides a way to run MPICH on networks
of workstations, it is not very friendly to users.
- The procgroup file is difficult to work with and the
documentation is not easy to find. Fortunately the complexity is often
hidden behind local utilities or an ``mpirun'' command.
- There is no concept of a ``virtual machine.'' Unlike PVM, the
network of workstations used by an application is defined by where the
application is running, not by an infrastructure that exists before
and persists afterwards. Consequently, there are no ``ps'' or ``kill''
equivalents that understand parallel jobs, and no automatic way to
examine the state of remote nodes or perform load balancing. The lack
of such infrastructure also contributes to the signal propagation and
I/O problems described below. In some cases, the lack of machine
state is a bonus, particularly when MPI programs are started
automatically by a batch system.
- Because signal propagation is managed through rsh, it is
very easy to end up with ``orphaned'' processes that don't realize the
rest of an application has gone away. These orphaned processes often
interfere with the running of subsequent parallel jobs and are
difficult to find.
- Because standard I/O relies on rsh, output from remote nodes is
often heavily buffered, and doesn't appear on the screen until well after
it is written. This can make debugging with printf very difficult.
Next: The ch_shmem and ch_lfshmem
Up: MPICH
Previous: Problems.
Jack Dongarra
Sun Nov 9 14:03:51 EST 1997