Report from the First PVM Users' Group Meeting

   The First PVM Users' Group Meeting was held at Knoxville
Tennessee on May 10-11, 1993.  The meeting was organized by the 
PVM Team. In particular, Adam Beguelin, Jack Dongarra, Al Geist, 
Bob Manchek, and Vaidy Sunderam worked hard to ensure its success. 
The meeting was sponsered by the University of Tennessee and the 
Department of Energy with support from the NSF Science and Technology 
Center for Research on Parallel Computation, Convex Computer Corporation, 
Cray Research, Inc., Digital Equipment Corporation, and Intel 
Supercomputer Systems Division was well appreciated. The PVM Users' 
Group Meeting was well attended, with 41 speakers and over 100 
participants from 7 countries.

Executive Summary
-----------------

 o PVM is in very wide use, world-wide - it is a "de facto" standard
 o A number of machine vendors are providing native or optimized versions
 o This first meeting had 107 enthusiastic users of PVM registered
 o A cottage industry of related software has grown up around PVM

Some Meeting Highlights
-----------------------

  The meeting had been well-planned to contain a variety of contributed
and solicited talks, time for discussion, and social events.  Some of the
talks were given in parallel sessions but these were kept on schedule so
that people could move back and forth.  The audience was made up mainly
by systems types, judging by the somewhat stronger attendance at system 
software talks verses numerical and applications talks.  
In his welcoming remarks Jack Dongarra announced that 9000 copies of PVM 
had been retrieved from netlib since the first release of PVM.

  The vendors were well represented there; interesting talks were given by 
Cray and Convex.  Peter Rigsbee from Cray described the use of PVM as the 
native message passing library for their forthcoming T3D MPP machine; 
Pat Estep from Convex described a special version of PVM that talks to 
their communications hardware.  In addition, Weicheng Jiang (U. Tenn) 
has been doing ports to multiprocessors, making use of the extended task 
id in PVM 3 as well as other features.  He is working on ports to the iPSC, 
the Paragon, and the CM5.  Others are reportedly working on Ncube and Meiko.  
What is missing is a PVM that is optimized for FDDI (one will be out soon) 
and HiPPI.

  Supercomputer centers and labs described their facilities and where
PVM came into play; In his keynote address Dennis Duke from Florida 
State University discussed the ``PVM Experiences at SCRI''
where they have a truly massive collection of machines,
including 40 IBM RS6000 machines, a CM2 from TMC, a Cray YMP, 
and various workstations.  Most of the distributed computing at SCRI 
is being done with PVM under control of DQS (distributed queueing system), 
a system originally developed at SCRI.  Both they and Cornell use an 
enhanced PVM written at IBM to get high performance on optical fiber.  
DQS has been selected by Boeing Computer Services as their PVM 
batch queuing system, as was later explained by Jim Patterson.  
Rod Fatoohi (NAS) also said they are looking at DQS for their large 
batch of SGI machines.  Cornell uses Load Leveler with tcp_wrapper to 
restrict PVM jobs to a pool of machines.  U. Wisconsin, naturally, 
uses Condor; recent developments to this batch queueing system 
were described by Miron Livny.

   Donna Bergmark from Cornell University provided a history of PVM at
the Cornell Theory Center. PVM has been part of their
production environment since June 1992. Cornell regularly holds a 
PVM workshop which are well received and over-subscripted.

  Other PVM talks covered applications that have been coded in PVM, with
many performance figures both good and bad; these talks were epitomized
by that of Karl-Heinz Winkler of LANL, who uses PVM to soak up all 
available cycles at the lab, ``for peanuts''.
Other points of Karl-Heinz's talk were
        o Impact of vectorization:  Made him organize the code, oh yeah, 
          goes faster now
        o Parallel programs: enforce managerial structure to the codes...
        o Powerful because it doesn't tie you to a hardware paradigm
        o Gets away from parallelizing compilers
        o PVM ATM cluster connected to a frame buffer under development
        o Gets rid of the OS on the nodes.  
        o Don't wait for parallelizing compilers
        o PVM crystallized the idea of distributed computing


  Software related talks were led off by Bob Manchek, who 
gave an overview of PVM 3 including technical details.
Bob stated that the main reasons for PVM 3 were fault tolerance, 
scalability, and the multiprocessor port.
Scalability is hard though; "by time you get 1000
nodes loaded up, 5 hours have passed, and your host node has died -- or
the operating system has been updated underneath you --".

  Vaidy Sunderam, one of the original developers of PVM touched
on the history of PVM,  shared objects, FDDI and ethernet performance.

  Many users have converted to PVM 3, but tools are lagging behind.  There
was curiosity about what happened to vsend; it has not gone away; it has 
been replaced by an "advice" routine that tells your daemon to negotiate 
virtual channels between you and other tasks.  The default communications 
method, though, is daemon-to-daemon, as in PVM 2.4.

  Many feel that PVM 3's biggest advantage is debugging; you can now spawn
tasks under a debugger, letting you do parallel debugging in a straight-
forward way.  In fact, John Drake (ORNL) reported that he debugs his PVM
programs on a single workstation first, and then runs it on the Paragon,
where the debugger isn't quite all there yet.  By contrast, Rob Gordon
(Convex) has been working on a debugger that will talk to other native
debuggers, thus providing a central point of control.  Milon Mackey (HP)
is working on an Instant Reply sort of debugger.  People still in
PVM 2.4 can used Adam Beguelin's Xab tracing tool, just released.

  Some imaginative applications of PVM: Mark Bolstad of Martin Marietta 
has taken AVS modules and run them on PVM across a heterogeneous network.
Vasudha Govindan of Washington U. has implemented the Barnes-Hut algorithm 
for N-body simulation in PVM, with a locally adapted version of ParaGraph 
to display the load balance.

  Bob Benway from GE described using large codes with PVM in many domains,
one example given took 46 hours on 9 computers, and in another example
a production run using 50K node finite element codes use to take
a week, now 3-4 hours with PVM for 100K (FE) nodes.
This makes their power generation engineering (a big part of their
business) more practical and cost effective.
Some of the problems he described were:       
	o Updating binaries is a pain
	o File I/O is troublesome
	o NFS problems on spawn.  It clogs the network.  Too many machines     
	  banging on the net in parallel.   The solution was to stagger the
	  startup of processes.

  There are a few higher level programming systems being built on top of
PVM, including DOME (CMU), QN (UT), and GDC (Emory).  There will also be
versions of HPF running on PVM, when Applied Parallel Research has finished
their work on xHPF.  And there appear to be a number of C++ object based 
systems being developed.  People at Georgia Tech have implemented a 
Cthreads port of PVM, which will allow PVM to run quickly on shared 
memory multiprocessors.

  Neil Lincoln gave the very well-received after dinner talk,
reminding us all of some unfortunate, but true, old saws about the nature
of hardware and software development done under time pressure.  

  Finally, at the end of the meeting there was a question and answer
session with the developers of PVM. 
	o Announced comp.parallel.pvm
	o Should we have a second Users' Group meeting?  
	  Large number of yes votes
	o How many hacked PVMDs?  A few.
	o When PvmInPlace?  Bumped over task-task...
	o Compatible library with 2.x?  
	  One just posted to comp.parallel.pvm
	o Where did probe go?  It's back.
	o Differences between 3.0 and 3.1
                Task/task routing
                Bug fixes
                Host delete now aggressive
	o Kerberos
		Would like to be able to easily set the rsh used for
		starting remote daemons by an option in the hostfile
                rather than compiling it in.  This would allow sites with 
		Kerberos to use Kerberos versions of rsh when needed 
		for security.
	o When global communication operations?
                To be done this over the summer.  Will be a library
                that will come with PVM like the group calls.
	o Other wants:
                - unrecv to put a message back in the Q as unread 
                        probe returns the message id and you can actually
                        unpack it while it's still in the message queue...
                - reserved message tags
                - message contexts
                - Set environment variables in spawn
                - I/O 
                - Flush for Fortran (very nonportable)
                - PVM console args for adding an MPP
                - ability to unpack in different sized chunks than 
                  used in the pack
                - support for multiple interfaces FCS, IP, ...


Table of PVM Products
---------------------

+------------+--------------------+-----------+----------------------------+
| package    | Description        |  Version  | Contact                    |
+------------+--------------------+-----------+----------------------------+
| Condor     | batch queuing sys. | PVM 2.4.1 | miron@cs.wisc.edu          |
| DQS        | batch queuing sys. | PVM 3.0   | Tom Greene, Jim Patterson  |
| QM         | HL Prog System     | PVM 2.4   | sept@cs.utk.edu            |
| Xab        | trace              | PVM 2.4   | xab@psc.edu                |
| DOME       | HL Prog System     | PVM 3.1   | adamb@cs.cmu.edu           |
| chkpnt/rst | utility            | PVM 2.4   | jl+@cs.cmu.edu             |
| GDC        | HL Prog System     |           | Vaidy Sunderam             |
| Cray PVM   | Optimized PVM      | PVM 2.4   | Peter Rigsbee              |
| Convex PVM | Optimized PVM      | PVM 2.4.2 | Pat Estep                  |
| pvmdb      | Debugger           | PVM 2.4.1 | Rob Gordon                 |
| IVD        | Debugger           | PVM 3.0   | Milon Mackey               |
| Cthreads   | Ported PVM         | PVM 2.4.2 | Vernard Martin             |
| 2.4->3.0   | Translation Util   |   ...     | on the net                 |
| linalg     | linear algebra sw  | PVM 3.1   | in netlib in pvm3/ex/linalg|
+------------+--------------------+-----------+----------------------------+

  A list of abstracts and slides presented are available on netlib.
For further information send email to netlib@ornl.gov and in the email
type: send index from pvm3/pvmug.

  From the interest in this meeting there will be another PVM Users'
Group Meeting next year roughly the same time, the location is still
to be determined.


Adam Beguelin, Donna Bergmark, and Jack Dongarra