Report from the First PVM Users' Group Meeting The First PVM Users' Group Meeting was held at Knoxville Tennessee on May 10-11, 1993. The meeting was organized by the PVM Team. In particular, Adam Beguelin, Jack Dongarra, Al Geist, Bob Manchek, and Vaidy Sunderam worked hard to ensure its success. The meeting was sponsered by the University of Tennessee and the Department of Energy with support from the NSF Science and Technology Center for Research on Parallel Computation, Convex Computer Corporation, Cray Research, Inc., Digital Equipment Corporation, and Intel Supercomputer Systems Division was well appreciated. The PVM Users' Group Meeting was well attended, with 41 speakers and over 100 participants from 7 countries. Executive Summary ----------------- o PVM is in very wide use, world-wide - it is a "de facto" standard o A number of machine vendors are providing native or optimized versions o This first meeting had 107 enthusiastic users of PVM registered o A cottage industry of related software has grown up around PVM Some Meeting Highlights ----------------------- The meeting had been well-planned to contain a variety of contributed and solicited talks, time for discussion, and social events. Some of the talks were given in parallel sessions but these were kept on schedule so that people could move back and forth. The audience was made up mainly by systems types, judging by the somewhat stronger attendance at system software talks verses numerical and applications talks. In his welcoming remarks Jack Dongarra announced that 9000 copies of PVM had been retrieved from netlib since the first release of PVM. The vendors were well represented there; interesting talks were given by Cray and Convex. Peter Rigsbee from Cray described the use of PVM as the native message passing library for their forthcoming T3D MPP machine; Pat Estep from Convex described a special version of PVM that talks to their communications hardware. In addition, Weicheng Jiang (U. Tenn) has been doing ports to multiprocessors, making use of the extended task id in PVM 3 as well as other features. He is working on ports to the iPSC, the Paragon, and the CM5. Others are reportedly working on Ncube and Meiko. What is missing is a PVM that is optimized for FDDI (one will be out soon) and HiPPI. Supercomputer centers and labs described their facilities and where PVM came into play; In his keynote address Dennis Duke from Florida State University discussed the ``PVM Experiences at SCRI'' where they have a truly massive collection of machines, including 40 IBM RS6000 machines, a CM2 from TMC, a Cray YMP, and various workstations. Most of the distributed computing at SCRI is being done with PVM under control of DQS (distributed queueing system), a system originally developed at SCRI. Both they and Cornell use an enhanced PVM written at IBM to get high performance on optical fiber. DQS has been selected by Boeing Computer Services as their PVM batch queuing system, as was later explained by Jim Patterson. Rod Fatoohi (NAS) also said they are looking at DQS for their large batch of SGI machines. Cornell uses Load Leveler with tcp_wrapper to restrict PVM jobs to a pool of machines. U. Wisconsin, naturally, uses Condor; recent developments to this batch queueing system were described by Miron Livny. Donna Bergmark from Cornell University provided a history of PVM at the Cornell Theory Center. PVM has been part of their production environment since June 1992. Cornell regularly holds a PVM workshop which are well received and over-subscripted. Other PVM talks covered applications that have been coded in PVM, with many performance figures both good and bad; these talks were epitomized by that of Karl-Heinz Winkler of LANL, who uses PVM to soak up all available cycles at the lab, ``for peanuts''. Other points of Karl-Heinz's talk were o Impact of vectorization: Made him organize the code, oh yeah, goes faster now o Parallel programs: enforce managerial structure to the codes... o Powerful because it doesn't tie you to a hardware paradigm o Gets away from parallelizing compilers o PVM ATM cluster connected to a frame buffer under development o Gets rid of the OS on the nodes. o Don't wait for parallelizing compilers o PVM crystallized the idea of distributed computing Software related talks were led off by Bob Manchek, who gave an overview of PVM 3 including technical details. Bob stated that the main reasons for PVM 3 were fault tolerance, scalability, and the multiprocessor port. Scalability is hard though; "by time you get 1000 nodes loaded up, 5 hours have passed, and your host node has died -- or the operating system has been updated underneath you --". Vaidy Sunderam, one of the original developers of PVM touched on the history of PVM, shared objects, FDDI and ethernet performance. Many users have converted to PVM 3, but tools are lagging behind. There was curiosity about what happened to vsend; it has not gone away; it has been replaced by an "advice" routine that tells your daemon to negotiate virtual channels between you and other tasks. The default communications method, though, is daemon-to-daemon, as in PVM 2.4. Many feel that PVM 3's biggest advantage is debugging; you can now spawn tasks under a debugger, letting you do parallel debugging in a straight- forward way. In fact, John Drake (ORNL) reported that he debugs his PVM programs on a single workstation first, and then runs it on the Paragon, where the debugger isn't quite all there yet. By contrast, Rob Gordon (Convex) has been working on a debugger that will talk to other native debuggers, thus providing a central point of control. Milon Mackey (HP) is working on an Instant Reply sort of debugger. People still in PVM 2.4 can used Adam Beguelin's Xab tracing tool, just released. Some imaginative applications of PVM: Mark Bolstad of Martin Marietta has taken AVS modules and run them on PVM across a heterogeneous network. Vasudha Govindan of Washington U. has implemented the Barnes-Hut algorithm for N-body simulation in PVM, with a locally adapted version of ParaGraph to display the load balance. Bob Benway from GE described using large codes with PVM in many domains, one example given took 46 hours on 9 computers, and in another example a production run using 50K node finite element codes use to take a week, now 3-4 hours with PVM for 100K (FE) nodes. This makes their power generation engineering (a big part of their business) more practical and cost effective. Some of the problems he described were: o Updating binaries is a pain o File I/O is troublesome o NFS problems on spawn. It clogs the network. Too many machines banging on the net in parallel. The solution was to stagger the startup of processes. There are a few higher level programming systems being built on top of PVM, including DOME (CMU), QN (UT), and GDC (Emory). There will also be versions of HPF running on PVM, when Applied Parallel Research has finished their work on xHPF. And there appear to be a number of C++ object based systems being developed. People at Georgia Tech have implemented a Cthreads port of PVM, which will allow PVM to run quickly on shared memory multiprocessors. Neil Lincoln gave the very well-received after dinner talk, reminding us all of some unfortunate, but true, old saws about the nature of hardware and software development done under time pressure. Finally, at the end of the meeting there was a question and answer session with the developers of PVM. o Announced comp.parallel.pvm o Should we have a second Users' Group meeting? Large number of yes votes o How many hacked PVMDs? A few. o When PvmInPlace? Bumped over task-task... o Compatible library with 2.x? One just posted to comp.parallel.pvm o Where did probe go? It's back. o Differences between 3.0 and 3.1 Task/task routing Bug fixes Host delete now aggressive o Kerberos Would like to be able to easily set the rsh used for starting remote daemons by an option in the hostfile rather than compiling it in. This would allow sites with Kerberos to use Kerberos versions of rsh when needed for security. o When global communication operations? To be done this over the summer. Will be a library that will come with PVM like the group calls. o Other wants: - unrecv to put a message back in the Q as unread probe returns the message id and you can actually unpack it while it's still in the message queue... - reserved message tags - message contexts - Set environment variables in spawn - I/O - Flush for Fortran (very nonportable) - PVM console args for adding an MPP - ability to unpack in different sized chunks than used in the pack - support for multiple interfaces FCS, IP, ... Table of PVM Products --------------------- +------------+--------------------+-----------+----------------------------+ | package | Description | Version | Contact | +------------+--------------------+-----------+----------------------------+ | Condor | batch queuing sys. | PVM 2.4.1 | miron@cs.wisc.edu | | DQS | batch queuing sys. | PVM 3.0 | Tom Greene, Jim Patterson | | QM | HL Prog System | PVM 2.4 | sept@cs.utk.edu | | Xab | trace | PVM 2.4 | xab@psc.edu | | DOME | HL Prog System | PVM 3.1 | adamb@cs.cmu.edu | | chkpnt/rst | utility | PVM 2.4 | jl+@cs.cmu.edu | | GDC | HL Prog System | | Vaidy Sunderam | | Cray PVM | Optimized PVM | PVM 2.4 | Peter Rigsbee | | Convex PVM | Optimized PVM | PVM 2.4.2 | Pat Estep | | pvmdb | Debugger | PVM 2.4.1 | Rob Gordon | | IVD | Debugger | PVM 3.0 | Milon Mackey | | Cthreads | Ported PVM | PVM 2.4.2 | Vernard Martin | | 2.4->3.0 | Translation Util | ... | on the net | | linalg | linear algebra sw | PVM 3.1 | in netlib in pvm3/ex/linalg| +------------+--------------------+-----------+----------------------------+ A list of abstracts and slides presented are available on netlib. For further information send email to netlib@ornl.gov and in the email type: send index from pvm3/pvmug. From the interest in this meeting there will be another PVM Users' Group Meeting next year roughly the same time, the location is still to be determined. Adam Beguelin, Donna Bergmark, and Jack Dongarra