The Latency

Next: The Intel Paragon Up: The CM-5 Previous: The Bandwidth

The Latency

Figure (a) shows the transfer time between two nodes for small messages (up to 1024 bytes).

Figure: (a) Latency on the CM-5 : PVM3 - CMMD - (b) Bandwidth : PvmDataDefault

We computed the latencies from Figure using a least squares interpolation. They are given in the following table.

We observe that the latency for CMMD_send_block()-CMMD_receive_block() is the lowest. The latency for pvm_psend()-pvm_precv() is higher, since these routines are built on top of the CMMD routines. Moreover, pvm_psend() is much more complex than CMMD_send_block(), since it uses CMMD_send_async() and accepts incoming messages while waiting for its sending operation to be completed, putting them into a queue (the semantics of pvm_psend() implies that no deadlock should occur and that the buffer is ready for reuse when it returns).

The latency of pvm_send()-pvm_recv() is of course much higher than that of pvm_psend()-pvm_precv(). This is because of the data packing-unpacking and the use of CMMD_send_noblock(). We notice that the latency is higher with PvmDataInPlace than with PvmDataRaw, which can be seen from Figure . With PvmDataInPlace, pvm_send() has much more ``work" to do than with PvmDataRaw. Before sending the first data to the receiver, a header must be sent, to inform the receiver about the size of the messages to be expected. This header is built in the PVM space and must be sent separately because it is not contiguous with the data. In the figure, the sending of the header corresponds to the blue arrow number 1. Once it has received the header, the receiver builds a PVM buffer according to the information contained in the header, symbolized by the dashed black arrow on the figure. Then it begins accepting the data in this buffer (blue arrows 2). This process is repeated with the next header if there is one. In our small program, we have only one header of data to transmit. Thus, pvm_send() will send a header and then the data. The extra cost of the header is the penalty for short messages.

Note that if we use PvmDataInPlace to send noncontiguous different data, pvm_send() actually sends messages. Hence, it is highly inefficient to use PvmDataInPlace instead of PvmDataRaw to send a large amount of noncontiguous small data.

Next: The Intel Paragon Up: The CM-5 Previous: The Bandwidth

Jack Dongarra
Thu Jul 20 07:22:58 EDT 1995