Reliability



next up previous
Next: Buffering Up: Implementation Previous: Handler Execution

Reliability

To realize maximum bandwidth with a low-level transport demands a simple and efficient implementation of reliable data delivery. When using a completely unreliable transport, timers and data verification fields must be used in addition to those for protocol management. The above protocol requires only two fields in the header: the message type and the protocol instance number. The type actually requires only one bit indicating whether the message is a request or a reply. The protocol instance number is necessary in order to match a reply with the corresponding request. Remember that with the N-way protocol, multiple requests can be outstanding for the same node. Explicit alarms can be avoided by timestamping the copy of each transmitted request. Note that the above protocol requires that only outgoing requests be stamped, since they are the only messages being monitored for loss. Timestamps should be obtained through the most efficient means possible. This is often very different from platform to platform. On some platforms for example, interval timers are less expensive to read compared to the gettimeofday() system call [9]. The interval between successive retransmission of stale requests can be exponential with an arbitrary limit signifying node failure.

To maintain data integrity requires the insertion of the length and a packet-wide checksum into the message's header. Upon reception, if the computed values do not match, the message is discarded. This approach may not be necessary, however, since many network transports provide this service automatically.



Jack Dongarra
Tue Feb 7 21:45:39 EST 1995