An Active Messaging Protocol



next up previous
Next: Implementation Up: Possibilities for Active Messaging Previous: Protocol Performance

An Active Messaging Protocol

For our AM implementation, an N-way request-and-reply protocol will be used as outlined in Richard Martin's paper. He has shown that this model is capable of achieving full network bandwidth when using efficient network access methods [9]. Initially, a one-way model will be presented for clarity. This model will subsequently be extended to N ways to realize higher performance.

The request-and-reply protocol dictates that two messages be exchanged for every transaction: a request and a reply. All requests must send a reply, but no action is required of the user: the AM library automatically sends an empty acknowledgment if none is specified. The protocol also prevents any reply handler from accessing the network, effectively eliminating race conditions.

A traditional one-way request-and-reply protocol works as follows. Given V nodes, each node allocates two tables of length V for outgoing messages. The first table stores request AMs, and the second stores replies to requests made from the other nodes. Each position in each table corresponds to a fixed, unique destination address. When any message is to be sent, the buffer that corresponds to its type (a request or reply) and its destination address is examined. If that buffer is free, the message is stamped with an instance number (either 0 or 1), copied into the buffer, and injected into the network. Otherwise, the message will stall and wait for the buffer to become available. The request buffer is freed when a reply with a matching instance number is received. The reply buffer is freed when a request with a new instance number is received from the associated node. This new request means that the previous reply was received and processed by the sender; hence it is no longer of any use. Note that this protocol allows only one outstanding request and reply per node in the table. Thus, the instance number need be only a single bit.

One advantage to using a split-phase protocol is that it completely eliminates livelock and deadlock among communicating processes. Every request is automatically paired with a reply. Doing so avoids the overhead caused by code associated with the detection of such conditions.

Another advantage is that this model provides a simple and efficient form of flow control. An occupied request buffer means that the receiving node has not yet processed the request and sent the reply. Any attempt to send a request to this node will stall until a reply is received. Sophisticated buffer management and the explicit exchange of control messages are unnecessary. The benefits of this approach will become apparent upon the extension of the model.

Reliability is a problem only when communicating over local area networks (LANs), since all MPPs guarantee packet delivery and data integrity. A key item to remember is that very low rates of packet loss and corruption are being assumed. Reliability problems can be solved by using two different methods. Packet truncation and data corruption can be detected by using two fields in the message's header: a packet length and a checksum. Packet losses can be detected by timestamping the outgoing request. Upon each subsequent operation or poll of the network interface, one entry of the request table is examined. If any stale requests are found, they are retransmitted. Reply AMs are not stamped. If a reply is dropped, the resulting action is to retransmit the request. This presents a problem because the corresponding request handler could be executed twice. The solution is as follows: when an invalid request (its instance number matches that of reply buffer) is received, the corresponding reply is simply retransmitted without invoking the handler[9].

Clearly, this algorithm allows only one outstanding request and reply per node. As a result, the protocol is not capable of achieving maximum bandwidth. The goal of any communication protocol is to hide the latency of accessing the media by keeping multiple messages in the network. This is accomplished here by replicating the one-way protocol N times for each node, where N corresponds to the network depth. This value usually needs to be discovered by experimentation for each networking card, operating system, and access method. For example, memory-mapped register operations to the network device are much more efficient than traps to a kernel networking stack. Since processing an active message requires matching requests with replies and vice versa, a multibit instance number is introduced to the message. Now, for each node in the host pool, N buffers are allocated, and each is associated with one instance of the one-way protocol.



next up previous
Next: Implementation Up: Possibilities for Active Messaging Previous: Protocol Performance



Jack Dongarra
Tue Feb 7 21:45:39 EST 1995