One can improve performance on many systems by overlapping communication and computation. This is especially true on systems where communication can be executed autonomously by an intelligent communication controller. Multi-threading is one mechanism for threads achieving such overlap. While one thread is blocked, waiting for a communication to complete, another thread may execute on the same processor. This mechanism is efficient if the system supports light-weight threads that are integrated with the communication subsystem. An alternative mechanism that often gives better performance is to use nonblocking communication. A nonblocking communicationcommunication, nonblocking nonblocking post-send initiates a send operation, but does not post-send complete it. The post-send will return before the message is copied out of the send buffer. A separate complete-send complete-send call is needed to complete the communication, that is, to verify that the data has been copied out of the send buffer. With suitable hardware, the transfer of data out of the sender memory may proceed concurrently with computations done at the sender after the send was initiated and before it completed. Similarly, a nonblocking post-receive initiates a receive post-receive operation, but does not complete it. The call will return before a message is stored into the receive buffer. A separate complete-receive complete-receive is needed to complete the receive operation and verify that the data has been received into the receive buffer.
A nonblocking send can be posted whether a matching receive has been posted or not. The post-send call has local completion semantics: it returns immediately, irrespective of the status of other processes. If the call causes some system resource to be exhausted, then it will fail and return an error code. Quality implementations of MPI should ensure that this happens only in ``pathological'' cases. That is, an MPI implementation should be able to support a large number of pending nonblocking operations.
The complete-send returns when data has been copied out of the send buffer. The complete-send has non-local completion semantics. The call may return before a matching receive is posted, if the message is buffered. On the other hand, the complete-send may not return until a matching receive is posted.
There is compatibility between blocking and nonblocking communication functions. Nonblocking sends can be matched with blocking receives, and vice-versa.