To assess the native bandwidth and latency, we used CMMD_send_block() and CMMD_receive_block(). We could also have used CMMD_send_async() and CMMD_receive_async(), which would have given the same performance. The main advantage of these routines is that they provide the user with the possibility of overlapping some communications by some computations. It is clear, after some experiments, that CMMD_send_noblock() is quite inefficient. This result is surprising for a ``ping-pong" test, since normally the receive is always posted and CMMD_send_noblock() should be able to send the data without any buffering. Here, on the contrary, it buffers the data systematically, always involving an extra data copy on the sending end. Nevertheless, CMMD_send_noblock has several advantages: it cannot lead to a deadlock (as can CMMD_send_block()), and the user can reuse its buffer as soon as the call returns (unlike CMMD_send_async()). One pitfall in CMMD_send_noblock is that it could run out of message descriptors if packets pile up at the sending end.