28 RS/6000 SP: Practical MPI Programming Case 3. One process sends and receives; the other receives and sends It is always safe to order the calls of MPI_(I)SEND and MPI_(I)RECV so that a send subroutine call at one process and a corresponding receive subroutine call at the other process appear in matching order. IF (myrank==0) THEN CALL MPI_SEND(sendbuf, ...) CALL MPI_RECV(recvbuf, ...) ELSEIF (myrank==1) THEN CALL MPI_RECV(recvbuf, ...) CALL MPI_SEND(sendbuf, ...) ENDIF In this case, you can use either blocking or non-blocking subroutines. Considering the previous options, performance, and the avoidance of deadlocks, it is recommended to use the following code. IF (myrank==0) THEN CALL MPI_ISEND(sendbuf, ..., ireq1, ...) CALL MPI_IRECV(recvbuf, ..., ireq2, ...) ELSEIF (myrank==1) THEN CALL MPI_ISEND(sendbuf, ..., ireq1, ...) CALL MPI_IRECV(recvbuf, ..., ireq2, ...) ENDIF CALL MPI_WAIT(ireq1, ...) CALL MPI_WAIT(ireq2, ...) 2.5   Derived Data Types As will be discussed in 3.1, “What is Parallelization?” on page 41, if the total amount of data transmitted between two processes is the same, you should transmit it a fewer number of times. Suppose you want to send non-contiguous data to another process. For the purpose of a fewer number of data transmissions, you can first copy the non-contiguous data to a contiguous buffer, and then send it at one time. On the receiving process, you may have to unpack the data and copy it to proper locations. This procedure may look cumbersome, but, MPI provides mechanisms, called  derived data types, to specify more general, mixed, and non-contiguous data. While it is convenient, the data transmissions using derived data types might result in lower performance than the manual coding of packing the data, transmitting, and unpacking. For this reason, when you use derived data types, be aware of the performance impact. 2.5.1   Basic  Usage  of Derived Data Types Suppose you want to send array elements  a(4), a(5), a(7), a(8), a(10)  and a(11) to another process. If you define a derived data type  itype1 as shown in Figure 19 on page 29, just send one data of type  itype1 starting at  a(4). In this figure, empty slots mean that they are neglected in data transmission. Alternatively, you can send three data of type  itype2 starting at  a(4).