Multiprocessing by Message Passing MPI
Example 1.3 Integration with MPI Nonblocking Send
Until a matching receive has signaled that it is ready to receive, a blocking send
will continue to wait. In situations where work following the send is not dependent
on the safe arrival of the message to proceed,
it would be more efficient to use nonblocking send so that
work following the send statement can start right away while the send process is pending.
In this example, the point-to-point blocking MPI_Send used in the
preceding example is replaced with the nonblocking
MPI_Isend subroutine to enable work to proceed
while the send process is waiting for its matching receive process to respond.
Example 1.3 Fortran Code
Program Example1_3
c#######################################################################
c#
c# This is an MPI example on parallel integration
c# It demonstrates the use of :
c#
c# * MPI_Init, MPI_Comm_rank, MPI_Comm_size, MPI_Finalize
c# * MPI_Recv, MPI_Isend, MPI_Wait
C# * MPI_ANY_SOURCE, MPI_ANY_TAG
c#
c# Dr. Kadin Tseng
c# Scientific Computing and Visualization
c# Boston University
c# 1998
c#
c#######################################################################
implicit none
integer n, p, i, j, proc, ierr, master
real h, a, b, integral, pi
integer req(1)
include "mpif.h" ! brings in pre-defined MPI constants, ...
integer myid, source, tag, status(MPI_STATUS_SIZE)
real my_int, integral_sum
data master/0/ ! processor 0 collects integral sums from other processors
c**Starts MPI processes ...
call MPI_Init(ierr) ! starts MPI
call MPI_Comm_rank(MPI_COMM_WORLD, myid, ierr) ! get current proc id
call MPI_Comm_size(MPI_COMM_WORLD, p, ierr) ! get number of procs
pi = acos(-1.0) ! = 3.14159...
a = 0.0 ! lower limit of integration
b = pi/2. ! upper limit of integration
n = 500 ! number of increment within each process
tag = 123 ! set the tag to identify this particular job
h = (b-a)/n/p ! length of increment
my_int = integral(a,myid,h,n)
write(*,*)'myid=',myid,', my_int=',my_int
if(myid .eq. master) then ! the following is serial
integral_sum = my_int
do proc=1,p-1
call MPI_Recv(my_int, 1, MPI_REAL,
& MPI_ANY_SOURCE, MPI_ANY_TAG, ! more efficient, less prone to deadlock
& MPI_COMM_WORLD, status, ierr) ! root receives my_int from proc
integral_sum = integral_sum + my_int
enddo
else
call MPI_Isend(my_int, 1, MPI_REAL, master, tag,
& MPI_COMM_WORLD, req, ierr) ! send my_int to master
call other_work('more work') ! because of Isend, gets here immediately
call MPI_Wait(req, status, ierr) ! wait for nonblock send ...
endif
c**integral_sums from all procs have been collected and summed ...
if(myid .eq. master) then
write(*,*)'The Integral =',integral_sum
endif
call MPI_Finalize(ierr) ! let MPI finish up ...
stop
end
subroutine other_work(header)
implicit none
character*(*) header
write(*,'(a)') header
return
end
real function integral(a,i,h,n)
implicit none
integer n, i, j
real h, h2, aij, a, fct, x
integral = 0.0 ! initialize integral
h2 = h/2.
do j=0,n-1 ! sum over all "j" integrals
aij = a + (i*n +j)*h ! lower limit of "j" integral
integral = integral + fct(aij+h2)*h
enddo
return
end
real function fct(x)
implicit none
real x
fct = cos(x)
end
Example 1.3 C code
#include <mpi.h>
#include <math.h>
#include <stdio.h>
float fct(float x)
{
return cos(x);
}
/* Prototype */
void other_work(char* header);
float integral(float a, int i, float h, int n);
int main(int argc, char* argv[])
{
/***********************************************************************
*
* This is an MPI example on parallel integration
* It demonstrates the use of :
*
* * MPI_Init, MPI_Comm_rank, MPI_Comm_size, MPI_Finalize
* * MPI_Recv, MPI_Isend, MPI_Wait
* * MPI_ANY_SOURCE, MPI_ANY_TAG
*
* Dr. Kadin Tseng
* Scientific Computing and Visualization
* Boston University
* 1998
*
***********************************************************************/
int n, p, myid, tag, master, proc;
float h, integral_sum, a, b, pi, my_int;
MPI_Request req;
MPI_Status status;
/* Starts MPI processes ... */
MPI_Init(&argc,&argv); /* starts MPI */
MPI_Comm_rank(MPI_COMM_WORLD, &myid); /* get current process id */
MPI_Comm_size(MPI_COMM_WORLD, &p); /* get number of processes */
master = 0;
pi = acos(-1.0); /* = 3.14159... */
a = 0.; /* lower limit of integration */
b = pi*1./2.; /* upper limit of integration */
n = 500; /* number of increment within each process */
tag = 123; /* set the tag to identify this particular job */
h = (b-a)/n/p; /* length of increment */
my_int = integral(a,myid,h,n); /* 0<=myid<=p-1 */
printf("Process %d has the partial result of %f\n", myid, my_int);
if(myid == master) {
integral_sum = my_int;
for (proc=1;proc<p;proc++) {
MPI_Recv(&my_int, 1, MPI_FLOAT, MPI_ANY_SOURCE, tag,
MPI_COMM_WORLD, &status);
/* with MPI_ANY_SOURCE, more efficient and less prone to deadlock */
integral_sum += my_int;
}
printf("The Integral =%f\n",integral_sum);
}
else {
MPI_Isend(&my_int, 1, MPI_FLOAT, master, tag,
MPI_COMM_WORLD, &req); /* send my_int to master */
other_work("more work");
MPI_Wait(&req, &status);
}
MPI_Finalize(); /* let MPI finish up ... */
}
void other_work(char* header)
{
printf("%s\n", header);
}
float integral(float a, int i, float h, int n)
{
int j;
float h2, aij, integ;
integ = 0.0; /* initialize integral */
h2 = h/2.;
for (j=0;j<n;j++) { /* sum over all "j" integrals */
aij = a + (i*n + j)*h; /* lower limit of "j" integral */
integ += fct(aij+h2)*h;
}
return integ;
}
Discussions
- With the use of nonblocking MPI_Isend, the work performed by other_work
proceed without having to wait for MPI_Recv to complete.
This usage of nonblocking send (or receive) to avoid
processor idling is referred to as "latency hiding." Latency is
the elapse time for an operation, such as MPI_Isend, to complete.
- Another performance enhancement applied to this example is the use of
MPI_ANY_SOURCE as the source from which the message is received.
The do loop is used to account for all non-master processors.
The messages originated from these processors will be processed
in the order of their arrival rather than any imposed order (such as the
loop-index order). This is allowed because
integration operation (essentially summation), satisfies both the associative
and commutative rules and hence the order of summation is not pertinent.
- Since
MPI_ANY_SOURCE is used, the origin of an arrived
message is not known explicitly. However, the status buffer returning from
MPI_Recv contains useful information about the message. For example,
status(MPI_SOURCE) returns the source (i.e., processor
number) of the message in a fortran code while
status.MPI_TAG returns its tag for a C code.
Example 1  |
Example 1.1 |
Example 1.2 |
Example 1.3 |
Example 1.4 |
Example 1.5
|
|