Scientific Computing & Visualization | Documentation | Tutorials | Multiprocessing by Message Passing MPI

This Site

Multiprocessing by Message Passing MPI

Example 1.3 Integration with MPI Nonblocking Send

Until a matching receive has signaled that it is ready to receive, a blocking send will continue to wait. In situations where work following the send is not dependent on the safe arrival of the message to proceed, it would be more efficient to use nonblocking send so that work following the send statement can start right away while the send process is pending.

In this example, the point-to-point blocking MPI_Send used in the preceding example is replaced with the nonblocking MPI_Isend subroutine to enable work to proceed while the send process is waiting for its matching receive process to respond.

Example 1.3 Fortran code

Example 1.3 Fortran Code

      Program Example1_3
c#######################################################################
c#
c# This is an MPI example on parallel integration
c# It demonstrates the use of :
c#
c# * MPI_Init, MPI_Comm_rank, MPI_Comm_size, MPI_Finalize
c# * MPI_Recv, MPI_Isend, MPI_Wait
C# * MPI_ANY_SOURCE, MPI_ANY_TAG
c#
c# Dr. Kadin Tseng
c# Scientific Computing and Visualization
c# Boston University
c# 1998
c#
c#######################################################################
      implicit none
      integer n, p, i, j, proc, ierr, master
      real h, a, b, integral, pi
      integer req(1)

      include "mpif.h"  ! brings in pre-defined MPI constants, ...
      integer myid, source, tag, status(MPI_STATUS_SIZE)
      real my_int, integral_sum

      data master/0/    ! processor 0 collects integral sums from other processors

c**Starts MPI processes ...

      call MPI_Init(ierr)                              ! starts MPI
      call MPI_Comm_rank(MPI_COMM_WORLD, myid, ierr)   ! get current proc id
      call MPI_Comm_size(MPI_COMM_WORLD, p, ierr)      ! get number of procs

      pi = acos(-1.0)   !  = 3.14159...
      a = 0.0           ! lower limit of integration
      b = pi/2.         ! upper limit of integration
      n = 500           ! number of increment within each process
      tag = 123         ! set the tag to identify this particular job
      h = (b-a)/n/p     ! length of increment

      my_int = integral(a,myid,h,n)
      write(*,*)'myid=',myid,',  my_int=',my_int

      if(myid .eq. master) then               ! the following is serial
        integral_sum = my_int
        do proc=1,p-1
          call MPI_Recv(my_int, 1, MPI_REAL, 
     &       MPI_ANY_SOURCE, MPI_ANY_TAG,  ! more efficient, less prone to deadlock
     &       MPI_COMM_WORLD, status, ierr) ! root receives my_int from proc
          integral_sum = integral_sum + my_int
        enddo
      else
        call MPI_Isend(my_int, 1, MPI_REAL, master, tag,
     &     MPI_COMM_WORLD, req, ierr)     ! send my_int to master 
        call other_work('more work')  ! because of Isend, gets here immediately
        call MPI_Wait(req, status, ierr)  ! wait for nonblock send ...
      endif

c**integral_sums from all procs have been collected and summed ...

      if(myid .eq. master) then
        write(*,*)'The Integral =',integral_sum
      endif
      call MPI_Finalize(ierr)                       ! let MPI finish up ...

      stop
      end

      subroutine other_work(header)
      implicit none
      character*(*) header
      write(*,'(a)') header
      return
      end

      real function integral(a,i,h,n)
      implicit none
      integer n, i, j
      real h, h2, aij, a, fct, x

      integral = 0.0                ! initialize integral
      h2 = h/2.
      do j=0,n-1                    ! sum over all "j" integrals
        aij = a + (i*n +j)*h        ! lower limit of "j" integral
        integral = integral + fct(aij+h2)*h
      enddo

      return
      end

      real function fct(x)
      implicit none
      real x
      fct = cos(x)
      end

Example 1.3 C code

#include <mpi.h>
#include <math.h>
#include <stdio.h>
float fct(float x)
{
      return cos(x);
}
/* Prototype */
void other_work(char* header);
float integral(float a, int i, float h, int n);

int main(int argc, char* argv[])
{
/***********************************************************************
 *
 * This is an MPI example on parallel integration
 * It demonstrates the use of :
 *
 * * MPI_Init, MPI_Comm_rank, MPI_Comm_size, MPI_Finalize
 * * MPI_Recv, MPI_Isend, MPI_Wait
 * * MPI_ANY_SOURCE, MPI_ANY_TAG
 *
 * Dr. Kadin Tseng
 * Scientific Computing and Visualization
 * Boston University
 * 1998
 *
 ***********************************************************************/
      int n, p, myid, tag, master, proc;
      float h, integral_sum, a, b, pi, my_int;
      MPI_Request req;
      MPI_Status status;

/* Starts MPI processes ... */

      MPI_Init(&argc,&argv);                 /* starts MPI */
      MPI_Comm_rank(MPI_COMM_WORLD, &myid);  /* get current process id */
      MPI_Comm_size(MPI_COMM_WORLD, &p);     /* get number of processes */

      master = 0;
      pi = acos(-1.0);  /* = 3.14159... */
      a = 0.;           /* lower limit of integration */
      b = pi*1./2.;     /* upper limit of integration */
      n = 500;          /* number of increment within each process */
      tag = 123;        /* set the tag to identify this particular job */
      h = (b-a)/n/p;    /* length of increment */

      my_int = integral(a,myid,h,n);  /* 0<=myid<=p-1 */

      printf("Process %d has the partial result of %f\n", myid, my_int);
  
      if(myid == master) {
        integral_sum = my_int;
        for (proc=1;proc<p;proc++) {
          MPI_Recv(&my_int, 1, MPI_FLOAT, MPI_ANY_SOURCE, tag,
                   MPI_COMM_WORLD, &status); 
/* with MPI_ANY_SOURCE, more efficient and less prone to deadlock */
          integral_sum += my_int;
        }
        printf("The Integral =%f\n",integral_sum);
      }
      else {
        MPI_Isend(&my_int, 1, MPI_FLOAT, master, tag,
                      MPI_COMM_WORLD, &req);      /* send my_int to master */
        other_work("more work");
        MPI_Wait(&req, &status);
      }
      MPI_Finalize();                       /* let MPI finish up ... */
}
void other_work(char* header)
{
      printf("%s\n", header);
}

float integral(float a, int i, float h, int n)
{
      int j;
      float h2, aij, integ;

      integ = 0.0;                 /* initialize integral */
      h2 = h/2.;
      for (j=0;j<n;j++) {             /* sum over all "j" integrals */
        aij = a + (i*n + j)*h;        /* lower limit of "j" integral */
        integ += fct(aij+h2)*h;
      }  
      return integ;
}

Discussions

With the use of nonblocking MPI_Isend, the work performed by other_work proceed without having to wait for MPI_Recv to complete. This usage of nonblocking send (or receive) to avoid processor idling is referred to as "latency hiding." Latency is the elapse time for an operation, such as MPI_Isend, to complete.
Another performance enhancement applied to this example is the use of MPI_ANY_SOURCE as the source from which the message is received. The do loop is used to account for all non-master processors. The messages originated from these processors will be processed in the order of their arrival rather than any imposed order (such as the loop-index order). This is allowed because integration operation (essentially summation), satisfies both the associative and commutative rules and hence the order of summation is not pertinent.
Since MPI_ANY_SOURCE is used, the origin of an arrived message is not known explicitly. However, the status buffer returning from MPI_Recv contains useful information about the message. For example, status(MPI_SOURCE) returns the source (i.e., processor number) of the message in a fortran code while status.MPI_TAG returns its tag for a C code.

OIT | CCS | September 16, 2008