Demonstrates the usage of virtual topology.
In this example, we demonstrate an application of the cartesian topology by way of a simple elliptic (Laplace) equation.
Fundamentals
The Laplace equation, along with prescribed boundary
conditions, are introduced. Finite Difference Method is then
applied to discretize the PDE to form an algebraic system of equations.
Jacobi Scheme
A very simple iterative method, known as the Jacobi Scheme,
is described. A single-process computer code is shown.
This program is written in fortran 90 for its concise but clear
array representations. (Parallelism and other
improvements will be added to this code as we progress)
Parallel Jacobi Scheme
A parallel algorithm for this problem is discussed.
Simple MPI routines, without the invocations of cartesian topology,
are inserted
into the basic, single-process, code above to form the parallel
code.
SOR Scheme
The Jacobi scheme, while simple and hence desirable for
demonstration purposes, is impractical for "real" applications
due to its slow convergence. Enhancements to the basic technique
are introduced which leads to the Successive Over Relaxation
(SOR) scheme.
Parallel SOR Scheme
With the introduction of a "red-black" algorithm, we can employ
the parallel algoithm used for Jacobi to parallelize the SOR.
Scalability
We show the performance of the code for various number of processes
to demonstrate its scalability.
Click here
to download a zipped file containing the F90 Jacobi and SOR
codes, along with make files and a matlab m-file for plotting
the solution.
and is shown below in a contour plot with x pointing from left to right and y going
from bottom to top.
For the obvious reason of better load-balancing, we will divide the amount
of work, in this case proportional to the grid size, evenly
among the processes (m x m / p). For convenience, we define
m' = m/p as the number of cells in the y-direction for each
process. Next, lets rewrite Eq. 3 for a process k as follows:
The figure below depicts the grid of a typical process k as well
as part of adjoining grids of k-1, k+1.
For details, see Chapter 19.5, Relaxation Methods for Boundary Value Problems,
the Numerical Recipe .
With this red-black group identification strategy and upon applying the
five-point finite-difference stencil to a point (i,j) located at a red
cell, it is immediately apparent that the solution at the red cell depends only
on its four immediate black neighbors to the north, east, west,
and south by virtue of Eq. 6.
On the contrary, a point (i,j) located at
a black cell depends only on its north, east, west, and south red
neighbors. In other words, the finite-difference stencil in Eq. 6 effects
an uncoupling of the solution at interior cells such that solution at
the red cells depend only on solution at
the black cells and vice versa. In a typical iteration, if we first perform an
update on all red (i,j) cells, then when we perform the remaining
update on black (i,j) cells, we could use the red cells that have just
been updated. Otherwise, everything that we described about the parallel
Jacobi scheme applies equally well here; i.e., the green cells represent
the physical boundary conditions while the solutions from first and last
rows of the grid of each process are deposited into the blue cells of
respective process grids to be used as the remaining boundary conditions.
A few comments ...
Fundamentals
First, some basics:
Jacobi Scheme
While numerical techniques abound to solve PDEs such as the Laplace equation, we will focus on the
use of an iterative method as it will be shown to be readily parallelizable
and lends itself to the opportunity to apply cartesian topology. The simplest of
iterative techniques is the Jacobi scheme, which can be stated as follows:
Single process f90 code for the Jacobi Scheme
Parallel Algorithm for the Jacobi Scheme
To enable parallelism, first we need to divvy up works for individual processes;
this is known commonly as domain decomposition. Since the governing
equation is two-dimensional, typically we have a choice of using a 1D or 2D
decomposition. Here, we will focus on a 1D decomposition and
defer the discussion of a 2D decomposition for later. Assuming that p
processes will be used, we split the computational domain into p
horizontal strips, each assigned to one process, along the north-south or
y-direction. This choice is made primarily to facilitate a simpler boundary
condition (code) implementations.
Jacobi Parallel Implementation. Note that
Cartesian topology is not employed in this implementation but will be
used later in the parallel SOR example with the purpose of showing alternative
ways to solve this type of problems.
Successive Over Relaxation (SOR)
While the Jacobi iteration scheme is very simple -- and parallelizable --
its slow convergent rate however renders it impractical for any "real world"
applications. One way to speed up the convergent rate would be to "over predict"
the new solution by linear extrapolation. This leads to the Successive Over Relaxation scheme, or SOR:
Note in the above that setting wn = 1 recovers the
Jacobi scheme while wn< 1 underrelaxes the solution.
Ideally, the choice of
wn would be such that it provides the optimal rate
of convergence and is not restricted to a fixed constant.
As a matter of fact, an effective choice of wn, known as the
Chebyshev acceleration, is defined as
A Parallel SOR Red-black Scheme
To further speed up the rate of convergence, we could use u at time
level n+1 for any or all terms on the right hand side of Eq. 6 as soon as they
become available. This is the essence of the Gauss-Seidel scheme.
This requires that the solution be updated cell-by-cell via do
loop in order to make use of the
most recent solution as soon as they become available. Alternatively,
we will explore a conceptually similar red-black scheme which
will work equally well for an array-based language such as
F90 as a scalar-based language such as C or F77. This scheme can best
be understood visually by painting the interior cells alternately
in red and black to yield a Checkerboard-like pattern:
parallel F90 SOR code. Cartesian topology is employed in this
implementation.