(The Uniform Resource Locator for this World Wide Web page is
"http://scv.bu.edu/documentation/tutorials/F90/")

Fortran 90 and Multiprocessing

Course 3070

Introduction

Highlight of Fortran 90 Features

Fortran 90 Constructs

Fortran 90 Array Intrinsics

The performances of the above array intrinsics may or may not be better than by doing them explicitly with do-loops. It is highly dependent on the individual function and the compiler version used. A table listing the performances of the above array intrinsics has been compiled.

Serial Code Compilation

To compile example.f90 and produce an executable "example" :
lego% f90 -o example example.f90

at prompt, type f90 -help to get a list of all f90 compiler options
If a Makefile is used to compile an f90 program, caution must be taken to make sure that modules are compiled before subroutines that refer to them. Otherwise, compilation will fail. For an example, see here.

Serial Code Execution

To run job interactively,
lego% example

On the Power Challenge Array (PCA) and Origin2000 (O2K), all interactive jobs have a 10-minute (loosely speaking) cpu time limit.
To submit batch job :
lego% bsub -q o2k-short example

See the man page of lsbatch for other batch-related commands
lego% bqueues lists all available queues
lego% bqueues -l gives a complete list of all single and multiprocessor queues and their respective time limits.
Here is a good summary on the hardware and queues available at Boston University.

Serial code Tuning

Multiprocessing with Fortran 90

There are a number of methods available to help you achieve parallelism in your code. One method may be more effective than another, it depends largely on the characteristics of your code. Note that it is possible to use more than one method in different parts of the same code to achieve parallelism.

Parallel Code Compilation and Executions

There are different ways to compile source codes, depending on your objectives:
  1. Use SGI's parallel mathematical libraries :
    lego% f90 -o example -mp example.f90 -lscs_mp

    at prompt, type f90 -help to get a list of all f90 compiler options.

    If your code is f77 based, you can still link with -lscs_mp.

    See Intro to SCSL to find out if the Lapack routine you are using is a member of the parallel library. Caveat: just because it is in the library doesn't means that you will get great speed up. Some routines are known to have minimal effect (like SVD routines); others however scales up very well (like LU decomposition).

  2. Use loop-level parallel directives :
    lego% f90 -o example -mp example.f90

    Here, you must include parallel directives in example.f90 in order for parallel works to take effect. -mp alerts the compiler that the source file contains directives. In addition, -mp also causes mp libraries to be linked.

  3. Use apo to automatically parallelizes and compiles code
    lego% f90 -o example -apo keep example.f90

    apo option can also be used with f77 compiler

To run job interactively at the monitor with 4 processors:
  1. lego% setenv MP_SET_NUMTHREADS 4
  2. lego% example

Alternatively, you can insert a fortran-callable SGI utility library routine in your code immediately after non-executable statements as follows:

call mp_set_numthreads(4)
Note that interactive jobs can only be executed on Tonka (an SGI PowerChallengeArray) and Lego (an SGI Origin2000)) and the time limit is 10 minutes (loosely speaking) per processor. A job that requires more than 10 minutes should be submitted to the various batch queues via bsub.

To submit a multiprocessor batch job requiring 4 processors to the PCA:

lego% bsub -q pca-mp4 example

DO NOT provide "-n 4" as described in the bsub manpage to request 4 processors. Instead, use MP_SET_NUMTHREADS as in the interactive job, or insert "call mp_set_numthreads(4)" in your program to request 4 processors. Remember to link with the -mp switch and do not ask for more processors than the queue's limit.
pca-mp4 is for jobs that require up to 4 hours per processor for a total of 16 hours of cpu time on the PCA.
o2k-mp4 is for jobs that require up to 4 hour per processor for a total of 16 hours of cpu time on the Origin 2000..
For more information on available queues and their corresponding CPU limits, see Scientific Computing Facility Technical Summary.
Click here for bsub related commands.

High Performance Fortran

With Boston University's Power Challenge Array, HPF is available through pghpf driver to The Portland Group's HPF compiler.

At present, we have pghpf 2.4. In order to use it, you should put this

if ( -d /usr/local/pghpf ) then
        setenv PGI /usr/local/pghpf-2.4
        set path = ($path $PGI/sgi/bin)
        setenv LM_LICENSE_FILE /usr/local/flexlm/licenses/license.dat
endif
in your .cshrc script.

For those who have Thinking Machines' CM Fortran codes and would like to convert it to HPF, the on-line documentation includes a paper, "Migrating CM FORTRAN to F90 and HPF", by Meadows and Miles. There are man pages for the pghpf compiler and for the individual HPF library routines.

For the efficiency-conscious, there is a menu-driven profiler pgprof for your applications.

Examples of source code compilation are as follows:

To run a pghpf job:

lego% example -pghpf -np 4

or

lego% setenv PGHPF_NP 4
lego% example

Examples

References

There are a number of Fortran 90 and HPF references available.
From book publishers :
  1. Fortran 90 Programming by Ellis, Phillips and Lahey, Addison-Wesley, 1994
  2. Migrating to Fortran 90 by J.F. Kerrigan, O'Reilly & Associates, Inc., 1993
  3. Fortran 90 Handbook by Adams, Brainerd, et. al., McGraw-Hill, 1992

On the Internet:
  1. The Fortran Market maintained by Walt Brainerd
  2. Fortran 90 Tutorial by Michael Metcalf
  3. Fortran 90 and Computational Science chapter in CSEP's on-line text book on scientific computation
  4. Portland Group's pghpf User's Guide
  5. Portland Group's pghpf HPF Reference Manual
  6. High Performance Fortran Forum's HPF Language Specifications. This document is also available in postscript form at this site.
  7. HPF web tour by Ian Foster
  8. HPF chapter in Designing and Building Parallel Programs by Ian Foster

For more information about this tutorial, and about Fortran 90, HPF and Multiprocessing, contact the course coordinator and instructor, Kadin Tseng (Email: kadin@bu.edu).