|
Code Porting
Compiled below is a list of facts you need to be aware of when trying to compile, link, and run programs on the Katana Cluster:
- The Katana Cluster consists of the Katana login (and compute)
node and 13 dedicated compute nodes. Each of these
nodes include four 2.6 GHz AMD Opteron processors sharing 8 GB
of memory.
- To make full use of the 8 GB in a node, you must be running a 64-bit
executable. A 32-bit executable can only accomodate 2 GB of
memory allocation. If you are using pre-built commercial or
other application packages, they may not be built for
64-bit applications. Gaussian is such as example.
- Sixty-four-bit addressing is the system default on the Katana Cluster.
To build 32-bit executables use the "-m 32" compiler option with
the GNU compilers and "-tp k8-32" with the PGI
compilers. (More details ...)
- Executables previously generated on the SCV Linux Cluster may
work on the Katana Cluster. However, we recommend that you
recompile all programs for Katana applications.
- Two sets of compilers, PGI (default) and GNU, are available.
These can be switched on or off through the environment
variable MPI_COMPILER. (More details ...)
- IF your code mixes C with fortran, it will most likely
require additional language-related support libraries. For Portland Group compilers,
add either -pgf77libs or -pgf90libs, depending
on the language syntax. (See pgcc)
- MPI is supported
by both the "openmpi" (default) and "mpich" MPI implementations. These two options can be switched on or off through the environment
variable MPI_IMPLEMENTATION. (More details ...)
- MPI C and C++ programs need to include
mpi.h
where necessary while MPI FORTRAN 77/90/95 programs need
mpif.h. No additional header files or compiler
switches are needed for C++ programs.
- MPI-2 functionalities, such as MPI_Put and MPI_Get, are
supported only in the "openmpi" MPI implementation.
- A timing comparison of
an MPI code for a 2D Laplace solver
on the SCV computer systems (IBM pSeries, IBM Bluegene, Intel Pentium
III Linux Cluster and the IBM Katana Cluster) is
provided to demonstrate their relative performances.
- Batch job wallclock limit is 24 hours. (More details ...)
- You can request up to 16 processors per job. (More details ...)
- Parallel processing with OpenMP
is limited to 4 processors. When submitting a
batch job, use "-pe omp N" with N a number between 1 and 4 to
assure that all requested processors are in a single node.
- The distributed-memory parallel mathematical library
ScaLAPACK is available.
- Usually, there is no need to know the specific node names
assigned at runtime to a batch job. However, if a job needs
the node names, they are available through an environment
variable $PE_HOSTFILE at runtime.
(More details ... )
- 50 GB of local scratch disk space is available on each
node. It is NOT backed up and can only
be kept for 10 days.
(More details ... )
- Debugging and profiling
tools are available.
|