Katana Compute Nodes Scratch Disks Usage
This page demonstrates the usage of the compute nodes' scratch file system
for temporary file storage in a batch job.
To access a compute node's local scratch file, simply refer to /scratch.
The absolute path of a compute node has the form
/net/katana-aNN/scratch/mydir/myfile, where
katana-aNN is the node number with NN a number from
01, 02, ..., 14 (with 01 the designation for the login node).
It is recommended that the user store files in a
user-created sub-directory (/net/katana-aNN/scratch/userID) for
user's file management purposes. Because the nodes in a batch job are
assigned at runtime, they must be derived from the
environment variable, $PE_HOSTFILE, available after the
batch job has started and nodes assigned.
Enclosed below is an example of an MPI batch script
that redirects the output of each processor (or rank) to
an output file in the local node's scratch.
- MPI batch script
#!/bin/csh
#
# Example SGE script for running mpi jobs
#
# Submit job with the command: qsub script
#
# Note: A line of the form "#$ qsub_option" is interpreted
# by qsub as if "qsub_option" was passed to qsub on
# the commandline.
#
# Set the hard runtime (aka wallclock) limit for this job,
# default is 2 hours. Format: -l h_rt=HH:MM:SS
#$ -l h_rt=2:00:00
#
# Invoke the mpi Parallel Environment for N processors.
# There is no default value for N, it must be specified.
#$ -pe mpi 4
# Merge stderr into the stdout file, to reduce clutter.
#$ -j y
## end of qsub options
# The system supports several different implemetations of MPI.
# This variable is used by the mpirun command to set up the proper
# runtime environment for the job. The allowed values are "openmpi"
# (the default), and "mpich"." The runtime setting should
# match the setting in effect when the program was compiled.
setenv MPI_IMPLEMENTATION openmpi
# By default, the script is executed in the directory from which
# it was submitted with qsub. You might want to change directories
# before invoking mpirun...
# cd somewhere
foreach i (`awk '{print $1}' $PE_HOSTFILE`)
mkdir -p /net/$i/scratch/kadin
end
# MPI_PROGRAM is the name of the MPI executable
set MPI_PROGRAM = local_scratch_example
# Run executable with "mpirun -np $NSLOTS $MPI_PROGRAM arg1 arg2 ..."
# where NSLOTS is set by SGE to N as defined in "-pe mpi N"
mpirun -np $NSLOTS $MPI_PROGRAM
- Sample FORTRAN program
On the Katana login node, when you cd to /scratch, by default it goes
to the scratch of the katana login node. Similarly, when addressing /scratch in
a batch job, it points to the scratch of the local node. Here is a
FORTRAN example (C works in similar fashion) that uses this default
convention to write output for each rank to its corresponding compute node.
Program local_scratch_example
implicit none
include "mpif.h"
integer p, total, ierr, master, myid, my_int, dest, tag
character*40 filename
data master, tag, dest/0, 0 ,0/
c**Starts MPI processes ...
call MPI_Init(ierr) ! starts MPI
call MPI_Comm_rank(MPI_COMM_WORLD, myid, ierr) ! get process id
call MPI_Comm_size(MPI_COMM_WORLD, p, ierr) ! get # procs
C**define and open output file for each rank on local node scratch
write(filename,"('/scratch/kadin/myoutput.dat.',i1)")myid
open(unit=11, file=filename,form='formatted',status='unknown')
my_int = myid ! result of local proc
C**write local output to its own output file
write(11,"('Contents of process ',i2,' is ',i8)")myid,my_int
call MPI_Reduce(my_int, total, 1, MPI_INTEGER, MPI_SUM, dest,
& MPI_COMM_WORLD, ierr)
if(myid .eq. master) then
write(11,*)'The sum is =', total ! writes total to master
endif
close(11)
call MPI_Finalize(ierr) ! MPI finish up ...
end
- Access output files on the scratch disks
After the batch
job is completed, a file with a name like batch_script_name.poJobID is generated
in the current drectory. JobID is the ID of the job. This file reports
nodes used for the run. For example,
katana:~ % more batch_script_name.po837
-catch_rsh /usr/local/...pool/katana-a05/active_jobs/837.1/pe_hostfile
katana-a05
katana-a03
katana-a09
katana-a11
With these, you know where to go to find your output files. The first
node listed (katana-a05 in this example) is the job host, which is mapped to
rank 0 in an MPI program.
katana:~ % cd /net/katana-a05/scratch
It is important that you type the full path and not rely on the system to
complete the word with tabbing. The reason is because the
node's scratch may not be mounted at the time. As far as the system is
concerned, "scratch" does not exist. Typing the key "katana-aNN/scratch"
causes the system to automount katana-aNN's scratch.
In the example below, the MPI program's rank 0 and 2 output (which go to
the katana-a05 and katana-a09 scratches, respectively) are examined ...
katana:~ % cd /net/katana-a05/scratch/kadin
katana:~ % ls
myoutput.dat.0
katana:~ % more myoutput.dat.0
Contents of process 0 is 0
The sum is = 6
katana:~ % cd /net/katana-a09/scratch/kadin
katana:~ % ls
myoutput.dat.2
katana:~ % more myoutput.dat.2
Contents of process 2 is 2
With care, the usual ls and rm commands may be used to manage scratch files as with other files ...
katana:~ % ls -tl /net/katana-a*/scratch/kadin/myoutput*
. . .
katana:~ % rm /net/katana-a*/scratch/kadin/myoutput*
. . .
|
|