Running Jobs
Single- and up to 4-processor jobs that require approximately ten minutes or less of cputime may run interactively on
the login node, katana.bu.edu. All other jobs must be
submitted to the batch system for running on the compute nodes.
Depending on the type of application, job running procedures may vary.
Instructions for several important types of serial and multiprocessing jobs,
such as MPI and OpenMP jobs, are demonstrated below.
Batch system technical summary
- Batch system is Sun Grid Engine.
Common commands are:
qsh,
qrsh,
qsub,
qstat, and
qdel.
Caution: Many Sun Grid Engine commands have identical names to PBS commands. However, the input to the
corresponding commands or their behavior may differ.
- Nodes are assigned to jobs at runtime and are not known a priori.
However, the assigned nodes' names are available through the environment
variable $PE_HOSTFILE at run time in a batch script.
- The processors assigned to a batch job may span multiple nodes.
Depending on the user's "parallel environment"
choice, the four processors of each assigned node may not be
exclusively assigned to the same job. Unused processors in each node
may be assigned to other jobs. Consequently, a node can
be shared by multiple jobs which means that the memory within a
node may be shared by multiple jobs using that node. So, if memory is
an important aspect of your job, care should be given to "parallel
environment" (-pe) selections.
- The maximum number of processors a user can request is 16.
- The maximum run time limit is 24 hours (-l h_rt=24:00:00). Default is 2 hours if you do not specify.
- See the Technical Summary section for additional information.
Using the batch system interactively 
You can "checkout" a processor for interactive use. Interactive batch jobs
are limited to 4 processors in the same node.
If x-window display is not required, use qrsh.
Otherwise, use qsh.
-
katana:~ % qrsh [-l h_rt=HH:MM:SS . . .]
The above command gives you a login shell on one of the batch nodes.
The optional argument -- among others -- specifies the run time limit for the shell.
The default (i.e., if you do not specify the run time limit) is 2 hours.
Note that the square brackets ( [ ] ) are to indicate that the enclosed item is optional. Do not type the brackets.
Shown below is an example that requests a 4-hour run time limit:
katana:~ % qrsh -l h_rt=4:00:00
In the qrsh-launched interactive window, a MATLAB session must be
launched with "matlab -nojvm -nodisplay -nosplash". A MATLAB session without
these options will fail to respond to any of the exiting commands
such as "exit", "quit", and "Ctrl c" or "Ctrl d".
-
katana:~ % qrsh -pe omp 4
You can also request multiprocessors (≤ 4 processors) with the "-pe" option shown above. The "omp" selection should always be used even for MPI applications to ensure that processors in the same node will be allocated.
-
katana:~ % qsh [-l h_rt=HH:MM:SS -pe omp N . . .]
If x-window is needed (such as for GUI-based debugger or MATLAB applications), then qsh should be used. Please note the following if
intending to "check out" multiprocessors via qsh
- MPI applications do not work with qsh.
- OpenMP applications work with qsh.
- Should always select the "omp" Parallel Environment (-pe omp N).
Submitting a batch job
Batch jobs are submitted with the qsub command.
The general form of the command is:
katana:~ % qsub [qsub options] command [arg1 ...]
In general, command is a user supplied shell script. Table 1 describes the most important
qsub options.
Table 1. qsub options and their definitions.
| qsub option |
Description |
| -l h_rt=HH:MM:SS | Hard Run Time limit (aka WallClock limit)
default is 2:00:00 (2 hours) |
| -pe   parallel_environment   N | Used to request use of more than 1 processor.
N is the number of processors desired
(2 - 16) and parallel_environment specifies how they
are allocated. See Table 2 below for
parallel_environment choices. |
| -b y | Tells qsub that "command" is a binary
executable rather than a shell script.
(see example 1) |
| -e errorfile | Where stderr of job should go. Defaults
to file called "command.e" in the
current working directory when qsub is run. |
| -o outputfile | Where stdout of job should go. Defaults
to file called "command.o" in the
current working directory when qsub is run. |
| -j y | Causes the error stream to be merged with the
output stream. |
| -m   b|e|a|s|n | Controls when the batch system sends mail to you.
When the job begins (b), ends (e), is aborted (a),
is suspended (s), or never (n). The default is 'e.' |
| -hold_jid   job_list | Setup job dependency list. job_list is a comma
separated list of job ids and/or job names which
must complete before this job can run. |
| -N name | Gives the job a name. Defaults to basename of
"command." |
| -v env_var=value | Set the runtime environment variable env_var to value. |
Note that most qsub
options can be included in the batch script
instead of on the commandline by using a special form of comment: #$ <qsub option>.
An exception to this rule is when -b y is in effect.
See the "Types of Applications" section below for more details.
The -pe (parallel environment) qsub
option is used to request use of more than one processor.
The -pe option takes two arguments: the name of a parallel environment and a number N
specifying the number of processors required by the job. Table 2 lists each of the supported
parallel environments, describing their intended purpose, processor allocation rule, and restrictions
on N.
Several PEs are available for MPI jobs. The only difference between the ones whose purpose is labeled
MPI is the way in which processors are allocated amongst the available nodes. Unless your application has specific
allocation requirements we recommend that you use the "mpi" PE for all single-threaded MPI jobs. The
PEs labeled "multi-threaded MPI" are intended to be used by hybrid OpenMP-MPI applications. In all cases
the second argument N to the -pe qsub option should be the total number of processors required by
the job while the argument passed to the -np mpirun option should be N / threads_per_task.
If your application can run on multiple nodes but doesn't use MPI you will need a specialized PE.
Send mail to help@katana.bu.edu and we'll
create an appropriate PE for you.
Table 2. The parallel-environment options.
| parallel-environment | Purpose | Allocation Rule | Allowed values of N |
| omp |
Shared Memory (OpenMP,pthread,etc.) |
All N requested processors on a single node |
1 - 4 |
| mpi |
MPI |
use fewest number of nodes as possible |
1 - 16 |
| mpi_1_task_per_node |
MPI |
exactly 1 processor per node |
1 - 13 |
| mpi_2_tasks_per_node |
MPI |
exactly 2 processors per node |
2,4,6,8,10,12,14,16 |
| mpi_3_tasks_per_node |
MPI |
exactly 3 processors per node |
3,6,9,12,15 |
| mpi_4_tasks_per_node |
MPI |
exactly 4 processors per node |
4,8,12,16 |
mpi_2_procs_per_task |
multi-threaded MPI |
exactly 2 processors per node to be used by a single mpi task |
2,4,6,8,10,12,14,16 |
mpi_3_procs_per_task |
multi-threaded MPI |
exactly 3 processors per node to be used by a single mpi task |
3,6,9,12,15 |
mpi_4_procs_per_task |
multi-threaded MPI |
exactly 4 processors per node to be used by a single mpi task |
4,8,12,16 |
Types of Applications
Job running procedures vary depending on the type of application.
Instructions for several important types of serial and multiprocessing jobs
are demonstrated below.
- Running a serial program, a.out, on one processor
katana:~ % qsub -b y a.out
No command script is necessary with -b y which expects a binary
executable, like a.out. This job will use the default runtime limit of 2 hours.
To run for say 24 hours, add the -l h_rt=24:00:00 option:
katana:~ % qsub -l h_rt=24:00:00 -b y a.out
- Running a MATLAB application on one processor.
- First, create a batch script, say, mbatch
#!/bin/csh
matlab -nodisplay < myexample.m > myoutput
Don't forget to enable the execute attribute of mbatch!
katana:~ % chmod +x mbatch
- Submit the batch job.
katana:~ % qsub mbatch
- Running 4 separate single-processor programs for 6 hours on the
same compute node
katana:~ % qsub -l h_rt=6:00:00 -pe omp 4 myscript
where myscript is:
#!/bin/sh
prog1 &
prog2 &
prog3 &
prog4 &
wait
An example of this is to run multiple MATLAB tasks in a single job submission.
- Running an OpenMP program, a.out, on 4 processors
The omp Parallel Environment is used to run OpenMP applications. There
are two ways to define the number of processors required by an OpenMP
application. It
can either be compiled into the executable a.out (through the invocation
of the OpenMP library function omp_set_num_threads in the source code) or can be determined at runtime via the
OMP_NUM_THREADS environment variable.
In the first case the job can be submitted with:
katana:~ % qsub -pe omp 4 -b y a.out
In the second case the environment variable setting can be specified with the
qsub's -v option:
katana:~ % qsub -v OMP_NUM_THREADS=4 -pe omp 4 -b y a.out
- Running an MPI program
MPI jobs should be submitted with:
katana:~ % qsub myscript
where myscript is an appropriately customized version of the following batch script:
#!/bin/sh
#
# Example SGE script for running mpi jobs
#
# Submit job with the command: qsub myscript
#
# Note: A line of the form "#$ qsub_option" is interpreted
# by qsub as if "qsub_option" was passed to qsub on
# the commandline.
#
# Set the hard runtime (aka wallclock) limit for this job,
# default is 2 hours. Format: -l h_rt=HH:MM:SS
#
#$ -l h_rt=2:00:00
#
# Merge stderr into the stdout file, to reduce clutter.
#
#$ -j y
#
# Invoke the mpi Parallel Environment for N processors.
# There is no default value for N, it must be specified.
#
#$ -pe mpi 4
#
# end of qsub options
# The system supports multiple implementations of MPI.
# This variable is used by the mpirun command to set up the proper
# runtime environment for the job. The allowed values are "openmpi"
# (the default) and "mpich." The runtime setting should
# match the setting in effect when the program was compiled.
#
export MPI_IMPLEMENTATION=openmpi
# By default, the script is executed in the directory from which
# it was submitted with qsub. You might want to change directories
# before invoking mpirun...
# cd SOMEWHERE
# Invoke mpirun.
# Note: $NSLOTS is set by SGE to the number of processors
# requested by the "-pe mpi N" option.
#
mpirun -np $NSLOTS mpi_program arg1 arg2 ...
In this example, the executable mpi_program must have been
previously compiled with MPI_IMPLEMENTATION set to openmpi (which is the
system default). See the programming section for information about
compiling MPI applications.
You can override the SGE batch resource parameters such as
number of processors and walltime limit, pre-defined in myscript, as follows:
katana:~ % qsub -pe mpi 8 -l h_rt=24:00:00 myscript
- Running a hybrid OpenMP-MPI program
Hybrid OpenMP-MPI jobs should be submitted with:
katana:~ % qsub myscript
where myscript is an appropriately customized version of the following batch script:
#!/bin/sh
#
# Example SGE script for running hybrid OpenMP-MPI jobs
#
# Submit job with the command: qsub myscript
#
# Note: A line of the form "#$ qsub_option" is interpreted
# by qsub as if "qsub_option" was passed to qsub on
# the commandline.
#
# Set the hard runtime (aka wallclock) limit for this job,
# default is 2 hours. Format: -l h_rt=HH:MM:SS
#
#$ -l h_rt=2:00:00
#
# Merge stderr into the stdout file, to reduce clutter.
#
#$ -j y
#
# Invoke the mpi_K_procs_per_task Parallel Environment for N processors.
# Here "K" is 2, 3, or 4, and must be specified.
# There is no default value for N, it must be specified.
#
#$ -pe mpi_4_procs_per_task 12
#
# Specify the number of threads per task, this should match "K" above.
#$ -v OMP_NUM_THREADS=4
#
# end of qsub options
# The system supports multiple implementations of MPI.
# This variable is used by the mpirun command to set up the proper
# runtime environment for the job. The allowed values are "openmpi"
# (the default) and "mpich." The runtime setting should
# match the setting in effect when the program was compiled.
#
export MPI_IMPLEMENTATION=openmpi
# By default, the script is executed in the directory from which
# it was submitted with qsub. You might want to change directories
# before invoking mpirun...
# cd SOMEWHERE
# Invoke mpirun.
# The argument to the -np option should be N / K.
#
mpirun -np 3 openmp-mpi_program arg1 arg2 ...
Batch Job Management Commands
|
|