Scientific Computing & Visualization
Help Contact
About Accounts Computation Visualization Documentation Services

LSF (Load Sharing Facility) Basics

Description

LSF is the batch system used on the IBM p690 and IBM p655. LSF may be run via the command line or through a graphical user interface (GUI). For details on the command-line version see the lsfbatch man page, and for the GUI version see the xlsbatch man page.
Highlights: Highly configurable, X-Windows interface.

Availability and Setup

LSF is available on the IBM p690 (kite.bu.edu, frisbee.bu.edu, pogo.bu.edu, and domino.bu.edu) and the IBM p655 (twister.bu.edu, scrabble.bu.edu, marbles.bu.edu, crayon.bu.edu, litebrite.bu.edu, hotwheels.bu.edu, jacks.bu.edu, playdoh.bu.edu, and slinky.bu.edu).

The batch system is highly configurable and continues to be tuned. There are limits to what the system will allow us to do in terms of configuration (for example, it is not possible to move a job that has been started on one machine onto another machine in the middle of execution). Currently, the overall goals behind the configuration are

  1. Never oversubscribe the processors.
  2. Minimize wait times.

Please also read the Usage policies and batch section of the document SCF User Information for a description of the batch system on the IBM p690 and IBM p655. A table listing all of the available batch queues is online at http://scv.bu.edu/SCV/scf-techsumm.html#QUEUES.

Using LSF

Jobs which take less than 10 minutes of CPU time may be run interactively on all of the systems. In most cases, a process reaper will kill jobs which exceed this limit. The exception to this rule is when jobs run for more than 10 minutes, but are utilizing only 25% of a single processor. This exception allows users to keep jobs running interactively or in the background when the processes don't require much CPU time (e.g., emacs, xbiff).

Jobs which require more than 10 minutes of CPU time must be submitted through the batch system using LSF. There are several ways to submit a batch job. One method is to write a short script containing your run command. Make sure that you set the execute bit for this script (see the man page for chmod if you don't know how to do this). A sample script for a single-processor job is shown below:

    #!/bin/csh -f    progname < infile > outfile  exit  

The progname < infile > outfile line represents the command used to run your code.

For a multiprocessing job using OpenMP the number of processors is specified with the OMP_NUM_THREADS environment variable:

    #!/bin/csh -f    setenv OMP_NUM_THREADS N    progname < infile > outfile  exit  
where N is the number of processors required.

And for a multiprocessing job using MPI the number of processors is specified with a poe command line option to poe:

    #!/bin/csh -f    poe progname < infile > outfile -procs N  exit  
To run these scripts under LSF, use the bsub command:
    bsub -q queuename scriptname
It is important that you submit your job to the right queue. Each queue is intended to be used by jobs of a specific size (number of processors) and duration (wall clock limit). See the queue summary for a description of the queues. Alternatively, you can use the bqueues command:
    bqueues [-l] [queuename]
to get queue descriptions as well as current utilization.

Users on multiple projects can control which project their job is accounted to by using the -P flag to bsub:

    bsub -P project_name
The bjobs command will show you the status of all of your pending and running jobs. To show the status of all of your jobs in a particular queue, run:
    bjobs -q queuename
To show the status of all jobs (including those of users other than yourself), run:
    bjobs -u all 

After your job has finished, LSF will send you email telling you whether or not the job has completed successfully and report the exit code if it failed. See the SCF FAQ for information about the meaning of the exit codes. The message also contains a summary of system resources used by the job.

The Motif tool xlsmon is available for detailed monitoring of loads and jobs on the IBM p690 and IBM p655.

Additional Help/Documentation

LSF is produced by the Platform Computing Corporation and additional materials on it are available at their WWW site.

If you have questions about using the batch system, please send them to help@twister.bu.edu or if you think they would be of general interest to the SCF community, send them to the scfug-l@bu.edu mailing list/newsgroup.


Document Name: lsf
Author/Maintainer: Aaron D. Fuegi (aarondf@bu.edu)
Executable: /usr/local/bin/bsub, /usr/local/bin/xlsbatch, /usr/local/bin/xlsmon
Keywords: load, sharing, batch
Machines List: IBM p690, IBM p655
Related Man Pages: lsf, lsfbatch, xlsbatch, xlsmon
Created April 4, 1996; Last Revised August 29, 2004; Last Modified 11:25 23-Jun-06
URL of this document: http://scv.bu.edu/documentation/software-help/batchsystem/lsf.html
Go up to Software Help Pages
Boston University
Boston University
 
OIT | CCS | September 21, 2007  
Scientific Computing & Visualization Boston University home page Boston University home page