SCF User Information
Table of contents
Overview
The Boston University Scientific Computing Facilities (SCF) consist of a collection of high-performance computers, high-speed networks, and advanced visualization facilities. These facilities are managed by the Scientific Computing and Visualization (SCV) group of the Office of Information Technology (OIT) in collaboration with the Center for Computational Science (CCS).
The SCF facilities currently include an IBM BlueGene system, an IBM pSeries 655, an IBM BladeCenter Katana Cluster, an Intel Pentium III Linux Cluster (Cluster), and our virtual reality/scientific visualization facilities. Your SCF login name and password allows access automatically to most of these facilities. Consulting is available through SCV Staff on the use of all these facilities.
General conditions of use
Your use of these machines is governed by Boston University's Conditions of Use and Policy on Computing Ethics. This document is available on the Web at http://www.bu.edu/policies/computing/ethics.html and from the Office of Information Technology, 111 Cummington Street, Boston, MA 02215. Please carefully read this document now. By using your account on the SCF and other University computers, you are agreeing to the terms and conditions set forth there within, as well as the usage policies described below.
News and announcements
We will periodically make important announcements regarding usage policies, software and hardware upgrades, downtime, etc. It is important that you read these messages on a regular basis and we provide several methods for you to do so.
All messages are posted to the system message board and to the B.U. mailing list scfug-l. The system message board can be viewed using the program msgs. By default this command will be included in your .login startup file. If you modify this file, we suggest that you continue to include this command.
The users group mailing list, scfug-l, is used for general discussions regarding the Scientific Computing Facilities, as well distributing important announcements. You may have the posting to the scfug-l delivered directly to you by Email by sending a one line message of the form below to majordomo@bu.edu. In the message, please specify your preferred Email address.
subscribe scfug-l your_email_address
Messages sent to the scfug-l mailing list are cross posted to the newsgroup bu.mail.scfug-l and to the CoCo bulletin board on our Web site at http://scv.bu.edu/CoCo/SCFUG/. You may read these using your preferred Web browser or news reader.
We will also regularly send individual messages to you regarding your account status and usage. These will be sent via Email to your SCF account. If you do not regularly read Email on this system, you should have your mail forwarded to a machine where you do. This may be done by creating a .forward file in your home directory containing the Email address to which your mail should be redirected.
Getting information and help
We rely heavily on the World-Wide Web to provide information about our facilities. Our home page is at URL http://scv.bu.edu/. Other SCV Web documents can be found by following links from our home page. Of particular interest will be the following documents:
If you are experiencing system problems, please send Email to "help" on the system on which you are experiencing the problem. If that is not possible, please send Email to help@scv.bu.edu.
For more information or help in using or porting applications to the IBM systems, contact Doug Sondak (sondak@bu.edu) or Kadin Tseng (kadin@bu.edu).
If you have questions regarding your computer account or resource allocations, please send Email to scfacct@bu.edu.
Allocations and Accounting
We account for all usage (batch and interactive) by all of our users on our large systems. It is the responsibility of the Principal Investigator to monitor his/her project's usage and to request an appropriate allocation of processor time, expressed in "service units" (SUs). Information on accounts and allocations, as well as forms to request resources may be found on the Account Management Web pages at http://scv.bu.edu/accounts/.
Allocations
All projects on the system have an annual (or less if the project duration is less than one year) allocation, measured in SUs. The default allocation is currently 590 SUs. On October 1, 2003 we renormalized our SUs so that 1 SU corresponded to 1 processor hour on the now retired IBM p690. On the IBM p655 1.1 GHz machines, you will be charged 0.85 SUs for use of 1 processor hour. Processor hours on the Katana Cluster 2.6 GHz AMD Opteron 2218HE blades are also charged at the rate of 1.0 SUs per hour, while the slightly slower 2.4 GHz AMD Opteron 2216HE blades are charged 0.9 SUs per CPU hour. On the faster 3.0 GHz Intel Xeon E5450 blades in the Cluster, the charge rate is 1.5 SUs per hour. Each processor hour used on the older Linux cluster is charged 0.3 SUs - given the processor speeds, the two Linux clusters are the most efficient use of your SUs as long as they meet your needs. Each processor hour (counted by wall clock time) used on the IBM BlueGene is charged 0.25 SUs - this system will generally only make sense to use if your code scales well to at least 256 processors. Principal investigators can request a larger allocation during their annual project renewal or at other times by submitting a Request for Additional Processor Allocation, which can be found by following the pointers on the project management Web pages. Large allocation requests are reviewed by the SCF Allocation Committee and generally require two to four weeks for a decision.
Reporting
Each month we send principal investigators a summary of usage and remaining allocations for their projects; this report also gives /project file systems allocation and usage information for those projects with allocations on any of the /project file systems. Individual researchers are sent a summary of their own usage for all the projects with which they are associated. Individuals may also review the details of their recent usage, /project file systems disk usage, and monthly summary information using the password protected Web pages which may be found under the project management Web pages.
In addition to Emailed reports and individual usage Web pages, we have developed a utility called "acctool" to help you keep track of your CPU usage. Type "acctool -help" on any of the machines to get more information.
Project accounting
All usage accounting is based on projects. For most researchers, those who only belong to one project, the fact that the accounting is project-based will be inconsequential. However, researchers who are associated with multiple projects must pay special attention to assure that their usage is properly attributed to the correct project. The procedures for doing this are described below.
Each account on the system has been assigned a default project. This is the project which will be charged if no further actions are taken. Projects are implemented as UNIX groups on all systems. The command "groups" shows all of your projects. The first one listed is your default project. To change your default project, please visit http://scv.bu.edu/accounts/ and follow the link which begins "Individuals can see" and then complete and submit the appropriate Web form. Your default project will then be changed the next time the system configuration files are updated, generally overnight. To immediately, but temporarily change your current project, you may use the "newgrp" command on any of our systems. This command will start a new Unix shell associated with the project you specify. All interactive commands issued from this shell will be accounted to the new project. Batch jobs will be accounted to your default project unless you use a batch system specific method to override this behavior (the way to do this on the pSeries is by using the -P project_name command line argument to bsub and on the Linux Cluster, the option to qsub is -W group_list=project_name. See the documentation for the various batch systems for details.
Configuration
The Boston University Blue Gene is a single rack system, containing 1024 compute nodes. Each compute node contains a dual core 32-bit 700Mhz PowerPC 440 processor with 512MB of main memory. Our Blue Gene has a peak performance of 5.7 Teraflops. The login machines are levi.bu.edu and lee.bu.edu.
The IBM pSeries 655 is a 72-processor system composed of six nodes, named Twister, Scrabble, Marbles, Crayon, Litebrite, Hotwheels, Jacks, Playdoh, and Slinky. Users with accounts on the IBM pSeries systems must use ssh to log in to twister.bu.edu. Passwords are shared over the Scientific Computing Facilities so if you already have an account and password on others of our systems, you will have the same login and password on this system.
In December, 2007 we made available the Katana Cluster. The Katana cluster is made up of machines of a number of different configurations - please consult the Cluster web page for details. The Katana Cluster runs the BULinux 5.0 operating system. The login machine via ssh is katana.bu.edu.
The Intel Pentium III Linux Cluster consists of 30 Intel Pentium III compute nodes, and each node has 2 shared-memory processors for a total of 60 processors. The machine to log in to via ssh is cootie.bu.edu.
Please see our Technical Summary Web page for more information on the configurations of all the machines in the SCF.
File Systems
You have one home directory on the IBM p655 systems and a separate shared one for the Katana Cluster, Linux Cluster, and Blue Gene. All home directories are backed up nightly.
If you accidentally remove a file, you should request that it be restored by sending email to help@scv.bu.edu. Please specify exactly what files you deleted, what machine and file system those files were on, and at what time you deleted them.
The /scratch file systems are available for people who need a large amount of storage for a short period of time. Files are automatically purged after 10 days. If there is a critical shortage of scratch space, it may be necessary to purge files which are less than 10 days old. Files which have been "touched" but not modified will be treated as old and removed immediately. The scratch partitions are NOT BACKED UP.
If you are creating files in /scratch using the tar utility, please see our Frequently Asked Questions Web page for additional information.
Each machine has its own /scratch partition, but these partitions can be accessed from all the machines in the same cluster by using the full path. On the pSeries machines, those paths are of the form: /hostname/scratch; for example /frisbee/scratch. On the Linux Cluster, the path is of the form /net/nodexxx/scratch, where xxx is the node number. Also, note on the Linux Cluster that Skate and Cootie are exceptions and their /scratch partitions can NOT be accessed from other machines in the cluster.
Each machine also has its own /tmp and /var/tmp file systems. These file systems are used to store temporary files created by system programs such as compilers and editors. Users should never store files in these directories. They are not backed up. Old files are removed nightly and whenever the sysadmins feel that it is necessary.
Principal investigators may also request more permanent disk space on the /project and/or /projectnb file systems for their projects. This disk space is similar to /scratch, but is allocated specifically to individual projects and is not automatically purged. The difference between the two file systems is that /project is backed up nightly while /projectnb is not backed up at all. More information can be found on our Project Disk Space (http://scv.bu.edu/computation/storage/proj-diskspace.html) Web page.
We also encourage users with large amounts of data that does not need to be online at all times to use our mass storage facility to archive their data. This system has a very large amount of space available.
Usage policies and batch
Certain machines in each cluster have been designated for a particular set of functions. The following machines are available for interactive work: twister.bu.edu on the IBM pSeries systems, katana.bu.edu on the Katana Cluster, skate.bu.edu and cootie.bu.edu on the Linux Cluster, and levi.bu.edu and lee.bu.edu on the IBM Blue Gene. General interactive login sessions are allowed only on these machines.
IBM pSeries Batch System:
The batch system on the IBM pSeries machines is the Load Sharing Facility (LSF) software. There are a number of different batch queues for various types of jobs. These include "short" and "long" queues for jobs of different running times and separate queues for 1, 4, and 8 processor jobs. In general, the short queues will run with a higher priority than the long queues. It is very important that you submit your job to the appropriate batch queue and you should always specify a queue name when submitting a job, e.g., bsub -q p4-short progname.
Please look at our Technical Summary (http://scv.bu.edu/computation/tech-summary.html) Web page for more information about the queue structure.
All long running jobs must be submitted through the batch system. A system process monitors the processor consumption of all running jobs and will automatically terminate any job which is not running under the batch system and uses more than 10 minutes of processor time.
There is a very nice X-Windows interface to the batch system, using the command "xlsbatch". You may also use traditional Unix-style commands. To submit a job, use the command "bsub". To see the jobs that are queued, use "bjobs". To see the queue parameters use the command "bqueues". You may remove or kill a jobs with the command "bkill". For more information on the batch system, see LSF Basics.
Katana Cluster Batch System:
The batch system on the Katana Cluster is the Sun Grid Engine.
Jobs on the Katana Cluster are limited to a maximum of 16 processors and a wall clock time limit of 24 hours (but note that the default limit is 2 hrs, to run for longer you must request that).
Linux Cluster Batch System:
The batch system on the Linux Cluster is the open source version of the Portable Batch System (PBS). Unlike LSF, this system uses no queues and is instead based on resource requests. The command to submit batch jobs to PBS is qsub.
Jobs on the Linux Cluster are limited to a maximum of 16 nodes and a wall clock time limit of 24 hours.
Blue Gene Batch System:
The batch system used on the Blue Gene is IBM's LoadLeveler. The current limitation is that all jobs must use a partition of exactly 32, 128, 512 or 1024 (the entire machine) nodes and no job may run for more than 5 hours of wall-clock time. 1024-node jobs are only allowed to run in off-hours.
Software
Information regarding the software available on the systems can be found on our "Software Packages" Web page (http://scv.bu.edu/documentation/software-help/).
Note that for some software packages, you may need to add specific directories to your execution path or correctly set particular environment variables in order to use them. This is usually done by modifying your .login or .cshrc file. Please refer to the documentation on the Web page referenced above for the specific details.
|