Project Disk Space
We are happy to make available additional disk storage space on the Blue Gene, IBM pSeries systems, Katana Cluster, and Linux Cluster computer systems. This disk space is intended to be used primarily by projects which need fast access to large data sets.
Usage
The /project and /project2 file systems are specifically configured to provide high speed access for large data sets. In some ways, this disk space is similar to the /scratch file systems which are mounted on each of the individual machines. However, unlike /scratch, space on these file systems will be allocated explicitly to individual research projects. These file systems are backed up and if you need files restored, please send mail to help@scv.bu.edu with all pertinent details (full pathname location of files, time and date deleted).
Additionally, we have now introduced the /projectnb1 and /projectnb2 file systems which are similar to the above file systems except that they are not backed up. Due to backup constraints, we expect that we will be able to add more not backed up space much more readily than we can add backed up space. As such, applications for large quantities of not backed up space are much more likely to be able to be approved and we highly encourage this if it works for your data. The /projectnb# partitions are ideal for:
1) Data which exists elsewhere and is copied to the SCF disk system for high performance access during computation.
2) Data which can be easily regenerated.
3) Data which is needed for only a short time.
4) Newly generated data which will be copied to another system for storage.
Configuration
The disk space resides on four separate partitions: /project, /project2, /projectnb1 and /projectnb2. As of June, 2003, there is no longer any access speed advantage to using one over the other on any system.
The hardware which is currently being used to implement the /project file systems is a fiber-channel RAID disk array.
Allocation
Each project that wishes to use any of the various /project file systems must apply for a disk allocation. We expect typical allocations to be at least 100MB. Smaller allocations would normally be more appropriate for your home directory. The application form, which must be submitted by the Principal Investigator or Administrative Contact for the project, may be found with the rest of the account management pages at http://scv.bu.edu/accounts/ on our Web site. Projects needing 5GB or less should probably apply for space on either /project or /project2. Projects needing more space than this should seriously think about whether some or all of it does not need to backed up and, if this is the case, request some portion of their allocation on /projectnb1 or /projectnb2.
Unlike other files systems, this space is allocated to projects, not to individual users. However, the principal investigator may specify a limit on the amount of disk space to be used by the individual researchers on the project. For example, if a project with five members has an allocation of 500MB, the principal investigator may choose to limit each member to 100MB. By default, any individual account associated with the project may use the full project allocation.
When an allocation is granted, a subdirectory will be created for the project under the appropriate /project, /project2, /projectnb1, or /projectnb2 directory. This subdirectory will have the same name as the project and will be writable by any member of the project. The structure and access to the files and subdirectories created under the project's directory is entirely at the discretion of the project members. The Unix "group" file permission mechanism can be used to control access for the project (see the man page for "chmod" for more details).
Allocation enforcement
As mentioned above, this disk space is allocated to projects rather than individuals. Unfortunately, the operating system provides no mechanisms to enforce group or project disk allocations and we will be relying heavily on the "honor system" to enforce allocation quotas. This is particularly true as of the change in the system in June 2003 as the Unix quota system will no longer be effective on the new file systems. We will thus have our own system to enforce group (based on the project's allocation) and individual (set by Principal Investigators) quotas. Several times a day, the system will snapshot disk usage and will send email notification to the PI and individual researchers of projects that are over quota. The individuals involved in the project must then reduce their usage as soon as possible or access will be disabled.
Principal Investigators should note that it is your responsibility to ensure that your research associates remain under quota. You will receive email notification when your project or individuals on the project are over their quota.
We sincerely seek your cooperation in keeping within your disk allocation. Since the operating system does not provide us with any gentle enforcement mechanisms, we are forced to resort to rather harsh measures to prevent any individual from unfairly interfering with the use of the system by others. If a project remains over its allocation for more than one week, the project's allocation will automatically be revoked by the system and the project will lose all access to its files on the appropriate /project partition. At that point and only by special arrangement, you will have two weeks to copy your files to tape or archival storage. After two weeks your files will be permanently deleted.
Programming considerations
The /project file systems have been configured to provide fast access to large files. The best performance is obtained by reading or writing large chunks of data at once. A minimum suggested size is 128 KB. In C one should use the Unix read and write system calls with large buffers, avoiding the fread/fwrite family of routines which do an additional layer of buffering. In Fortran one should read and write large arrays to get the best performance. In all cases, using unformatted reads and writes gives the best performance, by as much as a factor of 50.
|