You are here

Bugaboo

Bugaboo is intended for jobs that require access to large storage systems (terabytes of data, for example).

More generally, the Bugaboo system can be used as a general purpose system appropriate for serial and parallel jobs and jobs that require long walltimes. There are a few nodes available that provide 48000 MB of memory - most nodes have either 16000 MB or 24000 MB.

To login to Bugaboo connect to the head node (login server) bugaboo.westgrid.ca, using an ssh (secure shell) client.

The following give a brief overview of the rules for batch jobs on the Bugaboo facility:

  • The default walltime is 24 hours.
  • The maximum walltime is 122 days (4 months).
  • A user may have running and submitted jobs that request a maximum number of processor-hours (the product of requested processors times walltime) of all jobs together of 120000 hours (5000 days); all additional jobs that exceed this limit will be queued under blocked jobs.
  • The maximum number of jobs that a user may have queued to run is 500. In particular, this means that the maximum size of an array job is 500 as well.
  • A maximum number of 10 jobs (per user) are considered for scheduling at any time. These jobs are listed under eligible jobs in the output of the showq command. There is no limit on the number of running jobs per user other than the number of processor-hours mentioned above. All other jobs will get listed under blocked jobs. These latter jobs will move to the eligible queue when the number of eligible jobs drops below 10.

Two nodes have been set aside for interactive work:

  • debugging of programs
  • testing of programs
  • short runs of programs
  • steered computations
  • and more

Basically all types of work that require interaction with programs and for which waiting in the batch queue is too cumbersome. In order to run on the interactive nodes a job script must be submitted with the -I argument for qsub, e.g.

bach@bugaboo:~> qsub -I jobscript.pbs
qsub: waiting for job 6481270.b0 to start
qsub: job 6481270.b0 ready
bach@b402:~>

The system will read the job requirements (no. of processors, walltime, memory, etc.) from the PBS section of the submission script as it does for batch jobs. However, the system will not process any commands from the body of the submission script, but instead will wait until the resources become available on the interactive nodes (this can take up to 3 minutes) and then open a normal shell on one of the interactive nodes. At this point any interactive command can be entered and executed including the execution of parallel programs using "mpiexec myMPIprog" and the system will use as many processors as have been requested in the job submission script.

Currently interactive jobs are limited to a maximum walltime of 2 hours; the default walltime is 10 minutes. The two interactive nodes are the same as the 12-core nodes of the Bugaboo cluster with 24GB of memory.

All other programs must be submitted to the queuing system using the qsub command (without the -I argument) and a proper submission script.

In addition to the login server, there is another server (file server) bugaboo-fs.westgrid.ca, which should be used for data transfers to or from the Bugaboo cluster. Bugaboo-fs has a better network connection (10GigE interface) to the internet. This also takes load off the login server. The gcp command can be used to efficiently transfer files between Bugaboo and other WestGrid sites.

Specifying Memory Requirements

Similar to requesting a specific number of processors for a job using the -l procs=N syntax, each job also must specify how much memory it requires to run. This can be done in two ways, either by specifying the total amount of memory used by the job (summed over all processes) or by specifying the amount of memory per process. The syntax for the latter method (memory per process) is:

#PBS -l pmem=2000mb

which would request 2000 MB per process, e.g., 20000 MB for a 10 processor job in total. Alternatively, the total amount of memory can be specified with:

#PBS -l mem=8000mb

which would request 8000 MB in total for the job, e.g., 800 MB per process for a 10 processor job. It is possible to use the gb unit to specify memory amounts in Gigabytes (GB). However, this is strongly discouraged: always specify, e.g., 2000mb instead of 2gb: this small difference actually has a huge effect on system efficiency and shorter waiting times of jobs. If neither pmem nor mem are specified the system will assign 256 MB per process (corresponding to a -l pmem=256mb specification). Jobs may get terminated, if they use significantly more memory than specified! The system will send an email to the owner of a job when the job uses more memory than assigned. For that reason it is important that a valid email address is specified using the

#PBS -M email@address.ca

syntax in the job submission script.

The current memory usage of a running job can be determined using the qstat -f <jobid>  command (substitute the actual jobid of the job for ); the memory usage is shown in the output of the command under resources_used.vmem - this is the total amount used for the job, i.e., this must be divided by the number of processors

Bugaboo uses Lustre, a high performance cluster file system to provide storage for /home (containing users' home directories; /home is backed up daily) and /global/scratch (space for files that do not require backup, e.g., files that can be regenerated, temporary storage, etc.; typically files associated with running jobs). Note, that for your convenience there exists a shortcut ~/scratch (and for historical reasons ~/data as well) in the home directory that points to the /global/scratch file space. I.e., files written to the ~/scratch directory tree are actually written to the /global/scratch filesystem and consequently not backed up (the same applies for files written to the ~/data directory tree).

There is additional currently unallocated space available (to a total of 2.4 PB) that can be assigned to either /home or /global/scratch, depending on future usage requirements.

Both /home and /global/scratch are global filesystems, i.e., they are accessible from the headnode and all computenodes. There also exist a local filesystem /scratch on each computenode. For each job running on a particular computenode the system creates a unique directory $TMPDIR in the /scratch filesystem. This can be used by the job for temporary files that the job generates. The system removes the $TMPDIR and all its contents after completion of the job.

On the 8-core nodes (E5430 processors) there are also two 146GB SATA 10000 RPM drives per node for local storage. These are in a RAID 0 (striped) configuration for extra performance. About 248 GB is available to users as /scratch on each compute node. The 12-core nodes (X5650 processors) have two 300GB drives instead with about 450GB of disk space available to users in /scratch.

The personal usage on each of the file systems can be checked with the lfs quota command, e.g.

lfs quota -u $USER /global/scratch

Disk quotas for user inewton (uid 987654):
Filesystem kbytes quota limit grace files quota limit grace
/global/scratch 946289640 1073741824 1181116006 - 524566 1000000 1100000 -

The number under kbytes shows your current usage in the file system in kB. The quota column shows your quota, i.e., the maximum amount of data in kB that you are allowed to store in this file system. The limit column specifies how much you can temporarily exceed your quota. In that case the grace column will tell you how much time you have to reduce your usage to get under your quota again. Under files the number of files you currently store in the file system is displayed. right beside that number the quota for the number of files is shown. The limit and grace fields are interpreted analogously to the limit and grace fields described before, only that they now refer to the number of files.

 

Storage Information on Bugaboo

Directory path Size Quota Command to check quota Purpose Backup Policy
/home 115TB 300GB, 1M files lfs quota -u your_username /home

For files that require backup, e.g., code that you have written yourself.

Daily backup

/global/scratch 570TB 1TB, 1M files lfs quota -u your_username /global/scratch

For files that do not require backup. E.g., files that can be regenerated by rerunning a job, source code downloaded from a website, etc.

No backup

/scratch depends on computenode, see above

for temporary files generated by jobs.

No backup. Files are removed after job completion.

Program Information on Bugaboo

Parallel Programs

MPI programs are run using the mpiexec command, which in general has the form:

mpiexec -n # cmd cmdargs

where # is the number of processes to be used and cmd is the name of the MPI program to be run with cmdargs being the arguments of that program. The head node (bugaboo) can be used for short test runs and for debugging purposes. All long running programs and programs that use a large number of processes must be submitted to the queueing system using qsub - see the Running Jobs web page.Within the "unified environmen" there are two supported ways of requesting processors within a PBS submission script:

1. -l procs=N
2. -l nodes=n:ppn=m

The first method requests N processors for the job. The scheduler will assign the next available N processors to the job. This method results in the smallest waiting time in the queue and is the recommended method, if there are no particular reasons using method 2. The second method requests exactly m processors each on n nodes, i.e., a total of N = m*n processors. Due to the constraint that the scheduler has to find n nodes that have at least m idle processors, this method leads to longer waiting times than method 1. Be aware that -l nodes=n is totally equivalent to -l nodes=n:ppn=1, which in almost all cases is undesirable. DO NOT USE -l nodes=n UNLESS THIS IS ESSENTIAL FOR YOUR JOB. Ask support@westgrid.ca, if in doubt about the best strategy.

Here is a sample job submission script for Bugaboo: 
#!/bin/bash 
#PBS -r n 
#PBS -l walltime=48:00:00 
#PBS -l procs=42 
#PBS -l pmem=1600m 
#PBS -m bea 
#PBS -M johann_bach@nowhere.ca

cd $PBS_O_WORKDIR 
echo "prog started at: `date`" 
mpiexec prog 
echo "prog finished at: `date`"

Compiling and running programs

After the expansion in the summer of 2011 the Bugaboo cluster has nodes with slightly different processors. For that reason it is important when using the Intel compiler for compiling programs NOT to use the -xHost and NOT to use the -fast (which implies -xHost) compilation options. Instead it is recommended to use -O3 -xSSE4.1 -axSSE4.2 options when the highest level of optimization is desired. So compiled programs will run on all Bugaboo nodes.

Bugaboo uses the so-called "unified environment", i.e., you can use the generic compilers cc, c++, f77, f90, mpicc, mpicxx, mpif77, mpif90 to compile programs. By default these generic compilers will use the Intel compilers. However, this default can be changed by loading so-called "modules" - ask support@westgrid.ca for details.

All installed software libraries can be linked at compile time by adding -lname_of_library to the compiler options, e.g., -lblas for linking with the BLAS library and -lfftw3 for linking with the FFTW library, version 3. Some libraries depend on other libraries, e.g., to link with the LAPACK library you also need the BLAS library and to link with the SCALAPACK library -lscalapack -lblacs -llapack -lblas is required. These dependencies are listed on the WestGrid software web page. Under no circumstances is it necessary to specify a path to library directories (-L/directory options) or is it necessary to set the LD_LIBRARY_PATH environment variable. In fact, this is strongly discouraged as this may actually prevent the generation of a working executable.

All programs can be run just by typing the name of the program, i.e., without specifying the full path (directory) of the program - all programs are in your PATH. If you find a program that you cannot run this way, please report this as a bug to support@westgrid.ca. Specifying the full path of a program is strongly discouraged and we do not guarantee that this will continue to work (e.g., the path may change when a new version is installed).