QuickStart Guide to Bugaboo
About this QuickStart Guide
This QuickStart guide gives a brief overview of the WestGrid Bugaboo facility, highlighting some of the features that distinguish it from other WestGrid resources. It is intended to be read by new WestGrid account holders and by current users considering whether to move to the Bugaboo system. For more detailed information about the Bugaboo hardware and performance characteristics, available software, usage policies and how to log in and run jobs, follow the links given below.
Introduction
Bugaboo is a Dell blade cluster with 1280 cores connected with Infiniband, running the Scientific Linux operating system. It is intended for jobs that require access to large storage systems (terabytes of data, for example).
Hardware
Processors
The Bugaboo cluster is comprised of 10 chassis, each containing 16 8-core blades, for a total of 1280 cores. Each blade (compute node) contains two sockets, with each socket containing an Intel Xeon E5430 quad-core processor, running at 2.66 GHz. Each blade has 16 GB of memory that can be shared among the 8 cores on that node.
Interconnect
The compute nodes are connected with Infiniband using a 288-port QLogic switch. The connection between nodes within a chassis is non-blocking, but, there is "2 to 1" blocking for connections that span chassis. That means that the maximum bandwidth for communications between nodes in different chassis is only half that for nodes within the same chassis.
Storage
Bugaboo uses Lustre, a high performance cluster file system to provide storage for /home (containing users' home directories) and /global/scratch (space for temporary storage, typically associated with running jobs). Size, quotas and backup policy for these file systems are as shown in this table:
| File system |
Size |
Quota |
Backup policy |
| /home | 115 TB | 300 GB | Daily backup |
| /global/scratch | 200 TB | 1 TB | No backup |
There is an additional 115 TB that can be assigned to either /home or /global/scratch, depending on future usage requirements.
There are also two 146GB SATA 10000 RPM drives per node for local storage. These are in a RAID 0 (striped) configuration for extra performance. About 248 GB is available to users as /scratch on each compute node.
Software
A list of the installed software on Bugaboo is available on the WestGrid software web page. The usual numerical libraries (e.g., BLAS, LAPACK, SCALAPACK, GSL, FFTW) are available along with several simulation packages like the GROMACS, NAMD, LAMMPS molecular dynamics packages and SIESTA, ABINIT, DIRAC for electronic structure calculations. GCC, Intel and Open64 compilers are available.
There are other software packages installed on Bugaboo that are not listed on the WestGrid software web page, e.g., PETSC, SLATEC, ARPACK, PARPACK, etc. Please write to support@westgrid.ca if there is software that you would like to have installed.
Using Bugaboo
Logging in and using the login server
To login to Bugaboo connect to the head node (login server) bugaboo.westgrid.ca, using an ssh (secure shell) client. For more information about connecting and setting up your environment, see the QuickStart Guide for New Users. The login server is used for such tasks as editing and compiling source code, performing short test runs for debugging and submitting batch jobs.
Compiling and running programs
Bugaboo uses the so-called "unified environment", i.e., you can use the generic compilers cc, c++, f77, f90, mpicc, mpicxx, mpif77, mpif90 to compile programs. By default these generic compilers will use the Intel compilers. However, this default can be changed by loading so-called "modules" - ask support@westgrid.ca for details.
All installed software libraries can be linked at compile time by adding -lname_of_library to the compiler options, e.g., -lblas for linking with the BLAS library and -lfftw3 for linking with the FFTW library, version 3. Some libraries depend on other libraries, e.g., to link with the LAPACK library you also need the BLAS library and to link with the SCALAPACK library -lscalapack -lblacs -llapack -lblas is required. These dependencies are listed on the WestGrid software web page. Under no circumstances is it necessary to specify a path to library directories (-L/directory options) or is it necessary to set the LD_LIBRARY_PATH environment variable. In fact, this is strongly discouraged as this may actually prevent the generation of a working executable.
All programs can be run just by typing the name of the program, i.e., without specifying the full path (directory) of the program - all programs are in your PATH. If you find a program that you cannot run this way, please report this as a bug to support@westgrid.ca. Specifying the full path of a program is strongly discouraged and we do not guarantee that this will continue to work (e.g., the path may change when a new version is installed).
Running parallel programs
MPI programs are run using the mpiexec command, which in general has the form
mpiexec -n # cmd cmdargs
where # is the number of processes to be used and cmd is the name of the MPI program to be run with cmdargs being the arguments of that program.The head node (bugaboo) can be used for short test runs and for debugging purposes. All long running programs and programs that use a large number of processes must be submitted to the queueing system using qsub - see the Running Jobs web page.Within the "unified environmen" there are two supported ways of requesting processors within a PBS submission script:
- -l procs=N
- -l nodes=n:ppn=m
The first method requests N processors for the job. The scheduler will assign the next available N processors to the job. This method results in the smallest waiting time in the queue and is the recommended method, if there are no particular reasons using method 2.
The second method requests exactly m processors each on n nodes, i.e., a total of N = m*n processors. Due to the constraint that the scheduler has to find n nodes that have at least m idle processors, this method leads to longer waiting times than method 1. Be aware that -l nodes=n is totally equivalent to -l nodes=n:ppn=1, which in almost all cases is undesirable. DO NOT USE -l nodes=n UNLESS THIS IS ESSENTIAL FOR YOUR JOB. Ask support@westgrid.ca, if in doubt about the best strategy.
Here is a sample job submission script for Bugaboo:
#!/bin/bash
#PBS -r n
#PBS -l walltime=48:00:00
#PBS -l procs=42
#PBS -l pmem=1600m
#PBS -M johann_bach@nowhere.ca
cd $PBS_O_WORKDIR
echo "prog started at: `date`"
mpiexec prog
echo "prog finished at: `date`"
Batch job policies
The following give a brief overview of the rules for batch jobs on the Bugaboo facility:
- the default walltime is 72 hours;
- the maximum walltime is 4 months;
- a user may have running and submitted jobs that request a maximum number of processor-hours (the product of requested processors times walltime) of all jobs together of 24576 hours (1024 days); all additional jobs that exceed this limit will be queued under blocked jobs.
File transfers
In addition to the login server, there is another server (file server) bugaboo-fs.westgrid.ca, which you use for data transfers to or from the Bugaboo cluster. This takes load off the login server. The gcp command can be used to efficiently transfer files between Bugaboo and other WestGrid sites.
Updated 2010-02-22.