You are here

Programming on the WestGrid Glacier Cluster

Table of Contents

Introduction

Glacier has been retired and is no longer available.

Documentation

This page deals with compilation, debugging and optimization of serial and parallel programs on the WestGrid Glacier Cluster. Especially if you are new to programming in a UNIX/Linux HPC environment, please start at the main WestGrid programming page for a more general introduction. On that page you will also find links to details about programming on other WestGrid machines.

More advanced programmers may want to refer to vendor supplied documentation:

  • For Intel compiler, debugger and mathematical library documentation, start at the Intel Software Development page and follow the link according to the language of interest. Choose the Linux version when there is a choice. Once on the language-specific compiler page, scroll down to the Product Documentation section for Getting Started and User's Guides.
  • For GCC (GNU Compiler Collection) documentation see gcc.gnu.org/onlinedocs/.
  • Portland Group compiler, debugger and profiling tools documentation is available here. The Fortran, C and C++ compilers are treated in a single User's Guide.
  • At the Absoft web site at www.absoft.com there is documentation for their latest compilers. Please be aware, however, that references to new products, such as the Fx2 debugger, are not relevant to the Pro Fortran 8.2 product on the Glacier Cluster.

For compiler options not presented here, details are available through the UNIX man command: man f90, man ifort, man icc, etc.

Hardware Considerations

The main computational nodes of the Glacier Cluster have two 3.0 GHz processors each. As such they are suitable for serial programs and for distributed memory parallel jobs, typically programmed with MPI. OpenMP can be used, but, is limited to the two processors within a node, so, other WestGrid resources, such Breezy are more suitable for OpenMP programs.

Compiler Recommendation

See the programming table on the WestGrid software page for a comparison of the compilers available on the various WestGrid computers. The table also lists the specific version numbers of the compilers on the Glacier Cluster.

There are several Fortran compilers available on Glacier, including Absoft (f90), Intel (ifort) and Portland Group (pgf77, pgf90) products. For Fortran 77 there is also the GNU compiler, g77. For C, there are Intel (icc), Portland Group (pgcc) and GNU (cc, gcc) compilers. Similarly, C++ is supported by Intel (icc,icpc), Portland Group (pgCC) and GNU (g++) products.

One compiler is not universally better than another. If you are linking with mathematical libraries, you may find that more of them have been compiled with the Intel compiler, so, that might be a good first choice. However, if you are able to successfully build your program with more than one compiler, you are encouraged to compare performance of the resulting executables before doing a lot of production runs.

Feedback to support@westgrid.ca would be appreciated if you experiment with the compilers and find that one works significantly better or worse for your code.

 

Compiling Serial Code

Introduction

In the compilation discussion in the following, there are two examples shown for each language. One example illustrates compiler flags to use when developing new code or debugging. A second example shows optimization options that could be tried for production code. It is advisable to test that the non-optimized and production code give similar numerical results. Sensitivity of the answers to the changes introduced by the use of the optimization flags may be indicative of a problem with the stability of the algorithm you are using.

Note, the examples shown here are for the Intel compiler. As this page is developed, options for other compilers may be added.

Fortran

The Intel compiler will accept source code files ending in .f or .f90 as fixed-form or free-form source code files, respectively. Source code ending in .F or .F90 is also accepted, but, will be preprocessed by fpp before compilation.

Example with debugging options (-CB for array bounds checking): ifort -g -fpe0 -O0 -CB -traceback diffuse.f writeppm.f -o diffuse

Note that O0 in the above is the letter "oh" followed by the number "zero".

Example with an optimization option:

ifort -fast diffuse.f writeppm.f -o diffuse

Caution regarding use of -fast in makefiles: The -fast option in the above example is equivalent to -O3, -ipo and -static. The -ipo option calls for interprocedural optimization. This leads to an error if -fast is used to link routines that have been compiled individually with the -c flag (as is often done in makefiles). This problem can be avoided by compiling two or more routines together, or by using -O3 instead of -fast in your makefile.

C

There are several C compilers available on Glacier, including Intel (icc), Portland Group (pgcc) and GNU (cc, gcc).

Example with debugging options:

icc -g pi.c -o pi

Example with an optimization option:

icc -O3 pi.c -o pi

C++

There are several C++ compilers available on Glacier, including Intel (icc, icpc), Portland Group (pgCC) and GNU (g++).

Example with debugging options:

icpc -g pi.cxx -lm -o pi

Example with an optimization option:

icpc -O3 pi.cxx -lm -o pi

 

Running Serial Code

Interactive Runs

The Glacier Cluster head nodes may be used for short interactive runs during program development.

For longer runs the regular production batch queue should be used, as described in the section on batch jobs below.

To run a compiled program interactively through an ssh window on the login node just type its name with any required arguments at the UNIX shell prompt. File redirection commands can be added if desired. For example, to run a program named diffuse, with input taken from diffuse.in and output (that normally go the screen) sent to a file diffuse.out, type:

diffuse < diffuse.in > diffuse.out

Batch Runs

Production runs or long test jobs are submitted to a batch queue, as described elsewhere.

For serial jobs, an example job script is shown below. Replace the program name, diffuse, with the name of your executable.

#!/bin/bash
#PBS -S /bin/bash

# Script for running large memory serial job, diffuse, on glacier
# 2005-07-22 DSP

cd $PBS_O_WORKDIR

echo "Current working directory is `pwd`"

echo "Starting run at: `date`"
./diffuse
echo "Job finished at: `date`"

It is recommended that you record the performance characteristics of your code for a series of test runs so that you can estimate the run time (walltime) of a long job more accurately. Similarly, you will need to know how your program's memory requirements scale as you increase the problem size. This kind of information is used during the batch job submission to ensure that your program is run on a node with appropriate hardware and runtime limits.

 

Parallel Programming

Introduction

The Glacier environment can be used for interactive development of parallel programs by running them on interactive nodes reserved for short debugging jobs (using qsub option -W x="QOS:debug" and requesting at most 2 nodes (4 CPUs) and a walltime of at most 10 minutes).

Basic commands for compiling MPI or OpenMP-based parallel programs are given in the following sections.

Message Passing Interface (MPI)

Compiling

To compile parallel MPI code with the Intel compilers, automatically linking the correct MPI libraries, use the wrapper scripts mpif77, mpif90, mpicc or mpiCC, according to the language. Add additional compiler options for debugging or optimization as given in the preceding section on serial code.

Some examples for Intel compiler:

export PATH=/global/software/mpich-1.2.7/ssh/bin:$PATH
mpif90 -fast diffuse.f writeppm.f -o diffuse
mpicc -O3 pi.c -lm -o pi
mpiCC -O3 pi.cxx -lm -o pi

Be careful to use the correct version of the MPI wrapper scripts. If necessary, change your PATH variable so that the MPICH directory appears first, such as shown in the above example (for a bash shell environment). Alternatively, use the full path, such as /global/software/mpich-1.2.7/ssh/bin/mpif90 instead of just mpif90. You can verify which compiler and libraries are used by the wrapper scripts, by using the -show flag. For example:

mpicc -show

To compile parallel MPI code with the Portland Group compilers, automatically linking the correct MPI libraries, add -Mmpi to the command line, adding additional compiler options for debugging or optimization as one would for serial code.

Some examples for the Portland Group compilers:

pgf90 -fast -Mmpi diffuse.f writeppm.f -o diffuse
pgcc -Mmpi -fast pi.c -lm -o pi
pgCC -Mmpi -fast -Minline=levels:10 pi.cxx -lm -o pi

Running

If your program allows, compare the results with a single processor to those from a two-processor run. Gradually increase the number of processors to see how performance scales. After you have learned the characteristics of your code, please do not run with more processors than can be efficiently used, as the system is typically very busy.

MPI jobs are run by submitting a script to the TORQUE batch job handling system with the qsub command.

Here is an example script to run an MPI program, pn, on the Glacier Cluster. If the script file is named pn.pbs, submit the job with qsub -l nodes=4:ppn=2 pn.pbs, for example, to request that 8 processors be used (4 separate nodes with 2 processors on each node).

Note that the -mpich-p4-no-shmem flag is required on the mpiexec command line whenever ppn=2 is used, but, will also work if ppn is not specified or ppn=1.

#!/bin/bash
#PBS -S /bin/bash

# Torque script for running MPI sample program pn on glacier
# 2009-02-21 DSP

MPIEXEC="/global/software/bin/mpiexec"

cd $PBS_O_WORKDIR
echo "Current working directory is `pwd`"

echo "Node file: $PBS_NODEFILE :"
echo "---------------------"
cat $PBS_NODEFILE
echo "---------------------"
PBS_NP=`/bin/awk 'END {print NR}' $PBS_NODEFILE`
echo "Running on $PBS_NP processors."

echo "Starting run at: `date`"
${MPIEXEC} -mpich-p4-no-shmem -np $PBS_NP ./pn
echo "Job finished at: `date`"

The form "./pn" is used to ensure that the program can be run even if "." (the current directory) is not in your PATH.

Source code for the pn program itself is pn.f.

OpenMP

Compiling

To use the Intel compilers with a program containing OpenMP directives, add a -openmp flag to the compilation.

Some examples:

ifort -openmp -fast diffuse.f writeppm.f -o diffuse
icc -openmp -O3 pi.c -lm -o pi
icpc -openmp -O3 pi.cxx -lm -o pi

Running

See the documentation on job submission for details on queues and the syntax for requesting nodes.

For OpenMP jobs, the environment variable OMP_NUM_THREADS should be set to the number of processors assigned to your job by TORQUE when submitting batch jobs with qsub. This is shown in the following script:

#!/bin/bash
#PBS -S /bin/bash
#PBS -l nodes=1:ppn=2

# Script for running OpenMP sample program pi on 2 processors on glacier
# 2005-08-03 DSP

cd $PBS_O_WORKDIR

echo "Current working directory is `pwd`"

# Note: The OMP_NUM_THREADS should match the number of processors requested.

echo "Node file: $PBS_NODEFILE :"
echo "---------------------"
cat $PBS_NODEFILE
echo "---------------------"
PBS_NP=`/bin/awk 'END {print NR}' $PBS_NODEFILE`
echo "Running on $PBS_NP processors."

export OMP_NUM_THREADS=$PBS_NP

echo "Starting run at: `date`"
./pi
echo "Job finished at: `date`"

 

Debugging

Introduction

The Intel idb and GNU gdb debuggers are available on the Glacier Cluster for use from character-based terminals. A graphical front end, ddd, can be used with gdb.

While debugging, it may be convenient to submit jobs to the nodes reserved for short debugging jobs by using qsub option -W x="QOS:debug" and requesting at most 2 nodes (4 CPUs) and a walltime of at most 10 minutes.

Regardless of the debugger being used, start by adding a -g flag to the compilation.

When running code compiled with version 8 of the Intel Fortran compiler, to obtain more informative messages in the case of a run-time error, set the environment variable NLSPATH to define the location of the compiler's error message catalog. For example, if using the bash shell:

export NLSPATH="/global/software/intel/fortran-8.0/lib/ifcore_msg.cat"

This setting is not necessary when using version 9 of the Intel compiler (now the default), but, codes compiled with version 8 will need to be recompiled to avoid having to set NLSPATH.

Please write to support@westgrid.ca for help with debugging.

Linking with Installed Libraries

Introduction

See the Mathematical Libraries and Applications section of the WestGrid Software page for a description of some of the optimized linear algebra and Fourier transform libraries that can be linked with your code.

C++ Libraries

Boost, an eclectic collection of C++ libraries, is available in /global/software/boost-1.31.0 .

Improving Performance

Introduction

We encourage you to have your code reviewed by a WestGrid analyst. Please write to support@westgrid.ca .