You are here

Programming on WestGrid Systems

Table of Contents

Preliminaries

This page gives an overview of programming on WestGrid systems.

The material presented here does not include tutorials designed to teach you programming in any particular language. The intention is to present basic usage instructions for compilers and the debugging and optimization tools available on the WestGrid systems.

Documentation

Especially if you are new to programming in a UNIX/Linux HPC environment, you might like to review the material presented here and then rely heavily on WestGrid support for questions.

In addition, more advanced programmers may want to refer directly to vendor supplied documentation:

  • For Intel compiler, debugger and mathematical library documentation, start at the Intel Software Documentation Library page and follow a link according to the language, library or tool of interest. Choose the Linux version when there is a choice.
  • For GCC (GNU Compiler Collection) documentation see gcc.gnu.org/onlinedocs/.
  • Portland Group compiler, debugger and profiling tools documentation is available here. The Fortran, C and C++ compilers are treated in a single User's Guide.

For compiler options not presented here, details are available through the UNIX man command: man gfortran, man ifort, man icc, etc.

Where to compile and test

Developing software is a cyclic process involving editing, compilation and testing. Editing and compilation are normally carried out in an interactive session on a login node (except possibly for Bugaboo, where you are more actively encouraged to reserve a compute node for interactive work).  Short tests of your software may also be possible on login nodes if the tests don't require too much memory or too many processors. However, more detailed testing of your code on login nodes may be problematic.  For example, your code may require dedicated processors for accurate timing tests, multiple nodes for testing parallel programs using MPI, a dedicated multi-core node for OpenMP-based parallel programs, or more memory than available on a login server.  If testing your code is likely to have a significant impact on other users if run on a login node, you should use the compute nodes for testing, either through a regular batch job or in an interactive session for which one or more compute nodes have been reserved through the batch system. A number of the WestGrid systems have special provision for short interactive or batch sessions using compute nodes.  See the QuickStart Guide for the system you are proposing to use for details, along with the section on Working Interactively on the WestGrid Running Jobs page.

Languages

The main languages used for scientific programming, Fortran (77 and 90), C and C++ are available on all the WestGrid systems. Other languages, such as Java, or MATLAB can be used, but, are not recommended for performance reasons and may not be as well supported. Scripting, whether using shell commands, or higher level languages such as Perl or Python, is often used to automate pre- or post-processing tasks, but, will not be covered here.

Editors and the X Window System

A number of text editors are available for use in developing programs. Enter editor in the search box on the WestGrid software page for a list.

NEdit may be of particular interest to users coming from a Microsoft Windows PC or Apple Macintosh background, as the keyboard shortcuts are the same as found in word-processing programs on those types of computer. It also has options to number each line and show the column position of the cursor. These are simple, but, helpful features when trying to relate the source code to an error message after a program has crashed.

We do not currently offer an integrated environment, such as Microsoft's Visual Studio, for programming on WestGrid systems. Many users will use simple character-based terminal windows for creating and editing program files. However, there are significant advantages to using a workstation on which an X Window display server program has been installed. Most UNIX and Linux environments will already have this support included. For MacOS X computers, a program called X11 can be downloaded from Apple. For Microsoft Windows PCs, a free alternative is Cygwin-X. Commercial programs, such as X-Win32, WinaXe and Exceed are also available.

When using a workstation that supports X Windows, the emacs and nedit editors support syntax highlighting, with language keywords marked in colour or boldface, for example. Comment lines are also indicated. Exactly which features are highlighted depends on the language being used.

Compilers

Both Intel and GCC compilers are available on all WestGrid clusters. Other compilers, such as from the Portand Group, are available on some systems, but, less commonly used. Our expectation is that in most circumstances Intel compilers will produce faster code, but, feedback to support@westgrid.ca would be appreciated if you experiment with other compilers.

Use these links to see the names and version numbers of the compilers available for the languages shown:

For equivalent lists for other compilers or programming tools available on WestGrid computers, see the programming section of the WestGrid software page. Select the language of interest and then the Software Versions tab.

Note that on some systems where there is more than one version of a given compiler installed, the module command is typically used to set up the environment for a particular compiler.  A general introduction to the module command is given on a separate page.

As mentioned above, one can get detailed information about the options available for a particular compiler, using the man command. For example, for the Intel C++ compiler, icpc, type:

man icpc

Programming tools

Besides the basic editors, compilers, debuggers and performance programming tools, there are a few other programming aids available on the system. Among these are make (streamlines the edit and compile cycle) and version control software (including git, mercurial and subversion). See the programming section of the WestGrid software page for help in locating these utilities.

Hardware considerations

Some WestGrid systems (Bugaboo and Jasper) are comprised of compute nodes with two types of processor. On these systems, some Intel compiler optimizations such as -fast and -Xhost should be avoided in order to ensure that your code runs on both types of processor chip.

It is recommended that you read the QuickStart Guide for each specific system that you use to check for additional programming considerations related to compilers or linking with installed libraries.  Also, it is a good idea to recompile your code when moving to a new site.

Serial Programming

Introduction

Serial code is that which can make use of only a single processor core. Researchers with small memory serial jobs are usually directed to Hermes, Bugaboo, Jasper, or Orcinus.  Serial jobs requiring more than a few GB of memory, up to about 250 GB, can be run on Breezy.  Larger jobs can be run on Hungabee.

Compling Serial Code

In the compilation discussion in the following, there are two examples shown for each language. One example illustrates compiler flags to use when developing new code or debugging. A second example shows optimization options that could be tried for production code. It is advisable to test that the non-optimized and production code give similar numerical results. Sensitivity of the answers to the changes introduced by the use of the optimization flags may be indicative of a problem with the stability of the algorithm you are using.

Note, the examples shown here are for the Intel compilers.

Fortran

Although g77 and gfortran are available, better results are generally expected with Intel Fortran compiler, which is called ifort.

By default, the Intel compiler will interpret your source code as fixed-form or free-form according to the file suffix.  Source code files ending in .f, .for or .ftn are treated as the older fixed-form Fortran style, whereas files with names ending in .f90 are treated as free-form. Source code ending in .F, .FOR, .FTN or .FPP (all fixed-form) or .F90 (free-form) is also accepted, but, will be preprocessed by fpp before compilation.

Example with debugging options (-CB for array bounds checking):

ifort -g -fpe0 -O0 -CB diffuse.f writeppm.f -o diffuse

Note that O0 in the above is the letter "oh" followed by the number "zero".

Example with an optimization option:

ifort -O3 diffuse.f writeppm.f -o diffuse

Note, in the Bugaboo QuickStart Guide it is recommended to use -O3 -xSSSE3 -axSSE4.2,SSE4.1, as a good combination of optimization options, taking into account that there are two different types of hardware making up the Bugaboo computing environment.

C

The C compilers available on most systems are those from Intel (icc) and the GNU Compiler Collection (cc, gcc).  Faster code is expected from icc.

Example with a debugging option:

icc -g pi.c -o pi

Example with an optimization option:

icc -O3 pi.c -o pi

C++

The C++ compilers available on most systems those from Intel (icc, icpc) and the GNU Compiler Collection (g++). Code generated by the Intel compiler is expected to be faster than that from g++, but, you might like to try both.

The Intel compiler accepts C++ source code files ending in .C, .cc, .cp, .cpp, .cxx and .c++ . Files with a .c suffix will be treated as C source code if called as icc or as C++ code if called as icpc.

Example with debugging options:

icpc -g pi.cxx -lm -o pi

Example with an optimization option:

icpc -O3 pi.cxx -lm -o pi

 

Running Serial Code

Interactive Runs

As mentioned in the introduction, the login server on most systems may be used for short interactive runs during program development and porting if the hardware resources are sufficient.  For longer or more demanding runs, the regular production batch queue should be used, as described in section on batch jobs below.

To run a compiled program interactively through an ssh window on the login node just type its name with any required arguments at the UNIX shell prompt. File redirection commands can be added if desired. For example, to run a program named diffuse, with input taken from diffuse.in and output (that normally goes the screen) sent to a file diffuse.out, type:

diffuse < diffuse.in > diffuse.out

Batch Runs

Production runs should be submitted as a batch job script to a TORQUE queue with the qsub command as described on the Running Jobs pages.

For serial jobs, an bare-bones example job script is shown below. Replace the program name, diffuse, with the name of your executable.

#!/bin/bash
#PBS -S /bin/bash

cd $PBS_O_WORKDIR

echo "Current working directory is `pwd`"

echo "Starting run at: `date`"
./diffuse

It is recommended that you record the performance characteristics of your code for a series of test runs so that you can estimate the run time (walltime) of a long job more accurately. Similarly, you will need to know how your program's memory requirements scale as you increase the problem size. This kind of information is used during the batch job submission to ensure that your program is run on a node with appropriate hardware and runtime limits.

Parallel Programming

Introduction

On WestGrid systems there are several ways to shorten the time it takes for a calculation by using two or more processors in parallel. The simplest way, which is often the most efficient, is to submit several independent serial jobs at the same time (perhaps running the same program with different input data sets). This will reduce the total time for a series of calculations, but, will not speed up an individual job.

If the turnaround time for a single job is of paramount importance or if a problem is too big to be run on a single machine, parallel programming techniques are needed to allow multiple processors to work simultaneously on that single job. Among the approaches one can use, the two that are discussed here are those based on message passing, as implemented by MPI (Message Passing Interface) and those based on shared memory using the OpenMP interface.

The preparation and submission of parallel jobs depends on which technique you select and the target platform. If your application is suited to the submission of multiple independent small jobs see the section on serial jobs above. At the other extreme, OpenMP jobs using many processors and many GB of RAM may run successfully only on the Breezy and Hungabee systems. OpenMP jobs are restricted to processors sharing the same memory space. This limits its use to just 12 cores on many WestGrid systems, including Bugaboo, Grex, Hermes, Lattice, Nestor, Orcinus and Parallel clusters. Using MPI is more flexible, as one can run MPI jobs using relatively large numbers of processors at all the WestGrid computational sites, but, generally requires more programming effort.

In practice, the number of processors one can realistically use depends on other factors, such as how fast efficiency drops off as the number of processors is increased, policies in place at the various WestGrid sites, on system load and how long you are willing to have a job sit in an input queue waiting for processors to become available.

There are numerous on-line tutorials for parallel programming, such as listed at the MPICH and OpenMP home sites.

Some details on compiling and running OpenMP and MPI-based parallel programs are given below.

OpenMP

To use OpenMP in a batch mode on many of the WestGrid systems, you need to:

  • modify your source code to add OpenMP directives,
  • compile your code, linking with appropriate flags,
  • prepare a TORQUE (PBS) job submission script that references your program, and
  • submit the TORQUE (PBS) script with qsub, making sure to request just a single node and to control the number of threads used.  Some brief notes are given below.

The details of the above steps depend on the language, compiler and WestGrid site being used.

Compiling OpenMP code

To compile a program containing OpenMP directives with Intel compilers, add a -openmp flag to the compilation.  Here are some generic examples for Fortran, C and C++, respectively:

ifort -openmp -O3 diffuse.f writeppm.f -o diffuse
icc -openmp -O3 pi.c -lm -o pi
icpc -openmp -O3 pi.cxx -lm -o pi

Running OpenMP code

Long tests or production jobs should be submitted to a TORQUE queue with the qsub command as described on the Running Jobs pages.  Options for specifying the number of processors, memory and run time are mentioned there.

For OpenMP jobs, the environment variable OMP_NUM_THREADS should be set to the number of processors assigned to your job by TORQUE when submitting batch jobs with qsub. This is shown in the following script: 

#!/bin/bash
#PBS -S /bin/bash
#PBS -l nodes=1:ppn=2

# Script for running an OpenMP sample program, pi,
# on two processor cores.

cd $PBS_O_WORKDIR

echo "Current working directory is `pwd`"

# Calculate the number of cores requested of the batch system.
# On most WestGrid systems, one can use "CORES=$PBS_NUM_PPN" instead.

CORES=`/bin/awk 'END {print NR}' $PBS_NODEFILE`
echo "Running on $CORES cores."

# Note: The OMP_NUM_THREADS should match the number of cores requested.
export OMP_NUM_THREADS=$CORES

echo "Starting run at: `date`"
./pi

 

Message Passing Interface (MPI)

To use MPI in a batch mode on many of the WestGrid systems, you need to:

  • modify your source code to add calls to MPI routines,
  • compile your code, linking with the MPI libraries,
  • prepare a Torque (PBS) job submission script that references your program, and
  • submit the Torque (PBS) script with qsub.

The details of the above steps depend on the language, compiler and WestGrid site being used.

Compiling MPI code

Fortran-based MPI code is typically compiled using the wrapper script mpif90. For C and C++, you can use mpicc and mpiCC, respectively.

Add debugging or optimization options, as appropriate, similar to what was shown for serial compilation above. 

On most WestGrid systems, these wrappers will call the Intel compilers (ifort, icc or icpc, for Fortran, C or C++, respectively). To check exactly what commands are executed by these scripts, add a -show argument. For example,

mpif90 -show

To compile an MPI Fortran program, diffuse.f, with the default compiler, type:

mpif90 -O3 diffuse.f -o diffuse

Similarly, to compile an MPI C program, pi.c, linking with the standard math library, type;

mpicc -O3 pi.c -lm -o pi

For a C++ program, the command line would look like:

mpiCC -O3 pi.C -lm -o pi

Running MPI code

If your program allows, compare the results with a single processor to those from a two-processor run. Gradually increase the number of processors to see how performance scales. After you have learned the characteristics of your code, please do not run with more processors than can be efficiently used, as the systems are typically very busy.

Long tests or production jobs should be submitted to a TORQUE queue with the qsub command as described on the Running Jobs pages.  Options for specifying the number and distribution of processors, memory and run time are mentioned there.

Here is an example of a script to run an MPI program, pn, using 2 processor cores. If the script file is named pn.pbs, submit the job with qsub pn.pbs

#!/bin/bash
#PBS -S /bin/bash
#PBS -l procs=2

# Script for running a parallel MPI job, pn.
# 2010-01-12 DSP

cd $PBS_O_WORKDIR

echo "Current working directory is `pwd`"

# Calculate the number of cores requested of the batch system.
# On most WestGrid systems, one can use "CORES=$PBS_NP" instead.

CORES=`/bin/awk 'END {print NR}' $PBS_NODEFILE`
echo "Running on $CORES cores."

echo "Starting run at: `date`"
mpiexec ./pn

Note the use of the -l procs=nn resource request in the above example.  This request allows the batch system to spread the job out on multiple nodes. While appropriate on many WestGrid systems, there are others (Breezy, Lattice and Parallel) where the use of procs is specifically discouraged.  As noted in the QuickStart Guides for those systems, instead of -l procs=nn, one should use -l nodes=x:ppn=y, where y is usually chosen to match the number of cores per node (8, 12 or 24, depending on the system).

Note that mpiexec will start the same number of MPI processes as cores assigned to your job, unless you override this with an additional argument.

In the above script, the form "./pn" is used to ensure that the program can be run even if "." (the current directory) is not in your command PATH.

Source code for the pn sample program itself, pn.f, is available here.

Debugging

Introduction

One of the strengths of the WestGrid environment for debugging is the wide variety of compilers and debuggers available. Some users get their code running on one system and then tend to blame the compiler if it doesn't behave the same way on a different platform. However, the compilers get a lot more careful scrutiny than the typical user's code. It is almost always hidden errors such as non-standard usage, uninitialized variables, array bounds problems, type mismatches, or a failure to understand the numerical limitations in the researcher's program, rather than compiler problems that leads to inconsistency in behaviour across systems. So, you are encouraged to try your code on more than one of the WestGrid systems and with more than one type of compiler.

Here are some other general guidelines for code development and debugging.

  • Keep a baseline version of your code, along with both input and output in a separate directory from the development version on which you are currently working.
  • Make changes in relatively small steps.
  • Use diff, xdiff, or xxdiff to compare versions of files.
  • Use IMPLICIT NONE in Fortran.
  • Depending on the language, make liberal use of WRITE, printf or cout statements. Saving debugging data to one or more files may encourage you to keep the output statements in your code, where they may come in handy a little later on. Comment out the debugging statements rather than deleting them.
  • Use visualization tools to look at more than just a small sample of the output.
  • Don't be shy about showing your code to someone else, such as the WestGrid support analysts. Write to support@westgrid.ca .

Regardless of the system being used, adding a -g flag to the compilation is a minimum prerequisite for using a debugger.

See the section above on serial code development for links to specific information about the debuggers available on the various WestGrid systems.

Improving Performance

Introduction

For many systems, one uses a -pg compiler flag to instrument code to collect statistics for later analysis with the gprof utility.

For profiling MPI code, one can use the MPE profiling libraries.

For a discussion of performance profiling for OpenMP programs using the ssrun command on nexus, see the program development tools section of the course notes developed at the University of Alberta. The ssrun and perfex commands for SGI systems are also discussed in more recent workshop notes.


 Updated 2014-01-13 - Completed removal of references to Glacier and Checkers.