Programming on the WestGrid Nexus System
Table of Contents
- Introduction - Scope of documentation and links to other programming references.
- Compiling Serial Code - Basic compilation instructions.
- Running Serial Code - Running interactive and batch jobs.
- Parallel Programming - Programming with MPI and OpenMP.
- Debugging - Overview of debuggers available on the Nexus system.
- Linking with Installed Libraries - Using optimized linear algebra, FFT or other libraries.
- Optimization - Tips and tools for improving performance of your code.
Introduction
Documentation
This page deals with compilation, debugging and optimization of serial and parallel programs on the WestGrid Nexus system, including the arcturus, helios, australis, borealis and aurora machines. Especially if you are new to programming in a UNIX/Linux HPC environment, please start at the main WestGrid programming page for a more general introduction. On that page you will also find links to details about programming on other WestGrid machines.
More advanced programmers may want to refer to vendor supplied documentation:
-
For SGI MIPS compiler and debugger documentation, start with the MIPS Compiling and Performance Tuning Guide.
References for specific languages are the MIPSpro C++ Programmer's Guide and Fortran 77 Programmer's Guide.
For debugging help see the ProDev WorkShop: Debugger User's Guide and dbx User's Guide.
Tools and advice on improving performance are described in the Origin 2000 and Onyx2 Performance Tuning and Optimization Guide and the SpeedShop User's Guide.
- For GCC (GNU Compiler Collection) documentation see gcc.gnu.org/onlinedocs/.
For compiler options not presented here, details are available through the UNIX man command: man cc, man CC, man g++, man f77, man f90, etc.
Hardware Considerations
The Nexus system consists of an 8-processor login node (nexus), a 256-processor node known as arcturus, as well as some older machines: helios (32 cpus), australis (60 cpus), borealis (62 cpus) and aurora (32 cpus). The system is intended for parallel jobs or large-memory serial jobs. OpenMP or MPI parallel programming techniques may be used.
Compiler Recommendation
See the programming table on the WestGrid software page for a comparison of the compilers available on the various WestGrid computers. The table also lists the specific version numbers of the compilers on the Nexus system.
Both SGI and GCC compilers are available on the Nexus system. Our expectation is that the SGI MIPS compilers will produce faster code, but, feedback to support@westgrid.ca would be appreciated if you experiment with both compilers.
Note the use of the -64 flag in the examples with the SGI MIPS compilers below. This requests that a 64-bit object format will be used.
When using the GCC compilers, recommended options are -mabi=64 -O3, with -mabi=64 being required to generate 64-bit executables.
Compiling Serial Code
Introduction
In the compilation discussion in the following, there are two examples shown for each language. One example illustrates compiler flags to use when developing new code or debugging. A second example shows optimization options that should be used for production code. It is advisable to test that the non-optimized and production code give similar numerical results. Sensitivity of the answers to the changes introduced by the use of the optimization flags may be indicative of a problem with the stability of the algorithm you are using.
Fortran
Although g77 is available, better results are generally expected with the SGI MIPS Fortran compiler, which can be called as f77 or f90 depending on the desired default language level.
There is also an f95 command, which is the NAGWare Fortran 95 compiler (licensed for U of A researchers only or for those external users who have an equivalent site license at their home institution - check with WestGrid support with questions about eligibility).
Example with debugging options: (See man debug_group on nexus for an explanation of the options.)
Example with an optimization option:
C
Although gcc is available, but, better results are expected with the SGI MIPS compiler (cc).
C language files are expected to have a .c suffix.
Example with debugging options:
Example with an optimization option:
C++
Although g++ is available, better results are expected with the SGI MIPS compiler (CC)
The CC compiler accepts C++ source code files ending in .c, .C, .i, .c++, .C++, .cc, .cxx, .CXX, .CC, .cpp, and .CPP.
Example with a debugging options:
Example with an optimization option:
Running Serial Code
Interactive Runs
The nexus login machine may be used for short interactive runs (less than an hour, using at most two 2 CPUs) during program development.
For longer runs the regular production batch queue should be used, as described in the section on batch jobs below.
To run a compiled program interactively through an ssh window on the login node just type its name with any required arguments at the UNIX shell prompt. File redirection commands can be added if desired. For example, to run a program named diffuse, with input taken from diffuse.in and output (that normally go the screen) sent to a file diffuse.out, type:
Batch Runs
Production runs or long test jobs are submitted to a batch queue, as described elsewhere.
For serial jobs requiring a large amount of memory, a number of processors corresponding to the memory required should be requested with the ncpus resource parameter. The required number of CPUs per GB of RAM depends on the machine. See the description of job queue limits for a table showing the hardware limits of the various machines. For example, helios has 16 GB of RAM, shared by 32 processors (0.5 GB/processor), whereas arcturus has 256 GB of RAM, shared by 256 processors (1 GB/processor). So, if a program, diffuse, requiring 16 GB of RAM is run on arcturus, one should ask for 16 processors, but, if run on helios, all 32 processors are required. A job script for submitting the job to helios should be similar to the following:
#PBS -S /usr/bin/bash
#PBS -q helios
#PBS -l ncpus=32
# Script for running large memory serial job, diffuse, on helios
cd $PBS_O_WORKDIR
echo "Current working directory is `pwd`"
echo "Starting run at: `date`"
./diffuse
echo "Job finished at: `date`"
It is recommended that you record the performance characteristics of your code for a series of test runs so that you can estimate the run time (walltime) of a long job more accurately. Similarly, you will need to know how your program's memory requirements scale as you increase the problem size. This kind of information is used during the batch job submission to ensure that your program is run on a node with appropriate hardware and runtime limits.
Parallel Programming
Introduction
OpenMP or MPI programs can be run on the shared-memory computers available through nexus.westgrid.ca on up to 256 processors, using up to 256 GB of RAM. The number of processors on each computer, memory limits and the names of the queues to which jobs can be submitted are summarized in a table elsewhere. Local policy dictates a minimum number of processors allowed for each queue. Jobs that do not scale well to large numbers of processors should not be run on the larger machines. However, what is considered reasonable will depend on system load, for example. So please feel free to write to support@westgrid.ca to discuss this.
The Nexus environment can be used for interactive development of parallel programs by running them directly on the login machine, nexus.westgrid.ca . Longer runs can be submitted to job queues for nexus or the other SGI computers.
Basic commands for compiling MPI or OpenMP-based parallel programs are given in the following sections.
Message Passing Interface (MPI)
Compiling
See the serial code section for examples of compiler options for development and production code. For parallel MPI code just add a flag, -lmpi.
Some examples:
cc -O3 -64 pi.c -lm -lmpi -o pi
CC -O3 -64 pi.C -lm -lmpi -lmpi++ -o pi
Running
If your program allows, compare the results with a single processor to those from a two-processor run. Gradually increase the number of processors to see how performance scales. After you have learned the characteristics of your code, please do not run with more processors than can be efficiently used, as the system is typically very busy.
MPI jobs are run by submitting a script to the TORQUE batch job handling system with the qsub command. Here is an example of a script to run an MPI program, pn, using 2 processors on nexus. If the script file is named pn.pbs, submit the job with:
The script itself is:
#PBS -S /usr/bin/bash
$PBS -q nexus
#PBS -l ncpus=2
# Script for running MPI sample program pn on nexus
cd $PBS_O_WORKDIR
echo "Current working directory is `pwd`"
# Note: NCPUS is a variable set by TORQUE to match the ncpus request above.
echo "Starting run at: `date`"
mpirun -np $NCPUS ./pn
echo "Job finished at: `date`"
The form "./pn" is used to ensure that the program can be run even if "." (the current directory) is not in your PATH.
Source code for the pn program itself is pn.f.
OpenMP
Compiling
To compile a program containing OpenMP directives, add a -mp flag to the compilation.
Some examples:
cc -mp -O3 -64 pi.c -lm -o pi
CC -mp -O3 -64 pi.C -lm -o pi
Running
See the documentation on job submission for details on queues and the syntax for requesting nodes.
For OpenMP jobs, the environment variable OMP_NUM_THREADS should be set to the number of processors assigned to your job by TORQUE when submitting batch jobs with qsub. This is shown in the following script:
#PBS -S /usr/bin/bash
#PBS -q nexus
#PBS -l ncpus=2
# Script for running OpenMP sample program pi on nexus
cd $PBS_O_WORKDIR
echo "Current working directory is `pwd`"
# Note, NCPUS is a variable set by TORQUE to match the ncpus requested above.
export OMP_NUM_THREADS=$NCPUS
echo "Starting run at: `date`"
./pi
echo "Job finished at: `date`"
Debugging
Introduction
The dbx debugger is available on the Nexus system for use from character-based terminals. There is also a graphical debugger called cvd which can be accessed if you have an X Window display server running.
Manuals for these debuggers are the dbx User's Guide and ProDev WorkShop: Debugger User's Guide.
Regardless of the debugger being used, add a -g flag to the compilation command line.
For general comments on debugging, see the main WestGrid programming page.
Please write to support@westgrid.ca for help with debugging.
Linking with Installed Libraries
Introduction
See the Mathematical Libraries and Applications section of the WestGrid Software page for a description of some of the optimized linear algebra and Fourier transform libraries that can be linked with your code.
Improving Performance
Introduction
We encourage you to have your code reviewed by a WestGrid analyst. Please write to support@westgrid.ca .
For a discussion of performance profiling for OpenMP programs using the ssrun command on nexus, see the program development tools section of the course notes developed at the University of Alberta. The ssrun and perfex commands for SGI systems are also discussed in more recent workshop notes.
Updated 2007-09-13
