GROMACS on WestGrid Systems

Table of Contents

Introduction

GROMACS software

GROMACS is a molecular dynamics program (along with attendant utilities) designed for simulations of large molecules, such as proteins. This WestGrid GROMACS web page includes instructions on how to submit GROMACS jobs, but, is not a tutorial on the GROMACS software suite itself. Visit www.gromacs.org for a detailed description of program features and instructions on such things as input file structure and command line options.

Although many researchers choose to maintain their own versions of GROMACS, the software has also been installed in publicly-accessible directories on several WestGrid systems. Both serial and parallel versions of the main GROMACS executable, mdrun, are available. On some systems both single and double precision versions are available. The most common usage appears to be single precision, parallel runs, so, example scripts here will illustrate that case.

Steps in a GROMACS analysis

After preparation of various input files (conf.gro for coordinates, topol.top for topology and the main parameters file, grompp.mdp), the basic steps in using GROMACS are to run the preprocessor grompp, then, run the main energy minimization program mdrun. The simulation may be extended with subsequent mdrun steps, perhaps using tpbconv to prepare the input from the files generated from a previously terminated run. When using a parallel version of mdrun, an important consideration is that the number of processors used for mdrun must match the number specified in the grompp step. One way to manage the sequence of steps is to use the scripts discussed in the job chaining section below.

Other programs in the GROMACS suite are used for analyzing the results, but, these are not discussed here. Additional visualization programs may also be needed. If you would like additional software to be installed, please contact WestGrid support at support@westgrid.ca.

Parallel performance issues

The number of processors that can be effectively used depends on which WestGrid system is being used and on the GROMACS options chosen. For example, performance using the PME (particle mesh Ewald) treatment of long-range electrostatic interactions may not scale as well with the number of processors used as other options. If the PME method is appropriate for the molecular system you are studying, go ahead and use it, but, adjust the number of processors requested to maintain a reasonable level of parallel efficiency. What constitutes "reasonable" is a point for discussion.

Parallelization has improved with GROMACS 4 and there is the option of dedicating processors to PME calculations.  Users of GROMACS 3 should study these changes in the GROMACS manual. Also, with GROMACS 4.0.7 (available on Checkers) and future releases of GROMACS there is a tuning utility called g_tune_pme that can be used to optimize parallel performance on a job-by-job basis.

Long simulations

Long simulations will have to be broken up into several jobs. Consequently, users will have to be familiar with restarting runs using files written from previous run. A strategy for chaining jobs together to perform a long simulation is presented below.

Batch job submission

Like other jobs on WestGrid systems, if one is doing more than small debugging runs, GROMACS jobs should be submitted for batch scheduling. This is done by embedding the GROMACS commands in a script that is submitting using the qsub command. Details of scheduling and job management are explained on the Running Jobs page, but, examples of job submission are shown in the following sections for some WestGrid systems.

Running GROMACS on Westgrid Systems

GROMACS is run in basically the same way on each WestGrid system on which it has been installed, with the main mdrun program run as an MPI parallel code. (See the Running Jobs page and QuickStart guides for more information.) However, due to differences in such factors as installation location, versions available, walltime limits and processor and network performance, separate instruction pages have been prepared for some of the sites. Click on the system name in the first column in the table below for comments and example scripts for on running GROMACS on the corresponding system. For the older Matrix and Glacier clusters, there are separate pages for these system-specific instructions.  To get some basic documentation in place, some short comments are given below for the newer Checkers cluster.

 

System GROMACS installation directory
Breezy
(Not available)
Bugaboo the default version (4.5.4) is in the default PATH already (/usr/local/bin)
use module load gromacs/<version> for older versions 4.5.1, 4.0.7
Checkers

For version 4.5.4:
module load gromacs/4.5.4
or

module load gromacs

Other versions available through modules are 4.0.5, 4.0.7, 4.5.1 and 4.5.3

Glacier /global/software/gromacs-3.2/intel-fftw-2.1/i686-pc-linux-gnu/bin
/global/software/gromacs-3.2/gcc-fftw-2.1-single_p/i686-pc-linux-gnu/bin
/global/software/gromacs-3.2/single_p_pgi-5.2-fftw-2.1/i686-pc-linux-gnu/bin
/global/software/gromacs-3.3/gcc-fftw-3.1-single
/global/software/gromacs-3.3/gcc-fftw-3.1-double
 Grex (Not available)
Hermes/Nestor

/global/software/gromacs-4.0.7-single
/global/software/gromacs-4.0.7-double

Lattice

See /global/software/gromacs

Orcinus See /global/software/gromacs or use module show avail to see available versions (use module load gromacs/<version>)
Snowpatch /usr/local/bin

 

Please note that on some of the systems, GROMACS has been built with the option to use a suffix to distinguish serial from MPI versions and single from double precision versions. Look in the directory corresponding to the version you want to use to see what convention has been used on the particular system you are using. On several systems mdrun and mdrun_mpi are used for the single precision serial and MPI versions, respectively.  The double precision serial and MPI versions on most systems are mdrun_d and mdrun_mpi_d, respectively.

Running GROMACS on Checkers

The sample batch job script below illustrates some of the key features necessary for running GROMACS on Checkers.

#!/bin/bash
#PBS -S /bin/bash
#PBS -l procs=2
#PBS -l walltime=12:00:00
 
cd $PBS_O_WORKDIR
 
module load gromacs
 
mpiexec mdrun_mpi -v -s em0a -o em1 -c em1 -g emlog

In particular, note the use of the module command to set up the GROMACS environment.  Use module show gromacs to see the changes to your environment that are made by the module command and which version of GROMACS is being configured. For more information about modules, click here.

On Checkers the mpiexec command does not require an argument telling it how many processors to use. It will determine this information automatically from the TORQUE (PBS) environment in which it is running.

Note the mdrun suffix _mpi to indicate that it is an MPI version that is being run.  By default it is the single precision mdrun.  The double precision versions are mdrun_d and mdrun_mpi_d.

A tuning aid to optimize GROMACS parameters for improved parallel performance,  g_tune_pme, has been installed for use with the default (4.0.7) Checkers GROMACS installation.  See http://www.mpibpc.mpg.de/home/grubmueller/projects/MethodAdvancements/Gromacs/Download/PosterHuenfeld2009.pdf for a description of how this utility works.

Chaining GROMACS jobs

Note:  The instructions here apply only to GROMACS 3.x and are largely obsolete.  The basic idea of using the -W depend: flag on the qsub command line still applies, but, the scripts described are not directly applicable for GROMACS 4.x.  For information about extending GROMACS jobs using checkpoint files from previous runs, see the pages for extending completed and incomplete runs on the GROMACS web site.

Walltime limits on some WestGrid systems are as short as one day, so, long simulations will often have to be broken up into several shorter jobs. Consequently, users will have to be familiar with restarting runs using files written from previous run. The GROMACS tpbconv utility is used between such runs to prepare output from one run for use in the next. Some scripts have been prepared to show how the process of doing this can be streamlined.

A master script, chain, is used to call other scripts to run grompp, mdrun and tpbconv as needed. System-specific versions of all these scripts are documented on the pages for running GROMACS on Matrix and Glacier. To distinguish among the scripts, the system name is added to the script name, so, to use the chain script on Matrix, for example, use the name chain_matrix, for Glacier, use chain_glacier, etc.

The general form of the chain command is:

chain_system_name run_base_name start_sequence_number end_sequence_number picoseconds_per_run number_of_processors

Often, defaults can be used for most of the arguments.

On Matrix one might use:

chain_matrix dppc 0 5

to set up a sequence of five mdrun jobs to run one after another. Later on, if the simulation is to be extended by three more 10-picosecond steps (6, 7 and 8), say, one could use:

chain_matrix dppc 6 8 10

See /usr/apps/examples/gromacs/chain/README on Matrix and /global/software/gromacs-3.3/job_examples/chain/README on Glacier for more instructions. To run an example, copy all the files from the chain directory to a directory of your own. Then, run the chain_matrix, chain_glacier, ... script, depending on which system you are using.

Please note that the scripts as currently written have to be in a directory that is on your command PATH to work (or you could add "." to your PATH).

 


Updated 2011-11-09.