GROMACS on WestGrid Systems

Table of Contents

Introduction

GROMACS software

GROMACS is a molecular dynamics program (along with attendant utilities) designed for simulations of large molecules, such as proteins. This WestGrid GROMACS web page includes instructions on how to submit GROMACS jobs, but, is not a tutorial on the GROMACS software suite itself. Visit www.gromacs.org for a detailed description of program features and instructions on such things as input file structure and command line options.

Although many researchers choose to maintain their own versions of GROMACS, the software has also been installed in publicly-accessible directories on several WestGrid systems. Both serial and parallel versions of the main GROMACS executable, mdrun, are available. On some systems both single and double precision versions are available. The most common usage appears to be single precision, parallel runs, so, example scripts here will illustrate that case.

Steps in a GROMACS analysis

After preparation of various input files (conf.gro for coordinates, topol.top for topology and the main parameters file, grompp.mdp), the basic steps in using GROMACS are to run the preprocessor grompp, then, run the main energy minimization program mdrun. The simulation may be extended with subsequent mdrun steps, perhaps using tpbconv to prepare the input from the files generated from a previously terminated run. When using a parallel version of mdrun, an important consideration is that the number of processors used for mdrun must match the number specified in the grompp step. One way to manage the sequence of steps is to use the scripts discussed in the job chaining section below.

Other programs in the GROMACS suite are used for analyzing the results, but, these are not discussed here. Additional visualization programs may also be needed. If you would like additional software to be installed, please contact WestGrid support at support@westgrid.ca.

Parallel performance issues

The number of processors that can be effectively used depends on which WestGrid system is being used and on the GROMACS options chosen. For example, performance using the PME (particle mesh Ewald) treatment of long-range electrostatic interactions may not scale as well with the number of processors used as other options. If the PME method is appropriate for the molecular system you are studying, go ahead and use it, but, adjust the number of processors requested to maintain a reasonable level of parallel efficiency. What constitutes "reasonable" is a point for discussion.

Parallelization has improved with GROMACS 4 and there is the option of dedicating processors to PME calculations.  Users of GROMACS 3 should study these changes in the GROMACS manual. Also, with GROMACS 4.0.7 (available on Checkers) and future releases of GROMACS there is a tuning utility called g_tune_pme that can be used to optimize parallel performance on a job-by-job basis.

Long simulations

Long simulations will have to be broken up into several jobs. Consequently, users will have to be familiar with restarting runs using files written from previous run. A strategy for chaining jobs together to perform a long simulation is presented below.

Batch job submission

Like other jobs on WestGrid systems, if one is doing more than small debugging runs, GROMACS jobs should be submitted for batch scheduling. This is done by embedding the GROMACS commands in a script that is submitting using the qsub command. Details of scheduling and job management are explained on the Running Jobs page, but, examples of job submission are shown in the following sections for some WestGrid systems.

Running GROMACS on Westgrid Systems

GROMACS is run in basically the same way on each WestGrid system on which it has been installed, with the main mdrun program run as an MPI parallel code. (See the Running Jobs page and QuickStart guides for more information.) However, due to differences in such factors as installation location, versions available, walltime limits and processor and network performance, separate instruction pages have been prepared for some of the sites. Click on the system name in the first column in the table below for comments and example scripts for on running GROMACS on the corresponding system. For the older Matrix and Glacier clusters, there are separate pages for these system-specific instructions.  To get some basic documentation in place, some short comments are given below for the newer Checkers cluster.

 

System GROMACS installation directory
Bugaboo /usr/local/bin
Checkers /global/scratch/software/gromacs/gromacs-4.0.7/bin (use module load gromacs)
/global/scratch/software/gromacs/gromacs-4.0.5/bin (use module load gromacs/4.0.5)
Cortex (Not available)
Glacier /global/software/gromacs-3.2/intel-fftw-2.1/i686-pc-linux-gnu/bin
/global/software/gromacs-3.2/gcc-fftw-2.1-single_p/i686-pc-linux-gnu/bin
/global/software/gromacs-3.2/single_p_pgi-5.2-fftw-2.1/i686-pc-linux-gnu/bin
/global/software/gromacs-3.3/gcc-fftw-3.1-single
/global/software/gromacs-3.3/gcc-fftw-3.1-double
Matrix

/usr/apps/gromacs405/bin
/usr/apps/gromacs332/bin

Nexus /usr/global/gromacs-3.1.4
/usr/global/gromacs-3.2
Orcinus /global/software/gromacs/4.0.5/intel/bin (use module load gromacs)
Robson

/usr/local/gromacs-3.2.1/bin
/usr/local/gromacs-3.3.2/bin 

Snowpatch /usr/local/bin

 

Please note that on some of the systems, GROMACS has been built with the option to use a suffix to distinguish serial from MPI versions and single from double precision versions. Look in the directory corresponding to the version you want to use to see what convention has been used on the particular system you are using.

Running GROMACS on Checkers

The sample batch job script below illustrates some of the key features necessary for running GROMACS on Checkers.

#!/bin/bash
#PBS -S /bin/bash
#PBS -l procs=2
#PBS -l walltime=12:00:00
 
cd $PBS_O_WORKDIR
 
module load gromacs
 
mpiexec mdrun_s_ompi -v -s em0a -o em1 -c em1 -g emlog

In particular, note the use of the module command to set up the GROMACS environment.  Use module show gromacs to see the changes to your environment that are made by the module command and which version of GROMACS is being configured. For more information about modules, click here.

On Checkers the mpiexec command does not require an argument telling it how many processors to use. It will determine this information automatically from the TORQUE (PBS) environment in which it is running.

Note the mdrun suffix, _s, to indicate that a single precision version is being run and the _ompi to indicate that it is an Open MPI version.

A tuning aid to optimize GROMACS parameters for improved parallel performance,  g_tune_pme, has been installed for use with the default (4.0.7) Checkers GROMACS installation.  See http://www.mpibpc.mpg.de/home/grubmueller/projects/MethodAdvancements/Gromacs/Download/PosterHuenfeld2009.pdf for a description of how this utility works.

Chaining GROMACS jobs

Walltime limits on some WestGrid systems are as short as one day, so, long simulations will often have to be broken up into several shorter jobs. Consequently, users will have to be familiar with restarting runs using files written from previous run. The GROMACS tpbconv utility is used between such runs to prepare output from one run for use in the next. Some scripts have been prepared to show how the process of doing this can be streamlined.

A master script, chain, is used to call other scripts to run grompp, mdrun and tpbconv as needed. System-specific versions of all these scripts are documented on the pages for running GROMACS on Matrix and Glacier. To distinguish among the scripts, the system name is added to the script name, so, to use the chain script on Matrix, for example, use the name chain_matrix, for Glacier, use chain_glacier, etc.

The general form of the chain command is:

chain_system_name run_base_name start_sequence_number end_sequence_number picoseconds_per_run number_of_processors

Often, defaults can be used for most of the arguments.

On Matrix one might use:

chain_matrix dppc 0 5

to set up a sequence of five mdrun jobs to run one after another. Later on, if the simulation is to be extended by three more 10-picosecond steps (6, 7 and 8), say, one could use:

chain_matrix dppc 6 8 10

See /usr/apps/examples/gromacs/chain/README on Matrix and /global/software/gromacs-3.3/job_examples/chain/README on Glacier for more instructions. To run an example, copy all the files from the chain directory to a directory of your own. Then, run the chain_matrix, chain_glacier, ... script, depending on which system you are using.

Please note that the scripts as currently written have to be in a directory that is on your command PATH to work (or you could add "." to your PATH).

 


Updated 2010-02-11.