Gaussian on WestGrid
Table of Contents
- Introduction
- User Responsibilities
- Migrating from Checkers to Grex
- System Characteristics and Limitations
- Using Gaussian
Introduction
WestGrid has acquired a full commercial license for Gaussian 09 (G09). Gaussian software is available for use by all approved WestGrid account holders, subject to some license restrictions (see User Responsibilities below).
Note: In addition to using WestGrid, University of Alberta users may apply to use Gaussian 03 or Gaussian 09 on machines operated by Academic Information and Communication Technologies. Contact research.support@ualberta.ca for more information. Similarly University of Calgary researchers may apply to use Gaussian 03 on local U of C resources by contacting support@hpc.ucalgary.ca.
User Responsibilities
Due to licensing restrictions, researchers must agree to certain conditions in order to use Gaussian software on WestGrid systems. Follow the directions given here to apply.
Users are expected to be generally familiar with Gaussian capabilities, input file format and the use of restart files. Also, please read the Efficiency Considerations section of the Gaussian 09 Online Manual in order to learn about memory and disk requirements of different types of analyses.
There is also system-specific information given in the following sections that is important for effective use of Gaussian.
- Please note that inappropriate use of the memory-related parameters can cause jobs to fail or prevent the batch scheduler from using the system efficiently.
Migrating from Checkers to Grex
As of May 6, 2011, WestGrid moved the Gaussian license from the Checkers cluster to the Grex cluster. The license will be available at both sites until May 31, 2011. Gaussian will be removed from Checkers on May 31, 2011 and at that time Gaussian will only be available on the Grex cluster. The Grex cluster is configured very similarly to the Checkers cluster and most submission scripts that worked on Checkers should also work on Grex without modification. However, there are several differences between Checkers and Grex of which all Gaussian users should be aware. Most importantly, these differences are
- Grex has 12 cores per node (Checkers has 8). This means that Gaussian jobs can now run in parallel on up to 12 processors.
- Grex has 48 GB of memory per node (Checkers has 16GB). This means that Gaussian jobs can now utilize up to 48 GB of memory per job.
- The local scratch space on each node is 200 GB on Grex compared to 400 GB on Checkers. By default, Gaussian is utilizing this scratch space for temporary data. Please contact support@westgrid.ca for advice if your Gaussian job requires more scratch space than is available in the local scratch.
- Use /global/scratch for data associated with running jobs such as Gaussian restart files - DO NOT use /home. /global/scratch is a high performance filesystem which is designed to handle the heavy I/O associated with running Gaussian jobs.
- Gaussian G03 is not available at Grex, but, as noted in the Introduction above, may still be available on local resources at some institutions.
System Characteristics and Limitations
Gaussian is available to approved WestGrid users on grex.westgrid.ca. See the Grex QuickStart Guide for an overview of the Grex cluster.
Parallel job limitations
The Linda environment is not available on this system, so a parallel job is restricted to at most the twelve processor cores on a single node.
File system issues
By default, if the Gaussian environment is initialized by an appropriate module command (described below), the Gaussian scratch directory will be automatically assigned to use the most appropriate file system.
To avoid overloading the file server, do not include a %rwf directive of the form:
%rwf=read_write_file.wf
in your Gaussian command file. Including such a directive places the frequently-accessed temporary "read/write file" in the job's current working directory. Leaving this directive out of the input file would put the rwf file in a job-specific temporary directory in a local file system on the execution node. This yields better performance for the Gaussian job and takes the load off the file server. It is often necessary to include a %chk directive in order to save the checkpoint file, but it is never necessary to save the rwf file.
Using Gaussian
Job Submission
Gaussian is available to WestGrid users only on grex.westgrid.ca. Like other jobs on WestGrid systems, Gaussian jobs are run by submitting an appropriate script for batch scheduling using the qsub command. A sample script, gaussian.pbs, is shown further down on this page. For example, to submit a serial Gaussian job with a time limit of 168 hours (one week), use
See the Grex QuickStart Guide and the Running Jobs page for more information about submitting jobs on Grex.
Job Time Limit and Restart (Checkpoint) Files
At the time of writing, the maximum time limit for Gaussian jobs on Grex is 21 days (504 hours). The -l flag with the walltime option, as shown on the qsub command above, is used to request a specific limit on the elapsed time for the job. The format is qsub -l walltime=hhh:mm:ss for a given number of hours (hhh), minutes (mm) and seconds (ss).
To avoid lost work if there is an interruption during a long job, it is recommended that a checkpoint file be specified in your Gaussian command file, using the %chk directive:
If you underestimated the time required for the job or if it was stopped due to a system problem (other than a disk failure!), you may be able to use the restart.chk file to continue the calculation in a subsequent job.
Matching TORQUE and Gaussian Memory Limits
The Gaussian command file directive %mem=[Gaussian_memory] can be used to increase the internal memory allocation for the Gaussian program, where, for example, Gaussian_memory=1600MB. At least for G03 the amount of memory used by Gaussian is significantly more than requested by %mem. This can cause the scheduler to assign jobs to nodes that do not have sufficient memory, which can lead to job failures and conflicts with other users' jobs.
The amount of extra memory used varies from about 300 MB to 700 MB, increasing with the amount requested. For example, if you used %mem=1600MB, you should tell Torque that your job needs 2000 MB, but, if you request %mem=3000MB, TORQUE should be advised that the job needs 3700 MB or more.
The mem option is used with the -l flag on the qsub command line to tell TORQUE how much memory the job requires. It can be combined with the walltime option as shown in this example:
Running Parallel Gaussian Jobs
Some of the analyses available in the Gaussian suite support parallel processing. If you have not previously run parallel Gaussian jobs, please ask your colleagues for advice on which kind of analyses work well in parallel, or do some short test runs to compare the elapsed time as you increase the number of processors from 1 to 2 (or up to 12). No more than 12 processors may be used for a single Gaussian job on Grex.
To request a parallel calculation, use the %nproc directive in your Gaussian command file. For example:
As with the memory, it is not sufficient to tell only Gaussian how many processors you wish to use. TORQUE must also be told so that the batch job handling system assigns your job the correct number of processors. This is done using the nodes=1:ppn=... option of the -l flag on the qsub command line. For example:
In the example, ppn stands for processors per node. Two processors were requested. The number of nodes requested should always be one for Gaussian jobs. Please ensure that a colon, not a comma, is used to separate the nodes=1 from the ppn.
Administrators have noticed that users sometimes forget to add the memory and processor directives on the qsub command line. As noted on the Running Jobs page, you can add these directives to the batch job script instead, with lines of the form:
#PBS -l nodes=1:ppn=2
This is also illustrated in the sample job below.
Sample TORQUE Script for Running G09
The script below is an example of what gaussian.pbs might look like. Modify the lines containing your_g09_commands.com to reference your own input file of Gaussian commands.
#PBS -S /bin/bash
#PBS -l mem=2000MB
#PBS -l nodes=1:ppn=2
# Adjust the mem and ppn above to match the requirements of your job
# Sample Gaussian job script
cd $PBS_O_WORKDIR
echo "Current working directory is `pwd`"
echo "Running on `hostname`"
echo "Starting run at: `date`"
# Set up the Gaussian environment using the module command:
module load gaussian
# Run g09
g09 < your_g09_commands.com
Note the use of the module command to set up the environment for running Gaussian. General information about modules is available on the WestGrid modules page.
Using formchk
To run formchk interactively, to convert Gaussian output to a form suitable for transfer to another system, you should initialize the Gaussian environment in a manner similar to what is done in the batch job example above.
formchk input.chk output.fchk
Converting files from Gaussian 03
If you have checkpoint files generated using Gaussian 03, these can be converted for use with Gaussian 09 on Grex using a utility, c8609, supplied with Gaussian 09. This command-line utility accepts one argument, the file name or path to the file to be converted. Note that the conversion occurs in-place, overwriting the input file. If you would like to retain the original input file, make a copy of it before running c8609. As explained in more detail below, the module command should be run to set up your environment.
Example of using c8609 interactively on Grex to convert an old checkpoint file, water.chk, to a format suitable for Gaussian 09:
module load gaussian
cp water.chk water.chk.old
c8609 water.chk
Updated 2012-01-24.
