Table of Contents
- Introduction - Basic description of the batch job handling software on WestGrid systems.
- File Systems - Where to start jobs and store files.
- Submitting Batch Jobs with qsub - Introduction to the qsub command for reserving processors and running batch jobs.
- Summary of Commands - List and description of common commands
- Sample Batch Job Scripts - Example batch job scripts for serial and parallel cases.
- TORQUE Directives in Batch Job Scripts and qsub Options - TORQUE directives in batch job scripts as an alternative to qsub arguments.
- Environment Variables in Batch Job Scripts - TORQUE-related environment variables in batch job scripts.
- Monitoring Jobs - Monitoring of waiting and running jobs.
- Deleting Jobs - Using qdel to delete jobs.
- Job Priorities - Factors influencing job priorities.
- Job Accounting - Accessing usage statistics.
- Working Interactively - Using qsub to reserve processors for interactive work.
Introduction
A great majority of the computational work on WestGrid systems is carried out through non-interactive batch processing. Job scripts containing commands to be executed are submitted from a login server to a batch job handling system, which queues the requests, allocates processors and starts and manages the jobs. It is usually not necessary (and in some cases not permitted) to log on to the compute nodes directly. A user can (and should) monitor his or her jobs periodically as they run, but, does not have to remain logged in the entire time. The batch job system can also be used to reserve processors for interactive work, as will be explained later. Through the use of grid-oriented software it is possible to submit jobs from remote workstations, but, this capability is not discussed here as it is not yet widely supported.
The system software that handles your batch jobs consists of two pieces: a resource manager (TORQUE) and a scheduler (Moab). Documentation for these packages is available through Adaptive Computing. However, typical users will not need to study those details. Together, TORQUE and Moab provide a suite of commands for submitting jobs, altering some of the properties of waiting jobs (such as reordering or deleting them), monitoring their progress and killing ones that are having problems or are no longer needed. Only the most commonly used commands are mentioned here.
The documentation in this Running Jobs section includes a description of the general features of job scripts, how to submit them for execution and how to monitor their progress. As some of the commands, sample scripts and usage policies vary from one WestGrid system to another, please consult the QuickStart Guide for the particular system on which you are working for additional details.
File Systems
An important aspect of running jobs is choosing the directories from which to start the jobs, for temporary files, if any, that the program may need to read or write while it is running, and for the final output that needs to be saved for post-processing and analysis. In addition to the home directory, which is the default working directory when you log in, most WestGrid systems provide large capacity, high performance file systems that are intended to be used for temporary storage by running programs. Please consult the QuickStart Guide for the particular system on which you are working for specifics.
For long-term storage of output results, see the Silo QuickStart Guide and the main storage page.
Submitting Batch Jobs with qsub
A batch job script is a text file of commands for a specified UNIX shell (typically bash or tcsh) to interpret, similar to what you could execute by typing directly at a keyboard. The job is submitted to an input queue using the qsub command. A job will wait in the input queue for a longer or shorter time, depending on factors such as system load and the priority assigned to the job. When appropriate resources become available to run a job, it is removed from the input queue and started on one or more assigned processors. A job will be terminated if it exceeds its allotted time limit, or, on some systems, if it exceeds memory limits. By default, the standard output and error streams from the job are directed to files in the directory from which the job was submitted.
In the simplest case, a batch job file called batch.pbs could be submitted with:
qsub batch.pbs
The batch job system needs a description of the job requirements. These can be specified using qsub command-line options or by using special directive lines in the batch job script itself. The most commonly-used qsub flag, -l (letter ell), is used to specify various resources required by the job, such as the maximum run time, the number of processors needed and the amount of memory required. For a full list of qsub options one can use man qsub. However, the qsub manual page just gives a generic description that doesn't take into account some of the specific hardware and policies at the various WestGrid sites. By choosing combinations of resource requests that cannot be satisified, it is quite possible to submit jobs that will sit in the input "forever", without a warning message. This can happen, for example, if you try to use the ncpus parameter, instead of nodes, to request processors on one of the distributed clusters, or in some situations if you request more memory than is physically available.
Explanations for the resource-related parameters used above and other qsub options are given in a later section.
If the qsub command successfully processes the script and has queued a job for execution, it will return a JOBID in the form of a number followed by the name of a server. Generally, the server portion of the JOBID can be ignored. The JOBID number can be used in other commands involved in monitoring or deleting the job.
If no UNIX shell is explicitly specified in the job script, the user's login shell will be used to interpret the script when it is run on a compute node assigned by the batch system. When the script is executed, the program runs as a child process of the script. When the program finishes, execution returns to the next line of the script. When the last line of the script has been completed, the job terminates. Jobs can be aborted using the qdel command, which is discussed later in these notes.
Summary of Job Commands
CLICK HERE for a handout of common pbs script commands, environment variables, job monitoring commands, cluster and group information.
Sample Batch Job Scripts
In this section, several examples are shown of batch job scripts. The simplest case is that of running a serial program. Describing the resource requirements of a parallel program is somewhat more complicated and there are differences depending on whether one is running a distributed-memory (MPI-based) parallel program or one that needs to share memory on a single compute node, such as OpenMP-based parallel programs.
Script Example for a Serial Program
Here is an example of a script for a job to run a serial program named diffuse.
#!/bin/bash
#PBS -S /bin/bash
# Script for running serial program, diffuse.
cd $PBS_O_WORKDIR
echo "Current working directory is `pwd`"
echo "Running on hostname `hostname`"
echo "Starting run at: `date`"
./diffuse
echo "Program diffuse finished with exit code $? at: `date`"
Note the back ticks (`) used in some of the informational lines in the script.
Next we discuss some additional considerations for parallel jobs.
Script Example for an MPI-based Parallel Program
Here is an example of a script for a job to run an MPI-based parallel program named mpi_diffuse.
#!/bin/bash
#PBS -S /bin/bash
# Sample script for running an MPI-based parallel program, mpi_diffuse.
cd $PBS_O_WORKDIR
echo "Current working directory is `pwd`"
echo "Node file: $PBS_NODEFILE :"
echo "---------------------"
cat $PBS_NODEFILE
echo "---------------------"
# On many WestGrid systems a variable PBS_NP is automatically
# assigned the number of cores requested of the batch system
# and one could use
# echo "Running on $PBS_NP cores."
# On systems where $PBS_NP is not available, one could use:
CORES=`/bin/awk 'END {print NR}' $PBS_NODEFILE`
echo "Running on $CORES cores."
echo "Starting run at: `date`"
# On most WestGrid systems, mpiexec will automatically start
# a number of MPI processes equal to the number of cores
# requested. The -n argument can be used to explicitly
# use a specific number of cores.
mpiexec -n ${CORES} ./mpi_diffuse < mpi_diffuse.in
echo "Program mpi_diffuse finished with exit code $? at: `date`"
For a parallel job on one of the distributed-memory clusters, the qsub command could take the form:
qsub -l procs=16,pmem=2gb,walltime=72:00:00 mpi_diffuse.pbs
in which 16 processors are requested (procs parameter), using at most 2 GB of memory per process (pmem parameter) and running for at most 72 hours (walltime parameter). A table in a following section gives more information about the various resource parameters. Note (2011-08-16): there appears to be a bug in qsub such that the procs parameter must appear first in the list of resource parameters.
For most WestGrid systems, using the procs parameter is the recommended way to specify the number of processors to use for MPI-based parallel jobs. However, on the Lattice and Parallel clusters, intended for large parallel jobs, complete nodes should be used unless memory contention would dictate otherwise. Since there are 8 cores per node on Lattice, a ppn (processors per node) parameter of 8 will request that all the processors on a node be used. Also, it is recommended that you ask for 10-11 GB of memory per node requested, using the mem parameter. So, a typical job submission on Lattice would look like:
qsub -l nodes=4:ppn=8,mem=40gb,walltime=72:00:00 mpi_diffuse.pbs
On the Parallel cluster, the nodes have 12 cores per node and there is more memory available per node. A typical job submission command on Parallel is:
qsub -l nodes=4:ppn=12,mem=88gb,walltime=72:00:00 mpi_diffuse.pbs
Another special case is the Hungabee system for which the command line for running a parallel MPI program has a special parameter (dplace) to distributed MPI tasks properly. See the Hungabee QuickStart Guide for more information.
Script Example for an OpenMP-based Parallel Program
Here is an example of a script for a job to run an OpenMP-based parallel program named openmp_diffuse. This differs from the distributed-memory MPI-based parallel case in that the job must be restricted to a single node and the number of OpenMP threads must be matched to the number of cores requested by setting the OMP_NUM_THREADS environment variable.
As with the MPI-based parallel case discussed above, there is a special command for running OpenMP-based programs on Hungabee. So, the Hungabee QuickStart Guide should be consulted for details if you are working on that system.
#!/bin/bash
#PBS -S /bin/bash
# Sample script for running an OpenMP-based parallel program, openmp_diffuse.
cd $PBS_O_WORKDIR
echo "Current working directory is `pwd`"
echo "Running on `hostname`"
# On many WestGrid systems a variable PBS_NUM_PPN is automatically
# assigned the number of cores requested of the batch system
# on a single node (as is appropriate for OpenMP-based programs) and one could use
# echo "Running on $PBS_NUM_PPN cores."
# export OMP_NUM_THREADS=$PBS_NUM_PPN
# On systems where $PBS_NUM_PPN is not available, one could use:
CORES=`/bin/awk 'END {print NR}' $PBS_NODEFILE`
echo "Running on $CORES cores."
echo "Starting run at: `date`"
# Set the number of threads that OpenMP can use to the number of cores requested
export OMP_NUM_THREADS=$CORES
./openmp_diffuse < openmp_diffuse.in
echo "Program openmp_diffuse finished with exit code $? at: `date`"
An OpenMP job must run on a single compute node. Consequently, the TORQUE procs parameter should not be used, as this might result in the batch scheduler assigning more than one node. Instead, use nodes=1 along with a ppn parameter to specify the number of cores required on that single node. For example:
qsub -l nodes=1:ppn=12,mem=20gb,walltime=72:00:00 openmp_diffuse.pbs
Here 12 processor cores are requested (ppn parameter), using a total of at most 20 GB of memory (mem parameter) and running for at most 72 hours (walltime parameter). The table in the next section gives more information about the various resource parameters.
TORQUE Directives in Batch Job Scripts and qsub Options
Although qsub arguments can be used to define job characteristics, an alternative is to include these in the batch job scripts themselves. Special comment lines near the beginning of a script are used to specify TORQUE directives that are to be processed by qsub. It is primarily a personal preference as to whether they are specified in the batch script or on the command line, although debugging job submission problems is generally easier if they are in the batch job file.
TORQUE evolved from software called PBS (Portable Batch System). Consequences of that history are that the TORQUE directive lines begin with #PBS, some environment variables contain "PBS" (such as $PBS_O_WORKDIR in the example script above) and the script files themselves typically have a .pbs suffix (although that is not required).
Note: All the #PBS lines must occur before the first executable statement (non-comment or blank line) in the script.
Here is an example script containing some TORQUE directives.
# Script for running serial program, diffuse.
#PBS -l walltime=72:00:00
#PBS -l mem=2000mb
#PBS -r n
cd $PBS_O_WORKDIR
echo "Current working directory is `pwd`"
echo "Starting run at: `date`"
./diffuse
echo "Program diffuse finished with exit code $? at: `date`"
For each of the #PBS lines there is a corresponding qsub command line argument (just leave off the #PBS). If both command line arguments and #PBS lines in the script request the same type of resource, the command line argument is used. So, for example, if one wanted to do a short test run of the above script, diffuse.pbs, one could override the walltime in the script by submitting it with:
qsub -l walltime=00:30:00 diffuse.pbs
Some of the available TORQUE directives are described in the table below, with those related to resource requests (#PBS -l resource) being the most important.
Script line | Description | |
#PBS -S shell |
The -S flag is used to specify the UNIX shell that will be used to execute the script on the compute node to which the job is assigned. If this directive is omitted, a user's login shell will be used (typically bash or tcsh). Example: #PBS -S /bin/sh
Note that the path to a given shell may vary from system to system. |
|
#PBS -l resource |
|
|
#PBS -r [y/n] |
The -r flag, followed by a y or n for yes or no, respectively, is used to indicate that a job is restartable or not. #PBS -r n
|
|
#PBS -N job_name |
Use -N followed by up to 15 characters (starting with a letter and containing no spaces) to assign a name to your job. If no -N option is given, a default job name will be constructed from the name of the job script (truncated, if necessary, to 15 characters). #PBS -N diffuse_run_1
|
|
#PBS -q queue |
On some systems there is more than one queue that can be selected using the -q argument to qsub, but, on most WestGrid systems, the default is appropriate and the queue argument is not needed. See the Running Jobs documentation for the individual systems for information about the queues that are available. #PBS -q pre
|
|
#PBS -o file #PBS -e file #PBS -j [eo/oe] |
These options control how the standard output (-o option) and standard error (-e option) streams from the job combined (-j option) and directed to files. The -j option is used to request that the standard output and standard error be joined into a single intermixed stream. With -j eo, the two streams become the standard error stream and with -j oe, they become the standard output stream for the job. It is also possible to direct the combined stream to a single file (according to whether the -o or -e option is used). If neither the -o nor the -e option is used, the standard error and output streams are directed to two separate files, wtih the file names being of the form: job_name.[eo]job_number. #PBS -o diffuse_job.out
#PBS -e diffuse_job.err #PBS -j eo
#PBS -e diffuse_job.err |
|
#PBS -M email #PBS -m event |
If you would like email notification about events related to your job, use the -M option to specify an email address and use one or more of the -m options to choose which events trigger an email. The event parameter can be one of the following: n - Do not send mail The options can be combined, such as in -m ea. #PBS -M unknown@university.ca
#PBS -m bea |
Environment Variables in Batch Job Scripts
When a batch job is started, a number of environment variables are created that can be used in the batch job script. A few of the most commonly used variables are described here.
Environment Variable |
Description |
PBS_O_WORKDIR |
This is the directory from which the job was submitted. If submitting multiple jobs from different directories, it may be convenient to use cd $PBS_O_WORKDIR
as shown in the example script above, so that input files can be referenced with respect to the job submission directory, rather than using absolute paths. |
PBS_JOBID |
This is a unique job ID assigned to the job. By using this variable in the name of an output file, one can keep output from different jobs in the same directory without conflicts. When monitoring jobs, or diagnosing problems after a failed run, it is also convenient to have files tagged with the JOBID. For example: JOBINFO=diffuse.${PBS_JOBID}
echo "Starting run at: `date`" > $JOBINFO echo "Current working directory is `pwd`" >> $JOBINFO ... |
PBS_NODEFILE | The value of this variable is the name of a temporary file containing a list of the compute nodes to which the job has been assigned (with nodes repeated if more than one processor per node is used on multi-node systems). This variable is not useful on the shared-memory computers, since these are seen by the batch job system as being single nodes and the PBS_NODEFILE file is just a single line (even if multiple processors have been requested). However, it has been used for MPI-based parallel jobs on the distributed clusters, where counting the number of lines in the file PBS_NODEFILE gives the number of processors available to the job. That number can then be specified on the mpiexec command to make sure that the MPI processes are started on the nodes that the batch system has reserved for the job. However, on most WestGrid systems, mpiexec will use the correct number of processors by default. Echoing the contents of PBS_NODEFILE in a job script is still useful for debugging. |
Monitoring Jobs
There are a variety of commands available for monitoring jobs to see if they are waiting in an input queue or have started running. The main command for this purpose is showq (or showq -u your_user_name to see just your jobs). qstat -f job_id is a useful command for seeing the CPU time and memory usage of a particular job. Obtaining accurate information about running jobs, such as memory usage and load balance of parallel jobs is not always straightforward and may be system specific.
Visit the Job Monitoring Page for more detailed information.
Deleting Jobs
The qdel command can be used to delete unwanted jobs. Specify the JOBID of one or more jobs as the argument(s) to qdel. The JOBID can be obtained using one of the monitoring commands described in the previous section, such as showq -u your_username . Example:
qdel 12134
Job Priorities
A large number of factors determine the priority of a given job waiting to run in an input queue. The main principle for determining priority is called Fairshare, in which each user is assigned a target percentage of the resources. Priorities are adjusted to guide the scheduler toward these targets so that over time on a busy system, each user would get a "fair" share of the available computing resources. Additional details of the priority calculation will be given as time permits.
Job Accounting
The resource use from a job will be sent if you specify notifications by email.
#PBS -M unknown@university.ca
#PBS -m bea
Alternatively, an epilogue script can be run to include the information about the job in the standard output. To do this, create a file (for example, epilogue.script) in your home directory, with the content,
#!/bin/sh
echo "Epilogue Args:"
echo "Job ID: $1"
echo "User ID: $2"
echo "Group ID: $3"
echo "Job Name: $4"
echo "Session ID: $5"
echo "Resource List: $6"
echo "Resources Used: $7"
echo "Queue Name: $8"
echo "Account String: $9"
echo ""
exit 0
Make sure that the script is executable.
chmod u+x epilogue.script
We have found that the script will not execute if the file permissions include group write permission, so, suggest you also use:
chmod go-rwx epilogue.script
Include in your batch options,
#PBS -l epilogue=/home/myUserID/epilogue.script
The name of the script and its location are arbitrary. They just need to be specified.
To see statistics regarding your usage of WestGrid systems, log in to the Compute Canada database site.
Working Interactively
The qsub command is usually used for submitting batch jobs, but, it can also be used to reserve processors for interactive work. Use the -I option to request an interactive job:
qsub -I
In this case, qsub will not exit as it usually does, but, will indicate that it is waiting for a processor to be assigned to your job. When the processor on a computational node becomes available a message is printed to indicate that the job is "available" and a shell prompt will appear. You may then run your program and other commands interactively for a duration up to the default walltime limit. This limit varies from system to system, but, is usually at least three hours.
If you decide it is taking to long for a processor to become available, you can abort the job by typing "control-C" (hold down the control key while typing the letter C). You might then like to try again with a shorter walltime in order to improve the chances that the scheduler will be able to start your job quickly. For example, to request 30 minutes, type:
qsub -I -l walltime=00:30:00
It is important to kill any processes you have started during your interactive session, before the walltime limit has been reached. Type exit at the prompt to end your interactive session and the corresponding job.
If you need to run graphical applications during your interactive session and you have turned on X Window tunnelling in your ssh terminal session, on most WestGrid systems, the X Window support can be extended to your interactive job by adding a -X argument to the qsub command line:
qsub -I -X -l walltime=00:30:00
It is possible to work interactively with more than one processor, although the procedures for doing this vary from system to system and it is easy to accidently start processes on nodes to which your job has not been assigned. Please contact support@westgrid.ca for more information before trying this.
On all of the WestGrid systems there is some provision for doing interactive work on the login servers or through nodes assigned with qsub -I. Refer to the login banner for guidelines, or look at the system-specific QuickStart Guides for more information.
Updated:
2014-01-16 - Corrected an error in the description of the qsub procs=16 example.
2015-03-03 - Added note about file permissions for epilogue script.
2015-09-30 - Added file resource parameter note.