You are here

Parallel

Parallel is intended for multi-node parallel applications that can run in a relatively short time (less than 3 days) and can take advantage of its InfiniBand interconnect or special GPU-based nodes. It can also be used for applications that have license restrictions that prevent them from being run elsewhere.

Unlike most WestGrid systems, a separate request is required to obtain a WestGrid account on Parallel. If you think the software you would like to run is appropriate for the Parallel cluster, please write to accounts@westgrid.ca with a subject line of the form "Parallel account request (your_username)" with a request for an account and a mention of the software you propose to use.

To log in to Parallel, connect to parallel.westgrid.ca using an ssh (secure shell) client.

As on other WestGrid systems batch jobs are handled by a combination of TORQUE and Moab software. For more information about submitting jobs, see Running Jobs.

Unlike most other WestGrid systems, we prefer that the syntax "-l nodes=xx,ppn=12" be used rather than "-l procs=yyy" when requesting processor resources on Parallel. Parallel is used almost exclusively for large parallel jobs that use whole nodes. This has the potential of improving the performance of some jobs and minimizes the impact of a misbehaving job or hardware failure. Since there are 12 cores per node on Parallel, a ppn (processors per node) parameter of 12 will request that all the processors on a node be used. Also, it is recommended that you ask for 22-23 GB of memory per node requested, using the mem parameter. So, a typical job submission on Parallel would look like:

qsub -l nodes=4:ppn=12,mem=88gb,walltime=72:00:00 parallel_diffuse.pbs

The following limits are in place for batch jobs submitted to the default queue (that is, if no queue is specified on qsub command):

Resource Policy or Limit
Maximum walltime, but see below for other comments related to walltime 72 hours
Suggested maximum memory resource request, mem, per node. 23 GB
Maximum number of running jobs for a single user 64
Maximum cores (sum for all jobs) for a single user 3072 (for example, 256 12-core nodes)
Maximum jobs in Idle queue 1000

 

Some nodes have a maximum walltime limit of 24 hours and a few are restricted to just 3 hours. In particular, most of the GPU-enabled nodes accessed with -q gpu have a 24-hour limit.

The login node can be used for short testing and debugging sessions. If you are unsure how much memory your calculations require, you are testing parallel code, or you need to run tests that last more than a few minutes, you should not do your testing on the login node. Instead, see the Working Interactively section of the Running Jobs page for a method to reserve processors for interactive use, using qsub -I.

If you do not need to work interactively, but, just need to run short batch jobs to test your software, specifying a short walltime will increase your chances of getting a quick response.

There are a couple of nodes reserved for jobs less than 3 hours that can be accessed by using -q interactive and an appropriately short time limit. For example:

qsub -q interactive -l walltime=03:00:00 job_script.pbs

If you require GPU-enabled nodes, these are available through the interactive queue if you specify the gpus resource along with the nodes and processors you require. For example:

qsub -q interactive -l walltime=03:00:00,nodes=1:ppn=12:gpus=3 job_script.pbs

Please see the GPU Computations page for more information about running programs that require GPUs. On that page there is mention of using -q gpu to request nodes with GPUs. However, for short tests with two nodes or fewer you are probably better off to use -q interactive instead of -q gpu.

If you require exclusive access to a node while testing, you can add naccesspolicy=singlejob, as shown here:

qsub -l walltime=03:00:00,naccesspolicy=singlejob job_script.pbs

Storage Information on Parallel

Directory path Size Quota Command to check quota Purpose Backup Policy
/home 20 TB (Shared with Breezy and Lattice) 50 GB, with a 200,000-file limit, for each individual home directory. Write to support@westgrid.ca with a subject line of form "Disk quota for user your_user_name requested for Parallel"

Use your home directory for files that you want to save for longer than 30 days.

Users are responsible for their own backups.

/global/scratch/user_name 300 TB (Shared with Breezy and Lattice) 450 GB, with a 200,000-file limit. If you need an increased quota, please write to support@westgrid.ca Write to support@westgrid.ca with a subject line of form "Disk quota for user your_user_name requested for Parallel"

Run jobs from your directory in /global/scratch.

Please note that /global/scratch is intended only for files associated with running jobs or waiting for post-processing. Files older than 30 days are subject to deletion.

2015-11-09 - Changed quota checking procedure.