You are here

Orcinus

Orcinus intended for both serial and parallel applications.

To log in to Orcinus, connect to orcinus.westgrid.ca using an ssh (secure shell) client.

On most WestGrid systems, the TORQUE command, qsub, is used to submit jobs, as explained on the Running Jobs page. Although TORQUE is also used on Orcinus, the Moab resource manager msub command can also be used for submitting batch jobs.

The Moab, showq command can be used to monitor jobs. By default, showq shows information about all the jobs on the system. For a large cluster such as Orcinus, this can be a long list. So, you may prefer to use showq in the form

showq -u username

to limit the output to information about the jobs belonging to the given username. As on the Glacier cluster, an alternative is to use the qsort utility, which defaults to showing just your own jobs. The default queue limits are:

  • pmem = 756mb
  • walltime = 03:00:00 (that is, 3 hours)

These are the values that are used for these resources limits if your batch job submission script or qsub/msub command line arguments do not specify a memory or elapsed time (walltime) limit.

The maximum walltime limit is 240:00:00 (10 days).

The maximum number of jobs that a user may have queued to run is 1000.

Selecting processor type

Although Orcinus is comprised of two different types of nodes, jobs are never assigned to a mixture of node types. The partition resource parameter can be specified on the qsub command line or in a #PBS directive in the job script to explicitly select either the 8-core or 12-core nodes. To choose the older 8-core 3 GHz nodes, use:

#PBS -l partition=DDR

To select the newer 12-core 2.67 GHz nodes, use:

#PBS -l partition=QDR

Some interactive work is allowed on the Orcinus login machines for editing files, compiling programs, limited debugging, etc.

In addition, there are two 8-core compute nodes reserved for short debugging jobs. To access these nodes, use the -l qos=debug resource request on the qsub/msub command line or in directives in your batch job submission script:

#PBS -l qos=debug
#PBS -l walltime=mm:ss

Note: for a serial job (using a single core) the maximum walltime limit is 45:00 (45 minutes). Jobs using up to 2 compute nodes, for a total of 16 cores are allowed through the debug quality of service request, but, there is a limit of 14400 processor-seconds total (number of processors times the walltime limit, in seconds, must not exceed 14400). So, as an example, for a 16-core job the maximum allowed walltime specification is 15 minutes.

Another possibility for interactive work is to use an interactive batch job in which the batch job system is used to reserve processors for interactive work. At the moment this intearctive job feature is enabled only on the Seawolf1 login node.

To start an interactive batch job session, use a qsub command of the form:

qsub -I -l walltime=mm:ss,qos=debug

where, as above, the processors*walltime limit must not exceed 14400 processor-seconds.

Orcinus uses Lustre, a massively parallel distributed file system, for /home (the location of all users' home directories) and /global/scratch (temporary space ususally reserved for running jobs).

To check your current usage, the top 150 users are listed in daily reports located here:

/global/system/info/DU_home_info
/global/system/info/DU_scratch_info

With regard to local storage, the Phase One blades have 98GB of /scratch disk space, and the Phase Two blades have 218GB.

Storage Information on Orcinus

Directory path Size Quota Command to check quota Purpose Backup Policy
/home 86 TB 250GB per user / 250,000 files per user lfs quota -u your_user_name /home

Twice per week

/global/scratch 342 TB 500 GB per user, 200,000 files per user lfs quota -u your_user_name /global/scratch

No Backup