Hermes/Nestor QuickStart Guide
About this QuickStart Guide
This QuickStart guide provides a brief overview of the Hermes and Nestor clusters, indicating their role within WestGrid and highlighting some of the features that distinguish them from other WestGrid resources. It is intended to be read by new WestGrid account holders and by current users considering these systems.
For more detailed information about the hardware and performance characteristics, available software, usage policies and how to log in and run jobs, follow the links given below.
This system is also documented as part of the University of Victoria's Research Computing Facility.
Introduction
Hermes is a capacity cluster geared towards serial jobs. The Hermes cluster consists of 84 nodes (672 cores). Nestor is a capability cluster consisting of 288 nodes (2304 cores) geared towards large parallel jobs. The two clusters share infrastructure such as resource management, job scheduling, networked storage, and service nodes.
Both hermes.westgrid.ca and nestor.westgrid.ca are aliases for a common pool of head nodes named litaiNN.westgrid.ca. The Litai in Greek mythology were the personification of prayers from mortals to the gods. The choice of destination hostname when connecting to this facility has no bearing on what kind of jobs may be run: one may log in to hermes.westgrid.ca and submit a parallel job to the Nestor cluster.
Hardware
Nodes
Each node in the Nestor and Hermes clusters is an IBM iDataplex server with eight 2.67-GHz Xeon x5550 cores with 24 GB of RAM.
Interconnect
Hermes nodes use two bonded Gigabit/s Ethernet links (2 Gbit/s aggregate bandwidth) to get data from NFS and GPFS filesystems.
Nestor nodes share data with each other and the GPFS filesystem using a high-speed InfiniBand interconnect (4X QDR non-blocking connections giving a 40 Gbit/s signal rate with a 32 Gbit/s data rate).
Storage
The majority of storage on the clusters is deployed through the General Parallel File System (GPFS), a high-performance clustered file system that provides both fast data access and fault tolerance in cluster participants. The GPFS storage is provided by the storage subsystem (Pleiades). This subsystem is composed of:
- Spinning Disk storage: 900 TB of raw storage space
- Backup tape: 500 TB of backup storage space
The spinning disk storage is a DDN system structured into 45 tiers. Each tier provides 16 TB of usable storage. The amount left (180 TB) is used for RAID.
The following table summarizes the current usage of different tiers:
|
Number of Tiers |
Amount of usable TB |
Usage |
|
12 |
192 |
GPFS |
|
29 |
464 |
dCache (for ATLAS project) |
|
1 |
16 |
NFS (for ATLAS project) |
|
1 |
16 |
TSM for system backup |
|
2 |
32 |
Spare |
The total amount of usable spinning disk is 720 TB including the file system overhead. 180TB of disk space is managed by four storage nodes and shared amongst multiple uses, including home directories, software, and scratch space.
Disk usage is monitored but automatic disk quotas (i.e. using the "quota" command) are not implemented, nor are there current plans to implement this mechanism. Quotas are given below.
Key file spaces available include:
/home
- /home/username is your home directory (assigned to the HOME environment variable).
- Only essential data should be stored here, such as source code and processed results.
- Backed up nightly; most recent backup once active copy deleted is stored for 180 days.
- Quota: 300GB per user.
/global/scratch
- This is your work area for jobs.
- Please create a subdirectory of your choice (it is recommended you use /global/scratch/username) and use this for data sets and job processing.
- This file area is not backed up.
- Quota: 1TB per user.
/global/software
- This is where most software of user interest is installed, such as applications, analysis frameworks and support libraries.
- A list of such software is available below, but for a current up-to-date list please use ls /global/software.
Each compute node has a 250GB drive with about 225GB available for local, non-persistent scratch space for the lifetime of the job. This is roughly 28GB of scratch space per core.
Software
See the main WestGrid software page for comparative tables listing the installed software on Hermes, Nestor and other WestGrid systems, including information about the operating system and compilers.
Some of the software installed includes:
- Intel Cluster Suite, including C, C++ and Fortran compilers
- GROMACS
- GADGET
- NetCDF
- FFTW
Using Hermes and Nestor
To log in to Hermes and Nestor, connect to hermes.westgrid.ca or nestor.westgrid.ca using an ssh (secure shell) client. For more information about connecting and setting up your environment, see Setting up Your Computer.
As on other WestGrid systems batch jobs are handled by a combination of TORQUE and Moab software. For general information about submitting jobs, see Running Jobs.
Jobs are routed according to the resources requested, so that specifying a queue should for the most part be unnecessary. Jobs that request one node (or make no specific request) will be queued for Hermes nodes; jobs that request more than one node will be queued for execution on Nestor.
Queues may be explicitly requested using the -q <queue> notation on the qsub command line. The general-use queues are:
- hermes - general Hermes-appropriate jobs
- nestor - general Nestor-appropriate jobs
Wall time specification
Wall time is the amount of real time in which a job runs, regardless of the amount of CPU used or other factors. In other words, the amount of time recorded on a wall clock.
By default, jobs have a wall time of one minute. This encourages users to specify a more realistic wall time. Typically users estimate and multiply by three. To specify a wall time for a job, include the following directive at the top of the submission script (this example is for a 24-hour job):
#PBS -l walltime=24:00:00
Wall times enable prioritization and queuing based on the length of time resources will be consumed, and to some extent may be used by users to predict when their queued jobs may run.
The maximum walltime on Nestor and Hermes is 72 hours (3 days). For more information about the scheduling policies on Nestor and Hermes please check the Nesor/Hermes Job Scheduling page.
Processor specification
One may request a specific number of processors; processors chosen by the scheduler may be on any node. In this example, two processors are requested:
#PBS -l procs=2
One may also request multiple processors on a single node:
#PBS -l nodes=1:ppn=4
Finally, one may also request multiple processors on multiple nodes:
#PBS -l nodes=4:ppn=8
Jobs requesting 8 cores or less should run on hermes, and the others should run on nestor.
Memory specification
Each node has 24GB of memory, of which 1-2GB is used for the OS, depending on the image used. This leaves roughly 22GB of memory for jobs. The default amount of memory per job is 1024MB. To specify more, a resource directive like the following may be used (this example is of course for 2GB):
#PBS -l mem=2048mb
The mem parameter is the total memory limit for a job. For a parallel job, the pmem parameter can be used to specify a per-process memory requirement. For example:
#PBS -l procs=10,mem=20gb,pmem=2gb
This example requests 10 processors, with 2GB of memory per process, and 20GB total memory.
Updated 2011-11-08.
