You are here
Nov.25, 2015: Glacier has been decommisioned and is no longer available.
Glacier is a legacy, 32-bit system and, as such, new WestGrid accounts are NOT created on this machine by default. If you would like an account on this machine, please contact email@example.com and provide a brief description of your project along with some justification for wanting to use this system. Glacier is most suitable for serial processing and parallel jobs which do not require a fast interconnect fabric and can use a 32-bit architecture.
To log in to Glacier, connect to glacier.westgrid.ca using an
ssh (secure shell) client.
As on other WestGrid systems batch jobs are handled by a combination of TORQUE and Moab software. For more information about submitting jobs, see Running Jobs. Please note that the maximum walltime limit per job on Glacier is 240 hours.
The maximum number of jobs that a user may have queued to run is 1000.
To facilitate testing and debugging, a couple of Glacier nodes are reserved for short jobs (less than 10 minutes). To requests these nodes, add the debug Quality of Service (QOS) resource request to your job script.
#PBS -l qos=debug,walltime=00:10:00
To improve the startup time of parallel jobs use the parallel QOS request:
#PBS -l qos=parallel
Please note the following details regarding the QOS requests:
- debug QOS : Maximum 4 CPUs ; maximum walltime 10 minutes; uses nodes ice1_1 and ice1_2 - see associated memory limits below.
Note that Glacier nodes have only 2 CPUs, so, if you need to debug with the maximum of 4 CPUs, add nodes=2:ppn=2 to the -l resource list.
- parallel QOS: Minimum 4 CPUs ; maximum walltime 240 hours; uses all nodes except those reserved for QOS:debug.
- normal QOS: Maximum walltime 240 hours; uses all nodes except those reserved for QOS:debug.
A default memory limit of 768 MB is assigned to each job. To override this value use the mem resource request on the qsub command line or batch job script. For example:
#PBS -l mem=1024mb
The maximum value of the mem parameter for a single processor is:
2007mb for the 756 nodes (90% of the cluster) in racks 1-9 (ice1_1,...,ice54_14) and
4005mb for the 84 nodes (10% of the cluster) in rack 10 (ice55_1,...,ice60_14).
The mem parameter is the total memory limit for a job. For a parallel job, the pmem parameter can be used to specify a per-process memory requirement. For example:
#PBS -l nodes=10,mem=20gb,pmem=2gb
means that submitted job needs 10 processors, 20gb of memory with 2gb of RAM per process. Since 2gb (2048 MB) is > 2007mb, this job can be only executed on nodes ice55_1,...,ice60_14. One might expect such a job to wait in the input queue for a longer time than a job that could run on one of the smaller memory nodes.
Storage space is provided through IBM's General Parallel File System (GPFS) - a high-performance shared-disk file system that can provide fast data access from all nodes. A Storage Area Network (SAN) with almost 14 TB of disk space connected directly to 8 storage nodes (moraine1,...,moraine8) is used to fulfill I/O requests from all nodes.
In addition to the listed file systems, each compute node has an approximately 35 GB local partition for temporary files associated with running jobs. On the compute nodes, you can access this temporary storage area as either /scratch or /tmp - both directory references point to the same space. For jobs using many small files (a few MB each, say) use this local scratch storage (/scratch or /tmp) instead of /global/scratch, as the latter is optimized for large files.
Storage Information on Glacier
|Directory path||Size||Quota||Command to check quota||Purpose||Backup Policy|
Disk space is limited, so, please use this file system to store only your essential data (source code, processed results if "small" in size, etc.)
We backup the /global/home file system with a 14-day expiration policy (backup frequency every 36h).
|/global/scratch||8.6TB||Although the quota command itself may not report this, there is a storage limit of 100 GB for your data in /global/scratch.||
File system designed for fast changing "large" data sets and work area.
We do not backup this file system.