Lattice QuickStart Guide

About this QuickStart Guide

This QuickStart guide gives a brief overview of the WestGrid Lattice facility, highlighting some of the features that distinguish it from other WestGrid resources. It is intended to be read by new WestGrid account holders and by current users considering whether to move to the Lattice system. For more detailed information about the Lattice hardware and performance characteristics, available software, usage policies and how to log in and run jobs, follow the links given below.

Introduction

Lattice is an HP AlphaServer cluster running Tru64 UNIX. In April 2008 there were major changes to the Lattice system, as some of the older machines were retired. Formerly, Lattice was used for parallel MPI jobs using a high-end Quadrics interconnect, but, support for multi-node parallel jobs has also been removed. Parallel jobs using up to four processors can still be run on single nodes on Lattice. Lattice is also the only WestGrid system with Gaussian and ABAQUS Licenses for external users (with some restrictions).

Hardware

Processors

Lattice is a cluster consisting of 36 HP ES45 nodes for computation, along with a login node and other machines for job management and file serving. Each of the computational nodes has 4 Alpha EV6.8CB (21264C) CPUs and 4-8 GB of memory. The CPUs are clocked at 1.25 GHz and have 16 MB of cache.

Interconnect

Multi-node jobs are no longer allowed.

Storage

There is approximately 3 TB of disk space allocated for home directories and global scratch space.  A subdirectory for each user is available in /scratch and /scratch2.

In addition, each compute node has a local scratch directory, called /local_scratch with approximately 200 GB of storage space. As noted below, it is very important for performance reasons to use /local_scratch for I/O intensive programs. Note that these local scratch partitions are shared among all users of a given node, so, you are not guaranteed that all the space will be available for any given run.

Software

See the main WestGrid software page for tables showing the installed application software on Lattice and other WestGrid systems, as well as information about the operating system, compilers, and mathematical and graphical libraries.

Using Lattice

To log in to Lattice, connect to lattice.westgrid.ca using an ssh (secure shell) client. For more information about connecting and setting up your environment, see the QuickStart Guide for New Users.

As on other WestGrid systems batch jobs are handled by a combination of TORQUE and Moab software. For more information about submitting jobs, see Running Jobs. Historically, there has been more than one queue on the Lattice system, with the g03 queue being used for jobs with a walltime limit of 503 hours (nearly 3 weeks) and the default queue having a walltime limit of 168 hours (1 week). However, this arrangement will be subject to review after the April 2008 reorganization of Lattice is complete.

Access to the /scratch and /scratch2 file systems on Lattice was also changed in April 2008, now using NFS over a slower network connection than previously used. So, it is very important for I/O intensive programs to make use of the /local_scratch file system for the I/O associated with running batch jobs. Copy input to /local_scratch as part of your job script and copy output from /local_scratch to your directory in /scratch or /scratch2 after the run has completed.


Updated 2008-04-07.