Checkers QuickStart Guide

About this QuickStart Guide

This QuickStart guide gives a brief overview of the WestGrid Checkers facility, highlighting some of the features that distinguish it from other WestGrid resources. It is intended to be read by new WestGrid account holders and by current users considering whether to move to the Checkers system. For more detailed information about the Checkers hardware and performance characteristics, available software, usage policies and how to log in and run jobs, follow the links given below.

Introduction

Checkers is an SGI Altix XE320-based cluster with 1280 cores connected with Infiniband, running the Scientific Linux operating system.

Hardware

Processors

The Checkers facility is comprised of a login node, storage server and 160 compute nodes providing 1280 processor cores for computations. Each 8-core compute node has 2 sockets, each containing an Intel Xeon L5420 quad-core processor, running at 2.5 GHz.  Each compute node has 16 GB of memory (DDR2 RAM) that can be shared among the 8 cores on that node.

Interconnect

The compute nodes are connected with double data rate (DDR) infiniband (providing an aggregate 20 Gb/s) with 2 to 1 blocking, using a 288-port Voltaire Grid Director ISR 2012 switch.

Storage

Storage is provided by an SGI IS10k storage array, consisting of 100 one-TB SATA drives configured as RAID 6 ( 8+2 dual parity).  This gives about 80 TB of usable space.  The storage is managed by an 8-core SGI Altix 450 NFS server.

There are also two 250GB SATA 7200 RPM drives per node for local storage.  Just under 400 GB is available to users as /scratch on each compute node.

Software

A list of the installed software on Checkers is not yet available. Please write to support@westgrid.ca if there is software that you would like installed.

Using Checkers

To log in to Checkers, connect to checkers.westgrid.ca using an ssh (secure shell) client. For more information about connecting and setting up your environment, see the QuickStart Guide for New Users.

The following limits are currently enforced on the system:

  • Default walltime is 3 hours;   
  • Maximum walltime is 21 days;
  • Maximum number of running jobs per user is 64;
  • Maximum number of queued jobs gaining priority over time is 5 per user;
  • There is a maximum of 30720 processor-hours per job.  This value is the number of processors multiplied by walltime as  requested in the job.

(to be continued...)


Updated 2009-07-27.