Matrix QuickStart Guide
About this QuickStart Guide
This QuickStart guide gives a brief overview of the WestGrid Matrix facility, highlighting some of the features that distinguish it from other WestGrid resources. It is intended to be read by new WestGrid account holders and by current users considering whether to move to the Matrix system. For more detailed information about the Matrix hardware and performance characteristics, available software, usage policies and how to log in and run jobs, follow the links given below.
Introduction
Matrix is an Linux Opteron cluster with Infiniband interconnect. Matrix is restricted to parallel jobs and should be used for highly parallel applications that can take advantage of the high capacity Infiniband interconnect.
Hardware
Processors
Matrix is a cluster consisting of 128 nodes. Each node has 2 Opteron CPUs (2.40GHz) and 2 GB of memory.
Interconnect
All nodes are connected by both Gigabit Ethernet and Infiniband interconnect. Infiniband provides much lower latency and higher bandwidth than commodity networking, which makes these nodes suitable for demanding message passing applications.
Storage
There is 4 TB of disk space allocated for home directories and global scratch space. Each user has a subdirectory in /scratch. (A 10 GB quota per user is enforced for files in /home/users and a 350 GB limit in /scratch).
Software
See the main WestGrid software page for tables showing the installed software on Matrix and other WestGrid systems, including information about the operating system and compilers.
Using Matrix
To log in to Matrix, connect to matrix.westgrid.ca using an ssh (secure shell) client. For more information about connecting and setting up your environment, see the QuickStart Guide for New Users.
Users developing their own parallel programs should refer to the Matrix programming page.
The Matrix login node may be used for short interactive runs during development. There are also two nodes reserved for jobs that are 3 hours or less in duration. These can be accessed for interactive work using the technique described on the Matrix Running Jobs page. Production runs should be submitted as batch jobs.
As on other WestGrid systems, batch jobs are handled by a combination of TORQUE and Moab software. For more information about submitting jobs, see the general Running Jobs page and some Matrix-specific notes.
The maximum walltime limit for Matrix jobs is 3 days. The combined total number of processors allowed for running jobs for a given user is 64 processors at a given time. Since 8-16 processor jobs will generally wait in the input queue for a shorter time than larger jobs, you will get better throughput by submitting several smaller jobs to run at the same time rather than submitting a series of 64-processor jobs one after another.
Updated 2009-08-10.