MATLAB at WestGrid

Table of Contents

Introduction

About MATLAB

MATLAB is a general-purpose high-level programming package for numerical work such as linear algebra, signal processing and other calculations involving matrices or vectors of data. Visualization tools are also included for presentation of results. The basic MATLAB package is extended through add-on components including SIMULINK, and the Image Processing, Optimization, Neural Network, Signal Processing, Spline and Wavelet Toolboxes.

MATLAB Licensing on WestGrid

There are three methods of accessing MATLAB on WestGrid systems, as determined by the type of licensing (license-free standalone applications, donated academic licenses and WestGrid Distributed Computing Server license). 

For users with access to a MATLAB compiler license, either their own or provided by their institutions, on a machine with an architecture and system similar to a target WestGrid machine, it may be possible to create a standalone application from their code using the MATLAB compiler.  The application can then be run on a WestGrid system without using any licenses at run time.

On some WestGrid systems there is a "normal" MATLAB distribution, which is run using donated academic licenses from individual institutions, including the University of Calgary, University of British Columbia and Simon Fraser University. Under the terms of those licenses MATLAB jobs can be run by researchers from Calgary only on the Lattice or Terminus clusters hosted at Calgary, UBC researchers only on the UBC machines and SFU researchers only on SFU machines.

However, in late 2009, WestGrid purchased a new 64-worker "consortium" license for the MATLAB Distributed Computing Server.  The consortium license allows researchers from Canadian academic institutions who have licensed the Parallel Computing Toolbox (or have access to it through a local server) to submit jobs to a WestGrid cluster, even if it is not located at their home institution.  As of this writing (2011-04-03), Orcinus is the only WestGrid cluster on which the Distributed Computing Server workers run.  However, we have obtained permission to install the Distributed Computing Server on other clusters and may do so in the future.

The notes below relate to creating and running standalone applications and for using the Distributed Computing Server license.  Instructions for using MATLAB under the donated academic licenses are given on a separate page.

Creating and running standalone applications

Introduction

If you (or your institution) own a MATLAB compiler license running on a Linux machine with an architecture similar to a WestGrid compute node you may be able to create a standalone application from your code.  Such an application may then be run as a serial (or in some cases, a single-node parallel) job on an appropriate WestGrid system without using any licenses at run time.  Two cases in which this approach may be useful are when there is a need to run many copies of the code simultaneously and when you need access to machines with a large amount of memory (such as Breezy, which has 256 GB per compute node).  Note that explicit parallel processing commands, such as matlabpool, are not supported by the MATLAB compiler.

For several WestGrid institutions, there are local machines on which the MATLAB compiler is available to researchers from the corresponding institution.  For example, University of Calgary researchers may use MATLAB on the WestGrid Lattice cluster to create applications that can then be deployed to other WestGrid clusters, such as Hermes or Breezy.  In a similar manner, SFU researchers may use the compiler on Snowpatch, UBC researchers may use Orcinus, etc.  Please contact support@westgrid.ca to discuss your particular situation.

Creating a standalone application

The MATLAB mcc command is used to compile source code (.m files) into a standalone excecutable.  There are a couple of important considerations to keep in mind when creating an executable that can be run in the WestGrid batch-oriented environment.  One is that there is no graphical display attached to your session and the other is that the number of threads used by the standalone application has to be controlled. 

For example, with code mycode.m a source directory src, with the compiled files being written to a directory called deploy, the following mcc command line (at the Linux shell prompt) could be used:

mkdir deploy
cd src
mcc -R -nodisplay \
-R -nojvm \
 -R -singleCompThread \
-m -v -w  enable \
 -d ../deploy \
mycode.m

Note the option -singleCompThread has been included in order to limit the executable to just one computational thread.

In the deploy directory, an executable mycode will be created along with a script run_mycode.sh.  These two files should be copied to the target machine where the code is to be run.

Running a standalone application

After the standalone executable mycode and corresponding script run_mycode.sh have been transferred to a directory on the target system (such as Hermes or Breezy) on which they will be run, a batch job script needs to be created in the same directory.  Here is an example batch job script.

#!/bin/bash
#PBS -S /bin/bash

MCR=/global/software/matlab/mcr/v714

echo "Running on host: `hostname`"

cd $PBS_O_WORKDIR
echo "Current working directory is `pwd`"

echo "Starting run at: `date`"
./run_mycode.sh $MCR
echo "Job finished at: `date`"

The job is then submitted as any ordinary WestGrid batch job with the qsub command. See the Running Jobs page for more information.  If the above script is called matlab.pbs, it could be submitted using:

qsub -l walltime=72:00:00,mem=6gb matlab.pbs

The specified walltime and total memory (mem) limits should be adjusted to appropriate values for your particular run.

An important part of the above script is the location of the MATLAB Compiler Runtime (MCR) directory.  This directory contains files necessary for the standalone application to run.  The version of the MCR files specified (v714 in the example, which corresponds to MATLAB R2010b) must match the version of MATLAB used to compile the code. 

A complete list of the MATLAB distributions and the corresponding compiler and MCR versions is given on the Mathworks web site.  The most recent versions are listed below, along with the corresponding installation directory to which the MCR variable should be set in the example script.  Not all systems have all versions installed, so, check the /global/software/matlab/mcr directory on the system you are proposing to use. If the MCR version you need has not been installed please write to support@westgrid.ca to request that it be installed, or use a different version of MATLAB for your compilation.

 

MATLAB Release Compiler Version MCR Version MCR directory
R2009b 4.11 7.11 /global/software/matlab/mcr/v711
R2009bSP1 4.12 7.12 Not installed
R2010a 4.13 7.13 /global/software/matlab/mcr/v713
R2010b 4.14 7.14 /global/software/matlab/mcr/v714
R2011a 4.15 7.15 /global/software/matlab/mcr/v715
R2011b 4.16 7.16 /global/software/matlab/mcr/v716

 

Using MATLAB with the consortium license

Introduction

If you own a Parallel Computing Toolbox license and would like to get started using the MATLAB Distributed Computing Server environment on Orcinus, please contact WestGrid technical support.  Researchers from several WestGrid institutions (including U of A, U of C, UBC and SFU) can avoid purchasing Parallel Computing Toolbox licenses, as these are available on local servers.  Other institutions may also provide this support.

Basic instructions will be added to this web site as researchers begin to use the consortium license. In the meantime, to get a feeling for what is involved in setting up and using MATLAB through the Distributed Computing Server, you could refer to the MathWorks web site at http://www.mathworks.com/products/parallel-computing/ and to the links below to instructions at other Canadian consortia.  There are differences in the details due to different batch scheduling environments and other factors.  For example, WestGrid does not allow passwordless SSH keys that are mentioned or implied in these documents.  Job submission does require the use of SSH keys, but, these must have a password.

https://www.sharcnet.ca/help/index.php/Using_MATLAB

http://wiki.ace-net.ca/index.php/Matlab_Instructions 

Using public key authentication to connect to Orcinus

The MATLAB Parallel Computing Toolbox uses the SSH (secure shell) network protocol to log in to Orcinus to execute commands (such as qsub for submitting batch jobs).   Similarly, the SCP (secure copy) protocol is used for transferring files back and forth between Orcinus and the system on which the Parallel Computing Toolbox is running. It would be very cumbersome to be prompted for a password every time MATLAB needed to execute a remote command or transfer a file.  This can be avoided by using public key authentication.  With this method, you have to enter your WestGrid password only once during a session in which you are submitting MATLAB jobs.

Before attempting to use the Parallel Computing Toolbox for remote MATLAB job submission on Orcinus, you should set up public key authentication and verify that you can use it connect to Orcinus with an SSH client and a (secure copy) file transfer client.  The details of how you do that depend on what type of system (Linux, Microsoft Windows, MacOS X, ...) you are using.

In brief, for Linux and Macintosh systems, you generate keys with ssh-keygen (making sure that you use a pass phrase) and transfer the public key to the .ssh/authorized_keys file on Orcinus.  Then, whenever you want to submit MATLAB jobs, you run commands like ssh-agent /bin/bash and ssh-add key_file (which should prompt you for the pass phrase associated with your ssh key) before starting MATLAB.  On Microsoft Windows systems the idea is similar, in that you need to generate ssh keys and install the public key in .ssh/authorized_keys on Orcinus and then request that ssh/scp connections use the installed keys. However, the key generation and management software does not come pre-installed.  Typically, PuTTY is used, as described here.

Distributed Computing Server - basic concepts

As mentioned above, the MATLAB Parallel Computing Toolbox is used to control job submission when using the Distributed Computing Server installation of MATLAB on a WestGrid cluster, such as Orcinus. Details of these MathWorks products are available on their web site, including a user's guide for the Parallel Computing Toolbox.  There is also an administrator's guide for the Distributed Computing Server, but, most end users will not need to look at that.

In the figure below, relationships are illustrated among the various software and hardware components involved in using the Parallel Computing Toolbox on your computer to submit a batch job on an Orcinus login node, which, in turn, will run workers under the Distributed Computing Server license on the Oricinus compute nodes.

 

MATLAB on Orcinus

Some of the details of the interaction among the various components shown above are either largely hidden from view or require only a one-time setup.  For example, the SSH interactions in the diagram are taken care of by the public-key authentication setup described in the previous section.  However, a basic understanding of what is going on "under the hood" is helpful, so, will be discussed below. For additional details, see the MathWorks web site, particularly the section on Using the Generic Scheduler Interface in the Parallel Computing Toolbox User's Guide.

The Generic Scheduler Interface is a way of describing to MATLAB where you want to run your job (on a remote cluster in most cases) and sending the necessary commands to the batch job scheduler on that remote system.  This is done by defining something called a scheduler object in your MATLAB session.  The scheduler object is a structure with several components, most of which will be the same from job to job.  Specific examples of how to set up the scheduler object for submitting jobs to a WestGrid cluster will be given later in these notes.

One of the most important components of the scheduler object is a reference to the MATLAB code, called a submit function, that is used to construct the batch job script (or a series of scripts if you are submitting a number of tasks at the same time), send that script to the remote cluster, construct the TORQUE qsub command line that is used to submit jobs to the batch job system on the cluster for execution and then actually run the qsub command.  The submit function can also be used to copy any data files that are needed for the job from your local machine to the remote cluster, although if you have a large data set that is referenced by several different jobs, you could manually copy that to the cluster ahead of time.

When the batch job script that is created by the submit function is actually executed on the compute nodes of the cluster, a Distributed Computing Toolbox worker will be started up. Some environment variables that are defined in the batch job are used by the MATLAB worker to locate such things as the directory where data needed for the job is located.

You may use one of two different submit functions, depending on whether your MATLAB calculations are essentially a number of independent (serial) tasks or whether you have a parallel calculation in which different workers need to communicate and exchange data as the calculation proceeds.  You may not need to know the details of the submit function code if you are using an institutuional server to submit your jobs.  However, if you are submitting the jobs directly from your own computer, you will have to edit a few lines of sample submit functions that MathWorks provides and install the submit functions where they can be found by your MATLAB session.  Until these notes are more self-contained, please contact support@westgrid.ca for more specific advice on editing and installing the submit functions.

Examples

Some examples are given at https://www.aict.ualberta.ca/units/research/numstatsserver/pbs#matlabWestgrid .


Updated 2011-09-20.