You are here

Parallel

Parallel System Status

Post date System Status: Update Notes
2015-02-23 - 02:19 PST Testing

Breezy/Lattice/Parallel Filesystem down

Update 2015-03-24: /global/scratch is undergoing file system checks. We estimate another 2-3 days to run the check tools during which time systems may continue to be unstable.

One of the fileservers supplying Breezy/Lattice/Parallel is experiencing hardware problems.

System staff and the vendor are investigating.

Update 2015-02-23: /global/scratch will be undergoing maintenance on Wednesday February 25 between 10:00 AM and 12:00 PM (noon). Access to /global/scratch will not be possible during this time.

2015-01-31 - 05:09 PST Online

Parallel back in service after a power failure in Calgary

Update Friday evening, 2015-01-30: Parallel is in production again after recovery from a power failure on Friday morning. All running jobs were lost during the outage.  Sorry for the inconvenience.

2015-01-30 - 11:56 PST Offline

Power failure in Calgary

All systems in Calgary are down.

2014-11-21 - 13:50 PST Online

Parallel fully operational

The Parallel cluster is fully operational.

2014-09-30 - 10:19 PDT Conditions

Shared filesystem still having problems

See the "lattice" status notice for details.  (The problem is with the shared storage system which is used by parallel, lattice and breezy).

2014-09-30 - 10:17 PDT Online

System fully operational

Finished on September 30, 2014 - 11:17 MDT

2014-09-05 - 16:39 PDT Conditions

Breezy - Lattice - Parallel file system problems

Original notice 2014-08-25: An IBRIX file system segment has become unavailable, affecting approximately 2% of "scratch".  Scheduling has been resumed using a temporary scratch file system  (/scratch2) while systems staff and the vendor investigate further. 

Update 2014-09-05: Scheduling on Parallel will be paused for about 4 hours, starting Monday, September 8, to reduce the heat load during air conditioner maintenance.

Additional notes: Jobs in which the job script references /global/scratch have been put in a UserHold state and will not run until after the file system problem has been resolved.  If you do not want to wait and choose to resubmit your jobs using /scratch2 instead, please delete (qdel JOBID) any corresponding duplicate /global/scratch-based jobs that are waiting. 

/global/scratch has been mounted on the login node to allow copying of selected files to /scratch2.  Note that /scratch2 has less space than /global/scratch so you should only copy files that are necessary to continue running your jobs.  Also, /global/scratch has not been mounted on the compute nodes, so, you should ensure that your job scripts do not contain any references to files in /global/scratch.

Pages