You are here
|Post date||System Status:||Update Notes|
|2015-02-23 - 02:19 PST||Testing||
Breezy/Lattice/Parallel Filesystem down
Update 2015-03-24: /global/scratch is undergoing file system checks. We estimate another 2-3 days to run the check tools during which time systems may continue to be unstable.
One of the fileservers supplying Breezy/Lattice/Parallel is experiencing hardware problems.
System staff and the vendor are investigating.
Update 2015-02-23: /global/scratch will be undergoing maintenance on Wednesday February 25 between 10:00 AM and 12:00 PM (noon). Access to /global/scratch will not be possible during this time.
|2015-01-31 - 05:09 PST||Online||
Parallel back in service after a power failure in Calgary
Update Friday evening, 2015-01-30: Parallel is in production again after recovery from a power failure on Friday morning. All running jobs were lost during the outage. Sorry for the inconvenience.
|2015-01-30 - 11:56 PST||Offline||
Power failure in Calgary
All systems in Calgary are down.
|2014-11-21 - 13:50 PST||Online||
Parallel fully operational
The Parallel cluster is fully operational.
|2014-09-30 - 10:19 PDT||Conditions||
Shared filesystem still having problems
See the "lattice" status notice for details. (The problem is with the shared storage system which is used by parallel, lattice and breezy).
|2014-09-30 - 10:17 PDT||Online||
System fully operational
Finished on September 30, 2014 - 11:17 MDT
|2014-09-05 - 16:39 PDT||Conditions||
Breezy - Lattice - Parallel file system problems
Original notice 2014-08-25: An IBRIX file system segment has become unavailable, affecting approximately 2% of "scratch". Scheduling has been resumed using a temporary scratch file system (/scratch2) while systems staff and the vendor investigate further.
Update 2014-09-05: Scheduling on Parallel will be paused for about 4 hours, starting Monday, September 8, to reduce the heat load during air conditioner maintenance.
Additional notes: Jobs in which the job script references /global/scratch have been put in a UserHold state and will not run until after the file system problem has been resolved. If you do not want to wait and choose to resubmit your jobs using /scratch2 instead, please delete (qdel JOBID) any corresponding duplicate /global/scratch-based jobs that are waiting.
/global/scratch has been mounted on the login node to allow copying of selected files to /scratch2. Note that /scratch2 has less space than /global/scratch so you should only copy files that are necessary to continue running your jobs. Also, /global/scratch has not been mounted on the compute nodes, so, you should ensure that your job scripts do not contain any references to files in /global/scratch.