System Notices
24 Aug 2010: Bugaboo unavailable 13:00 - 16:00 (Pacific)
Because of a power outage in the building where the Bugaboo cluster is located, this system will be unavailable from 13:00-16:00 (Pacific) on Tuesday, Aug. 24, 2010. All jobs running at that time will have to be terminated.
17 August 2010: UBC Chiller Maintenance Completed
Maintenance on the UBC Chemistry Chiller unit has been successfully completed. However, we are still running tests on the IB fabric, so scheduling will NOT resume until late tomorrow afternoon.
August 11 - Checkers cluster back in production at UofA
All the WestGrid systems affected by the Monday's storm are back in production. That's include IBMs Cortex, Synapse, Adenine, Guanine and Bigfoot, SGIs Nexus, Australis, Arcturus and Helios and the Checkers cluster.
August 9, 2010 - UofA WestGrid resources unavilable due to power outage
Due to severe weather conditions in Edmonton tonight (Monday, August 9) General Services Building and the computer room lost power between 21:00 and 22:20
Checkers cluster experienced a hard power outage, after it's UPS run out of power. There are several disks failures on the DDN array, we are waiting for controllers to verify filesystem integrity.
All the SGI's and IBM machines were powered off to keep the temperature down in the center. We will asses the damage and restart all possible services in the morning. All the running jobs on the systems at UofA were lost.
Sorry for the inconvenience.
Breezy outage
Due to a storage related issue, the Breezy cluster is currently closed and offline.
We hope to have all issues resolved this afternoon.
UPDATE: Cluster is up and running again, and accepting jobs.
03 August 2010 - UBC Orcinus Maintenance Scheduled
UBC WestGrid - Scheduling Resumed
UBC WestGrid - Power Failure Update
Glacier is back up. Orcinus should be up in about 15 more minutes. Here's our most recent MOTD:
July 27, 2010 (2:50PM):
Last evening there was a major power outage on the UBC campus. As a result both Glacier and Orcinus went down. We have restored the file system and access to the head nodes. But we have not yet restarted scheduling. Here's why:
Because of excessive disk space usage, we have enabled quotas on both clusters. Details to follow.
We will enable scheduling as soon as people have cleaned up enough space on both /global/{home,scratch} for us to continue processing new jobs. Please contact support@westgrid.ca, if you have any questions.
27 July 2010 - Power Outage at UBC
Power outage at UBC campus oricus/glacier down
04:35 Tuesday 27 Julty 2010
The UBC had/have power outage the glacier went down, we were forced to shut down the orcinus too (file system).
July 16 - 18 - Power outage UofA WestGrid site.
There will be a power outage in the data center.
Checkers, Bigfoot, Adenine, Guanine are all affected.
July 19, all machines back in production
