System Notices

Edmonton - Saskatoon OC192 outage

TICKET INFORMATION:
Subject: Edmonton - Saskatoon OC192 outage
Category: Outage
Ticket ID: 20100310-001
Start Time: 2010-03-10 03:22 EST (2010-03-10 08:22 UTC)
End Time: 2010-03-10 06:33 EST (2010-03-10 11:33 UTC)
TICKET HISTORY:
== Updated: Thomas on 2010-03-10 07:11 EST(2010-03-10 12:11 UTC) ==
The OC192 circuit was restored at the time shown above.
The provider informed that a local power failure at one of
the AMP sites caused this outage.
== Created: Thomas on 2010-03-10 04:19 EST(2010-03-10 09:19 UTC) ==
The Edmonton-Saskatoon OC192 went down at the time
shown above. The cause of the outage is unknown. The
provider is being contacted. The following core links and
lightpaths are affected.
Core Calgary-Winnipeg
Cybera - Edmonton
ECONET Edm - Mon
ECONET Edm - Sas
NRNet VCTR - SASK
Neptune300 VCTR-SASK
SRNet backup SASK - RGNA via EDMN
TRIUMF UBC - UofA
TRIUMF Van - Tor
WestGrid Cal - Sas
CANARIE NOC
Operations and Engineering
Email: eng@canarie.ca
Weekdays: 08:00-17:00 EST(UTC-5)
+1.613.944.5612
7/24 pager: +1.613.944.5611
http://www.canarie.ca/canet4/

Edmonton - Saskatoon OC192 outage

TICKET INFORMATION:
Subject: Edmonton - Saskatoon OC192 outage
Category: Outage
Ticket ID: 20100310-001
Start Time: 2010-03-10 03:22 EST (2010-03-10 08:22 UTC)
End Time: 0000-00-00 00:00 UTC (0000-00-00 00:00 UTC)
== Created: Thomas on 2010-03-10 04:19 EST(2010-03-10 09:19 UTC) ==
The Edmonton-Saskatoon OC192 went down at the time
shown above. The cause of the outage is unknown. The provider is being contacted. The following core links and
lightpaths are affected.
Core Calgary-Winnipeg
Cybera - Edmonton
ECONET Edm - Mon
ECONET Edm - Sas
NRNet VCTR - SASK
Neptune300 VCTR-SASK
SRNet backup SASK - RGNA via EDMN
TRIUMF UBC - UofA
TRIUMF Van - Tor
WestGrid Cal - Sas
CANARIE NOC
Operations and Engineering
Email: eng@canarie.ca
Weekdays: 08:00-17:00 EST(UTC-5)
+1.613.944.5612
7/24 pager: +1.613.944.5611
http://www.canarie.ca/canet4/

Mar 8, 2010 - Snowpatch Upgraded

The Snowpatch cluster has been upgraded to a newer operating system. All software packages that were previously available have been recompiled and upgraded to a newer version (if available). In all likelyhood users must recompile their own programs as well - please send email to support@westgrid.ca, if you need help.

Mar 8, 2010 - Bugaboo /global/scratch available again

Bugaboo has been rebooted and the /global/scratch directory is available again.

Monday, March 8, 2010 - Bugaboo /global/scratch unavailable due to file system problem

There is a problem with the /global/scratch file system on Bugaboo. Until the problem is resolved you cannot access
files in that file system.  Sorry for the inconvenience.

 

Mar 21 6 AM MST - Power outage UofA WestGrid site.

SUNDAY MARCH 21 6AM MST all UofA WestGrid resources will be unavailable due to a power outage in our data center.

Checkers cluster, IBM SMPs (Cortex, Dendrite, Synapse, Adenine, Guanine, Bigfoot), SGI SMPs (Nexus, Arcturus, Australis, Borealis, Helios, Corona) will all  be affected.

Mar 8, 2010 - Snowpatch system upgrade

snowpatch.westgrid.ca will be unavailable on Mar 8, 2010.

Snowpatch will be completely reinstalled: the operating system will be upgraded to Scientific Linux 5.3 and all applications will be recompiled.

All running jobs will be terminated. All applications that users have compiled themselves on snowpatch must most likely be recompiled after snowpatch comes back into production. Please, contact support@westgrid.ca, if you require help with recompiling your software.

We expect that Snowpatch will be available again on Mar 9.

Sunday, Feb. 21: UBC Orcinus - Data Center Cooling Issue

At ~11:30PM the Chemistry Datacenter suffered a chiller issue which was
 not resolved by UBC Plant Operations until ~1:00AM. As a result, many
 compute nodes in the cluster rebooted and many jobs running at that time
 were lost. We are working to restore normal operations as quickly as
 possible. In the meantime, please resubmit your jobs. We are sorry for
 this inconvenience.

Sat. Feb. 20: Bugaboo /global/scratch file system unavailable

The /global/scratch filesystem can currently not be accessed from the head node (bugaboo) of the Bugaboo facility. Any process that attempts to access that filesystem or files in that filesystem will hang. The vendor has been contacted and is looking at the issue. However, the /global/scratch filesystem can be accessed from all the computenodes and bugaboo-fs. Thus, as a workaround, bugaboo-fs can be used to access files. Furthermore, running jobs that access the /global/scratch filesystem will run as usual.

Syndicate content