System Notices

Wed, June 23 - UBC Orcinus Network Issue

At 2:30PM, we lost our connection between the Glacier and Orcinus clusters. As a result, we've temporarily lost our connection to all network file systems on Orcinus. We've created a trouble ticked with UBC Network Operations and will update this message as further details emerge.

Fri, Jun 18, 13:30 (Pacific): Snowpatch and Robson available again

The airconditioning unit has been repaired and the Snowpatch and Robson clusters have been powered on again.

Fri, Jun 18, 11:15 (Pacific): Snowpatch and Robson unavailable

Because of an air conditioning (AC) failure in the machineroom the Snowpatch and Robson clusters had to be powered off. All currently running jobs were terminated. At this point we have no timeline when the AC system will be functioning again.

The Calgary - Edmonton 10G maintenance

TICKET INFORMATION:

Subject: The Calgary - Edmonton 10G maintenance
Category: Scheduled maintenance
Ticket ID: 20100607-001
Start Time: 2010-06-22 02:00 MDT (2010-06-22 08:00 UTC)
End Time: 2010-06-22 06:00 MDT (2010-06-22 12:00 UTC)

== Created: Thomas on 2010-06-07 08:47 EDT(2010-06-07 12:47 UTC) ==

A DWDM system upgrade has been scheduled at the date and time shown
above. The Calgary - Edmonton 10G circuit will be affected and
unavailable for an approximately 15 minutes.

The affected lightpaths are:
Core Link Edmonton - Calgary
Core Link Cal-Win
ECONET VCTR2-EDMN1
NRNet VCTR - EDMN
NRNet VCTR - SASK
Neptune300 VCTR-SASK
RDC EDMN1-CLGR2
SRNet backup SASK - RGNA via EDMN
TRIUMF UBC - UofA
WestGrid 10G Cal - Sas
WestGrid Cal - Edm

CANARIE NOC
Operations and Engineering
Email: eng@canarie.ca
Weekdays: 08:00-17:00 EDT(UTC-5)
+1.613.944.5612
7/24 pager: +1.613.944.5611
http://www.canarie.ca/canet4/

June 1, 2010: Glacier, Emergency Air-Conditioner Maintenance (Part 1)

IT Services has asked us to suspend all new job scheduling for chassis 1 - 24 (ice1_1 - ice24_14) for the entire day.  We will resume scheduling new jobs once the repairs have been completed.  Please note that, in the very near future, we will also be required to stop scheduling new jobs for chassis 25 - 60 (ice25_1 - ice60_13).  However, at this time, we do not know exactly when this will happen.  More information will follow.  Also note that this will not affect any running jobs.

Silo/Hopper Down Monday, May 17, 2010 0900-1300 CST

Silo and hopper will be down for a short period of time on Monday, May 17, 2010, from 0900 to 1300 CST (Saskatoon time) to bring new storage online.  

 Please contact support@westgrid.ca with any questions or concerns.

May 16, 2010: Bugaboo not accessible

Due to a complete power outage on the Burnaby campus of SFU the bugaboo cluster cannot be accessed on May 16 from about 7:30am to 5:30pm (all times Pacific). The Bugaboo cluster itself will not go down (it is in the only building on campus that is not affected by the power outage), however, all networking will be down and therefore the cluster cannot be reached. Nevertheless, jobs are expected to continue to run through the outage.

May 14 - 17, 2010: Snowpatch, Robson, Gridstore unavailable

Due to a complete power outage on the whole SFU campus the Gridstore storage facility and the snowpatch and robson clusters need to be shutdown on May 14 around 18:00 (Pacific). The systems will be brought back up on May 17 around 10:00 am (Pacific). All jobs that are still running on May 14 at 18:00 will get terminated.

UBC Orcinus - Maintenance Completed

As of 4:00PM (PDT), all chiller maintenance has been completed.  Orcinus has been returned to normal processing production.

Silo/Hopper Down Tuesday, May 11, 2010 0900-1700 CST

Silo and Hopper will be down for routine maintenance on Tuesday, May 11, 2010 from 0900-1700 CST (Saskatoon time).  Regular maintenance, including firmware updates and GPFS updates, will be applied at this time.

Please contact support@westgrid.ca for any questions or concerns about this planned downtime.

 

Syndicate content