You are here

Grex

Grex System Status

Post date System Status: Update Notes
2017-07-21 - 14:53 PDT Online

Brief power outage on Grex

Brief power outage on Grex happened again at Jul 21, 4:30 PM local time; 105 of the nodes got rebooted as a result jobs running on them were lost.

2017-07-17 - 07:11 PDT Online

Brief power outage on Sunday

At about 6AM Sunday Jul 16th, Grex had a brief power outage which led to reboot of all compute nodes. All running jobs were lost. The system is now fully operational.

2017-07-06 - 10:47 PDT Online

System fully operational

System fully operational

2017-06-19 - 07:44 PDT Conditions

Grex Lustre filesystem runs on half of OSS

User access and Lustre are restored; however, one of the Lustre servers was taken out for further investigation. Most of the running jobs should not have been affected.

2017-05-30 - 12:15 PDT Online

Grex is fully operational

Grex is fully operational.

2017-05-29 - 14:15 PDT Conditions

Intermittent problems with Grex internal Ethernet switch

Grex's internal GigE switch is intermittently unavailable due to what appears to be firmware issues. This affects availability of internal Grex services such as Torque and Moab servers, LDAP authentification and such. Running and queued jobs and data should  not be affected, but logins to the system and commands like showq, qstat and qsub might intermittently fail. We are working on resolving the issue.

2017-05-21 - 17:24 PDT Conditions

Grex is open to test access

Our works on Grex storage update are almost complete. The system is open for access and running jobs, for now in test mode. More updates on status and documentation is to follow. Please contact support@westgrid.ca if you experience problems using it or accessing your data! Please CLICK HERE for more details on the new filesystem.

Pages