You are here

Grex

Grex System Status

Post date System Status: Update Notes
2017-02-07 - 10:59 PST Online

Lustre file system is working again.

Lustre file system on Grex is working again. Some of the running jobs might have expired while FS was not available.

2017-02-06 - 09:12 PST Testing

Lustre file system is having problems

Lustre filesysterm that serves /global/scratch failed again: another OSS server reboots at 9:30AM CST.
The files located  on OST's on that server are unavailable. Jobs queue is stopped.

2017-02-05 - 08:15 PST Conditions

Lustre file system is having problems

Lustre filesysterm that serves /global/scratch is restored; and the filesystem is available.

However, there appears to still be be an issue with one of Lustre's object storage servers. We will be working on resolving it.

System is availavle for user access and job queues are enabled.

2017-02-04 - 17:33 PST Testing

Lustre file system is unavailable

lustre filesysterm that serves /global/scratch became unresponsive on grex. We are working on restoring it and identifying the cause .

2017-01-16 - 11:10 PST Online

Grex Home filesystem is back

Most of the running jobs shoud have not been affected.

2017-01-16 - 08:20 PST Conditions

Grex Home filesystem unavailable due to high NFS load/utilization

Home filesystem on Grex is intermittenlty unavailable due to high NFS load/utilization. It may affect logging in to login nodes as well as performance of the running jobs. We are working on resolving the issue.

2016-09-08 - 12:51 PDT Online

Grex outage of Sept 7-8

Sept. 8, 2016: Our outage is finished. The system is available to connect and accepting batch jobs. During the outage, there were some alerts on jobs  being terminated by Moab; however, most of these jobs should be still working. Note that some of the older software modules that were not used on Grex, were obsoleted. If you miss  some software items, please contact us at support@westgrid.ca!

Pages