You are here
|Post date||System Status:||Update Notes|
|2013-10-31 - 17:04 PDT||Conditions||
Parallel available for testing
Parallel logins have been renabled after the long outage for file system and other system upgrades. The system is still being tested so should not be regarding as in full production yet. Most old Open MPI-based code will work without recompilation if module load openmpi/old is added to job scripts. Old GPU-based code will need module load cuda/4.1 to work without recompilation. We encourage you to log in to test and recompile your code as necessary, reporting problems to email@example.com.
For some additional background on the upgrades and current issues, please click here.
|2013-10-22 - 05:30 PDT||Offline||
Maintenance on Parallel continues due to unresolved file system issues. Click here for additional details. Sorry for the inconvenience.
|2013-10-13 - 22:18 PDT||Offline||
Ongoing file system checks have prolonged last week's maintenance window. Sorry for the inconvenience. It is not known when Parallel will be opened to users. This message will be updated when more information becomes available.
|2013-09-27 - 14:29 PDT||Downtime Scheduled||
Maintenance scheduled October 7 to October 14
Parallel will be down for a major system upgrade starting on Monday, October 7 at 8:30 AM and will be out of service until Monday, October 14.
The TORQUE and Moab batch scheduling system will be upgraded during that period. Please ensure that jobs submitted in the last few days before the maintenance have a walltime parameter that is sufficiently small to allow the jobs to finish before the system goes down. Any jobs waiting to run on the morning of October 7 will be deleted from the queue and will have to be resubmitted when the system comes back up.
Logins will not be available during the maintenance window, so, please make sure to retrieve any files you need prior to October 7.
Due to changes in shared libraries you may have to recompile and test software after the upgrade. Major packages in /global/software will be run on a test cluster in the week before the upgrade and recompiled if necessary, but, for some of the less frequently used software we will have to rely on the researchers to run their own tests and report any problems encountered.
Sorry for the inconvenience.
|2013-09-23 - 15:56 PDT||Online||
2013-09-23 Job scheduling resumed on Parallel
Scheduling has been resumed on Lattice, Parallel and Breezy after addressing a file system problem that affected quotas. Resolution of the problem required rebooting file servers and resulted in failure of some jobs to write output files. Please check output carefully and resubmit any suspect jobs. Sorry for the inconvenience.
|2013-09-19 - 20:01 PDT||Conditions||
2013-09-18 Job scheduling paused on Parallel
Scheduling has been paused on Lattice, Parallel and Breezy pending a solution to a problem affecting IBRIX file system administrative operations. This does not yet appear to have impacted running jobs, except in an isolated case in which a storage quota was incorrectly reported as being exceeded. Normal service will be resumed as soon as possible. Please accept our apologies for the service interruption.
|2013-08-12 - 21:15 PDT||Online||
System functioning normally
File system problems reported July 31 have been resolved.