CHPC OUTAGE: No Remote Access to CHPC Resources - UPDATED
Date Published: August 23, 2023
8/24/2023 at 7:30pm
                           The /scratch/ucgd/lustre is back online, mounted on all redwood compute and interactive
                              nodes.
                           In addition, rw[079,106,114,119,171,020,022,132,202] have been returned to service, leaving  rw[010,187], with memory issues, and  rw[072,134, 183] as the redwood nodes
                              that remain down.
                           8/23/2023 at 5:45pm
                        Redwood is back in service:
                        - All of the interactive nodes are up.
 - /scratch/ucgd/lustre is still down. DDN Support has been engaged to work on the issues; for now the mount of this space has been removed from all nodes
 - There are a number of compute nodes that are still being worked on:
 - rw[010,106,114,119,171,187,020,022,132,202] - memory issues are being reported, so doing additional testing overnight
 - rw[079,072,134,183] - these nodes are either not responding or have other issues that will require further work to diagnose.
 
8/23/2023 at 10:00am
                     Most, but not all, of the CHPC resources are once again accessible.  The notable resources
                        that are not ready for use are the redwood cluster, including the interactive nodes,
                        and the PE /scratch/ucgd/lustre file system.
                     At this time CHPC staff is working on identifying additional resources that are not
                        accessible and working to bring them back online.  Once we complete this process, we will send out a notification of the resources that
                           need additional work.
                     If you notice any other CHPC resource that is not accessible, please send a report
                        to helpdesk@chpc.utah.edu 
                     8/22/2023 at 6:45pm 
                            
                        
                        At about 3pm there was a widespread disruption of campus IT services that is being
                              attributed to humidity issues in the datacenter. You can monitor the current status
                              at https://uofu.status.io/   At this time the is no estimated time for resolution.
                           These issues resulted in an outage for remote access to CHPC resources. The outage will continue until the campus level event has been addressed.  CHPC staff
                                 has been actively working to identify the impact on CHPC hardware, and so far we have
                                 determined issues with some network equipment in the PE.  We are working with support
                                 to get those addressed.  
                        Our current view is that once the campus issues are fully sorted that the general
                                 environment should be in good shape, but the PE depends on us addressing the issues
                                 we have found.