Skip to content

Downtime - Tuesday March 15, 2016 starting at 6:30 a.m.

Date Posted: February 29, 2016

UPDATE:

All clear (but for a few straggling nodes with hardware issues) announced at 3:32 a.m. on March 16th.


 

During the downtime there will be electrical work being done (replacement of breakers) in the DDC room that houses CHPC resources.  This will require us to power down everything that is not on the tier 3 (generator backed power).

Estimation for the electrical work is between 4 to 8 hours. Along with the electrical work, we will be doing normal maintenance on the hardware including software and firmware updates, OS updates and patches.  In addition the versions of the intel compilers and lmod (modules) will be updated.

While some of this maintenance  can be done while the power is out, much of it will have to take place once the power is restored.   If the electrical work is completed on schedule, we expect to be able to return the clusters to service sometime during the evening hours. However, if the electrical work goes beyond this window, the clusters may not be returned to service until sometime the next day. We will provide updates if this is the case. 

Systems affected: 

  • All compute clusters (apexarch, ash, ember, kingspeak, lonepeak and tangent) will be down.  This includes the compute and the interactive nodes, and includes frisco01-07, atmos01-04, meteo01-23 and WX1-4 servers.
    • Reservations have been set in the batch queues to drain all work from all clusters by 6:30am on 3/15.
    • All existing FastX sessions will be terminated.
    • Note that the standalone astro01-astro05 and the meso boxes will remain on during the downtime.
  • Kachina and Swasey will be down in the morning to receive updates and new network cards; a reboot will be necessary. Once this  process is done it will be returned to service as they are on the tier 3 power.
  • The fileservers will remain up during the downtime with the exception of homerfs in the protected environment which will be down for about an hour starting at 8am in order to add networking cards.
  • Each VM – both those in the general and in the PE farm – will be down for about 5 minutes during the course of the day as they need to be  rebooted in order to have OS updates take effect. In addition any VM in the PE that accesses homerfs will lose that access during the time given above.

A reminder along with any additional information will be sent out a few days before the downtime.

 

 

Last Updated: 6/10/21