Maintenance window Wednesday 2017-01-04 -- FINISHED


We start at 0900 hours.

This time we will:

  • Upgrade Slurm and other system software  on Milou, Fysast1 and Tintin
  • Upgrade firmware on Milou and Fysast1

The firmware upgrade requires power cycling so Slurm queues are stopped. Queued jobs will start after the maintenance.

Login nodes on Fysast1, Milou, and Tintin will be rebooted once during the day (we will warn an hour ahead). Slurm commands. like sbatch and jobinfo will not be available all the time.

We will not stop Slurm queues on Tintin, Irma and Bianca. Maintenance on Irma and Lupus will be done next week, January 11th. 

This page will be updated during the maintenance, to keep you informed about our progress.

We plan to finish before evening (today, Wednesday).

Update at 1120 hours

Maintenance work continues.

Slurm has already been upgraded. Login nodes will probably be restarted at 1300 hours.

Update at 14:20 hours

Login nodes are upgraded and have restarted successfully.

Firmware upgrade continues.

Update at 17:00 hours

Most nodes on Milou and Fysast1 are successfully upgraded and back in production. The remaining nodes will be released later, when their upgrades are completed.

Update on Thursday at 1020 hurs

We need to make a second change to the Slurm installation, meaning that Slurm commands will not be available all the time. Jobs will keep running.

Now, we guess that we are finished with the maintenance sometime during today's afternoon.

Update on Thursday at 1530 hours

Still working with the Slurm upgrade. Still thinking that we will finish before evening.

Update on Thursday at 1640 hours

Slurm has been upgraded on Tintin, Milou and Fysast1. Nodes are still rebooting and are planned to be back in production within 2 hours.

Firmware upgrade failed on 4 (out of 26) chassis. This has been reported to the manufacturer for further troubleshooting. As a consequence, 32 Milou nodes are out of production, until this is solved. For the remaining chassis, the firmware upgrade was successful.

Old System News