Many Irma compute nodes lost electric power -- FIXED

2017-09-25

Three racks of Irma's compute nodes lost power,because an automatic fuse shut down.

Some jobs were lost due to this.  We are very sorry about that. Please rerun those jobs that were affected.

It looks like nodes i[167-250] were affected.
 

So what was the reason? It looks like an ethernet switch diied, possibly short circuited, so the automatic fuse shut down, getting more switches and the compute nodes to go down.

We have error reported to our support vendor. Until the bad ethernet switch has been repaired or replaced, Irma runs with a fewer number of compute nodes.

Update at 0950 hours

Now only nodes i[179-226] are down.

System News