Home About

Kusari Maintenance 2: Electric Boogaloo

#maintenance #downtime #five day downtime only at KYUN DOT HOST #subaru #sowwy
~2 min read by naphtha, 2023-10-12

i like my uptime low, down low, down low

One of the disks in the RAID 5 failed a few days ago, pending replacement, and today I started bumping into the same errors I was experiencing back when the first RAID failed.

At this point I was ready to just refund everyone and discontinue Kusari because I can't just keep losing customer data this frequently and forcing them to reinstall everything, but luckily the datacenter admin managed to restore the RAID and we're back up.

The DC admin suggested it was a problem with the server hardware, and I can't help but agree. The SSDs are brand new, 100% health, enterprise SSDs, RAID controller firmware is the exact same one recommended by Dell, all SSD models are identical, they have no reason to fail this early, maybe cosmic rays[fact checked by snopes]? We decided to move to an HP server.

VMs on the current server are curently shut down and being backed up because the chances of data loss are very high if we just move the SSDs to the new server and try to reuse the RAID. I've disabled any user action on VMs so you can't start your servers, we figured the less drive activity the less chances the RAID fails again (and higher chances the backup completes faster).

ETA until your VMs are back up is ~30 hours max(hopefully)(update, it DEFINITELY did NOT take 5 days, that did not happen, only an insane person would believe it did). As always, your server renewal date will be extended by however long the downtime will be, plus a few days for the inconvenience.