Dallas Datacentre - Power Outage Sunday 26th May 2024 13:28:00

SUMMARY (latest update 1 Jun 14:30 EST)

On Monday 27th May, the database we use in Dallas experienced a power outage due to extreme storms that knocked out power for much of Texas, compounded by two generator failures and then by difficulties for datacentre staff to work through the weather on Memorial day. Because of the sudden power outage we experienced significant data corruption and multiple instances of hardware failure we have been working through. Apologies for the delays in restoring service and in customer support response during this time. The current remaining issues are as follows:

We have some isolated virtual servers that have corrupted MySQL. If you have a virtual server with issues please raise a ticket to help us find the issue faster.

For the server ns65v, all missing domains are now restored. We will be replacing the system with a new, hardware NVMe RAID server as soon as possible, and migrating away from the affected datacentre. We will send a further update as soon as possible regarding these long term fixes.

Update Sat 1 June: Issues on da75 have been resolved.

Update @ 18:30 EST May 31st - Some database tables on the da75 server have corrupted indexes which affects some operations. To prevent data loss we are currently fixing affected databases which requires rebuilding the MySQL install. This is in progress.

Update @ 15:50 EST May 29th - All accounts are now re-created on ns65v. Home directory files and email data were retrieved from the old server however due to the hardware failure, we needed to restore database contents from a backup taken on May 23rd / 24th, before the power outage. Apologies for the extended outage here.

Update @ 11:50 EST May 29th -Some msnode instances may have issues still, we are investigating.

Update @ 09:50 EST May 29th - The restore of ns65v is ongoing.

Update @ 23:00 EST May 28th - The ns65v server has been reinstalled and accounts are restoring from backups. Home directory and email data for 98% of accounts was stored on a separate partition to the failed disk, and is therefore unaffected. However, database data is needing to be restored from backups for all accounts which is taking some time. The failed disk was also sent to a specialist recovery firm earlier today, however due to continued severe storms in the Dallas area, the recovery has not yet begun due to power outages to that firm's own systems. Please note this is unrelated to our operations - at this time the supply to all 3 of our Dallas datacentres is operating normally.

Update @ 14:00 EST May 28th - We are continuing to work on the ns65v server. The primary OS storage disk has failed and is being reinstalled. Once this is complete we will be able to assess further. We do have backups stored on an external server, if you need a copy of your account (we can also deploy this to a new server directly) please contact us.

da77venom.stableserver.net is operating normally. Crashed database tables have been repaired.

Update @ 10:05 EST May 28th - The server da77venom.stableserver.net is now back online.

Update @ 09:16 EST May 28th: We have restored services on most of the affected systems. There are 2 servers with ongoing issues and we are working on getting them up and running.

Update @ 06:45 EST May 28th: We are continuing to work through filesystem issues and replace failed hardware.

Update @ 16:44 EST Monday 27th May: The datacentre experienced a full power outage due to storms in the area. Recovery efforts are ongoing however are hampered by the scale of the outage with limited datacentre staff available to repair essential systems. Many services are restored and we are working to repair filesystems and restore connectivity to one rack which suffered multiple hardware failures. Apologies for the extended downtime, we are working as fast as possible to restore all systems.

The following servers are currently being affected by the issue:

We are currently investigating the matter and are trying to rectify it as soon as possible.

da75venom.stableserver.net - we've identified the root cause with some databases, and are currently working on a fix.

Issue with venomx92.stableserver.net have been resolved.

Issue with venomx90.stableserver.net have been resolved.

Update: The issue has reoccured, we are investigating.

Update #2: The issue has been resolved.

Issues with pls5.stableserver.net have been resolved.

We have resolved the issues with the following servers - da71venom and da72venom.

Our Data Center experienced a total power loss, which affected all connected services. Power has been successfully restored, and we are in the process of recovering all servers impacted by the outage. However, one network segment remains offline due to a hardware failure. We are currently working on bringing replacement hardware online to resolve this issue as quickly as possible

We have resolved problems with the following servers: ns75venom.stableserver.net ns71venom.stableserver.net da79venom.stableserver.net

We're working on bringing back the remaining servers.