So, things fell down. Most things got back up. Here’s the over-view of a very intense week.
Most accidents are a series of unfortunate (and usually preventable) events. This server crash was no different. The first preventable issue, which did not manifest itself until after most of the shouting was done, happened when I screwed up a line in a back up script, which I made worse by copy/pasting that section of script to another section backing up a different site.
But, literally years later, Ubuntu added a series of updates to their stable repository. Complex back story exists, the short story story is these updates could not be installed automatically because they were part of an operating system upgrade, but I forced them to install, and this took my webserver offline. In attempting to fix that, I broke the operating system.
Since I have back up scripts which run daily, and my server is set up to separate the various websites from the operating system, I was not too worried. I wiped the server boot partitions, reinstalled the operating system, upgraded it, and began rebuilding the software to run the kind of server I need. Then I began restoring the websites from back ups.
Restoring was not 100% successful because a few sites had not been correctly backed up. A couple of them had a database accessing error, and another batch had backed up the wrong databases.
The rest of the week has been spent trying to get the restored sites stable, upgraded and secured where necessary, and recreating the broken services either completely or at least the infrastructure to rebuild the content. Some things are still not working completely, but for the most part 14 sites or services are back online.