Darwin's Theories Blog

New Theories for a New Time

Server Hardware Crash

2021-05-06
Usually-reliable Mac fail

On Tuesday afternoon (May 4, 2021), the computer that hosts darwinsys.com, ianonevs, and a bunch of other small web sites died, taking all the sites offline. The machine had been hosted with Clarity Hosting at their Front St datacenter in Toronto. While their pricing and service are top-notch, this was not the first time I’d had to travel to the data center, far from home, to remove or replace hardware, recover from software booboos, etc. I decided to give up, and take the path of least resistance, and move to the cloud. I went to the data center, retrieved the server, extracted the hard drive (which wasn’t the source of the hardware failure), set up a new virtual machine in the cloud, and started reinstalling stuff. The process is largely complete. However, some of the sites remained unusable for a short time.

One of the reasons is an engineering compromise. When you ask your browser to visit darwinsys.com or ianonevs.com, the browser doesn’t know the actualy numeric IP address for the site. The browser looks up the address in the DNS, the Domain Name System. DNS is a huge distributed database, too big (and changing too quickly) for any one computer to hold. So when you look up 'darwinsys.com', the DNS system on your browser asks your computer if it knows, then asks your ISP, then will ask the ".com" DNS server, which (if it doesn’t know) will ask darwinsys.com’s name server. The "if it doesn’t know" is the key to the problem. Because every step takes time, servers will 'cache' or hold onto the information for "a while". The "a while" is usually set by the domain administrator, and may be from minutes to hours to days. Setting it too low will result in a lot of load on a busy server. Setting it too high will result in less load, but if it has to change suddenly, there is no way to void the cached copies, to the ".com" server will continue to give everyone the outdated information for up to a day or two. And that, my friends, is why some of the domains I run appeared down - they were working, but you just can’t see them. Now you should be able to.