We’ve had some bad downtime this week. On Wednesday (2018-05-30) something got wedged and games stopped being recorded. Jim and I were both busy and not paying close attention and so I didn’t notice till Thursday, when I happened to pull up the page during some downtime while my hair was being cut. I ended up remotely debugging and fixing the problem on my phone which was a pain, but worked and made me feel like I was some sort of elite hacker.

Today (2018-06-02) our SSL cert ran out for some reason and so things weren’t working until Jim fixed it.

Also today, we hit our 2,147,483,647th game played! If you don’t recognize that number, it’s the largest 32 bit signed integer (0x7fffffff if you’re into hexadecimal notation). That means that no more games can be added because the number that identifies the games can’t get any bigger. This was kind of a stupid oversight on our part and is the reason you are seeing the euphemistic “The server is undergoing maintenance” message when you finish a game. When we started this site in 2005 we didn’t think it would ever be popular enough for a number that big to come into play. Last year I even read an article about this exact thing happening to someone else and felt pretty smug that we weren’t that dumb. Oops.

The annoying thing is that all those games take up a lot of space. We have our database on 2 disks (volumes, technically)—one is a 2TB SSD (the main high score tables are there) and is 77% full. The other is a 4TB hard disk and is 95% full. The only way to permanently fix the issue is to re-write the data out which means we need double the amount of space we currently have to fix it. That means buying more disks, which will take a couple days (I don’t think there’s any place locally I can buy them so we’ll have to mail order from Amazon).

The long and short of it is that currently high scores are down and it’ll take us a few days to get back up and running again. The message is true though, the scores are being written out to a different disk and when the db is alive again we’ll import all those scores. If you read that article I mentioned, you might have noticed they had a quick fix to delay the inevitable. We’ll might do that tonight and get things kind of working. But we’re going to have to do a permanent fix soon-ish and so you’ll probably be seeing more of the “undergoing maintenance” message in the next week.

Update (2018-06-03):

Our crusty old server decided to die in the middle of the night. Because it’s housed in a satellite office of our hosting provider, there was no one on staff to reboot it. I had anticipated this and made a backup of the machine about a week ago. Jim and I spent Sunday morning getting the backup restored onto the shiny new server (currently hosted at my house). That is currently what the main site is running on. The blog and the forum are still running on the old server. We’ll be moving those to the new server when the new disks arrive (Amazon says Wednesday).

Update (2018-06-05):

We’ve copied the database to another computer with enough space and updated the DB to the latest version (Postgresql 10 if you are curious). We’re currently converting the id column that ran out of room into a representation that can hold bigger numbers (ALTER TABLE for you SQL nerds). This is unfortunately a slow process due to the size of the data. We started it going last night and it looks like it’s maybe 30% done. During this time the DB is completely offline and that is causing the server code to…not be happy. It can’t authenticate users (because users are stored in the DB, too) and so it’s not even saving games. Sorry about that. When the conversion is complete we’ll bring the DB back online and scores should be immediately be working.

Update (2018-06-08):

The disks all arrived and are all installed. We’re still converting the id columns. It’s very slow because the database has to more or less rebuild itself completely. Also we had a bug in the conversion script and once it got halfway done (after about 20 hours) it died and reset all the progress. :-/ That’s fixed though so this time it should work (fingers crossed).

Update (2018-06-10):

The database id column conversion finally finished. Took 79 hours to run. I’ve pointed the site to the server where the database is temporarily housed and will begin copying the data back to where it is supposed to be. This temporary database is not stored on SSDs so things might be slow. We’re not sure the disks can handle the full Green Felt load. It still might be a few days before everything is smooth.