4 Hours of downtime, sigh…
Posted by david on 13 Sep 2016 at 02:24 pm | Tagged as: Uncategorized
Our aging server stopped working last night at about 7pm. We couldn’t fix it remotely so we had to ask someone in the datacenter where the server lives to hard reboot the machine for us. Because of some sort of logistical issues (no one at the datacenter at that time), it took about 4 hours to get the machine rebooted. We don’t have any redundancy and so during that time we were just off the air.
Oddly, since the server has been rebooted the disks are acting differently (in a good way) and we I haven’t seen any of the “502 gateway timeout” errors that have been plaguing us for the last few weeks. We’ve been working on that (rather unsuccessfully so far), so it’s a little disappointing to see it magically fix itself. To investigate the 502s we gather a bunch of metrics use them to build neat graphs so that we can see what’s going on:
This graph shows both the outage and today’s lack of “502 Gateway timeout” errors—the outage is the huge 4 hour gap between the 2 vertical dashed green lines and the red line shows the 502 errors. Notice today it’s nice and flat (ahhhhh), while yesterday there were a bunch of ugly spikes during peak hours (US/Pacific time).
New Server
On the plus side, we have obtained a fancy new server, it’s got roughly 6 times the number of processor cores (with each core being twice the speed), and more than 30 times the memory. We’re (slowly) getting it ready. The old server was built in such a way that it was hard to move the programs around and keep everything working. We’re taking the more modern approach with the new one, but it means a lot of thought and planning up front so that everything is smooth (and possibly redundant) in the future. Jim and I are both pretty busy with our day jobs right now, so all this is happening in our spare time.
You guys do a great job, many thanks
I was starting to panic, did get some craft work done however, thanks as always for an awesome site.
Well thanks for getting it rebooted but there are problems. I tried to play Forty Thieves (game 1359826389) and had TWO identical cards (two of hearts). Tried to report the bug and it said I had to sign it, but I was already signed in (At this time I am number 2 on the Canfield leader board). Could not get past the page that said I had to sign in. Is there a difference between logging in and signing in? Is the server not working? Is the Forty Thieves software broken?
Forty Thieves is a two deck game, so there will be two of every card. The sign-in part is trickier—we use off the shelf forum software and the logins are tied to the main Green Felt login system. It’s supposed to be seamless, but it looks like something is confused. You might try logging out of Green Felt and then logging back in.
Once again, just want to thank you guys so much for all your hard work to keep this site going. It is my main source of stress relief! Thank you, thank you.
I visit daily. Tonight I have a little sign on every page/tab that I open in either chrome or explorer that has “4 hours of downtime, sigh…”
It is a little rectangle that appears on the top third of the right hand side, not just on Greenfelt pp, but every page, pdfs opened etc.
I couldn’t figure out where it would come from, googled it, and found it here http://blog.greenfelt.net/2016/09/13/4-hours-of-downtime-sigh/:
Why?
thanks,
Kathy
The only thing I can think of is that at some point you clicked on the RSS feed link and got subscribed. I don’t think chrome does anything with RSS, but Explorer might. Unfortunately I’m not familiar enough with Explorer to know where to look.
Thanks,guys. It’s a miracle to me that you develop and run so many online games that are challenging and yet easy to move around in. Now,Ilearn that the two of you also have day jobs, I am amazed. Bless you in both areas.
Thanks for all the good times.
Greystone
What is “Mpderation” in the context of these comments?
Greystone
The comments on this blog require approval before they appear for other people to see. We turned this feature on because we were having trouble with spam at one point.
Honestly – totally appreciate you efforts, especially since you both have day jobs. Dam the torpedoes, full steam ahead!