The other day we were having our earnings announcement (can you get more visible?) and I walked into work at around 8:30am. I'd been getting paged every once in a while from our site monitors that something was occasionally failing or slow.
We looked around (my ops engineers and I) and saw a significant jump in incoming traffic - something north of 50mb/s - when we normally run about 3-5mb/s. The servers were still fine, as the switches were handling the attacks. Unfortunately it was only 8:30am, with our peak traffic loads hitting at around 10am. Looking at the switches we were running about 98% of capacity. The big question was did we have the capacity to live through the earnings announcment?
As the load goes up, we saw more pages from the monitors that some places were slow or couldn't connect... Earnings announcement is at 2pm PST. By noon we're pretty well maxed out on the switches, but we don't want to take downtime due to the earnings...
1pm. We attempt a content push - and it fails. This is the content that will be used for the announcement at 2. Great. We then attempt to reboot one switch, hoping against hopes we can bring it back before the announcment. The good news, it boots pretty fast (10 min). The bad news - the other switch succumbs to the load and dies (reboots). Switch 1 has some issues because switch 2 never really lets go. Great.
1:30pm I'm getting calls from all kinds of people - VPs, you name it. If we blow this, we're in serious trouble. One nice thing is the incoming traffic has started to fall off, the DOS is blowing over (but we're still semi-offline - some of the sites are fine, others are not so fine). We finally restart both switches and get them to split the load up as they're supposed to...
1:45pm: attempt the content push again. It finally goes through. Lots of sweat and hard work to get everything restarted, but in the end we get it all out there at *1:56pm*. Yikes. Way too close for me. I go back to my office and finalize the plans to move to the new network. We've tested that one to much higher levels of attacks and it's proven to be more resilient. Gotta get there soon, before I blow a blood vessel.
Sometimes I think network hardware providers are behind DOS attacks - I've been hit 3 times, and each one caused me to buy new network gear (newer stuff handles attacks better - there's no doubt about it)
