This is a guest column from Peter Eisengrein, SVP Network Operations & Design, Evolve IP
Last week one of the nation's largest carriers experienced an outage that affected tens, if not hundreds of thousands of Voice over IP users, maybe more. At least one carrier employee dubbed the outage "catastrophic" yet the news media shrugged. While not exactly a reliable news source, even social media, which is at least a quick indicator of newsworthy events, hardly noticed.
How can it be that a "catastrophic" outage that is so far reaching never made the news? Perhaps it is because it was a busy news week covering an actual catastrophe, the tragic Boston Marathon bombing. If this had been Google or Facebook or Twitter, however, it probably would've made headlines. People that were impacted by the outage certainly noticed, though. Maybe we've just become jaded to "typical" outages that are not caused by nefarious acts of hacking, and maybe vast network outages are the new normal.
The unofficial cause of the outage (the official reason for outage (RFO) has not been released, at the time this is being written) was "the result of a DNS issue" which prevented calls from the carrier's PSTN gateways from completing for nearly two hours. The same source that called the outage "catastrophic" also suggested that this DNS issue may have actually been a denial of service attack; it seems unlikely, even if this is true - and at the moment it is pure hearsay - that it will be included in the RFO. Why? If you were to Google that carrier + DDOS you would find that there is a complete business practice focused on DDOS protection.
DNS is a particularly curious cause since some (many?) of the carrier's customers and service providers connect via IP addresses, not hostnames, and therefore DNS services are not needed. So, perhaps this had more to do with routing of calls within the carrier's network as opposed to access routes to competitive VoIP providers and enterprises. Whatever the root cause is determined to be, it is clear that there is still work to be done to prevent these kinds of problems. Is this the new normal? I don't think so. While it is still not infallible and problems with core components and services, such as DNS, can have a significant impact, a distributed VoIP network offers a greater level of fault tolerance than traditional services ever could. And it will only get better as we learn from these outages.