Amazon EC2 Outage: What The Experts Tell Us

The recent Amazon outage of its EC2 service affected a number of leading-edge companies who in some cases used the fact that they housed little to no infrastructure on their premise as a selling point to investors. Certainly, the cloud computing market is in what we could call a post-evangelism phase where there seems to be universal agreement that the cloud has a role in most organizations – at least to help with some tasks – if not all. We know the concept of hosted solutions isn’t new – many companies outsource payroll for example or tax preparation.

But certainly, when the largest company providing cloud infrastructure has a major outage of one of its availability zones lasting days, it’s time to sit back and reflect on the challenges of moving wholesale to the cloud without thinking the concept fully through.

In order to get a sense of what some of the major players in the space had to say I reached out to a number of people to get their feedback. Here are some of the responses – as I get more, I will be updating this piece.

Thomas Howe with consulting firm Embrase, produces an event with TMC called Cloud Communications Expo and when I asked him about whether companies should relocate to the cloud he had this to say:

“Well, in general, they need to run in the cloud. The question isn’t is the cloud safe, the question is – is the cloud safer than what I can do? For nearly all companies, Amazon wins that battle.” When asked about how to limit the damage from such outages he suggested working with a few vendors like Amazon and Rackspace.

I reached out to John Engates, CTO Rackspace Hosting and he had this to say, “Companies should do their diligence with regard to the promises made by their cloud provider and the SLAs in place to back them up. Transparency is key. If the provider is not transparent with their architecture, it’s impossible to fully understand if what you’re building is going to be resilient and highly available on top of that provider.”

He echoed the sentiment of Howe explaining how cloud outages get major media attention but they happen all the time in corporate data centers. He further likened a cloud outage to an airline crash. He explained people will continue to fly after such a disaster but a thorough investigation generally results in many lessons learned for the entire industry. He said this happens in isolation – if at all in a data center.

Another important point is that resources can and should be considered disposable so rather than protecting a single server with high-availability components like you would see in a typical data center, you should build a group or cluster of servers to handle the job. He says the companies who used this approach in a multi-data center or multi-cloud manner generally survived the Amazon outage in stride.

I also reached out to Joe Staples, CMO of contact center solutions provider Interactive Intelligence as his company recently started to offer hosted services in addition to premise-based hardware and software. Joe agreed that cloud environments are generally more secure and reliable than data centers.

When asked about how customers can limit damage from such outages he responded, “Develop a solid business continuity plan. Look for alternate providers that can deliver basic services in the case of a primary outage. Ask the question, how long could we be without this service? If it is a critical application, then customers should spend the added money to ensure they have a solid business continuity plan to keep them up and running.”

The general consensus here is putting an application in the cloud will generally make it more secure than a typical data center but if you want it to be more resilient you need to take steps to architect a cloud-based solution which can withstand outages in specific data center locations or even across an entire cloud vendor.

And as more and more companies begin to seriously consider the move to cloud-based services, the timing of this outage could even be considered fortunate in hindsight.

thomas howe,ec2,amazon,embrace,interactive intelligence,rackspace