Skype Explains the Perfect Storm

Tom Keating : VoIP & Gadgets Blog
Tom Keating
CTO
| VoIP & Gadgets blog - Latest news in VoIP & gadgets, wireless, mobile phones, reviews, & opinions

Skype Explains the Perfect Storm

Skype Perfect Storm
So we now know that Windows Update (as I accurately theorized) was the culprit behind the Skype outage. So many people has been asking me why this latest Microsoft Update and series of patches was any different than previous Microsoft patches to cause the Skype meltdown. The answer in short was that it was a "Perfect Storm" of events that happened in combination that brought the Skype network to its knees.

The Skype Heartbeat explained it very well as follows:

2. What was different about this set of Microsoft update patches?

In short – there was nothing different about this set of Microsoft patches. During a joint call soon after problems were detected, Skype and Microsoft engineers went through the list of patches that had been pushed out. We ruled each one out as a possible cause for Skype’s problems. We also walked through the standard Windows Update process to understand it better and to ensure that nothing in the process had changed from the past (and nothing had). The Microsoft team was fantastic to work with, and after going through the potential causes, it appeared clearer than ever to us that our software’s P2P network management algorithm was not tuned to take into account a combination of high load and supernode rebooting.

3. How come previous Microsoft update patches didn’t cause disruption?

That’s because the update patches were not the cause of the disruption. In previous instances where a large number of supernodes in the P2P network were rebooted, other factors of a “perfect storm” had not been present. That is, there had not been such a combination of high usage load during supernode rebooting. As a result, P2P network resources were allocated efficiently and self-healing worked fast enough to overcome the challenge.

4. Has the bug been fixed? Should Skype users worry about future Microsoft Update patches and reboots?

Yes, the bug has been squashed. The parameters of the P2P network have been tuned to be smarter about how similar situations should be handled. Once we found the algorithmic fix to ensure continued operation in the face of high numbers of client reboots, the efforts focused squarely on stabilising the P2P core. The fix means that we’ve tuned Skype’s P2P core so that it can cope with simultaneous P2P network load and core size changes similar to those that occurred on August 16. We’d like to reassure our users across the globe that we’ve done everything we need to do to make sure this doesn’t happen again. We’ve already introduced a number of improvements to our software to ensure our users will not be similarly affected – in the unlikely possibility of this combination of events recurring.


Related Articles to 'Skype Explains the Perfect Storm'
fcc-logo.jpg
skype-stopped-working.jpg


Featured Events