Stratus Technologies helps Boost the Efficiency and Fault Tolerance of NFV

As telcos become software telcos, they are beginning to shed some of their bespoke, proprietary hardware in exchange for NFV software running on OTS computers from white box companies, Dell, HP, Cisco, etc. This move follows similar trends in many markets – thus we’ve seen opinion pieces about how software is eating the world.

It’s all about abstraction really. Software running on smartphones for example can replace fixed-function remote controls. A great example is home stereo equipment provider Sonos and their app versus a hardware remote. Users gain familiarity with the software interface and can operate it on various types of hardware running iOS, Android or any other OS. To some degree this brings with it hardware commoditization – then again, I tread lightly with the idea as it seems to hold little meaning in Cupertino.

In other example, the cloud or something like PaaS can be thought of as mainframe abstraction… You no longer have to deal with the underlying hardware – you can access it through abstraction provided through various software interfaces. Yes, a compiler is another example.

A common concern as a network operator moves to software is – how to ensure the hardware solution has the necessary fault tolerance (FT) or at least high availability (HA). One option in the world of stateful HA is to have 1+1 redundancy which means utilization rates of 45-50% or less. The challenge of course is software telcos are supposed to become much nimbler and lower their costs in order to compete with OTT. OTS hardware is relatively cheap but if you don’t need to double the cost, your telco is in a better financial situation.

Enter Stratus Technologies – the company that was one of the pioneers in the fault tolerant server business back in the eighties along with Tandem Computers. In fact Stratus was launched in Mass along with many of big computer companies of the day like DEC, Wang and Prime.

Over the years, the company went through a bewildering array of acquisitions and spinoffs – suffice it to say, Stratus Technologies is still focused on providing HA solutions. Notice I wrote “solutions.” The point here is the company has taken the best of its fault tolerant, HA technology everRun and placed it in software.

There are a number of benefits to what Stratus has done. Because they have rewritten their technology to take advantage of OTS servers, companies can leverage HA at far lower price points. Moreover, there is tremendous flexibility afforded by the software-only approach. Just as you can assign a priority to an application running on an OS, you can easily – in an automated fashion determine how to distribute network functions – whether they need to be fault tolerant or just HA.


The result is Stratus now considers themselves to be a software defined availability company – an apt name for what they are doing. Moreover, their solution doesn’t require a rewrite of code. The service continuity has fully stateful resiliency and availability – all compatible with currently running applications.

I mentioned at the start that typical stateful fault tolerant systems have 45% utilization or so. Stratus says they have been able to get this number up considerably – as high as 80% in fact.


In fact, Ali Kafel, Sr. Director & Head of Telecom Business Development at the company had this to say about their technology in a recent phone conversation and email exchange:

The Stratus Cloud Technology provides fully stateful FT at much higher utilization because we don’t do 1+1 (like we did in HW) but instead N+k De-Clustered Redundancy where K is much smaller than N (in some small cases just 1). In this scenario, each primary VM has a shadow secondary VM (on a different server) that uses significantly less resources (about 6-10% of the primary).

This means there is no need to dedicate the same number of ali-kafel.jpgservers for just the backups. In fact, all the servers (including the K servers) will have secondary VMs and well as primary VMs with stand-up reserve capacity to standup the secondary to become primary if needed. This sophisticated level of resiliency automation is what enables our solution to dramatically increase utilization, and raise the efficiency of redundancy from traditional telecom levels of 45% to levels of 80%+, while providing uncompromised reliability, at a fraction of the cost.

The concept here is formally called the Stratus Automated Virtualized Resilience Layer and it currently works with KVM in OpenStack environments and Ali explained, it may eventually support VMWare. It can coexist with VMWare today. Moreover, there can be a common orchestrator and different levels of resiliency based upon function as I alluded to above. For example DDOS protection functions can be designated FT and voice just HA (isn’t it amazing that voice can be a lower priority than something else these days?).

The idea here is telcos have very complicated environments – the more automation you can apply to deploying VNFs, the better.


One other point Ali made is when the secondary host takes over, it automatically spins up a third host. With many traditional FT solutions, when a redundant system fails, it could take time… Minutes, hours, days to get another system up and running and a subsequent secondary failure could be disastrous.

Finally, he said, “One of the other key benefits that Telcos like about this solution is that they can take any application, like a traditional enterprise-grade firewall that may not be redundant and deploy it on the Stratus software NFVI with immediate and simplified FT, with no complex code development, testing and support.” He further exclaimed, “This gives them tremendous agility in their partner/vendor system because it immediately opens up VNFs that they otherwise would not have considered!”

I started this piece by talking a lot about abstraction. Supercomputers have become abstracted thanks to Linux… Android devices also have become generally abstracted and commoditized. In another example, RAID systems abstract the hard drives but add value through the software which makes a series of generally unreliable spinning disks a more reliable solution.

In fact, in some ways, RAID or a Clustered File System is very similar to what Stratus showed me. I’d propose in fact we could describe their HA technology when applied to NFV as Redundant Efficient Managed Virtual Network Functions or REMVNF. After all, who doesnt like more acronyms? Especially when they are six-letters long? smiley-laughing

John Donovan, Senior Executive VP of AT&T said roughly that as AT&T becomes a software company, they need to boost their utilization from 45% to 80-90%. Technology like what we’ve seen from Stratus will certainly help them and other carriers meet these goals in order to more effectively compete with OTT and other emerging threats.

To learn more – there is a sponsored webinar on the matter May 26, 2015 (archive available as well if you miss it.)

    Leave Your Comment


    Share via
    Copy link
    Powered by Social Snap