How to meet the challenges of virtual network functions assurance

Next Generation Communications Blog

How to meet the challenges of virtual network functions assurance

By: Kevin Landry, Product Marketing Manager, Alcatel-Lucent

As they move into the cloud era, network operators need a service aware network operations tool to assure virtual network functions (VNF) management. They’ll need it to efficiently perform a variety of network operations tasks, including:

  • Service impact assessment
  • Fault localization
  • Identification of true root-cause (from symptomatic faults)
  • Taking corrective actions to resolve problems fast

As described in a vEPC post related to converging NMS and VNF manager functions within the ETSI Management and Orchestration (MANO) architecture, operators need to evolve their network operations tools for NFV through tighter coupling the NMS and VNF manager functions. Specifically for VNF assurance, the blog states “Troubleshooting is simplified because traditional NMS faults/events are correlated with VNF related events/faults. The VNFM provides lifecycle management and automates the self-healing of VNFs.”  

In addition to the ETSI MANO architecture, progress has been made in the ETSI specification for defining NFV Service Quality Metrics that strives to enable better engineering of VNF user service quality, more efficient fault localization and mitigation, and faster identification of true root cause of service impairment so proper corrective actions can be taken promptly.

As NFV service quality metrics and traditional network service performance are continuously monitored, a service aware infrastructure relationship model within a network operations tool will be important for it to be able to innately correlate events to the true root-cause of service impacting problems, without having to develop and pre-configure volumes of custom handling policy rules and scripts. In addition, this model will allow operators to perform a more rapid service impact assessment for network events under investigation, as well as speed fault isolation and resolution. 

And to make this more advanced fault management meaningful for network operators, assurance visualization will help by providing intuitive views for easily understanding how a multitude of events and key quality indicators (KQIs) relate to each other, with clear visibility into the root-cause of problems. It will also insightfully give operators an understanding of the time-line for events and state changes in the network to give a better indication of cause and possible effects.

This blog is the 2nd in a series that discusses the evolution of network and service assurance. The 1st blog gives a general overview on how network operations tools can be more efficient.


VNF configurations will be far more dynamic than with physical network elements (PNF), presenting new challenges for network operations tools to keep pace with many events related to highly dynamic network state changes and elastic scaling.

Manual processes that piece together assurance data from disparate views will not be sufficient to keep pace in this highly dynamic NFV environment. And traditional real-time-only monitoring and assurance views will not be effective when a VNF could be here in 1 moment and scaled down and gone in the next. This means that there is a need for both current and historical events and state information to be intelligently processed with near real-time performance, and at large scale. 

Consider how much more meaningful it would be for network operators if assurance views could be made more intuitive for easily understanding how all the network events and MANO related KQIs relate to each other. For example, wouldn’t it be more insightful for operators troubleshooting a service performance issue to have a timeline that shows the service impacting threshold crossing alerts (TCAs) as well as whether orchestration or network events occurred in the same general timeframe? 


As VNF deployments increase, network operations tools will need to evolve with new NFV service quality metric definitions and provide intelligence for correlating the multitude of different events coming from the various types of NFV infrastructure and MANO elements. Specifically related to troubleshooting and root-cause analysis that works in coordination with VNF lifecycle management, operators need service aware visibility and traceability to the various possible service quality impacting layers.

For operations to be effective in a highly dynamic environment with network services that depend on both VNFs and PNFs for underlying network infrastructure, there must be a service aware understanding of the relationships between services and these VNFs and PNFs. And equally important, there also must be a mapping of how service quality events triggered by virtual machines, VNFs, and orchestration layers impact or trigger changes in dependent layers. 

For example, when there are issues with virtual network provisioning latency or reliability or diversity compliance, these conditions may trigger actions within the orchestration layer. But as a primary concern of network operators:

  • How will these actions impact service quality?
  • And then how will they impact the virtual network?  
  • How will the VNF manager react? 

Without a network operation tool that can provide this type of intelligence for assuring VNFs, operators will not have the visibility needed to understand whether a problem is within the scope of their control. And this is the type of information would not only be highly valuable for troubleshooting, but even more broadly for clarifying accountability for a localized problem across various organization groups from IT to the different network domain groups. 

Operators require a unified network operations tool that has evolved with the intelligence to meet all of these new NFV related assurance challenges. This tool must possess a service aware model that is unified with NFV lifecycle management. It must scale and perform to keep pace with tracking huge volumes of events that reflect the continual state of flux of change across service quality impacting layers. (For more examples of service quality metrics that provide requirements for assuring virtual networks, please refer to the ETSI specification for defining NFV Service Quality Metrics.)


Operators deploying NFV require advanced fault management that provides both current and historical visibility for root-cause analysis, so that active faults can be correlated with past ones as the state of the network changes. This historical fault correlation is essential for pinpointing the root cause of problems in the highly dynamic virtualized network where MANO triggered corrective actions could potentially make intermittently reoccurring customer impacting issues difficult to investigate. 

And network and service assurance tools in the cloud /NFV era must scale to track the full history of related service impacting events so network operators can perform both real-time troubleshooting and trend analysis. 

Tools also need to have the intelligence to detect reoccurring problems. Specifically, operators require a tool that can help them to assess whether corrective resolutions that were automated are successful, or whether they are failing. And if failing, whether the failures are persistent or intermittent, and whether there is an actionable probable cause against the network infrastructure within the scope of the network operator’s control.  And amongst the high volumes of events, there will also be a need to suppress (or filter out) events that do not require an action by the network operations team.

The following video demo offers a deeper dive into an advanced fault management application from Alcatel-Lucent. 

and faster problem resolution




Related Articles to 'How to meet the challenges of virtual network functions assurance'

Featured Events