19
Self-Healing and Resilience in Future 5G Cognitive Autonomous Networks 26-28 November Santa Fe, Argentina J. Ali-Tolppa, S. Kocsis, B. Schultz, L. Bodrog, M. Kajo Nokia Bell Labs [email protected]

Self-Healing and Resilience in Future 5G Cognitive

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Self-Healing and Resilience in Future 5G Cognitive

Self-Healing and Resilience in

Future 5G Cognitive Autonomous

Networks

26-28 November

Santa Fe, Argentina

J. Ali-Tolppa, S. Kocsis, B. Schultz, L. Bodrog, M. Kajo

Nokia Bell Labs

[email protected]

Page 2: Self-Healing and Resilience in Future 5G Cognitive

26-28 November

Santa Fe, Argentina

Resilience

• “An ability to recover from or adjust easily to misfortune or change” Merriam-Webster Dictionary

Robustness

• “Capability of performing without failure under a wide range of conditions ” Merriam-Webster Dictionary

Page 3: Self-Healing and Resilience in Future 5G Cognitive

26-28 November

Santa Fe, Argentina

Why is resiliency important in 5G?

• 5G is by nature dynamic and complex

→Unforeseen circumstances are bound to happen

• Use cases requiring ultra-high reliability (URLLC)

Robustness (redundancy etc.) is no longer alone

enough!

Page 4: Self-Healing and Resilience in Future 5G Cognitive

26-28 November

Santa Fe, Argentina

How to design for resilience?

• Monitor and adapt

• Decoupling,

modularity

Focus

Common core

principles

Page 5: Self-Healing and Resilience in Future 5G Cognitive

26-28 November

Santa Fe, Argentina

How to design for resilience?

• Monitor and adapt

• Decoupling,

modularity

Focus

Common core

principles

Page 6: Self-Healing and Resilience in Future 5G Cognitive

26-28 November

Santa Fe, Argentina

Self-Healing in Radio Access Networks

Page 7: Self-Healing and Resilience in Future 5G Cognitive

26-28 November

Santa Fe, Argentina

Detecting Anomalies without Labelled Training Data

Which are anomalous?

Example 1 Example 2 Example 3 Example 4

Meaningless quest ion Red Green

Page 8: Self-Healing and Resilience in Future 5G Cognitive

26-28 November

Santa Fe, Argentina

Anomaly Detection

Feature selection

Relevant feature:

Color Shape Color and shape

Page 9: Self-Healing and Resilience in Future 5G Cognitive

26-28 November

Santa Fe, Argentina

Radio Access Network Anomaly DetectionFeature and context selection

• Input features include typically Performance Management (PM) Key Performance Indicators (KPIs) and Fault Management (FM) alarms, but other (additional) inputs can be used as well, e.g. log analysis

• Is the whole input space profiled, including cross-correlations, or only selected projections of it (single KPIs, selected KPI pairs etc.)

• Context needs to decided, e.g. will the profiling be done per network function or a group of network functions, hourly, diurnal profiles for network traffic dependent KPIs etc.

• In our work we used PM data only and created diurnal profiles for traffic-dependent KPIs and cross-correlations for selected KPI pairs.

Page 10: Self-Healing and Resilience in Future 5G Cognitive

26-28 November

Santa Fe, Argentina

Radio Access Network Anomaly DetectionSimple time-context dependent profiling of a timeseries

Page 11: Self-Healing and Resilience in Future 5G Cognitive

26-28 November

Santa Fe, Argentina

Radio Access Network Anomaly DetectionCross-correlation profiling with clustering

• First a clustering algorithm is applied, which omits the most probable outliers to clarify data

• Correlation is modelled only inside the clusters

• Can model also non-normal multivariate distributions

Page 12: Self-Healing and Resilience in Future 5G Cognitive

26-28 November

Santa Fe, Argentina

DiagnosisAnomaly event detection and diagnosis

• Anomalous timeframes are detected by using DBSCAN algorithm on the anomaly levels of selected features against their profiles

• By aggregating the selected feature (KPI) values in the anomaly event timeframe, the event is represented as an anomaly pattern

– The diagnosis feature set can be, and often is, different than what is used in the detection!

• The root causes of the detected anomaly patterns are diagnosed against a diagnosis knowledgebase

Time

KPI1

KPI2

KPI3

anomalous t imeframe

average value (KPI1)

average value (KPI2)

average value (KPI3)

KPI1

KPI2

anomaly pattern

KPI3

Page 13: Self-Healing and Resilience in Future 5G Cognitive

26-28 November

Santa Fe, Argentina

DiagnosisActive learning assisted diagnosis

a) A human operator provides the machine with his own interpretation of the data

– By attaching labels to anomaly points or clusters while considering information from step b)

b) The machine provides the operator with a structured view of the data

– By clustering the data points while taking into account information from step a)

loop

structured view of the data

interpretation of the data

rethink restructure

Page 14: Self-Healing and Resilience in Future 5G Cognitive

26-28 November

Santa Fe, Argentina

Holistic Self-HealingAcross domains and management areas in mobile networks

Coordination is required between the self-healing actions of, for example:

• Network Management (NM): Management automation aggregated on a (Virtual) Network Function (V)NF level

• Quality of Experience (QoE) driven management: Optimizing the end-to-end customer experience at the

application and individual subscriber level

• VNF and Service Orchestration

In a complex system, improving the resilience of only one part or level of organization can sometimes

(unintentionally) introduce fragility in another. To improve the resilience, it is often necessary to work

in more than one domain and scale at a time. - A. Zolli, A. M. Healy, “Resilience – Why Things

Bounce Back”

Page 15: Self-Healing and Resilience in Future 5G Cognitive

26-28 November

Santa Fe, Argentina

Knowledge CloudTransferring diagnosis knowledge

• Collecting the diagnosis knowledge base is significant effort

• It would be desirable to be able to diagnose previously unforeseen problems

• This could be mitigated by sharing diagnosis knowledge between self-healing function deployments

• However, translating, i.e. generalizing and re-applying, diagnosis knowledge from other

deployments is a difficult problem

– Transfer learning methods

Page 16: Self-Healing and Resilience in Future 5G Cognitive

26-28 November

Santa Fe, Argentina

Demonstration in SON Experimental System

Page 17: Self-Healing and Resilience in Future 5G Cognitive

26-28 November

Santa Fe, Argentina

Evaluation Results

• Fault injection in a testbed• Radio attenuation• Backhaul misconfiguration

• Background traffic increased until the optimization functions can no longer remedy the problem

• Our solution detected and diagnosed both conditions before they lead to a service degradation

Page 18: Self-Healing and Resilience in Future 5G Cognitive

26-28 November

Santa Fe, Argentina

Conclusion

• In 5G, networks are becoming ever more complex and dynamic

• At the same time, new use cases are requiring increased reliability

• We need intelligent resilient networks that can react to unforeseen problems and adapt to changes in their context

• A step in this direction is the self-healing method presented in this paper, based on anomaly detection and diagnosis

• We need methods to share knowledge and coordinate the self-healing actions across management domains, areas and deployments– Standardized interfaces not only for sharing data, but also for sharing

knowledge and machine learning models

Page 19: Self-Healing and Resilience in Future 5G Cognitive

Thank you