97
SUCCESS D4.6 v1.0 Page 1 (97) SUCCESS D4.6 v1.0 Description of Available Components for SW Functions, Infrastructure and Related Documentation, V3 The research leading to these results has received funding from the European Union’s Horizon 2020 Research and Innovation Programme, under Grant Agreement no 700416. Project Name SUCCESS Contractual Delivery Date: 30.04.2018 Actual Delivery Date: 30.04.2018 Contributors: RWTH, EDD, P3E, P3C, LMF, ENG, KTH Workpackage: WP4 – Securing Smart Infrastructure Security: PU Nature: R Version: 1.0 Total number of pages: 97 Abstract: This document describes in detail the functionality of the components of the Critical Infrastructure Security Analytics Network and communications parts of the SUCCESS Security Solution and the interfaces in the SUCCESS Security Solution. In addition, it defines a set of security countermeasures to threats and describes the way that the countermeasures are executed by means of the components in the SUCCESS Security Solution. Hence, this deliverable maps the SUCCESS countermeasures to functionality of the components and the interfaces between the components. Keyword list: Security, communication, Critical Infrastructure, Architecture, Threat, Countermeasure, Security Monitoring Solution Disclaimer: All information provided reflects the status of the SUCCESS project at the time of writing and may be subject to change.

SUCCESS D4.6 v28

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 1 (97) SUCCESS D4.6 v1.0 Description of Available Components for SW Functions, Infrastructure and Related Documentation, V3 The research leading to these results has received funding from the European Union’s Horizon 2020 Research and Innovation Programme, under Grant Agreement no 700416. Project Name SUCCESS Contractual Delivery Date: 30.04.2018 Actual Delivery Date: 30.04.2018 Contributors: RWTH, EDD, P3E, P3C, LMF, ENG, KTH Workpackage: WP4 – Securing Smart Infrastructure Security: PU Nature: R Version: 1.0 Total number of pages: 97 Abstract: This document describes in detail the functionality of the components of the Critical Infrastructure Security Analytics Network and communications parts of the SUCCESS Security Solution and the interfaces in the SUCCESS Security Solution. In addition, it defines a set of security countermeasures to threats and describes the way that the countermeasures are executed by means of the components in the SUCCESS Security Solution. Hence, this deliverable maps the SUCCESS countermeasures to functionality of the components and the interfaces between the components. Keyword list: Security, communication, Critical Infrastructure, Architecture, Threat, Countermeasure, Security Monitoring Solution Disclaimer: All information provided reflects the status of the SUCCESS project at the time of writing and may be subject to change.

Page 2: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 2 (97) Executive Summary This document has two parts: firstly it defines a set of security incidents which correspond to groups of cyber-security threats and defines countermeasures to mitigate the security incidents, together with a mapping onto the components and functions that implement the countermeasures. Focus is put on the countermeasures related to the use cases to be demonstrated in the SUCCESS field trials. The detection of the security incidents for Critical Infrastructures and the implementation of the mitigating countermeasures is performed by the SUCCESS Security Solution, the architecture of which is described in deliverable D4.3 [6]. The second part of this document gives a detailed description of the components in the SUCCESS Security Solution which have been developed by SUCCESS WP4, as instantiated in the infrastructure developed by the SUCCESS project itself. The Critical Infrastructure Security Analytics Network (CI-SAN) is a wide-area security analytics network, intended to perform security analytics over Critical Infrastructures on a continent-wide basis. The CI-SAN is formed from two types of nodes: Security Analytics Nodes (SA Nodes) and Security Data Concentrators (SDCs). The SA Nodes perform the security analytics. The distributed SDCs act as information collecting and distributing agents towards the Critical Infrastructures. The SUCCESS project’s has three field trial sites in different European countries. Its instantiation of the CI-SAN has an SDC for each of these field trial sites and a single SA Node. This document also describes in detail the Breakout Gateway (BR-GW), which is a new 5G mobile communications node being developed in SUCCESS, which allows mobile core network functionality to be implemented on the edge of the network (e.g. in a cloud system located on the radio mast). The Breakout Gateway supports Data Centric Security (checking of packet integrity without decoding the packets) and Generic Bootstrap Architecture (authentication performed locally on the BR-GW). Countermeasures are also implemented through the BR-GW. Other components which form part of the SUCCESS Security Solution and are being developed by WP3 of the SUCCESS project are the Critical Infrastructure Security Operations Centre (CI-SOC), which is described in D3.6 [4] and the Next-generation Open Real-time Smart Meter (NORM), which is described in D3.8 [5]. This version of this deliverable documents the current status of the design of the SUCCESS Security Solution. It is the third and final version of this deliverable, updating and replacing D4.4 and D4.5.

Page 3: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 3 (97) Authors Partner Name e-mail RWTH Aachen University (RWTH) Padraic McKeever [email protected] Gianluca Lipari [email protected] OY L M ERICSSON AB (LMF) Patrik Salmela [email protected] ERICSSON GmbH (EDD) Dhruvin Patel [email protected] Zain Mehdi [email protected] Frank Sell [email protected] Robert Farac [email protected] P3 ENERGY & STORAGE GmbH (P3E) Manuel Allhoff [email protected] Engineering – Ingegneria Informatica SPA (ENG) Antonello Corsi [email protected] Giampaolo Fiorentino [email protected] P3 Communications GmbH (P3C) Panagiotis Paschalidis [email protected] KTH Royal Institute of Technology (KTH) György Dán [email protected] Peiyue Zhao [email protected]

Page 4: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 4 (97) Table of Contents 1. Introduction ................................................................................................. 7 1.1 How to Read This Document ......................................................................................... 7 2. List of Security Incidents and Outline of Countermeasures ................... 9 3. Mapping of Threats to Security Incidents and Countermeasures ........ 14 4. SUCCESS Components and Interfaces Description .............................. 18 4.1 SUCCESS Security Solution Components .................................................................. 18 4.2 Communication Components....................................................................................... 20 4.2.1 State-of-the-Art of Mobile Networks (4G/LTE) ..................................................... 20 4.2.2 SUCCESS Security Solution based on 5G Mobile Networks .............................. 21 4.2.3 Communication Solution for SUCCESS Security Solution .................................. 21 4.2.4 Break-out Gateway (BR-GW) ............................................................................... 22 4.2.5 5G communication Test Lab Setup ...................................................................... 28 4.2.6 Communication and Computing Resource Orchestrator for Resilience .............. 29 4.3 Security Monitoring Components ................................................................................. 30 4.3.1 Critical Infrastructure Security Operations Centre ............................................... 30 4.3.2 Critical Infrastructure Security Analytics Network ................................................ 30 4.4 SUCCESS API ............................................................................................................. 41 4.4.1 Definition and Motivation ...................................................................................... 41 4.4.3 Discussion on Data Models describing Cyber-Security Incidents ........................ 44 4.5 Data Model for the SUCCESS API .............................................................................. 48 4.6 Interfaces in the SUCCESS Security Solution ............................................................. 50 4.6.1 I1 between NORM and BR-GW or CI-SOC ......................................................... 50 4.6.2 I2 between BR-GW and CI-SOC .......................................................................... 52 4.6.3 I3 between Critical Infrastructure Security Operations Centre and SDC ............. 54 4.6.4 I4 between SDC instances ................................................................................... 56 4.6.5 I5 between SDC and SA Node ............................................................................. 57 4.6.6 I6 between SA Node/SDC and Critical Infrastructure-External Data Sources ..... 58 4.6.7 I7 between CI-SOC and Critical Infrastructure-internal data sources .................. 58 5. Conclusion ................................................................................................ 59 6. References ................................................................................................. 60 7. List of Abbreviations ................................................................................ 62 8. List of Figures ........................................................................................... 64 9. List of Tables ............................................................................................. 65 A. Descriptions of Security Incidents and Countermeasures ................... 66 A.1 Incident/countermeasure CS-1 .................................................................................... 66 A.1.1 Description ........................................................................................................... 66

Page 5: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 5 (97)

A.1.2 Related SW Functions.......................................................................................... 69 A.1.3 Related Infrastructure ........................................................................................... 69 A.2 Incident-countermeasure CS-2 .................................................................................... 70 A.2.1 Description ........................................................................................................... 70 A.2.2 Related SW Functions.......................................................................................... 70 A.2.3 Related Infrastructure ........................................................................................... 70 A.3 Incident-countermeasure CS-3 .................................................................................... 72 A.3.1 Description ........................................................................................................... 72 A.3.2 Related SW Functions.......................................................................................... 72 A.3.3 Related Infrastructure ........................................................................................... 72 A.4 Incident-countermeasure CS-4 .................................................................................... 74 A.4.1 Description ........................................................................................................... 74 A.4.2 Related SW Functions.......................................................................................... 74 A.4.3 Related Infrastructure ........................................................................................... 74 A.5 Incident-countermeasure CS-5 .................................................................................... 75 A.5.1 Description ........................................................................................................... 75 A.5.2 Related SW Functions.......................................................................................... 75 A.5.3 Related Infrastructure ........................................................................................... 76 A.6 Incident-countermeasure CS-6 .................................................................................... 77 A.6.1 Description ........................................................................................................... 77 A.6.2 Related SW Functions.......................................................................................... 77 A.6.3 Related Infrastructure ........................................................................................... 77 A.7 Incident-countermeasure PS-1 .................................................................................... 78 A.7.1 Description ........................................................................................................... 78 A.7.2 Related SW Functions.......................................................................................... 78 A.7.3 Related Infrastructure ........................................................................................... 78 A.8 Incident-countermeasure PS-2 .................................................................................... 79 A.8.1 Description ........................................................................................................... 79 A.8.2 Related SW Functions.......................................................................................... 79 A.8.3 Related Infrastructure ........................................................................................... 79 A.9 Incident-countermeasure PS-3 .................................................................................... 80 A.9.1 Description ........................................................................................................... 80 A.9.2 Related SW Functions.......................................................................................... 80 A.9.3 Related Infrastructure ........................................................................................... 80 A.10 Incident-countermeasure PS-4 .................................................................................... 81 A.10.1 Description ................................................................................................. 81 A.10.2 Related SW Functions ............................................................................... 81 A.10.3 Related Infrastructure ................................................................................ 81 A.11 Incident-countermeasure PS-5 .................................................................................... 82 A.11.1 Description ................................................................................................. 82

Page 6: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 6 (97)

A.11.2 Related SW Functions ............................................................................... 82 A.11.3 Related Infrastructure ................................................................................ 82 B. Interfaces SUCCESS API I3 ...................................................................... 83 C. SUCCESS API Data Models ...................................................................... 87 C.1 IODEF Data Model ....................................................................................................... 87 C.2 IDMEF Data Model ...................................................................................................... 93

Page 7: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 7 (97) 1. Introduction The architecture of the SUCCESS Security Solution is described in D4.3 [6], which also provides the background of SUCCESS and motivates the need for the SUCCESS Security Solution. D4.3 describes the architecture of the SUCCESS Security Solution from the perspectives of the Critical Infrastructures, communications and security. This conceptual architecture is not specific to the SUCCESS project but may be instantiated in different particular systems. The SUCCESS project has implemented an instantiation of the SUCCESS Security Solution and demonstrated it in the SUCCESS project’s field trials in Ireland, Italy and Romania [8] [9], where the components of the SUCCESS infrastructure work together to detect security threats and incidents to the Electricity Distribution Grid’s management and communication systems and execute countermeasures which mitigate these threats or incidents. Several of the components used in the SUCCESS Security Solution are described in this document:

the Critical Infrastructure Security Analytics Network (CI-SAN), which is a network made up of instances of two types of sub-components, the Security Data Concentrators (SDC), which locally collect and aggregate data on DSO/TSO level, and the Security Analytics Nodes (SA Node) which collect data from several SDC instances across Europe. In the case of the SUCCESS project’s instantiation of the SUCCESS Security Solution, only one SA Node instance will be instantiated. This single instance is therefore at the apex of the SA Node hierarchy and is the Co-ordinating Critical Infrastructure Security Analytics Centre. the Critical Infrastructure Security Operations Centre, which detects security incidents and applies countermeasures on DSO level; the Breakout Gateway (BR-GW) which is located at the edge of the mobile communication network (in the eNodeB, i.e. radio base station, see Ch. 4.2.1 below for details explaining the mobile communications network) and implements mobile core network functionality there, as well as detecting cyber-attacks on the data communications and implementing countermeasures. This document also describes the interfaces between all the SUCCESS components. Other SUCCESS Security Solution components are described in [4] (CI-SOC) and [5] (NORM). In addition, this document defines a set of security countermeasures to security incidents resulting from identified threats and describes the way that the countermeasures are executed by means of the components in the SUCCESS Security Solution. Hence, this deliverable maps the SUCCESS countermeasures onto the components’ functionality and the interfaces between the components. The SUCCESS field trials will implement use cases to demonstrate countermeasures and verify that the SUCCESS Security Solution mitigates cyber-threats to smart grid infrastructures. All the use cases fall into the category of incident and countermeasure CS-1 (Device behaving suspiciously in terms of communication pattern or content), discussed in Annex A.1. 1.1 How to Read This Document This version of this deliverable documents the current status of the design of the SUCCESS Security Solution. It is the third and final version of this deliverable, updating and replacing D4.4 and D4.5. Before reading this document, the reader should read D4.3 [6], which motivates the SUCCESS project and describes the SUCCESS Security Solution. The analysis of cyber-security threats to the critical infrastructures made in D1.2 [1] should be consulted to understand the purpose of the countermeasures presented in this document. Additionally, further details of the CI-SOC, which is briefly described in this document, are also available in D3.6 [4] and the NORM Smart Meter Gateway is described in D3.8 [5]. This document is an output of Task 4.2 of SUCCESS, which is concerned with architecture definition. It is the third release of this document. Note on terminology: SUCCESS uses the following terms:

Page 8: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 8 (97)

A threat is a way to perform an attack (cyber-attack or physical-attack) on the system (where the system being considered in SUCCESS is the electrical grid and its related systems, i.e. the grid itself and its electrical components, the associated communications networks and the systems managing the grid and the communications). In the context of SUCCESS, the components in the SUCCESS Security Solution can themselves be the subject of attacks and are also therefore, part of the system which is under threat An attack is the manifestation of a threat, i.e. the hypothetical threat is made real by an attacker and becomes a reality A security incident is a suspected attack (cyber- or physical) which has been detected by the SUCCESS Security Solution. A countermeasure is the procedure followed to mitigate the verified security incident. The structure of this document is as follows: The rest of the document comprises a list of security incidents and applicable Countermeasures (Ch.2). Ch. 3 describes the mapping from the threats identified in D1.2 [1] to the identified incidents and their countermeasures. Ch.4 describes the environment of the SUCCESS project and the components used, with sub-chapters detailing the communication components and the Critical Infrastructure Security Analytics Network. SUCCESS is concentrating on how mobile 5G communications can be applied to Critical Infrastructure communications to provide enhanced security functionality. The 5G communication features used by SUCCESS are described in Ch.4.2. The Critical Infrastructure Security Operations Centre (CI-SOC) and the Critical Infrastructure Security Analytics Network (CI-SAN) are described in Chs. 4.3.1 and 4.3.2, respectively. o The CI-SOC monitors and generates countermeasures for particular local Critical Infrastructures (electricity distribution grids in the case of the SUCCESS Security Solution). Full details of CI-SOC are given in [4]. o CI-SAN is intended to monitor a large area and to alert authorities and/or Critical Infrastructure operators in case of security incidents. In the SUCCESS project’s instantiation of the SUCCESS Security Solution, CI-SAN comprises a central instance (SA Node) and several decentralised instances (SDC), which interwork (i.e. be able to connect, communicate, or exchange data) with the local CI-SOCs. In contrast to the CI-SOC, CI-SAN will not directly initiate countermeasures. Rather, the cyber-attacks detected by CI-SAN will be made known to the authorities and/or operator of the Critical Infrastructure (e.g. DSO or TSO) who is responsible for initiating the countermeasures. The supporting DSO/TSO SCADA system will not be included in the scope of the SUCCESS project. The interfaces between the components of the SUCCESS Security Solution are outlined in Ch. 4.3.2. Annex A includes a set of sub-sections describing in detail the security incidents and outlining countermeasures for them and how they will be implemented by the components of the SUCCESS Monitoring Solution. The countermeasures dealt with in this Annex are introduced in Ch. 3. Annex B gives detailed definitions of the interfaces described in Ch. 4. The document covers two different, but related, broad subjects: 1) details of the components and interfaces in the SUCCESS Security Solution and 2) details of the threats and countermeasures. The reader interested in getting details about the SUCCESS Security Solution, its components and interfaces is recommended to begin by reading Chapter 4. Readers who are interested in the details of the threats and countermeasures are directed to Chapters 2 and 3.

Page 9: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 9 (97) 2. List of Security Incidents and Outline of Countermeasures SUCCESS deliverable D4.3 [6] discusses security configuration and measures that should be taken to provide a good basic security for the smart grid and which can already eliminate many of the attacks and threats identified in deliverable D1.2 [1] on the communications, the utility network and the physical devices. However, any such preventive measures cannot be considered sufficient to prevent attacks. The ability to detect attacks and initiate countermeasures is also needed. The combination of the basic security configuration and measures with active attack detection and mitigation complement each other to provide defence in depth, and making it more difficult for the attacker to succeed. The basic requirement to identify security incidents is monitoring of the network and the nodes in it. Effectively, the power grid can only be monitored through the communications network, so that the network being monitored is a hybrid containing both the power grid and the communication network. Hence, the network in question here comprises both the power grid and the communication network it uses, and the nodes comprise equipment in the power grid and equipment in the communication network. The ideal case is that attack attempts are identified, and blocked, as they happen, e.g. when an attacker tries to gain access to a node in the system. This means that the node itself has to raise an alarm when it notices things such as an intrusion attempt, multiple failed login attempts, port scans etc., which gets the attention of security personnel that can monitor the situation and take actions if needed. Ways to minimise the attack surface include hardening the nodes, requiring all software being installed be signed, and using a hardware root of trust, which can verify signed software, detect modifications to the system etc.. Periodic remote attestation could also be performed to verify the state of the nodes. These precautions are good (but not guaranteed) ways of tackling hackers, while they are less efficient against insider attacks by attackers who have some level of authorized access to the nodes/system. If an attacker is able to gain access to a node undetected, indications of his presence may include devices not responding or not operating according to what is expected, e.g. exceptions to regular communication patterns or message type and content. However, identification of the root cause of the incident might not be trivial as multiple threats could result in similar type of results, such as a node not responding due to a physical attack on the node, a network-based attack/hack on the node (e.g. Denial of Service (DoS) attack) or the network itself, or a natural disaster or accident that has destroyed the node or rendered it useless. Of course, sometimes what is identified as an incident turns out to be regular operation, i.e. a false positive. Therefore it is often good to analyse and verify the incident before reacting to it (although some incidents might require immediate action and full analysis might not be feasible). Once a security incident has been detected and verified, it should be mitigated by following a pre-defined procedure, consisting of a number of steps or parts. This is referred to as a countermeasure, which is used for responding to the incident and minimising the effect of the incident. This deliverable outlines multiple incidents and the countermeasures that could be applied to them to solve the situation. Some of these will also be implemented in trial sites, with the focus being on the incident and countermeasure types outlined in CS-1 described in Annex A.1, where there are trial specific subsections describing the particular scenario. Countermeasures will be chosen as a combination of atomic actions, depending on the exact nature of the security incident. Normally, countermeasures are implemented sequentially, depending on the specific nature of the incident. Therefore, we perform a categorisation of the major types of security incidents in Ch. 3 below. This categorisation will be further detailed in the project deliverable D1.4 [2]. The set of security incidents is more generically defined and is not expected to need modification so often, while the countermeasures associated with them might be revised over time. The background to this is that cyber-attacks are constantly evolving, thus potentially requiring evolved countermeasures as well, but the effect they have on ICT systems, such as converged IT/OT systems, (as represented by the security incident) remains constant, e.g. loss of service. Hence, we focus on the incidents rather than the use of specific individual countermeasures. Furthermore, the defined countermeasures are more of a blueprint and might in many cases need to be adjusted for the particular incident at hand. This means that the countermeasure library will grow over time as new variations of the incidents occur and suitable countermeasures are defined.

Page 10: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 10 (97) Table 1 and Table 2 contain countermeasures to incidents related to cyber security and physical security, respectively. The tables provide a high-level description of the countermeasures, while a more detailed description is given in Annex A. The last column of the tables identifies the threat groups from deliverable D1.2 [1] to which the incident and countermeasure applies. The Ponemon 2017 Cost of a Data Breach Study [28], which also includes organizations from the energy sector, shows that detecting a breach in an organization takes around 200 days. The study considers a breach as “an event in which an individual’s name and a medical record and/or a financial record or debit card is potentially put at risk”. Even if the definition of breaches they have been looking at does not match 100% to how a breach in the SUCCESS system would be defined, it is close enough that we could consider the detection time as a ballpark figure estimate for what could be reality for example in a smart grid. This means that from the time an attacker manages, for example, to gain access to the system it can potentially take months until it is detected. How fast an attack is detected depends a lot on how the attacker behaves. An attacker that gains credentials to a system but just monitors it is less likely to be detected promptly compared to an attacker that just modifies messages on the path without proper credentials. The latter should be detected immediately as the modified messages will not pass integrity checks. The Ponemon study also indicates things that affect the cost of a breach both negatively and positively, where threat sharing is in the top 5 of cost reducing factors, thus validating the approach of SUCCESS. Assuming a system which is applying proper security protocols and configurations, e.g. as described in D4.3, an attacker could still potentially gain access to the system through hacking, social engineering, or through a disgruntled employee. The attacker could then use the gained access for performing malicious activities, e.g. a man-in-the-middle (mitm) attack in order to disrupt the system, monitor traffic, or even modify it. Detecting a security incident should optimally happen before the actual attack (mitm attack in our example) is launched, i.e. when the attacker tries to gain access to the system. However, this might not always be possible; if the attacker is good enough, he might find a vulnerability in the system that lets him gain access without the system/detection system noticing it. The next phase where the attack could be detected is when the attacker uses the system, e.g. by acting as a mitm. However, as discussed in the previous paragraph, related to the Ponemon study, detection of this activity is not trivial, and it might go on undetected for a very long time. A mitm attack is not an example of a security incident for which we define a countermeasure, rather we do it for the sub attacks needed for applying a mitm attack, e.g., physically breaching the casing of a device, which could be part of the setup phase of the attack where the attacker gains access to the system, or noticing strange type of data being communicated in the system, which might be due to an attacker modifying the traffic as a mitm. The term “device” is meant to be understood as any physical device in the system, such as NORM, on which a (potentially virtualised) function is being run, or as any part of a bigger system that either sends, receives or handles communication. The scope of the countermeasures described in this chapter and detailed in Annex A is the whole of the SUCCESS Security Solution, i.e. including components and functions which are not described in Ch. 2 above, e.g. NORM (see [5] or Double Virtualisation (see [3]).

Page 11: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 11 (97) Table 1: Cyber-security related incidents and countermeasures Incident Label Incident Description Countermeasure Targeted threat groups CS-1 Device behaving suspiciously 1. Move the data or applications to another physical/logical zone 2. Perform remote attestation to verify device state 3. If state is OK, red-flag the device a. Temporarily disconnect device from the grid b. Investigate T000, T100, T200, T300, T400, T410, T420, T700 CS-2 Remote attestation fails (after step 2 of CS-1 above) 1. Disconnect device from the grid. 2. Send maintenance unit to location 3. Reset/Reinstall device & re-bootstrap (new credentials & revoke old ones) T000, T100, T200, T300, T400, T410, T420, T700 CS-3 Unauthorised messages 1. Identify device or network segment where data is originating from 2. If device, perform CS-1 3. If network segment, it means there is an unauthorised node in the network segment 4. Isolate network segment 5. Investigate T000, T100, T200, T300, CS-4 Virus detected in device 1. Re-deploy VMs running on device 2. Isolate device/functions from network 3. Enable backup device if available 4. Reinstall device to remove malware 5. Verify peers not infected 6. Update malware definitions in all nodes as soon as possible T000, T200, T700 CS-5 DoS suspicions 1. Block DoS traffic at edge of network by updating firewall rules and using SDN for re-routing DoS traffic 2. Move VMs running on targeted node to other location 3. Enable backup node if available 4. Do load-balancing if possible 5. Analyse suspected DoS traffic, verify attack T000, T300, T410, T700 CS-6 Security algorithm deemed insecure 1. Remotely configure affected nodes to deprecate insecure algorithm and enable alternative algorithm 2. Optionally select and review proper alternative algorithm T100, T200, T300

Page 12: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 12 (97) Table 2: Physical security related incidents and countermeasures Incident Label Incident Description Countermeasure Targeted threat groups PS-1 Perimeter breached 1. Send security personnel to the location to investigate and repair breach (infrastructure device) T100, T300, T500 PS-2 Device casing breached 1. Send security personnel to the location to investigate and repair breach (infrastructure device) 2. Perform remote attestation of device state 3. Send maintenance unit to location a. Reset device & re-bootstrap (new credentials & revoke old ones) b. Repair device or c. Replace device T100, T300, T400, T500 PS-3 Communication link unavailable 1. Re-configure network to route device via secondary access (if available) 2. If secondary access not available: Move data and/or applications to another physical/logical zone 3. Send maintenance unit to location to a. Reset network connection & unit b. Repair network connection & unit or c. Replace network connection unit T300, T500 PS-4 Device power unavailable 1. Enable backup power 2. At least if 1) not possible: Move data and/or applications to another physical/logical zone 3. Send maintenance unit to location to a. Reset power supply unit b. Repair power supply unit or c. Replace power-supply unit T300, T500 PS-5 Device Unavailable 1. Enable backup node if available 2. Move data and/or applications to another physical/logical zone (to backup node if available) 3. Send maintenance unit to location to a. Reset device b. Repair device or c. Replace device T300, T500

Page 13: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 13 (97) A more detailed description of the incident-countermeasure pairs is given in Annex A. The given countermeasures are generic countermeasures for a specific type of incident. However, the countermeasures serve as a template and should be adapted to the real use-cases and specific incidents for actual use. Concrete countermeasures for the incidents implemented in the trial sites is given in subchapters where applicable. All security incidents should also be propagated upwards to SDC to help provide a better understanding of security incidents on a pan-European level. This includes providing the SDC with all relevant information pertaining to the incident so that the SA Node can correlate potential distributed attacks in a wide area and also providing DSOs with information on individual ongoing or recent incidents. This way, the DSOs can be prepared and learn about possibly future incident types and patterns. A mapping showing how the countermeasures map to the threats identified in deliverable D1.2 [1] is shown in Table 3 in Ch. 3.

Page 14: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 14 (97) 3. Mapping of Threats to Security Incidents and Countermeasures Table 1 in deliverable D1.2 [1] contains the identified threats to the system. The following table indicates how those threats are mapped to the identified incidents and their countermeasures. The table is provided to give an overview of how the threats are covered by the defined countermeasures and the system security that should be in place.

Green means the threat is covered by the countermeasure, orange that the threat is partially covered while white indicates that it is not, e.g., a DoS related countermeasure would not cover an eavesdropping threat. In addition, physical security threats are generally not applicable to cyber security incidents and countermeasures and vice versa. This is indicated by N/A (Not Applicable). Some N/A cases might still also be usable against some attacks and might thus also have colour coding indicating this. The threat label is also colour-coded to indicate how well the threat is covered by the countermeasures. The “Covered by default security” column is there to indicate which threats should be covered by standard ICT security for smart-grid systems and thus should not be an issue (indicated by ♦) or where it is not clear cut (indicated by ●). However, as each system is individually installed and configured, misconfigurations, human error and negligence could of course result in not optimally secured systems. Likewise, new security threats and e.g. protocol vulnerabilities might over time lessen the system security. An example is mitm attacks, which should not have an impact (of course with the exception of performing DoS when being on the path), since all communicating entities should have certificates issued by a trusted Certification Authority (CA), possibly the DSO or TSO CA, meaning an attacker should not be able to obtain a valid certificate, which would be needed to successfully become a man-in-the-middle. Likewise, eavesdropping should not be an issue as all communication should be sufficiently protected. Of course, this does not mean that one should not consider these threats as unimportant.

Page 15: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 15 (97)

Table 3: Threat to Countermeasure Mapping Threat Covered by Default Security CS-1 CS-2 CS-3 CS-4 CS-5 CS-6 PS-1 PS-2 PS-3 PS-4 PS-5 T001 N/A N/A N/A N/A N/A T002 N/A N/A N/A N/A N/A T003 N/A N/A N/A N/A N/A T004 N/A N/A N/A N/A N/A T005 N/A N/A N/A N/A N/A T101 ♦ N/A N/A N/A N/A N/A T102 ♦ N/A N/A N/A N/A N/A T103 ♦ N/A N/A N/A N/A N/A T104 ♦ N/A N/A N/A N/A N/A T105 ♦ N/A N/A N/A N/A N/A T106 ♦ N/A N/A N/A N/A N/A T107 ● N/A N/A N/A N/A N/A T108 ● N/A N/A N/A N/A N/A T109 ♦ N/A N/A N/A N/A N/A T110 N/A N/A N/A N/A N/A T111 N/A N/A N/A N/A N/A T112 ♦ N/A N/A N/A N/A N/A T113 ♦ N/A N/A N/A N/A N/A T201 N/A N/A N/A N/A N/A T202 N/A N/A N/A N/A N/A T203 N/A N/A N/A N/A N/A T204 N/A N/A N/A N/A N/A T205 N/A N/A N/A N/A N/A T206 N/A N/A N/A N/A N/A T207 N/A N/A N/A N/A N/A T208 ♦ N/A N/A N/A N/A N/A

Page 16: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 16 (97)

T209 ♦ N/A N/A N/A N/A N/A T210 N/A N/A N/A N/A N/A T211 N/A N/A N/A N/A N/A T212 N/A N/A N/A N/A N/A T213 N/A N/A N/A N/A N/A T301 ♦ N/A N/A N/A N/A N/A T302 N/A N/A N/A N/A N/A T303 N/A N/A N/A N/A N/A T304 N/A N/A N/A N/A N/A T305 N/A N/A N/A N/A N/A T306 N/A N/A N/A N/A N/A T307 N/A N/A N/A N/A N/A N/A T401 N/A N/A N/A N/A N/A T402 N/A N/A N/A N/A N/A T403 N/A N/A N/A N/A N/A T404 ♦ N/A N/A N/A N/A N/A T405 N/A N/A N/A N/A N/A T410 ♦ N/A N/A N/A N/A N/A T420 N/A N/A N/A N/A N/A T501 N/A N/A N/A N/A N/A N/A T502 ● N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A T503 ● N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A T504 ● N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A T505 N/A N/A N/A N/A N/A N/A T506 N/A N/A N/A N/A N/A N/A T601 ● N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A T602 ♦ N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A T603 ♦ N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A T604 ♦ N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A

Page 17: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 17 (97)

T605 ♦ N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A T606 ♦ N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A T607 ♦ N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A T701 ● N/A N/A N/A N/A N/A T702 N/A N/A N/A N/A N/A T703 N/A N/A N/A N/A N/A T704 N/A N/A N/A N/A N/A T705 N/A N/A N/A N/A N/A T706 N/A N/A N/A N/A N/A T707 N/A N/A N/A N/A N/A Exceptions and special cases amongst the threats: T107: If the system is properly secured, i.e. traffic is encrypted, only message frequency and potentially message size can be analysed. This can give some information to the attacker, but it is limited. T108: Threat of war driving depends on used access technology. 4G/5G should not have issues, while WLAN might have. T420: Could potentially be marked as green, but it depends on how the threat “Compromising confidential information” is interpreted. CS-1, CS-2 and PS-2 cover both cases when data is sent over the network (unusual behaviour) and accessed via the physical device (device casing breached) so those oranges could be seen as resulting in an overall green. T502-T504: When an employee steals information from the employer it is not something that can easily be fully protected against as has been seen with even the US National Security Agency (NSA) having its information leaked. Therefore, it is good to understand that the key to preventing or minimising such incidents is with proper security guidelines, authorisation and access control. We do not present a specific countermeasure to these types of incidents as they are more or less the same regardless of industry; trying to minimise the spread of the leaked/stolen information and charging the perpetrator. T601: Very much related to T502-T504 above. Employee stealing information is a general issue in all areas and as such difficult to fully protect against. T602, T603, T607: These are results of human error and like T601 quite generic problems in all fields. T604: An unreliable source would not be connected to the system. (it depends on how one interprets the meaning of the threat) T605, T606: Critical data should always be backed up, this is a non-issue. T607: A device that gets lost would have its credentials revoked from the system T701, T705-T707: Once more, related to stealing or human error.

Page 18: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 18 (97) 4. SUCCESS Components and Interfaces Description The Smart Grid environment which SUCCESS addresses comprises the electricial grids, communications infrastructures and management applications of utilities. In particular, the SUCCESS project will perform field trials in Italy, Romania and Ireland, testing the SUCCESS Security Solution in the real Smart Grid infrastructures in these countries. In addition, simulated grids will be used for tests in laboratory environments. Some details of these grids are given in D5.1 [8]. The field trials and laboratory setups are instantiations of the SUCCESS Security Solution described in D5.2 [9]. The focus of this chapter is on the instantiation of the SUCCESS Security Solution [6] developed by the SUCCESS project. These are the components developed by the SUCCESS project, which will be part of the field trial and laboratory infrastructures, and the interfaces between the components. The underlying electricity grids, communications networks and cloud systems hosting these components in the SUCCESS field trials are described in [8] and [9]. 4.1 SUCCESS Security Solution Components

Figure 1: SUCCESS Components in their Environment

Page 19: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 19 (97) The components being used by the SUCCESS project are shown in Figure 1. The components are located in, and form part of, the Critical Infrastructure’s equipment in the field (e.g. Smart Meters or EV charging stations in the electricity grid), the communications network or in cloud systems. A utility may have their own mobile and fixed infrastructure networks, e.g. fibre networks, in addition to using public mobile networks. The components which are being used by the SUCCESS project’s field trials are:

the Next-generation Open Real-time Smart Meter (NORM), which is produced by SUCCESS and will be described in detail in deliverable D3.8 [5]. a new Phasor Measurement Unit, which is produced by SUCCESS and will run as one of the components integrated in the NORM and is described in detail in D3.8 [5]. Smart Meters of various types which will be present in the field trials and laboratory (if they are connected to the Smart Meter Gateway of NORM and are also considered as components of NORM), Electric Vehicles (EVs) and corresponding charging stations which will be part of the Irish field trial; the EV Charging station hosts the NORM. SUCCESS is focussing on using mobile communications, particularly 5G communications. This means that, as regards communications, SUCCESS reuses the security features of the 5G mobile network (described in Ch. 4.2 below) but also extends them with the Breakout Gateway. Please note, however, that 5G mobile communications is just one infrastructural component in the SUCCESS Security Solution described in Ch. 4.2 below, and that the SUCCESS Security Solution itself does not require 5G mobile communications but can use other IP-based communications networks (public or private), albeit with the consequence that no 5G-Mobile-specific functions will then be available. Indeed, because 5G technologies are still under development and are not yet commercially available, the SUCCESS trial sites are expected to use communications solutions other than 5G mobile and it is expected that the use of 5G mobile technologies in SUCCESS will be limited to laboratory tests. SUCCESS is developing a new Breakout Gateway (BR-GW) which allows mobile core network functionality and application functionality to be executed in the mobile radio network, reducing communication latency and enabling dedicated security functionality. Moreover, BR-GW has the potential to increase the resilience of local microgrids, for which there is communication support during local power outages as well as during massive blackouts. Additionally, with respect to securing the NORM to CI-SOC communications and guaranteeing authentication, while incurring only a minimal communication overhead, a solution based on using a Physically Unclonable Function (PUF) has been considered. By introducing PUF, SUCCESS is providing NORMs with hardware security features which inherently guarantee that the NORM’s identity cannot be tampered with or faked, as PUFs act to uniquely identify the device, its uniqueness being bound to their specific hardware construction characteristics, also considering environmental or explicitly-introduced randomness. In contrast to Trusted Platform Module (TPM) [27]] which is a standard for a secure crypto processor used to secure hardware by integrating cryptographic keys into devices, PUFs (particularly the ones relying on intrinsic randomness) can be added in the NORM infrastructure without requiring any change to the manufacturing/design process of the NORM. Further details on the PUF implementation of SUCCESS are provided in deliverable D3.8 [5]. The components grouped as Applications in Figure 1 perform functions related to managing the underlying electricity grids. Figure 1 shows the Applications being developed in SUCCESS. In SUCCESS, security attacks or incidents will be detected and countermeasures initiated by Critical Infrastructure Security Operations Centres, which gather data from a set of NORMs and searches for specific patterns with machine learning methods. The components of the SUCCESS infrastructure work together to implement the SUCCESS Security Solution, whose architecture is described in D4.3 [6], with a focus on the vulnerabilities introduced by smart devices in electrical distribution grids.

Page 20: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 20 (97) 4.2 Communication Components This chapter describes the hardware and software elements of the communications network, with an emphasis on security of transmission of information and commands between the nodes of the electrical network. Figure 2 shows the current cellular network technology (LTE) that can be utilised for machine type communications, in our case smart grid communication (for example wide area monitoring and control). LTE is the 4th generation of mobile communication technology and has been standardised by the 3rd Generation partnership project (3GPP) community. Initial standards were specified in release 8, aiming to meet the demands of data traffic over mobile communication networks. Further enhancements are being made in LTE to satisfy new requirements for machine type communication. Figure 2: LTE for Smart Grid Communication 4.2.1 State-of-the-Art of Mobile Networks (4G/LTE) Today’s mobile communication networks consist of two major components, the core network, and the radio access network. The Radio network consists of the Evolved Universal Terrestrial Radio Access Network (E-UTRAN) and User Equipment (UE), which is a device which provides authentication of end users. E-UTRAN is the air interface part of the LTE network. It consists of the eNodeBs (base stations), which provide radio related functions. These functions include radio resource management, scheduling, QoS, ciphering/deciphering of the user plane and control plane data and compression and decompression of the user plane packet headers [11]. The core network is generally geographically located far away from the radio network and it enables authentication, mobility, and connection to other non-3GPP networks. In LTE, the core network is known as the Evolved Packet Core (EPC) and it includes the following components. The Mobility Management Entity (MME): The MME is the key control-node for the LTE access network controlling the high-level operations such as authentication by means of signalling messages. It is responsible for user authentication and for regulating security parameters. It is also involved in the bearer activation/deactivation process. Serving Gateway (S-GW): The S-GW routes and forwards user data packets. It is the node that terminates the interface towards E-UTRAN and acts as the mobility anchor for the user plane during inter-eNodeB handovers and is responsible for compatibility between LTE and other 3GPP technologies. Packet Data Network Gateway (P-GW): The P-GW provides connectivity to the UE to external Packet Data Networks (PDNs) using the SGi interface. The UE may be simultaneously connected with more than one P-GW for accessing multiple PDNs. This has two slightly different implementations, namely S5 if the two devices are in the same network, and S8 if they are in different networks. Home Subscriber Service (HSS): It is a centralised database containing the information about all the mobile subscriber in the network. MME interacts with HSS to authenticate all the 3GPP device in the network.

Power GridRadio access Control & Monitoring ApplicationsSCADA Core NetworkApplication data plane Internet/VPNWireline accesseNB HSSS-GW P-GWMME

Page 21: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 21 (97) The current mobile network infrastructure in Figure 2 shows that the traffic which is generated by, and sent to the end users are transmitted through radio network and core network. 4.2.2 SUCCESS Security Solution based on 5G Mobile Networks The SUCCESS communication solution is based on the evolving LTE infrastructure and 5G next generation networks as explained in D4.3 [6]. This chapter provides a detailed description of different components which enable local edge processing. The main driving force behind development of 5G is to support a wide range of use-cases in business areas such as augmented reality, logistic tracking, energy meters, cell automation, smart factory, connected vehicles, and transportation. In the future, smart grids and industry use-cases will require a sophisticated ICT infrastructure supporting ultra-low latency, high availability, maximum reliability, and devices with significantly longer battery life. These requirements set the scene for the next generation of wireless access – 5G systems that are set for commercial availability from 2019. The road to 5G is an evolution, and LTE is adopting and implementing many of the requirements of the new use cases while the development is approaching towards the 5G standard. Key enablers for 5G, including Network Slicing, end-to-end security, enhanced Radio Access and Software Defined Networking will enable the use of the 5G technology in SUCCESS solution. These components are future enhancements to LTE networks which have not yet been deployed. Network Slicing: dedicated network resources in mobile radio and core networks ensure necessary priority for the specific performance requirements of the use cases.

End-to-end Security: security services including identity management and pervasive integrity protection over the entire network. Network Function Virtualisation: networking hardware virtualization via software, allows to operate software at the network edge or the network core, and caters for the maximization of the scalability and reliability of the communication network Enhanced Radio Access: new spectrum, new encoding methods, and adaptable bandwidth offerings support use cases with huge number of devices and extremely short round-trip times. Software-Defined Networking: software-based configuration of mobile transport and core networks including capacity provisioning enable the mobile network operator to configure the network depending on service requirements, and independently of the underlying physical and logical network architecture. This next generation cellular infrastructure aims to fulfil the communication requirements for machine-to-machine (M2M) type communications involving use cases such as smart grid communications. As explained in in D4.3 [6], smart grid communications can be broadly divided into two categories, massive M2M type communication (Advanced Metering Infrastructure, Wide area monitoring) and critical M2M type communication (real time monitoring and grid control, FLISR applications). Enhancements of the current LTE network are being investigated to support both of these use cases. 4.2.3 Communication Solution for SUCCESS Security Solution This and the following sub-chapters provide a detailed description of the communication solution including the functional components. A Breakout Gateway function (BR-GW) is described which enables local edge processing and allows implementation of real-time countermeasures. The 5G communication solution aims to provide seamless connectivity while at the same time ensuring the security of the communication and can interwork (i.e. be able to connect, communicate, or exchange data) with any non-3GPP access network (non-cellular network), which might be the communication network in certain smart grid scenarios. Considering that 3GPP technology such as Generic Bootstrapping Architecture (GBA) [14] can be utilised to authenticate applications running over 3GPP and non 3GPP network by leveraging the trust of SIM card subscription data, Figure 3 shows an overall view of the 3GPP and non 3GPP network domain. In SUCCESS, Data Centric Security is applied for enabling end-to-end validation of data integrity against data manipulation and false data injection threats. These threats are deemed critical for smart grid applications such as power grid state estimation [13] and voltage control [10] [16].

Page 22: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 22 (97) A detailed description of these components as part of BR-GW is provided Ch. 4.2.4.2. Figure 3: 3GPP domain and extension of 3GPP security features 4.2.4 Break-out Gateway (BR-GW) The BR-GW is a new functional unit being proposed, designed, and implemented as part of this project. The breakout functionality diverts the user plane traffic from going towards the core network and allows to it to be processed near the base station on a virtualized instance. 4.2.4.1 Motivation for Breakout Gateway Based on the requirements of the upcoming smart grid infrastructure with a huge number of smart meters streaming data in real-time, the ICT infrastructure must be designed to support such functionality. One of the major requirements is to enable local edge processing for realisation of real-time countermeasures. Current mobile networks are not designed to support local edge processing. Data traffic generated by, or sent to, end users is transmitted through core network functions which are usually not placed near radio networks. This induces more latency in the reaction time of any services running on it. To address the use cases described above in Ch. 4.2.2, current 5G network technology enablers are under thorough investigation [12]. Specifically, this project is developing breakout functionality enabling edge processing for real time applications. This will be hosted in a cloud environment located at the edge of the communications network. Such a breakout gateway can enable and host distributed data processing and will perform real time countermeasures at the network edge, reducing the impact of local failures and reducing response time. Figure 4: Breakout Gateway

Mobile NetworkBSF NAF SCADA/Utility Control CentreSensor Network3GPPGatewayBSF – Bootstrapping Server Function NAF – Network Application Function3GPP enabled Power System MetersExtended 3GPP Security Features (SIM+GBA)Existing 3GPP Features (SIM)Smart Meters based on Non 3GPP NetworkPower GridRadio access Edge Cloud Internet/VPNCore NetworkControl & Monitoring SDN Control PlaneBR-GWeNB SDN Controller HSSBR-GW

Page 23: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 23 (97) The BR-GW acts as a local gateway based on the use of Software Defined Networking (SDN), which is an integral part of the 5G network architecture [12]. Advantages of this network layout include:

Data can be processed on decentralized computing nodes, close to the sensors and actuators of the network. Higher data speeds between measurement and control units of the smart grid system, as this solution bypasses many other network elements of the mobile system. Better support for the huge numbers of IoT devices in future applications, as most of their communication can be managed outside of the mobile core network. The Breakout Gateway has two impacts on communication network, Functional and Architectural. The communication architecture with Breakout Gateway is shown in Figure 4. It shows how the BR-GW will be used in the communication network as an integral part. The functionality of the BR-GW is described in Ch. 4.2.4.2. It describes the details and different functions of the BR-GW.

Page 24: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 24 (97) 4.2.4.2 Functionality of the Breakout Gateway Figure 5: Conceptual diagram of Breakout Gateway The conceptual diagram in Figure 5 shows the essential functions of the BR-GW. These include the local breakout based on SDN technology, GBA, Data-Centric Security, and other vital security applications such as CI-SOC in SUCCESS. The Functional impact of BR-GW on the communication network is detailed in the following description of the functional enhancements in 5G networks. Local Breakout In 5G mobile networks, the user plane (also called data plane) refers to the user data traffic and control plane refers to the signalling traffic, which is used for routing the user plane traffic. The Breakout Gateway is based on the principle of separating the user Plane from the control Plane. This is realised by the use of SDN technology allowing dynamic configuration of the network based on the application requirement. Rather than being a physical node, BR-GW is a virtualised instance which will enable edge processing. Software-Defined Networking (SDN) SDN is based on the principle of separating the control plane and the user plane. It aims to provide a logically centralised network intelligence and a standardised interface for software development to control network resources and the flow of network traffic. Basic SDN components include the SDN controller, network elements and applications used to control network resources. The SDN controller interacts with applications via standardised Application Programming Interfaces (APIs) and on other side with network elements with standard protocols such as OpenFlow [17]. The applications can implement network services like routing, security, and bandwidth management, with different behaviour for traffic in different flows, i.e. from different sources, responding to real-time demand changes in the network. Network Function Virtualisation (NFV) NFV is the virtualization of network functions that were historically performed by dedicated hardware appliances. This new approach eliminates the need for proprietary network-services devices because it decouples network functions from the underlying hardware so that the functions can be hosted on VMs running on industry standard servers. The goal of NFV is to transform the way that network operators architect networks by allowing consolidation of multiple network equipment types onto industry standards–based platforms [29].

Breakout GatewayData Centric SecurityGBASDN for local breakoutEricsson CloudOther Security Applications

Page 25: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 25 (97)

Figure 6: NFV Transformation [29] An NFV specification is developed and maintained by the ETSI Industry Specification Group (ETSI ISG), which is “working intensely to develop the required standards for NFV as well as sharing their experiences of NFV implementation and testing.” Examples specific to telecom operators include WAN acceleration, voice over Long-Term Evolution (VoLTE), user data consolidation, and carrier-grade network address translation (NAT). Data Centric Security (DCS) DCS enables data integrity checking of the end to end information flow in any communication network. Smart meter critical data that are used for power grid state estimation or voltage control [10][13][16] could, if manipulated, disrupt the output of the power grid applications. Such attacks are commonly referred to as False Data Injection attacks. Data integrity of the measurements being communicated over any network should be protected and checked at the entry and exit points. DCS technology is used to ensure the integrity of the message data sent between a sender and a receiver. The procedure is shown in Figure 7. First, the message sender generates a hash (Hash 1) of the message data using the SHA-256 algorithm and sends the hash value to a DCS Signing Authority that generates a Keyless Signature using a technology based on the blockchain principle and returns the signature to the message sender. Then the message sender sends the message data and the Keyless Signature to the message receiver. The message receiver generates a hash (Hash 2) of the received message data and sends it, along with the Keyless Signature, to the DCS Signing Authority. The DCS Signing Authority verifies the Keyless Signature and the received Hash 2 with the blockchain publication file and authenticates the transaction, informing the message receiver of the result. This procedure uses Keyless Signature to prove that the transaction has taken place, that the data has not been tampered with and that the sender is the true sender of the data. In order to optimise the performance, a distributed DCS Signing Authority architecture, where versions of the blockchain publication file can be kept locally and periodically synchronised with the central version, can be deployed. In the SUCCESS Security Solution, the message sender can be situated in the NORM and the message receiver in the CI-SOC, which is instantiated as an application in the BR-GW. In addition, the same procedure is also used to verify the integrity of the NORM firmware. In this case, the NORM (periodically or on request of the CI-SOC) generates a hash value based on its firmware and, using the same procedure as above, authenticates the firmware with the CI-SOC.

Page 26: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 26 (97)

Figure 7: Data Centric Security Generic Bootstrapping Architecture (GBA) Generic Bootstrapping Architecture (GBA) is a 3GPP authentication architecture. GBA is standardised in 3GPP standard TS 33.220 [14]. It uses the mobile SIM based authentication to provide security functions even outside the mobile network domain. One example could be authentication towards enterprise server for mobile device. Figure 8: GBA functions applied at Breakout Gateway The reason for using GBA for authentication is to leverage the trust of mobile communication operators to enable authentication of smart grid applications running over 3GPP or non 3GPP network. Compared to current state of art solutions such as Public Key Infrastructure and digital certificates, GBA provides a novel solution to authenticate a power grid application running on remote measurement and control devices. GBA provides a means to use mobile network subscriber data - which a mobile network operator keeps in his mobile network - to authenticate mobile network users for application services. These other services need not necessarily be under the control of the mobile network operator. In fact, the ability to use mobile network subscriber data to authenticate 3rd party services is a key feature of GBA. Further, GBA supports inter-operator authentication. GBA can be applied to local breakout services, so that security functions can be provided as well to localised enterprise functions. Figure 3 and Figure 8 shows the setup of the GBA in mobile network. Network Application Function (NAF) and Bootstrapping Server Function (BSF) are the two main new components included in GBA. Sender DCS Signing Authority ReceiverDataHash OK, NOT OKHash (hash-1)SignatureSignature + DataSignature + Hash (hash-2)Enterprise ApplicationNAF Core NetworkHSSP-GWBSFMMEeNB BR-GW

Page 27: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 27 (97) In mobile communication networks, HSS is the main database containing information used to authenticate User Equipment in the network. BSF sits in front of the HSS and protects the HSS from direct access by 3rd parties and provides necessary information for GBA to enable authentication of application/user. NAF is connected to BSF via Zn interface [10]. NAF’s main function is to verify the authentication of application messages generated from mobile phones (UEs). NAF acts as message authenticator for any service application. This authentication is done by utilising subscriber information stored in the HSS via BSF. NAF functionality can be embedded to authenticate smart grid application generating messages from smart meters connected via 3GPP or non 3GPP networks. Figure 8 shows the realisation of NAF function in BR-GW. BR-GW in Two Domains: BR-GW will be instantiated as a virtual instance in the distributed edge cloud, which enables the connection to a local instance of the CI-SOC, in case this is also instantiated on the same edge cloud system, as is the case in the SUCCESS field trials. Conceptually BR-GW is explained above; however its functionality can be divided into two domains, the Telco domain and the DSO domain, as shown in Figure 9. The functionality in the Telco domain is controlled using SDN mechanisms, and is responsible for local breakout which provides connectivity layer functionality. The DSO domain will have Services / applications installed on the BR-GW virtual machine to enable security functions for threat detection and countermeasure implementation. The encrypted traffic from the NORM in DSO domain can be decrypted only in the BR-GW DSO domain in order to analyse the data. The DSO domain will be part of DSO and Telco will have no control or will not be able to access the data from DSO domain. However, there will be information exchange between both domains in order to enable countermeasure implementation. This information exchange is done via I2 interface. The applications / services running in the DSO domain will be able to communicate with the CI-SAN on an encrypted channel via I3 interface. This will allow reporting of incidents to CI-SOC which can take appropriate countermeasures against attacks. D4.3 applies the concepts of the Trusted Execution Environment (TEE) and Trusted Platform Module (TPM). TEE provides isolated execution of code, and ensures integrity and confidentially of critical applications. In the proposed BR-GW, TEE can be utilized to host and process services which require access the security- critical data of the meters. Thus, these services could potentially have access to DSO session keys that they can use to access the protected meter data of NORMs in a secure and isolated way with the use of TEE. In order to store these keys, utilised for secure communication, a TPM module can be used. Recently work has been done toward virtualised Trusted Platform Modules (vTPM) [18], which enable TPMs also for virtualised entities. These concepts, combined with the BR-GW function in a virtualised environment, could provide a secure solution for the hosting service to enable security functions. Figure 9: Breakout gateway logical domains Breakout Gateway CI-SANVPN enabled / encryptedTelco domainSDN to enable Local breakout DSO domainServices to enable security functionsVPN enabled / encryptedNORM

Page 28: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 28 (97) 4.2.5 5G communication Test Lab Setup

Figure 10: Test lab setup The complete test lab with all the components is shown in Figure 10: Test lab setup. A live 5G-ready network is deployed at the state-of-art research facility at RWTH University ACS lab. The flight rack shown in the figure consists of Baseband unit (BBU), Routers, Switches, Firewall, and Servers. These servers are hosting the Ericsson Cloud infrastructure. BR-GW based on the NFV and SDN technology is hosted on the Ericsson Cloud. The Radio Resource unit (RRU) is connected to the flight rack providing radio transmission. The components DCS and CI-SOC as part of SUCCESS solution are hosted on the Ericsson Cloud BR-GW. Testing of the functionality is done with hardware-in-the-loop simulation for a local grid control application. The basic test setup with DCS component integrated and local grid application was shown in the first review with running live demonstration. The current setup is integrated with CI-SOC and the interfaces with the NORM (I1 interface) and CI-SAN (I3 interface) has been realised and tested. The CI-SAN is hosted in P3 Aachen site and is connected over a VPN realising I3 interface. The Irish trial site is connected over a VPN with the ACS lab and hence with the BR-GW for the trial purposes. The details are described in D5.2. In summary, the test setup incorporates the following components: 5G-ready flight rack, eNodeB and core network. LTE modems to connect with the live mobile network NORM device for testing

Page 29: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 29 (97)

CI-SOC instance hosted on the cloud Raspberry Pi’s to realise smart meters and are connected to RTDS The test cases are described in detail in D4.8 [7] and the final version of the test cases will be released in D4.9 at the end of the project. 4.2.6 Communication and Computing Resource Orchestrator for Resilience Due to the future reliance of critical infrastructures, such as electricity, on the computing and communication infrastructure provided by 5G mobile networks, it is imperative that these are by design resilient to cyber attacks and component failures. Cyber attacks could be in the form of denial-of-service attacks and possibly software or hardware compromise caused by advanced persistent threat. Systems component failures could affect the wireless access, the mobile backhaul, and the virtual machines, hypervisors, and servers in the 5G edge cloud. Cyber-attacks and failures would make a set of the edge cloud servers unsuitable for hosting smart grid application processes, either through affecting the computing resources directly or through disrupting communication with the computing resources. In order to make smart grid applications resilient to attack and failures, it is essential to migrate the processes to suitable computing resources, on demand, under the coordination of a central resource management entity. Coordination in the proposed architecture is done by the Communication and Computing Resource Orchestrator (CCRO). The CCRO makes a set of the edge servers available for hosting virtualized smart grid application processes, and initiates the migration of the smart grid application processes upon the occurrence of cyber-attacks and component failures. Figure 11 shows a conceptual view of the CCRO and its relation to other system components. Figure 11: A conceptual view of the CCRO The CCRO communicates with and coordinates four system components. The 5G system state monitor monitors the state information of the 5G mobile network (e.g., component status, including failures and potential cyber-attacks), and exchanges state information with the CCRO via real-time communication. The database stores the historical state information of the 5G mobile network, which is used by the CCRO to compute the reliability statistics of the mobile network components. • The CCRO communicates with the edge cloud resource manager to migrate application processes. The CCRO also provides the edge cloud resource manager information about the use of the communication and computing resources of the 5G edge cloud. Figure 12 shows the workflow of the CCRO. The CCRO periodically receives state information from the 5G system state monitor. If the state of the 5G mobile network stays unchanged since the last update, the CCRO processes the state information and updates its database. The historical data in the database is used to compute the statistical data of the 5G components (e.g., the failure rate, and the mean time between failures). Based on the statistical data about the system components, the CCRO updates the optimal placement of the smart grid application processes depending on the state of the 5G system, according to the Resilient Application Placement (RAP) algorithm. The RAP algorithm minimizes the long-term operation cost of the edge cloud, while taking the delay performance into consideration, and is described in detail in D2.5 [3].

Page 30: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 30 (97) If the state of the 5G mobile network has changed (e.g., in case of a failure or a cyber attack, or in case of the recovery of system components) since the last update, the CCRO instructs the edge cloud resource manager to migrate the application processes. After that the CCRO processes the state information, updates the database, and updates the placement of the application processes. Figure 12: The workflow of the CCRO 4.3 Security Monitoring Components 4.3.1 Critical Infrastructure Security Operations Centre The Critical Infrastructure Security Operations Centre (CI-SOC) implements a set of countermeasures to mitigate security incidents specific to smart meters. CI-SOC forwards the data it receives from NORM (directly or via the BR-GW) to the SDC after filtering and aggregating them. CI-SOC is responsible for anonymising the collected data, if necessary, before passing it to the SDC. Typically, the data at DSO/TSO level have various legal restrictions due to privacy issues, such that an anonymisation step at the Critical Infrastructure level is crucial before sharing the data. In addition, CI-SOC informs SDC of incidents which have been identified by CI-SOC and the selected countermeasures. Details of the CI-SOC are available in deliverable D3.6 [4]. 4.3.2 Critical Infrastructure Security Analytics Network The Critical Infrastructure Security Analytics Network (CI-SAN) represents a novel approach to monitor the security status of critical infrastructures. Moreover, CI-SAN is able to fulfil requirements of European legal regulations. More details about the legal regulations and the role of CI-SAN are described in Deliverable 3.3 version 3 of the SUCCESS project. 4.3.2.1 Architecture and Function Description CI-SAN gathers data from sources distributed across Europe to obtain a comprehensive view of the security status of critical infrastructures. CI-SAN is able to analyse data gathered from different types of critical infrastructures, such as the electricity, gas and water critical infrastructures, thus the scope of its analytics is wide as regards geographical span and is not limited to a particular type of critical-infrastructure but spans across different CI types. In the case of electricity grids, CI-SAN obtains information provided by DSOs and TSOs across Europe, identifies common patterns characterising cyber-attacks, and then shares these patterns with the DSOs and TSOs. Thereby, the DSOs and TSOs obtain information that cannot be derived locally. CI-SAN is shown in Figure 13 and consists of two components:

Page 31: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 31 (97)

several instances of regional Security Data Concentrators (SDC), which locally collect and aggregate data on critical infrastructure level (e.g. DSO/TSO level in the case of electricity grids), and Security Analytics Node (SA Node) instances, which collect data from several SDC instances and perform data analysis on a national or regional level. The SA Node instances also form an international network (not shown in Figure 13 but discussed in D4.3 6) in order to share information and co-ordinate. SA Node (1) evaluates the data with regard to common patterns describing cyber-attacks and (2) shares the information with SDC instances. We refer to Figure 1 on page 18 for an overview of the infrastructure implemented in the SUCCESS project and Figure 22 on page 50 for an overview of the SUCCESS Security Solution, that is, how the components in the SUCCESS infrastructure interwork. Figure 13 depicts the concept of CI-SAN as instantiated in the SUCCESS project, with a single SA Node instance, interworking via SDC instances with a number of different critical infrastructures, each of which has its own CI-SOC. CI-SAN receives data from the CI-SOC over Interface I3. The CI-SOC is hosted at the critical infrastructure sites (see Ch. 4.3.1 for more details about CI-SOC) and resorts to the Breakout Gateway to obtain data from NORMs. SDC acts as an agent which gathers data from critical infrastructures. An instance of SDC receives data from: SA Node over Interface I5, CI-SOC over Interface I3, other SDC instances over Interface I4 (see Ch. 4.3.2.3.3 for more details) Figure 13: CI-SAN and critical infrastructures in the SUCCESS Security Solution SA Node is responsible for the European view of critical infrastructures. At this level, data from external data sources, such as social media, are also used by the analytics network for a comprehensive view. Thereby, society-related trends, which are discussed in public on social media platforms, can be automatically identified. These discussions potentially include relevant information about the stable operation of the critical infrastructure. The term CI type is used in Figure 13 to denote different types of utilities (e.g. electrical grids, water grid, gas grid, waste water treatment infrastructure) or other CIs (e.g. transport, government computer networks etc.). SA Node obtains information from several decentralised Security Data

Page 32: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 32 (97) Concentrator (SDC) instances, which gather and aggregate data from a type of critical infrastructures in their region (e.g. from the DSO/TSO Utilities of type electrical grid in the region). The rationale for establishing SDC instances is that they collect and aggregate locally (or regionally, the area which an SDC covers can be adapted to the local needs) bounded data. Hence, SDC reduces traffic by ensuring that only significant security-related information is shared with the SA Node. The SDC instances share information with the SA Node, such that the SA Node can produce an aggregate analysis on a European level. Thus, detection of patterns that cannot be evaluated on local critical infrastructure operator level becomes possible. The SA Node subsequently shares the analysis results with the CI-SOC instances via the SDC instances. Thereby, CI-SOC instances can confirm the patterns by comparing them with the data available in their own network. SA Node searches for unexpected and significant patterns in data which stem from various sources. If such a pattern has been detected successfully, SA Node:

precisely visualises the results for reasonable human-machine interactions. The visualisation concept depends on the structure of the data (e.g. streaming or event-based data transmission) which will be transmitted to the SA Node and will take place at the SA Node. Particularly, time resolution and the amount of data will determine the underlying visualisation approach. Human operators of the SA Node will potentially screen and analyse incoming traffic and directly intervene if necessary, that is, for example manually sending warning messages. notifies the responsible authorities such that proper countermeasures can be initiated in a downstream process by the controller. 4.3.2.2 CI-SAN for the SUCCESS Trial Sites Figure 14 gives an architectural overview of an instantiation of CI-SAN for the SUCCESS trial sites. There are three trial sites, each of which has a CI-SOC instance. All three CI-SOC instances will, for operational reasons, not be located inside the information infrastructure of the SUCCESS critical infrastructure partner hosting the trial site in the respective country, but at the Lab of the RWTH Aachen. Each CI-SOC instance will interact with its own SDC instance. In this setup, the SDC instances are effectively acting as national information gathering instances of the CI-SAN. There is a single SA Node instance covering the three trial sites; it is effectively an international instance of the SA Node and is the apex of the CI-SAN, so that it acts as a Co-ordinating Critical Infrastructure Security Analytics Centre. Figure 14: CI-SAN and Utilities in the SUCCESS Security Monitoring Solution

Page 33: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 33 (97) 4.3.2.3 SDC Description SDC instances act as agents hosted at critical infrastructures such as energy providers or hosted outside the critical infrastructures. In either case, the SDC interworks with the CI-SOC of the critical infrastructure. Thereby, they enable bidirectional communication between CI-SAN and the critical infrastructures. On the one hand, they act as sensors that gather critical infrastructure-related data from CI-SOC. On the other hand, SDC instances receive messages about security related issues from SA Node which they forward to CI-SOC. CI-SOC displays the messages to the proper authorities. Thereby, security of critical infrastructures in Europe is increasing by an early warnings system. However, of course, introduction of additional components brings a risk that the critical infrastructure can be attacked through these components in CI-SAN: these risks are examined in D1.2 [1]. SDC instances are designed as lightweight as possible and do not contain computational power for analysis on their own. However, the flexibility of the SUCCESS architecture allows modifications to this property of SDC in future instantiations. For example, SDC instances may be able to analyse the data they collect from the CI-SOCs if required. The default functionality of SDC instances is to aggregate data to reduce traffic with the SA Node and, if necessary, to make data anonymous. 4.3.2.3.1 Data Flow Figure 15 SDC Data Flow SDC receives data over the CI-SAN API (I3, see Chapter 4.6.3 for more details), where data are obtained from CI-SOC. On the one hand, data can describe the analysis results of CI-SOC. On the other hand, data can be measured values from critical infrastructure-related data sources which are provided by CI-SOC. The Anonymisation Module obtains all data coming from the CI-SAN API. The Module performs a validation check for incoming data. If the data is valid, the module makes the data anonymous, if necessary. In general, the CI-SAN API provides already anonymous data. However, further anonymisation steps may be required, if data is first aggregated and then shared in a European wide manner. Data is received from various CI-SOC instances which are anonymized locally on the CI information level. However, it is possible that the aggregation of data of various CI-SOC instances among Europe leverage the among and the worth of shared information. The aggregation on the European information level thereby may facilitate the possibility for data de-anonymisation due to the gross of novel information. Moreover, CI-SAN anonymisation strategies may only cover country specific legal regulations. As, SDC leverage data on the European information level, it is likely that adjusted legal regulations have to be implemented as well. Afterwards, data is saved to the Cache Database via the Database Connector Module. Next, the Aggregation Module caches the pre-processed data from the Cache Database and aggregates them to reduce the network traffic between SDC and SA Node. Finally, the SA Node Transmitter Module sends the aggregated data via I5 to the upper CI-SAN level. Proper configuration files are

Page 34: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 34 (97) locally stored for the Anonymisation, Database Connector, Aggregation and SA Node-Transmitter Module. Thereby, a customized application of the modules is possible. SDC can also receive messages from SA Node. Messages may contain information about identified attacks on a critical infrastructure. The SA-Receiver Module obtains messages from I5 and forwards them to the CI-SOC Transmitter Module which itself forwards them to the CI-SAN API, where CI-SOC can request them. 4.3.2.3.2 Components SDC instances comprise various components which fulfil different tasks. 4.3.2.3.2.1 Cache Database and Database Connector Module The Cache Database caches received messages for downstream processing. Messages can be received from both directions: from the European level via the SA Node API and from the critical infrastructure level via the CI-SAN API. The Cache Database is operated by the Database Connector Module. By resorting to the Database Connector Module, a flexible exchange of technologies to realize the database access is possible. If the Cache Database will be realized by other technologiges (maybe due to the fact, that the current technical realization provies vulnerabilities), only the Database Connector Module has to be adopted. All other software functions rely on the Database Connector Module and do not have direct access to the Cache Database. The rationale for the Cache Database is that received messages have to be collected for the Aggregation Module. The Aggregation Module periodically gathers and removes data from the Cache Database. 4.3.2.3.2.2 Anonymisation Module In general, data received by the CI-SAN API is already anonymised. The reason is that operators of critical infrastructures do not disclose non-anonymised data to the European information level. However, as the CI-SOC only provides an anonymisation strategy for the critical infrastructure information level, further anonymisation steps may be required for the European information level. SDC leverages data on the European information level and it is likely that further legal regulations have to be taken into account for the European information level. CI-SAN must not rely on the local anonymisation strategy of CI-SOC instances and therefore SDC applies an own strategy for anonymization before raising data to the European information level. Besides the anonymisation, the Anonymisation Module performs an integrity check of all incoming traffic. The exact rules can be defined in a flexible way, for example, to fulfil country specific requirements, by resorting to a configuration file for each individual SDC instance. 4.3.2.3.2.3 Aggregation Module The purpose of the Aggregation Module is to reduce the network traffic between SDC and SA Node instances, as little as possible data needs to be sent after the aggregation procedure. For that, a configuration file describes which data should be aggregated and how aggregation should be performed. Thereby, a flexible application of different aggregation methods is possible. The default method is the aggregation by time. Here, the Aggregation Module requests novel data from the Cache Database periodically. For each request, the important information of the gathered data is collected and summarized for post-processing. Which information is important needs to be defined separately for each data source. Chapter 4.3.2.6.1 gives an example of an aggregation strategy for data obtained by the control room software of an energy provider. 4.3.2.3.2.4 Data Transmitter and Interfaces SDC uses three interfaces to communicate with the environment: CI-SAN API, I4 and I5. The CI-SAN API is used to bidirectionally transfer messages between CI-SOC and SDC. The CI-SOC Transmitter Module is used to send messages to CI-SOC. These messages stem form SA Node and are received by SA Node Receiver Module. I5 can be used by the SA Node Transmitter to send messages to SA Node. The SDC Transceiver Module is used to send messages obtained from CI-SOCs to other SDCs and reversely receive messages from other SDCs via I4. 4.3.2.3.3 Communication Channels As stated above, SDC instances are able to communicate with each other. This communication can be established at a local or national level. This communication channel can be deemed useful in case an SA Node is inaccessible due to malfunction, maintenance or an attack. Upon such an incident, an SDC shares threat information received from the CI-SOCs with other SDC instances.

Page 35: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 35 (97) Threat information can be an emergency notification from a CI-SOC stating that is being attacked and the attack pattern. The receiving SDC instances forward this information to the CI-SOCs, so that they become alert. This functionality mitigates the impact of an attack or a malfunction on the SA Node, since the SDC instances are still able to communicate with each other. The functionality of the SA Node is not restored, though, since neither does data processing of the aggregated data at European level take place, nor are social media data analysed. Potential computational capabilities of the SDC instances as also mentioned in Ch. 4.3.2.3 enhance the effect of this communication, as data analysis on aggregated data at SDC level could be also enabled in this case and more information may be shared among the SDC instances. 4.3.2.4 SA Node Description SA Node is designed as a scalable, highly available Big Data platform (see Chapter 4.3.2.5 formore details about the Big Data implementation). It is supposed to receive and process massive amounts of data in almost real time. Moreover, it stores information about the attacks in a data lake for downstream analysis. Thereby, in-depth analysis of attacks becomes possible. As shown in Figure 13, SA Node is linked to three instances of SDC for the SUCCESS trial sites. SA Node receives information from all SDC instances, processes it and sends data insights back to the SDC instances. 4.3.2.4.1 Data Flow Figure 16 SA Node Data Flow Figure 16 shows the data flow within SA Node. Data is obtained from the internal interface I5 which is used by SDC instances distributed across Europe. Potentially, a tremendous number of SDC instances will contribute data to SA Node. Received data is stored in a Data Lake which is accessed by the Database Connector Module. The Attack Identifier Module accesses the Data Lake for in-depth analysis concerning cyber-attacks on critical infrastructures. Identified attacks are saved in the Event Database and sent back to SDC via I5 and the SDC Transmitter Module. Moreover, the attacks are graphically visualised on a map of Europe via the Visualization Module. The Social Media Analysis Module processes data, which is received via I6. Data comes from social media platforms. The Module searches for novel social trends affecting operations of critical infrastructures. Identified insights are stored in the Event Database. 4.3.2.4.2 Components The SA Node consists of different modules that interworkwith each other. 4.3.2.4.2.1 Data Lake The Data Lake stores all messages coming via I5. The rationale is that downstream analysis only becomes possible if data is kept available over a proper time period. For instance, advanced persistent threats (APT) last over a long time period. Questions addressing the attack vectors of identified APTs become possible if historical data about attacks is available. Furthermore,

Page 36: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 36 (97) messages describing an identified attack may refer to a particular subset of historical data. Thereby, a clear picture of the attack and its impact on the attacked system can be shared with third parties. Third parties can use this information for a more precise and effective search in their systems. 4.3.2.4.2.2 Event Database The Event Database contains all significant events identified by CI-SAN. One the one hand, these events can comprise attacks, which have been identified with the data obtained from SDC via I5. Stored attacks can be linked to data in the Data Lake to reconstruct the attack’s underlying data basis. On the other hand, the insights of the Social Media Analysis Module are stored in the Event Database as well. Here, data is obtained via I6. 4.3.2.4.2.3 Database Connector Module The Database Connector Module is responsible for writing and reading processes for both the Data Lake and the Event Database. By resorting to this module, a flexible mitigation to novel technical implementations of the Data Lake and the Event Database becomes possible, as in such a case only the Database Connector Module needs to be adopted. 4.3.2.4.2.4 Data Transmitter and Interfaces SA Node contains two interfaces to communicate with its environment. I5 realizes the communication with SDC instances distributed across Europe, through which two-way communication is implemented. SA Node receives data from SDC instances for its analysis. Moreover, SA Node sends data regarding identified attacks to SDC. The transmission from SA Node to SDC is realized by the SDC Transmitter Module. Furthermore, I6 is a generic interface that describes data stemming from social media platforms. Data from social media platforms is analysed by SA Node and the insights are shared with the SDC instances via I5. 4.3.2.4.2.5 Attack Identifier Module The purpose of the Attack Identifier Module is to identify coordinated cyber-attacks on critical infrastructures. Coordinated cyber-attacks are attacks on critical infrastructure instantiations with a close time dependency and similar configuration in attacked hard- or software. For instance, in case of energy providers, coordinated cyber-attacks describe multiple attacks on the SCADA system of DSOs across Europe at the same time. The DSOs share a common hard- and software configuration for their SCADA system. The rationale for focusing on the same hard- and software configuration is that the SCADA systems offer the same attack vectors due to similar vulnerabilities. Therefore, by exploiting the attack vectors in a synchronized way, a coordinated attack becomes possible. Coordinated attacks may have a tremendous impact in particular on energy grids, as they may lead to grid instabilities. The basic asset for the identification process is the data gathered across Europe and stored in the Data Lake. Machine learning methods are used to derive patterns from the data describing coordinated cyber-attacks. For that, clusters of critical infrastructures with similar hardware and software configurations are the focus of this activity. Due to confidentiality of how patterns are derived in the Attack Identifier Module, no details about the applied algorithms are shared. Various experiments are performed to evaluate the performance of the implemented algorithm in the Attack Identifier Module (see Chapter 4.3.2.6.1 for details). Identified coordinated attacks are stored in the Event Database through the Database Connector Module. 4.3.2.4.2.6 Social Media Analysis Module The Social Media Analysis Module obtains data via I6 which comprise free text of various social media platforms. Relevant topics discussed in public on social media platforms can be automatically identified. These discussions potentially include relevant information about the stable operation of critical infrastructures. Insights of the social media analysis are stored in the Event Database. For example, it can be evaluated whether twitter tweets can be automatically classified towards appeals for similar behaviour effecting the energy grid. One example for such an appeal is given by the hashtag “earthhour”. People following this hashtag typically want to make a mark for the environment by switching off lights for a particular hour. This coherent behaviour, while well meaning, can cause instabilities in the energy grid. By identifying such kind of trends in social

Page 37: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 37 (97) media, the energy provider can prepare for appropriate counteractions in advance. More details about this experiment can be found in Chapter 4.3.2.6.2. 4.3.2.4.2.7 Visualization Module The Visualization Module presents the results for human-machine interactions. In general, the graphical user interface (GUI) comprises

a list of SDC instances which are registered in CI-SAN. Technical details about the SDC, for example its covered critical infrastructures and its cluster of hard- and software configuration, are included as well; a map of Europe where the locations of each SDC instance is marked. A coloured schema indicates the security status of the SDC instance. An SDC instance can (1) operate in a secure way when no cyber-attack has been detected or (2) be under attack in case of a coordinated cyber-attack; and a list of relevant free text messages obtained by the social media platforms. The messages cover topics which have been identified as relevant by the Social Media Analysis Module. The Visualization Module obtains all data from the Event Database. The Visualization Module is still under development. 4.3.2.5 Implementation CI-SAN is implemented in Python 2.7 taking advantage of various packages such as Flask, hashlib, mllib, numpy, pandas, scikit-learn and scipy. Altogether 7380 lines of code have been written so far. Moreover, SA Node takes advantage of Big Data Technologies. Here, Hortonworks, version 2.6.1.0, is used as Hadoop distribution. Among others, HDFS, Hive, Kafka, MapReduce, Spark2, YARN and Zookeeper are installed on the Big Data cluster. The Big Data cluster runs on Ubuntu 16.04. 4.3.2.6 Experiments To evaluate the performance of CI-SAN, various experiments have been performed. This chapter gives a short overview of these experiments. 4.3.2.6.1 Identification of Coordinated Attacks The purpose of CI-SAN is to identify cyber-attacks against critical infrastructures distributed across Europe. In particular, coordinated attacks are of high interest for operators such as energy providers, as they may cause instabilities in the grid. Coordinated attacks are cyber-attacks which are performed within a strict time frame on targets with similar hard- and software configuration, as they offer similar vulnerabilities. In this experiment, the performance of CI-SAN to identify coordinated attacks is evaluated. The energy grid is used as an example of a critical infrastructure. Datasets are chosen in accordance how they can be found at common energy providers. Thereby, realistic scenarios for the application of CI-SAN are covered. Here, a short overview of the most important insights is given. A detailed description of the experiments is out of scope of the deliverable. 4.3.2.6.1.1 Datasets Two datasets are chosen for the experiment. The first dataset is the KDD data set [36], which was introduced 1999 at the Third International Knowledge Discovery and Mining Tools Competition and models intrusions to a military network environment. KDD has been chosen, as it has become a well-analysed and well-established dataset for research in intrusion detection. The data set contains a wide variety of cyber-attacks (probing attacks, Denial-of-Service, Remote to Local and User to root). KDD consists of approximately 4.9 million data points. KDD’s features describe (1) basic TCP/IP properties such as the connection source and target, (2) additional information about the targeted hosts such as whether the login attempt was successful, and (3) features which are computed in two-second time windows. They contain for example information about the amount of errors that appeared when connecting to the host. The KDD data set is criticised as for instance, for the sake of privacy, synthesized data is used which is supposed to be similar to real data [44]. However, experimental validations turn out that the workload of the traffic does not be similar to the traffic in real networks. Moreover, analysis claimed that plenty of replicated entries are included in the data set. Therefore, we delete this replicates for downstream analysis [44].

Page 38: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 38 (97) The rationale for choosing KDD is that it describes cyber-attacks on a network environment in a generic way. In particular, KDD provides data about basic TCP/IP information (see Feature 1 of the KDD data set) which can be found at energy providers and who can obtain the similar information about their network. The analysis of KDD is restricted to the basic TCP/IP information (first feature set described above) to model a realistic scenario where energy providers can offer CI-SAN similar information about their network environment. The second dataset comprise log files from the PSI control room software. The software monitors system events such as user logins, logouts, failed logins and creating of new users. The PSI control room software is widely used by energy providers. PSI provided example log files for performing the experiments in the SUCCESS project. 4.3.2.6.1.2 Method A simulator based on KDD and the PSI log files for the traffic in networks was developed. Thereby, the network environment of various SDC instances could be simulated in a customized way. The rationale is that thereby the performance of CI-SAN for different scenarios can be evaluated. In particular, coordinated attacks on infrastructures covered by SDC instances could be simulated. The simulator is parametrized in such a way that coordinated attacks happen simultaneously on several SDC instances in predefined time frames. Altogether, 14 SDC instances have been instantiated. These 14 SDC instances are clustered in three different hard- and software configurations (cluster 1 contains 10 SDC instances, cluster 2 and cluster 3 contain two SDC instances respectively). Three coordinated attacks were simulated with different ratios of SDC instances of cluster 1 which are under cyber-attack. For each coordinated attack, various numbers of malicious TCP/IP packets have been used. Each scenario has been repeated 100 times. A background signal for cyber-attacks on a network is simulated as well. Here, the firewall traffic of the SUCCESS project partner ASM Terni was used as ground truth. Thereby, a realistic parameterization of the simulator for an energy provider becomes possible. In particular, malicious connections to ASM range from 33 to 980 per minute, while the median (average) amount of connections is 69 (77) connections per minute. Figure 17 gives the distribution. Figure 17 Malicious connections on the network of the energy provider ASM Terni CI-SAN obtains data from both data sources, KDD and PSI log files, via the CI-SOC API. Next, SDC instances aggregate the data sources. Linking both data sources is based on common time stamps. Aggregation of the PSI log files is performed by counting e.g. the number of failed logins

Page 39: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 39 (97) during a predefined time frame. Next, SDC sends the data to SA Node. The SA Node instances combine machine learning techniques with statistical approaches to identify coordinated attacks. Due to confidentiality reasons, no details about the implemented algorithms are provided in this deliverable. 4.3.2.6.1.3 Results and Discussion The algorithm identifies coordinated attacks better for a high number of malicious TCP/IP packets which are associated to each cyber-attack and for a high number of attacked SDC instances that belong to the same cluster of hard- and software configuration. To validate the algorithm, two experimental results are described: First, for a high number of 2000 malicious TCP/IP packets which are associated with each attack for a coordinated attack, the algorithm obtains a true positives value of 72,1%, if all SDC instances of cluster 1 are under a coordinated attack. As it is not reasonable to assume that 2000 malicious TCP/IP packets are associated with each attack for a coordinated attack, the performance of the algorithm for less TCP/IP packets is evaluated. Therefore, second hand, for only 5 malicious TCP/IP packets associated with a coordinated attack and for only 50% SDC instances under coordinated attack in cluster 1, the algorithm obtains a true positive value of 12,88%. This behaviour is expected as it is easier to identify coordinated attacks if each attack itself clearly distinguishes from the background signal. On the contrary, the algorithm obtains a higher false positive rate for low number of attacks which a associated with a coordinated attack. In detail, for 2000 malicious TCP/IP packets and all SDC instances of cluster 1 under attack, the algorithm obtains a false positive rate less than 0,01%. For only 5 TCP/IP packets associated with each attack for a coordinated attack, the false positive rate increase to 8,93% for 50% SDC instances of cluster 1 under attack. The experiment gives a proof-of-concept of the developed system and in particular of the applied algorithm to detect unexpected patterns in network traffic. For that, as described above, high and low number of TCP/IP packets for attacks associated with coordinated attacks were analysed. As expected, the algorithm performs better for a high number of malicious TCP/IP packets. However, a high number of malicious TCP/IP packets might be an inrealistic assumption. The experiments show that also for a low number of malicious TCP/IP packets, the algorithm reasonably estimates coordinated attacks. 4.3.2.6.2 Identification of relevant Twitter Tweets Text classification is about assigning a text to one or more classes or categories. Social media platforms such as Twitter tremendously increase the amount of publicly available text messages for analysis in the past few years. The rationale of such analysis is to prematurely gain insights about relevant topics in an automated way. Of particular importance for the energy grid is the identification of calls for coherent behaviours in the grid. Such coherent behaviour is for example caused by people following calls on social media for saving energy and therefore switching off load from the grid such as Earth Hour, Zero Emission Day or the Internation Dark Sky Week (see details below). The coherent removal of load from grid leads to a synchronized reduction of the power demand which may lead to an unstable grid status. In this experiment, the classification of Twitter tweets in general and in particular towards calls to coherent behaviour is analysed. Here, a short overview of the most important insights is given. A detailed description of the experiments is out of scope of the deliverable. 4.3.2.6.2.1 Datasets The SemEval Workshop in 2016 provides a set of tweets. The task is to classify the tweets in positive, negative or neutral expressed opinions. Tweets were downloaded between July and October 2015 by the Twitter API and embrace popular socially relevant topics. Altogether 30,632 tweets are available for analysis. Tweets were post-processed by resorting to the platform Amazon Mechanical Turk. We refer to these data set as SemEval data set. Tweets about calls to coherent behaviour in the energy grid were downloaded by the Twitter API. Tweets are classified towards whether they contain such a call or not. With these calls, people typically want to make a mark for the environment. For the analysis, historical tweets of different calls were used:

Page 40: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 40 (97)

#earthhour was initiated by the WWF. Participants switch off lights for one hour for predefined time frame to save energy. #ZeDay stands for Zero Emission Day. People are invited to forgo the use of fossil fuels. #idsw stands for International Dark Sky Week. People are invited to reduce all kind of light pollutions for a particular week. We refer to these dataset as Hashtag data set. For a background signal, the Twitter API was used one-week long to download tweets. Altogether 251,337 tweets were used in this experiment. 4.3.2.6.2.2 Method For the text classification, several pre-processing steps are performed for each tweet. Tweets are transformed to lower cases and consecutive symbols within a word were reduced to a maximal replication of three, that is, the word “yesssss!” is transformed to “yesss!”. A Bag-of-Word model is used to describe the tweet’s text. For that, only words that occur at least n times in the entire dataset, are used as the overall vocabulary. A parameter analysis is performed in this experiment as well. A multinomial Bayes-Classifier is used to segment the Tweets’ texts into classes. The classifier is separately applied to the SemEval and the Hashtag dataset. Various improvements were implemented and evaluated for the classifier. For confidentiality reasons, no details about these improvements are given in this deliverable. Among others, a simple measure, that is, the error rate, is used to evaluate the performance of the classifier. For a given set of classified tweets, the error rate is defined as the ratio between number of the incorrectly classified tweets and the total number of analysed Tweets. The lower the error rate, the more precise the classifier works. 4.3.2.6.2.3 Results and Discussion For the experiment with the SemEval data set, the classifier obtains an error rate of 48,21% if the vocabulary is defined as words that occur at least 4 times (n>=4). In other words, approximately one in two tweets gets correctly classified. For the experiment with the Hashtag data set, the classifier obtains an error rate of 1,81%. Here, a vocabulary with words that occur at least 5 times (n>=5) is used. The classifier performs better, if rare words of the vocabulary are deleted. Deeper analysis shows that the classifier performs better with the identification of tweets that do not contain a call for coherent behaviour (negative class, error rate of 2,56%) than tweets that contain a call (positive class, error rate of 21,77%). However, in a real case scenario, one can assume plenty of tweets with calls for a coherent behaviour will occur. Due to the mass of tweets, the classifiers worse performance to classify tweets which contain this call not have significant impact. If a correct tweet classification is missed, plenty of further tweets are still available for analysis. The experiments show a proof-of-concept for the automated classification of tweets. In particular, the experiment comprising the Hashtag data set may play an important role for energy provider as well as other critical infrastructure operators. An automated analysis of the Twitter stream becomes possible. In case of significant increase of tweets with calls for coherent behaviour, an alert message can be transmitted to the controller of the critical infrastructure. 4.3.2.7 Relationship between SDC and CI-SOC CI-SOC obtains data from various data sources inside the critical infrastructure. In the case of energy grids, the data stem from the utility and include, for instance, meter data, IT-related data such as firewall and SCADA log files and energy-related data such as log files from the control room. These data can indicate that measurements for certain components exceed the user-defined thresholds at a critical level. CI-SOC is part of the critical infrastructure and part of the critical infrastructure’s information system. Before data can leave the domain of the critical infrastructure, it must be anonymised. Hence, CI-SOC anonymises data before sending it to SDC. SDC and CI-SOC work in a 1:n relationship. This can be a 1:1 relationship or an SDC instance may cover several CI-SOC instances; for example, if separate utilities provide electricity, water and gas in an area, there might be a single SDC responsible for the area. The relationship between SDC and CI-SOCs is flexible and can be designed to accommodate local needs and circumstances.

Page 41: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 41 (97) SDC receives data from the local CI-SOC instances which contains information about the identified security incidents, the initiated countermeasures and measurements from the NORM devices (Interface 3, see Ch. 4.6.3). CI-SAN does not initiate countermeasures itself. Here, CI-SAN differs to CI-SOC which autonomously performs countermeasure selection and initiation based on algorithms and the Countermeasures Knowledgebase Database (see Ch. 4.3.1). CI-SOC visualises the results at the local operator’s level. 4.4 SUCCESS API 4.4.1 Definition and Motivation The SUCCESS Security Solution concerns three separate parts, as shown in Figure 18:

the Critical Infrastructures themselves; the Security Operations Centres on CI-level which monitor the CI, analyse the CI data, detect security incidents, initiate countermeasures on a per-CI level and communicate about the anonymised data, incidents and countermeasures; the Security Analytics Network which gathers and aggregates data from the CI-SOCs, analyses the data, detect security incidents, and communicate about the data, and the incidents. The two interfaces between these three parts of the SUCCESS Security Solution represent the major external interfaces defined in -and published by- the SUCCESS project and are referred to as the SUCCESS API to highlight that they are meant to be supported by any systems which implement the SUCCESS Security Solution. Referring to Figure 22, the SUCCESS API comprises: I1 (interface between CIs and CI-SOC), called the CI-SOC API I3 (interface between CI-SOC and SDC), called the CI-SAN API The other interfaces shown in Figure 22 may be considered to be internal to one of the three parts and are not published in a fully-defined way by SUCCESS. All the SUCCESS interfaces, both the SUCCESS API and the others are described in Ch. 0. Figure 18: SUCCESS Security Solution The CI-SOC API allows the CIs to pass data to the CI-SOC for security analysis. The data sources are CI-internal data sources and the CI-SOC also belongs inside the information domain of the CI operator. Hence, this CI-SOC API defines the interwork between the CI-SOC the CI-internal

Page 42: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 42 (97) data sources, such as smart meters, smart meter gateways, communications equipment, SCADA systems firewalls etc. The CI-SAN API defines the interwork between the CI-SOC (which is part of the information domain of the CI) and the CI-SAN, which is a separate information domain. Hence, any personal data passed over this interface must be anonymised before leaving the CI-SOC. The CI-SAN API supports passing information about (1) security incidents, (2) triggering and advising of countermeasures, and (3) further payload between the SUCCESS components. Payload data is data about the grid status (obtained by NORM) or the IT infrastructure (obtained by e.g. firewall log files). The SUCCESS API is published for use by Critical Infrastructure operators (DSOs/TSOs). The main purpose in exposing the SUCCESS API is to allow interoperability of different components and a flexible, downstream implementation of additional threats and countermeasures. The API definition will be made openly available. However, the actual data passed on the API will be restricted and subject to security controls. In addition, no data related to private persons will be passed on the SUCCESS API. Data will be anonymised and private data protected. The SUCCESS API provides unified definitions of how grid-state data can be made available between DSOs and the organisation(s) operating a CI-SAN comprising national or international SA Nodes and national or regional SDCs for gathering and aggregating security data. Analysis and comparison of these data at the different levels can reveal abnormalities which may be caused by physical or cyber-attacks. Therefore, the communication via the SUCCESS API implements the holistic security approach of SUCCESS, which includes multiple tiers for the detection of a security incident and the initiation of countermeasures. Chapter 4.5 gives an informal overview of SUCCESS API’s features. Each subchapter of Chapter 4.5 motivates the interface for the considered software component. In the paragraphs “Communication Channels” a more detailed description of each interface is given. Here the interfaces are distinguished in particular by the sender, receiver and the format. Moreover, a more detailed description of the interface is given. Appendix C gives an example for a formal definition of one RESTful service of the SUCCESS API. Interfaces I4, I5 and I6 describe interfaces between components of CI-SAN and are therefore not part of the official, public available SUCCESS API. Therefore, there are not subparagraphs “Communication Channel” available for these interfaces in this document. However, the SUCCESS consortium plans to make a documentation, similar to the remaining interfaces, available for internal usage by an additional confidential deliverable. Thereby, the SUCCESS consortiums ensures that the CI-SAN components can be used by all consortium members. By not making interfaces I4, I5 and I6 publicly available, we provide a quality assurance for all services that are provided by CI-SAN, as we ensure that only proprietary data processing services offer data to CI-SAN. Hence, we provide data sovereignty for the end-users by CI-SAN. 4.4.2 Documentation of the SUCCESS API The CI-SAN API is designed for multiple use-cases and therefore the CI-SAN resorts to REST architecture as well as to a Publish/Subscribe mechanism. RESTful API – Request/Response Protocol The CI-SAN API follows the REST architecture. REST services typically resort to HTTP, such that the services of the CI-SAN API are triggered by the HTTP methods GET, POST, PUT and DELETE. For each method, the services answer with a well-defined response: with GET, a response object is called, with POST, a resource is created, with PUT, a resource is altered and with DELETE, a resource is removed. The documentation of the CI-SAN API differentiates between the HTTP verbs and gives the request’s header, which contains meta-information about the REST call, as well as the request’s body, which contains the payload. The payload is usually given in JSON or XML format. The documentation encompasses the response message of the service as well. The response message contains at any rate the HTML status code of the request. Examples for the HTML status code are 200 for OK, 201 for Created and 202 for Accepted. Due to the REST architecture, the CI-SAN API provides the following features:

Client-Server-Model,

Page 43: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 43 (97)

Stateless, Caching, Uniform Interface, Layered System, and Code on Demand. These features lead to various advantages for the CI-SAN API. First, by encapsulating, the independent development of client and server becomes possible. In particular, the server usually gets a lighter implementation. Second, by using a stateless communication, that is, each operation is self-contained, the API of the services becomes clear and well-defined. Finally, REST is a resource-oriented architecture, that is, altering the resource leads to changes in the client’s state. Each resource has to be identifiable by an Uniform Resource Identifier (URI). MQTT – Publish/Subscribe Protocol MQTT is the short form of Message Queue Telemetry Transport and is based on publish/subscribe architecture. It consists of a centralised broker which is used to transmit all the communication between end points. The MQTT protocol is built on the TCP protocol, therefore connection is established between one endpoint and the broker before the communication happens. Endpoints can be a Publisher or a Subscriber or both. Subscriber endpoint subscribes to any TOPIC with the broker. The Publisher publishes the message on the topic. Subscriber(s) gets the published message on subscribed topic. Figure 19 MQTT architecture depicts the MQTT architecture.

Figure 19 MQTT architecture MQTT has many advantages over other protocols being used in industry. It is a lightweight protocol with as little as 2 bytes of overhead. Moreover, MQTT provides reduced latency, more responsive, less battery drainage, and less bandwidth consumption. Publisher Broker Subscribersource Public/Private Server sinkConnectConnect Ack ConnectConnect AckPublish (topic, data) Publish (topic, data)Subscribe (topic)Subscribe Ack

Page 44: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 44 (97) It is a promising protocol for the SUCCESS architecture, therefore MQTT is implemented for the interfaces I1, which describes the communication between NORM and BR-GW (CI-SOC API), and I2, which describes the communication between NORM and CI-SOC. All standard security features, such as SSL, PGP or OAuth, can be used to secure the communication. 4.4.3 Discussion on Data Models describing Cyber-Security Incidents To support interoperability, to provide transparency to the user and thereby to increase the acceptance of the solution, CI-SAN makes use of widely adopted standards for describing cyber-security incidents and threats. For a nice reading flow, in this document, the term "Security Data Concentrator" (SDC) refers to the Soft- and Hardware element of the SUCCESS architecture as well as to the critical infrastructure (CI) the SDC instance is covering. The critical infrastructure is of intereset as it is typically either under attack or at risks of being attacked. It is always clear from the context which meaning is used. In same way, the term “attack” and the term “incident” are used interchangeably when an incident represents only one attack and not multiple ones. 4.4.3.1 Related Studies Various data models have been proposed by the research community [21]. For example, Structured Threat Information eXpression (STIX) is a community-driven structured language to define cyber threat information. For that, STIX provides an architecture tying together cyber threat information including Cyber Observables, Indicators, Incidents, Adversary Tactics, Exploit Targets, Course of Action, and Cyber Attack Campaigns and Cyber Threat Actors. Moreover, Cyber Observable eXpression (CybOX) is a standardized schema for specifying events that are observable in the operational domain. CybOX resorts to objects describing cyber information. These objects can be for example E-Mail Message objects, Unix Process Objects or Disk Objects. Both, STIX and CybOX, are developed by the United States Department of Homeland Security (DHS) and are used by various research projects such as CES-21 (see [24] for more details) or the ECOSSIAN project (see [19] for more details). 4.4.3.2 Data Model Requirements The prospective data model of the CI-SAN API should be expressive enough for a comprehensive exchange of security incident information. In other words, the data should cover contextual data about incidents to facilitate immediate response by CERTs that are or could be under attack. Contextual data could comprise attack description, the method and strategy of the attacker, countermeasures performed and any related recommendations. The CI-SAN API interfaces defined in the SUCCESS Security Solution (SSS) (see Ch.4.5) support passing information about cyber-security incidents. Information exchanged are classified into mainly two types of messages: Report message and alert message. They are defined as follows:

A report message describes a confirmed intrusion that is being identified by CI-SOC or SA Node. The intrusion related information is then exchanged between CI-SOC, SDC and SA Node. An alert message describes initial information of a happened attack as a warning for the SDC(s) with high probability of being attacked as well. For the SUCCESS Security Solution (SSS) IODEF format has been selected to represent information of the incident report message. To represent the alert message, IDMEF has been selected. The following chapters explain IODEF and IDMEF data formats and the rationale for choosing them. 4.4.3.3 IODEF Format The Incident Object Description and Exchange Format (IODEF) is a standard developed by IETF and compiled in Request for Comments (RFC) 5070 [19] [20]. IODEF is based on XML and is used to share incidents information of common interest for Computer Emergency Response Teams (CERTs) which typically have an operational responsibility for watch-and-warning over organizations. Cybersecurity information includes attack pattern, platform information, vulnerability and weakness, countermeasure instruction, computer event logs, and severity assessments. The IODEF format is also used for the representation of data and statistics demanded to effectively respond to a security incident. The data model encompasses information

Page 45: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 45 (97) about hosts, networks and the services running on systems. Moreover, it provides information about the attack methodology and can contain forensic evidences [21]. The XML representation leads to a light-weighted architecture. Moreover, for an ad-hoc communication JSON based formats are established in the CI-SAN API. IODEF with JSON is a human readable format which is also designed for machine processing. See the detailed description of each interface (Ch. 4.5.5, Ch.4.5.6) for more details about the specific JSON messages. The IODEF format has a very expansive and complex structure, which makes it hard for CERT to mannuly generate them such as it is typically the case nowadays. However, the complexity of the IODEF model is proportional to the needed fields that form the structure of the model. IODEF is an object-oriented representation where fields are either classes or attributes. They are defined as follows:

IODEF class is a description of a group of subclasses with common properties. For example, the IODEF-Document class contains attributes like version and sub-classes like Incident and AdditionalData. IODEF attribute is a single element which stores data. For example, the version of the IODEF document stores a single value about the IODEF specification version number to which certain IODEF document conforms Figure 20 The IODEF document class An example of the IODEF attributes, classes and the relationships of each class with other classes is shown in Figure 20. The example shows the IODEF-Document class which is the top-level class in the IODEF data model. The IODEF-Document class has different either required or optional attributes of different data types such as version and xml:lang. The IODEF-Document class has also various aggregated classes of information for different uses. Instances of the subclasses are created either zero, one or many times. In some cases, an instance of a subclass should be only presented once as a mandatory class. An IODEF document as shown in Figure 21 must include at least an Incident class, which contains a purpose attribute and three mandatory-to-implement elements. These elements are:

GenerationTime class, which describes the time of the generated incident document, IncidentID class, which identifies the IODEF document along with the name attribute, and at least one Contact class, which contains role, type attributes and describes contact information for organizations and personnel involved in the incident.

xml:lang attribute which defines the supported language, version attribute which describes the version of the IODEF format [30].

Page 46: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 46 (97)

Figure 21 Minimal IODEF-Document Class 4.4.3.4 IDMEF Format The Intrusion Detection Message Exchange Format (IDMEF) is a standard data format designed to report alerts about events that deem suspicious [31]. IDMEF main usage is to share information with a focus on intrusion detection events [32]. IDMEF was originally written to be encoded in XML. The availability of tools for processing XML document makes IDMEF a good choice. The IDMEF data format is mainly targeted toward automated alert notification systems. The data model encompasses information of the possible sources, targets of the event and the services that caused the event if information is available. Moreover, it provides an assessment of an event, its impact, actions taken in response as well as the confidence as an estimation of the validity and accuracy of information provided [31]. IDMEF data format opens the door for cooperation among different IDSs used by CERTs and harmonize the variation in information levels provided by different data types, like network traffic, OS logs and application logs [34]. The IDMEF data model addresses several problems that appear when representing alert information. One example, is the heterogeneity of alert messages. Each time an alert message is generated, different information is provided upon availability. Alerts messages can be generated with very little information, such as origin, destination, name, and time of the event, or with more information if available, such as ports or services, processes and user information. Moreover, IDMEF is flexible enough to cover complex alerts that aggregate simpler alerts. The IDMEF is a complex structure that is designed for automatic processing [33]. The complexity of the IDMEF format is due to the number of classes and attributes that form the structure of the model. IDMEF message include at least an: Alert class, which contains a messageid attribute as a unique identifier for the alert and three mandatory-to-implement elements. These elements are:

Analyzer class, which describes the analyzer or probe that sent the message/alert, CreateTime class, which describes the date at which the message/alert was created, and Classification class, which allows human operator to understand what a particular alert stands for.

version attribute, which describes the version of the IDMEF format [31]. 4.4.3.5 Proposed Data Model for the SUCCESS API IODEF has been chosen as the core data model used by CI-SAN for collecting structured event-based incident data. The rationale is that IODEF is an open standard evaluated by various CERTs, it is vendor-neutral in origin and flexible by resorting to XML allowing extensions and grouping of events. The XML representation leads to a light-weighted architecture. The decision for the right data model was based on its number of features. Messages describing an incident, its countermeasures, or any further information can be provided in separate IODEF

Page 47: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 47 (97) documents when this information is available. This can be realized by using the IODEF class RelatedActivity. Each Incident IODEF document has a unique identifier specified in the IncidentID class. The value of the RelatedActivity class should refer to the IncidentID of the incident document which contains further information. Another property of IODEF format is the flexibility of extending the model to include additional information that is hard to be represented in the main model. Although IODEF is a rich model of attributes and classes, however, there could be cases when information relevant to the IODEF message does not fit with any of the proposed IODEF attributes and classes. In such cases, the IODEF can be extended by the AdditionalData class of type EXTENSION to represent this information [30]. IODEF also supports evidence. Evidence Information supports information about the security incidents. Information could include, for example, data from syslog file, data dump taken from intrusion detection system incidents database, or any other data that participated to cause the alert or could contribute in increasing the confidence about the event. The evidence format could be a summary PDF report, CSV of raw data related to the detected attack or in an Image format. IODEF has the potential to send an entire file as an evidence item or use a link to that file [30]. For the exchange of the alert message, IDMEF has been chosen as the core model for the CI-SAN. An IDMEF message transmits technical information about an intrusion (or possible intrusion) between SDC(s) and/or between SA Node(s) and SDC(s) which in turn forward these alert messages to CI-SOC for further actions. The IDMEF data model is designed in a structured way for the easy representation of the alert message [35]. IDMEF provides varieties of classes for better representation of the alert message. For instance, the ToolAlert class provides further information related to the tools used in the attack or malicious programs such as Trojan horses. The ToolAlert class helps in grouping previously-sent alerts together, by listing their IDs in the aggregate class alertident and specifying the name of the used tool which caused these alerts in the name class. This information can be beneficial for the receiver of the alert message for better control and supervision [31]. Since IDMEF is widely used by many IDSs installed by CERTs and by many automated alert notification systems, this in turn will facilitate the understanding and further processing of the received alert messages [34]. IODEF is based on IDMEF and it supports backward compatibility to IDMEF [37]. Due to their related nature, data generated in IDMEF format can be easily converted to an IODEF document if needed. For example, through various extension mechanisms, it is possible for the IODEF document to include or reference to IDMEF alert message as initial incident details. In this way, implementation compatibility problems can be avoided in the future. Considering IODEF and IDMEF formats will result in a full coverage of all information elements that describe a security incident. The well-defined structure of theses formats give semantics to these elements. IODEF and IDMEF are not competing formats, rather complementary [37][38]. In general, the lifetime of the IODEF message is much longer compared to one-time use of the IDMEF message [39]. Two examples in speech, which makes the difference between IODEF and IDMEF messages clear, are as follows:

IDMEF: At 14h01, someone tried a login with a wrong password on the Apache web server. IODEF: In September, we have been attacked many times by an Chinese IP address where a Man-in-the-Middle attack was used to penetrate our network. We found a solution which we present in this document. The IDMEF message in the first example contains information of a failed login attempt by a user which could be an attacker. The information in the IDMEF message is used to warn the receiver about a possible attack that could happen in the future. In the second example, the IODEF message contains information about a confirmed incident composed of several attacks. Details like incident’s time, attacker IP address, method of the attack, and applied actions to mitigate the attack has been provided. Although IDMEF is a rich format with a wide range of attributes and classes, however, it has some limitations. For instance, elements describing attacks and vulnerabilities are not presented in IDMEF compared to IODEF, since these elements provide information of a confirmed intrusion, which contradict with the definition of the alert message sent in IDMEF format. In addition, enumerations are static in IDMEF format which is not compatible with the speed at which

Page 48: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 48 (97) cybersecurity evolves. IODEF has solved this issue by allowing to define a customized value by using ext-value in the enumeration and then filling the field ext-value in the same class by the new wanted value. IODEF provides AdditionalData class as well as the Description class to every significant section of the message over one shared additional data class linked to the alert in the IDMEF format. Regardless of the potentials of each of the two formats in representing alerts and incident report, the different types of messages exchanged in CI-SAN require using more than one data models and data formats, for the better classification and further processing. 4.5 Data Model for the SUCCESS API From previous projects and experiences in the field of security and intrusion prevention and detection, the attack information that has been noticed to refer to malicious action are mainly: IP addresses, domains and URLs, attack methods, events, indicators, how they were discovered, and the assessment of the effects on the victim. However, CI-SAN API has its customized data models based on the IODEF and the IDMEF data model for each type of exchanged message. The proposed models will be explained in the below sections. SDC instances receive IODEF messages about incidents against different CIs. Specifying which CI the document belongs to is not covered by IODEF and IDMEF. Therefore, the IODEF model is extended to add information about CI in a customized field mandatory for the CI-SAN API. The additional information is added by using the AdditionalData class which indicates the concerned CI. IDMEF is also providing this information in the same manner. IODEF and IDMEF use a number of simple and complex types. The data types of the CI-SAN API data models can be found in [30] and [31]. The SUCCES API IODEF and IODEF data model includes the below listed attributes and classes. The models differentiate between mandatory and non-mandatory fields:

Mandatory fields must be provided for the IODEF message to be considered a valid message. Mandatory attributes are marked in bold. All other non-mandatory fields will be provided when available. For example, some of these fields are more related to the type of the attack/incident or more depending on the context. Non-mandatory fields are marked in italic. The IODEF data model is defined as follows:

Version xml:lang Incident purpose

ext-purpose GenerationTime Description IncidentID

id name

RelatedActivity IncidentID

id name

Contact type role

EventData Description DetectTime Flow

System Description category

Page 49: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 49 (97)

Node Location Address

category AddressValue

Record RecordData

Description RecordItem

Assessment IncidentCategory

History HistoryItem

action ext-action Description DataTime AdditionalData

AdditionalData Details about the IODEF data model can be found in Appendix C.1. The IDMEF data model is defined as follows: version Alert messageid Analayzer

name analyzerid

CreateTime ntpstamp

Classification text

DetectTime ntpstamp

Source ident Node

location Address

category address

Target ident Node

location name Address

category address

Assessment Impact

severity AdditionalData Details about the IODEF data model can be found in Appendix C.2.

Page 50: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 50 (97) 4.6 Interfaces in the SUCCESS Security Solution As depicted in Figure 22, the SUCCESS Security Solution (see D4.3 [6] for more details) comprises various devices, which exchange information through the depicted interfaces. These interfaces define which data is shared.

Figure 22: SUCCESS Security Solution. Interfaces between the elements are denoted as I1…I8 4.6.1 I1 between NORM and BR-GW or CI-SOC The I1 interface provides connectivity between NORM and BR-GW or, in configurations without BR-GW, between NORM and CI-SOC. All the information between the rest of the other components (SA Node, power system applications) and NORM will be exchanged over this interface. The CI-SOC will handle the real-time information on various metrics that are measured from the smart meter and are proxied or processed by NORM, such as voltage levels, angles between voltages, frequency. This data will be also sent through the BR-GW. Details of the data is included in the D3.8 [5]. Moreover, NORM will send PMU data towards CI-SOC. This data will be sent to the BR-GW and the BR-GW will transmit the data further to the CI-SOC. The NORM will send metadata information and security related data such as tampering detection reports, access control statistics, credentials, challenge / response updates to the BR-GW which will be processed in BR-GW functions and further in CI-SOC and SA Node. 4.6.1.1 Communication Channels 4.6.1.1.1 Channel 1

Page 51: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 51 (97)

Sender: NORM Receiver: BR-GW Short Description: metadata information Format: JSON Content: NORM will send its metadata information to BR-GW The metadata information from the NORM includes the mobile network security information, handshakes, IP address assignment, authentication, authorization, and this information is always exchanged between the NORM (mobile user equipment) and the mobile network and with the BR-GW. This channel is not exposed to the user. 4.6.1.1.2 Channel 2 Sender: NORM Receiver: BR-GW Short Description: Data Centric Security check Format: JSON Content: NORM will send firmware and data signatures over this channel to BR-GW so that the BR-GW can detect any anomalies. Use case 1: Integrity check of the NORM data MQTT Topic: NORM/DSOSMC/SECAGENT { "norm_ip":string, //IP address of the norm "request_id":string, // alpha-numeric unique id of the hash request "timestamp":number, //timestamp when the data is sent to UI-SOC "norm_data":string //NORM data (i.e. encryptionTimestamp encryptedData, as outlined at the end of step 4 description) "hashed_data":string //hashed NORM data } Use case 2: Integrity check of the NORM firmware Request: MQTT Topic: NORM/DCSAGENT/DSOSMC BRGW→DCSAgent - NORM: checkFirmwareStatusRequest { "norm_ip":string, //IP address of the norm "request_id":string, //alpha-numeric unique id of the request "request_ts":number //timestamp when the request is generated } Response: MQTT Topic: NORM/DSOSMC/DCSAGENT {

Page 52: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 52 (97)

"norm_ip":string, //IP address of the norm "request_id":string, //alpha-numeric id of the request "request_ts":number, //timestamp when the request is generated "response_ts":number, //timestamp when the response is generated "norm_data":string //hash-value of the firmware and the certificate generated based on the hash-value } 4.6.1.1.3 Channel 3 Sender: BR-GW Receiver: NORM Short Description: Action taken on NORM Format: JSON Content: BR-GW will take action / send commands to the NORM over this channel. 4.6.2 I2 between BR-GW and CI-SOC Once the incident countermeasures analysis has been accomplished by CI-SOC and the part of the network and the devices with suspicious behaviour have been identified with some probability, the CI-SOC component oversees selecting the best countermeasure among the available ones to solve as much as possible the identified violations. Possible countermeasures are to communicate information on compromised devices and detected anomalies in pattern to the BR-GW. This information is communicated to the BR-GW. The same is possible in the other direction, that BR-GW can detect security anomalies and send alarms on them to the CI-SOC. Hence, there is a bidirectional exchange of data over interface I2 as shown in Figure 22. The functions described in Ch. 4.2.4 such as Data Centric Security will notify CI-SOC over this particular interface. 4.6.2.1 Communication Channels 4.6.2.1.1 Channel 1 Sender: CI-SOC Receiver: BR-GW Short Description: Information Forward and/or actions from CI-SOC Format: JSON Content: CI-SOC will talk to BR-GW for any actions / commands needed to be forward to NORM. Use case 1: CI-SOC wants to run the integrity check on NORM firmware via BR-GW MQTT Topic: BRGW/DCSAGENT/DSOSMC/FIRMWARE CI-SOC→BRGW: checkFirmwareIntegrity Request { "norm_ip":string, //IP address of the norm "request_id":string //alpha-numerical id of the request “INTEGRITY_VERIFICATION_random string” "request_ts":number //timestamp when the request is generated }

Page 53: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 53 (97) Use case 2: CI-SOC wants to block NORM traffic MQTT Topic: DSOSMC/BRGW/COUNTERMEASURE/BLOCKNORM I-SOC→BRGW: blockNORM Request { "norm_ip":string, //IP address of the norm "request_id":string, //alpha-numeric id of the request "request_ts":number //timestamp when the request is generated } BRGW→CI-SOC: blockNORM Response { "norm_ip":string, //IP address of the norm "request_id":string, //alpha-numeric id of the request "request_ts":number //timestamp when the request is generated "blocked":boolean //true if NORM traffic was successfully blocked, false otherwise } 4.6.2.1.2 Channel 2 Sender: BR-GW Receiver: CI-SOC Short Description: Security breach notification Format: JSON Content: DCS running on BR-GW will notify CI-SOC of any detected anomalies on NORM. This type of information will be send to CI-SOC over this channel. MQTT Topic: BRGW/DSOSMC/DCSAGENT BRGW→DSO-SMC: checkFirmwareIntegrity Response { "norm_ip":string, //IP address of the norm "request_id":string, //alpha-numeric id of the request "request_ts":number, //timestamp when the request is generated "response_ts":number, //timestamp when the response is generated "norm_data":string, "verification_status":string //possible values are OK or NOT OK }

Page 54: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 54 (97) 4.6.2.1.3 Channel 3 Sender: BR-GW Receiver: CI-SOC Short Description: Information Forward Format: JSON Content: BR-GW will be able to forward any data / messages received from the NORM to CI-SOC. 4.6.2.1.4 Channel 4 CI-SOC can receive the following info from the Edge Cloud:

Cloud resources status; Virtual functions topology, describing the relationships among virtual functions and physical grid resources (physical network segments). SDC can send instructions for the virtual instances relocation to the Edge Cloud. E.g. which virtual instance will be migrated, shutdown etc. Sender: Edge Cloud Receiver: CI-SOC Short Description: Edge Cloud events Format: JSON Content: The edge cloud continuously monitors its CPU resources and sends it to SDC. 4.6.3 I3 between Critical Infrastructure Security Operations Centre and SDC 4.6.3.1 Description Countermeasures related to attacks through the Neighbourhood Area Network (NAN) zone are addressed by CI-SOC. CI-SOC matches and correlates potential new and old incidents (with high impact and risks detected within WP1), with the most suitable security actions and countermeasures for the specific application scenario related to NAN-level smart devices. CI-SOC selects countermeasures from a countermeasures repository that will be incrementally populated. CI-SOC will share information about the detected threats and incidents, the initiated countermeasures as well as pre-processed data received from NORM with SDC. At TSO/DSO level, SDC and CI-SOC interwork in a 1:1 relationship (in the case of the SUCCESS project’s instantiation of the SUCCESS Security Solution, although the SUCCESS Security Solution’s architecture supports a 1:n relationship) where CI-SOC provides one among many data sources for SDC. Going further into detail, the data exchanged between CI-SOC and SDC are bidirectional. On the one hand, CI-SOC analyses grid related measurements (voltages, power and frequencies) and sends potential anomalies to SDC. This information allows a meta-analysis to be carried out by CI-SAN at pan-European level, which may potentially detect also the location of the detected security incident if NORM data is associated with its position obtained from a Geographic Information System (GIS) of the DSO. On the other hand, CI-SAN analyses data obtained from DSOs/TSOs across Europe and combines them with external data sources such as social data (i.e. twitter streams) to identify new trends in the energy sector. This information is shared with CI-SOC. Also, if CI-SAN marks a specific set of NORMs as being corrupted, then the SA Node informs the CI-SOC about these compromised NORMs. Interface I3 is part of the SUCCESS API along with I1.

Page 55: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 55 (97) 4.6.3.2 Communication Channels See Appendix B for a formal definition for Channel 3, Channel 5 and Channel 6. For the remaining Channel, see Deliverable 3.4. 4.6.3.2.1 Channel 1 Sender: SDC Receiver: CI-SOC Short Description: Attack found Exchanged data Report message Content: SDC has received an alarm message describing an attack(s)/incident(s) detected by SA Node which is then forwarded to CI-SOC. Each incident can be described by an IODEF message. The IODEF message document must contain at least all mandatory attributes and classes of the IODEF model (see Chapter 4.4.3). The IODEF message is exchanged by the HTTP protocol. 4.6.3.2.2 Channel 2 Sender: SDC Receiver: CI-SOC Short Description: Twitter results Format: IDMEF Content: SDC receives security related trends which were derived from social media channels by SA Node. SDC forwards them to CI-SOC. CI-SOC informs its authorities via its dashboard about the finding. The finding is described in JSON format including at least a free text description of the social media trend. 4.6.3.2.3 Channel 3 Sender: CI-SOC Receiver: SDC Short Description: Attack identified Format: IODEF Content: CI-SOC informs SDC about all detected threat or incidents which CI-SOC finds in the field and in particular among the monitored NORM devices. These incidents are described in IODEF message. The IODEF message document must contain at least all mandatory attributes and classes of the IODEF model (see Chapter 4.4.3). The IODEF message is exchanged by the HTTP protocol.

Page 56: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 56 (97) 4.6.3.2.4 Channel 4 Sender: CI-SOC Receiver: SDC Short Description: Countermeasure applied Format: IODEF Content: CI-SOC informs SDC about all applied countermeasures and actions performed. The countermeasure information is provided in the IODEF format described in Chapter 4.4.3. 4.6.3.2.5 Channel 5 Sender: SDC Receiver: CI-SOC Short Description: Alert message initiated Format: IDMEF Content: SDC has received an alert message from SA Node which is then forwarded to CI-SOC. The alert message notifies CI-SOC when an attack has been detected against other SDC. Attacks against SDC with similar hard- and software configuration are interesting for the CI-SOC due to a likelihood that vulnerabilities are shared. The alert message is described in IDMEF format. The IDMEF message contains all mandatory fields (see Chapter 4.4.3). The IDMEF message is exchanged by the HTTP protocol. 4.6.3.2.6 Channel 6 Sender: CI-SOC Receiver: SDC Short Description: Input measured values description Format: MQTT Content: CI-SOC provides SDC with critical infrastructure related data which in turn forward these data to SA Node for analysis. Data sources can be IT-related systems such as firewall log files from the DSO/TSO, SCADA log files, meter data, and any utility-internal related data. 4.6.4 I4 between SDC instances The interface I4 shall enable the exchange of attack information between SDC instances. At the current instantiation of the SUCCESS architecture within the project, this communication is proposed to be activated when the SA Node is unavailable. The interface serves then as an alternative way for the SDC instances to share information regarding detected attacks and identified attack patterns from the CI-SOC level. The interface I4 can be established at a local or national level, as shown in Figure 24. The dotted lines imply that communication may or may not be established between instances from different regions according to a set of predefined criteria such as type of attack, critical infrastructure peer type, HW and SW configuration etc. For comparison, Figure 23 depicts the CI-SAN when the SA Node is fully operational. Note that each SDC shares with the SA Node instance a larger amount of data than with the rest of the SDC instances over I4. This is because the SA Node uses the SDC data to detect threats on a European level, whereas the SDC instances simply forward attack notifications and patterns created at CI level by a CI-SOC.

Page 57: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 57 (97)

Figure 23: SUCCESS CI-SAN when SA Node is available Figure 24: SUCCESS in case of SA Node unavailability: I4 between all SDC 4.6.5 I5 between SDC and SA Node Interface 5 (I5) in Figure 13 comprises the exchange between the SA Node and its corresponding SDC instances. As described in Ch. 4.3.2.3, the data on the SDC are aggregated and anonymised data originating from the critical infrastructure information level. SDC transmits them to the European information level. Thus, Interface 5 separates the European information level, which comprises the SA Nodes, from the multiple critical infrastructure information level, whose top layer is embodied in the SDC instance and includes among others the respective CI-SOC instances for each critical infrastructure. The SA Node data domain can be considered as “public” within the CI-SAN system, since the SA Node aggregates and processes the data and patterns from all SDCs, including open data from external sources. If multiple SA Nodes form a network, the access to “public” data may be restricted according to the applying privacy and data sharing regulation between countries. The data that the SDC shares with the SA Node includes pattern-specific data, even if the CI-SOC cannot detect any significant abnormalities and of course any further identified attacks/threats. The SA Node will share through I5 any identified patterns, weighted with the results from external data sources (see Ch. 4.6.6). Since these data comprise information that is “public” in the CI-SAN sense, the SA Node part of I5 does not need a further anonymisation process. The data flow between SDC and SA Node can be summarized with three use cases: Identified cyber-attack(s) and applied countermeasure(s): sent from SDC(s) to SA-Node Alert about an increased probability of cyber-attack: sent from SA Node to SDC(s) Alarm about detected single/coordinated attack(s): sent from SA Node to SDC(s)

Page 58: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 58 (97) 4.6.6 I6 between SA Node/SDC and Critical Infrastructure-External Data Sources Interface 6 in Figure 13 comprises the communication content between SA Node and data sources which are external to the Critical Infrastructures’ infrastructures. External data sources may vary significantly in the type of data that they may provide. For example, it has been examined that social media can provide meaningful insights to a secure operation of critical infrastructure (see Chapter 4.3.2.6.2). Here, I6 gives data from social media to be analysed by SA Node. Moreover, Interface 6 ensures that the data from these sources are processed and filtered adequately. Social media analysis is performed by the SA Node and the results are shared with SDC instances by Interface 5 (see Ch. 4.6.5). 4.6.7 I7 between CI-SOC and Critical Infrastructure-internal data sources CI-SOC can further receive data from internal data sources by Interface 7 in Figure 13. This data is called Critical Infrastructure-internal since they come from within the Critical Infrastructure’s data domain. Thus the data are subject to the data privacy and security restrictions applicable inside the Critical Infrastructure’s data domain. Such data can potentially stem from (1) energy-related sources such as SCADA systems and control room software; and (2) IT-security sources such as SIEM systems, firewalls or antivirus scanners. The functionality of interface 7 focuses on extracting the necessary content, similar to the functionality of interface 6. CI-SAN obtains the desired data and forwards them via I3 to SDC instances (see Chapter 4.6.3.2.6 for details).

Page 59: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 59 (97) 5. Conclusion This deliverable describes several components and interfaces the SUCCESS Security Solution: the Critical Infrastructure Security Analytics Network (CI-SAN) and the Breakout Gateway. All of these components are involved in the detection of security incidents and the implementation of countermeasures to mitigate the incidents. The set of countermeasures to the cyber-security threats applicable for electrical grids has been outlined in this deliverable. The SUCCESS Security Solution is demonstrated in three field trials in electrical grids in Ireland, Italy and Romania. Each field trial implements use cases where cyber-attacks are emulated and the relevant countermeasure is applied using the components of the SUCCESS Security Solution. The descriptions of the components and interfaces results from the implementation, testing and integration of the components and also on the use of the components in the field trials.

Page 60: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 60 (97) 6. References 1. SUCCESS D1.2 v1.0, “Identification of existing threats V1”, April 2017 2. SUCCESS D1.4 v1.0, “Threat Classification and Risk Analysis”, October 2017 3. SUCCESS D2.5 v1.0, “The Resilience by Design Concept V2”, April 2018 4. SUCCESS D3.6 v1.0, “Information Security Management Components and Documentation, V3”, April 2018 5. SUCCESS D3.8 v1.0, “Next Generation Smart Meter, V2”, April 2018 6. SUCCESS D4.3 v1.0, “Solution Architecture and Solution Description, V3”, April 2018 7. SUCCESS D4.8 v1.0, “Integration and Validation Plan - Test and certification specifications, V2”, July 2017 8. SUCCESS D5.1 v1.0, “Trial Site Planning”, April 2017 9. SUCCESS D5.2 v1.0, “Operational Trial Results, V1”, April 2018 10. Angioni, Andrea, et al. "Coordinated voltage control in distribution grids with LTE based communication infrastructure”, Environment and Electrical Engineering (EEEIC), 2015 IEEE 15th International Conference on. IEEE, 2015. 11. Dahlman, Erik, Stefan Parkvall, and Johan Skold, “4G: LTE/LTE-advanced for mobile broadband”, Academic press, 2013. 12. Dohler, Mischa, and Takehiro Nakamura, “5G Mobile and Wireless Communications Technology”, Eds. Afif Osseiran, et al. Cambridge University Press, 2016. 13. Pau, Marco, et al. "Low voltage system state estimation based on smart metering infrastructure”, Applied Measurements for Power Systems (AMPS), 2016 IEEE International Workshop on. IEEE, 2016. 14. 3rd Generation Partnership Project “Technical Specification Group Services and System Aspects; Generic Authentication Architecture (GAA); Generic Bootstrapping Architecture (GBA)” Release 13, 2016 15. ECOSSIAN D1.2 V1.0,” Requirements Report “, March 2015, stored at http://ecossian.eu/downloads/D1.2-Requirements-PU-M09.pdf (accessed 20161223) 16. Teixeira, André, Dán, György, Sandberg, Henrik, Berthier, Robin, Bobba, Rakesh B. and Valdes, Alfonso, “Security of Smart Distribution Grids: Data Integrity Attacks on Integrated Volt/VAR Control and Countermeasures',' in Proc. of American Control Conference (ACC), Jun. 2014 17. Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru Parulkar, Larry Peterson, Jennifer Rexford, Scott Shenker, and Jonathan Turner. 2008. OpenFlow: enabling innovation in campus networks. SIGCOMM Comput. Commun. Rev. 38, 2 (March 2008), 69-74. DOI=http://dx.doi.org/10.1145/1355734.1355746 18. Perez, Ronald, Reiner Sailer, and Leendert van Doorn. "vTPM: virtualizing the trusted platform module." Proc. 15th Conf. on USENIX Security Symposium. 2006 19. Danyliw, R., Meijer, J. and Demchenko Y.: The Incident Object Description Exchange Format, 2007. https://www.ietf.org/rfc/rfc5070.txt, last access: 19th June 2017 20. Farnham, G. and Leune, K.: Tools and Standards for Cyber Threat Intelligence Projects. GIAC (GCPM) Gold Certification, 2013. 21. ENISA (European Union Agency for Network and Information Security): Detect, SHARE, Protect – Solutions for Improving Threat and Data Exchange among CERTs, 2013. 22. Settanni, G. , Skopik, F. , Shovgenya, Y. , Fiedler, R. , Carolan, M. , Conroy, D. , Boettinger, K. ,Gall, M. , Brost, G. , Ponchel, C. , Haustein, M. , Kaufmann, H. , Theuerkauf, K. , Olli, P. “A collaborative cyber incident management system for European interconnected critical infrastructures” Journal of Information Security and Applications, May 2016 23. https://stixproject.github.io/, last access: 19th June 2017

Page 61: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 61 (97) 24. http://nationalinterest.org/blog/the-buzz/how-california-protecting-its-critical-infrastructure-cyber-18366, last access: 19th June 2017 25. Moriarty, K.: Real-time Inter-network Defense (RID), 2012, https://tools.ietf.org/html/rfc6545, last access: 19th June 2017 26. Kreps, J., Narkhede, N. and Rao, J.: Kafka: a Distributed Messaging System for Log Processing, Proceedings of 6th International Workshop on Networking Meets Databases (NetDB), 2012 27. http://www.trustedcomputinggroup.org/trusted-platform-module-tpm-summary 28. 2017 Ponemon Cost of Data Breach Study, https://www.ibm.com/security/data-breach 29. http://cloudpages.ericsson.com/hubfs/Content-Offers/Ericsson%20NFV-Telco%20Final.pdf 30. Roman Danyliw, “The Incident Object Description Exchange Format Version 2”, RFC 7970, 2016, https://datatracker.ietf.org/doc/rfc7970/, last access: 19th March 2018 31. Benjamin Feinstein, David Curry and Herve Debar, “The Intrusion Detection Message Exchange Format (IDMEF)”, RFC 4765, 2007, https://www.ietf.org/rfc/rfc4765.txt, last access: 19th March 2018 32. Jessica Steinberger, Anna Sperotto, Mario Golling and Harald Baier, “How to Exchange Security Events? Overview and Evaluation of Formats and Protocols”, 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM2015), https://www.dasec.h-da.de/wp-content/uploads/2012/02/IMa2015.pdf, last access: 19th March 2018 33. Intrusion_Detection_Message_Exchange_Format, In Wikipedia, 2017, https://en.wikipedia.org/wiki/Intrusion_Detection_Message_Exchange_Format, last access: 19th March 2018 34. Cristina Hoepers, Nandamudi L. Vijaykumar and Antoio Monte, “HIDEF: a Data Exchange Format for Information Collected in Honeypots and Honeynets”, 2008 35. SECEF Project, http://www.secef.net/en/idmef-format/, last access: 19th March 2018 36. Irvine, “KDD Cup 1999 Data”,1999, http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html, last access: 19th March 2018 37. J. Arvidsson, Telia CERT, A. Cormack, JANET-CERT, Y. Demchenko, TERENA, J. Meijer and SURFnet,” TERENA's Incident Object Description and Exchange Format Requirements”, 2001 38. Yuri Demchenko, Roman Danyliw, “Extended Incident Handling BOF (inch)”, 2010, https://www.ietf.org/ietf-ftp/01dec/inch.txt, last access: 19th March 2018 39. Yuri Demchenko,” IODEF Design principles and IODEF Data Model Overview”, 2002 40. “XML Schema Part 2: Datatypes Second Edition”, 2004, https://www.w3.org/TR/xmlschema-2/, last access: 19th March 2018 41. “Extensible Markup Language (XML) 1.0 (Fifth Edition)”, 2013, https://www.w3.org/TR/2008/REC-xml-20081126/, last access: 23th March 2018 42. A. Phillips and M. Davis, “Tags for Identifying Languages”, 2009, https://tools.ietf.org/pdf/rfc5646.pdf, last access: 19th March 2018 43. Roman Danyliw, Takeshi Takahashi, “Incident Object Description Exchange Format v2 (IODEF)”, 2016, https://www.iana.org/assignments/iodef2/iodef2.xhtml#incident-purpose, last access: 19th March 2018 44. Tavallaee, M., et al, “A detailed analysis of the KDD CUP 99 data set”, IEEE Symposium on Computational Intelligence for Security and Defense Applications. 2009

Page 62: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 62 (97) 7. List of Abbreviations 3GPP 3rd Generation Partnership Project, mobile communications standardisation body 4G 4th Generation mobile communications system 5G 5th Generation mobile communications system API Application Programming Interface BR-GW Breakout Gateway BSF Bootstrapping Server Function CA Certificate Authority CI-SAN Critical Infrastructure Security Analytics Network CI-SOC Critical Infrastructure Security Operations Centre CS Cyber-security DCS Data Centric Security DoS Denial of Service DDoS Distributed Denial of Service DSO Distribution System Operator EPC Evolved Packet Core, core network of LTE system E-UTRAN Evolved Universal Terrestrial Radio Access Network, part of EPC network EV Electric Vehicle FLISR Fault Location, Isolation, and Service Restoration FIWARE Future Internet Ware, open source components, development environment, platform GBA Generic Bootstrapping Architecture GE Generic Enabler HSS Home Subscriber Server, part of EPC network ICT Information and Communication Technology LTE Long Term Evolution (4th Generation mobile communications system) M2M Machine-to-Machine MME Mobility Management Entity, part of EPC network NAF Network Application Function NAN Neighbourhood Area Network NFV Network Function Virtualisation NORM Next-generation Open Real-time Smart Meter P-GW Packet Data Network Gateway, part of EPC network PMU Phasor Measurement Unit PUF Physically Unclonable Function PS Physical security QoS Quality of Service SA Node Security Analytics Node SCADA Supervisory Control and Data Acquisition SDC Security Data Concentrator

Page 63: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 63 (97) SDN Software Defined Network S-GW Serving Gateway, part of EPC network SSS SUCCESS Security Solution TEE Trusted Execution Environment TPM Trusted Platform Module TSO Transmission System Operator UE User Equipment, handset in EPC network USM Unbundled Smart Meter UUID Universally Unique Identifier VM Virtual Machine

Page 64: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 64 (97) 8. List of Figures Figure 1: SUCCESS Components in their Environment ............................................................. 18 Figure 2: LTE for Smart Grid Communication ............................................................................. 20 Figure 3: 3GPP domain and extension of 3GPP security features ............................................. 22 Figure 4: Breakout Gateway ........................................................................................................ 22 Figure 5: Conceptual diagram of Breakout Gateway .................................................................. 24 Figure 6: NFV Transformation [29] .............................................................................................. 25 Figure 7: Data Centric Security ................................................................................................... 26 Figure 8: GBA functions applied at Breakout Gateway ............................................................... 26 Figure 9: Breakout gateway logical domains .............................................................................. 27 Figure 10: Test lab setup ............................................................................................................. 28 Figure 11: A conceptual view of the CCRO ................................................................................ 29 Figure 12: The workflow of the CCRO ........................................................................................ 30 Figure 13: CI-SAN and critical infrastructures in the SUCCESS Security Solution .................... 31 Figure 14: CI-SAN and Utilities in the SUCCESS Security Monitoring Solution ......................... 32 Figure 15 SDC Data Flow ........................................................................................................... 33 Figure 16 SA Node Data Flow..................................................................................................... 35 Figure 17 Malicious connections on the network of the energy provider ASM Terni .................. 38 Figure 18: SUCCESS Security Solution ...................................................................................... 41 Figure 19 MQTT architecture ...................................................................................................... 43 Figure 20 The IODEF document class ........................................................................................ 45 Figure 21 Minimal IODEF-Document Class ................................................................................ 46 Figure 22: SUCCESS Security Solution. Interfaces between the elements are denoted as I1…I8 ..................................................................................................................................................... 50 Figure 23: SUCCESS CI-SAN when SA Node is available ........................................................ 57 Figure 24: SUCCESS in case of SA Node unavailability: I4 between all SDC ........................... 57

Page 65: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 65 (97) 9. List of Tables Table 1: Cyber-security related incidents and countermeasures ................................................ 11 Table 2: Physical security related incidents and countermeasures ............................................ 12 Table 3: Threat to Countermeasure Mapping ............................................................................. 15 Table 4: Fields of IODEF-Document ........................................................................................... 87 Table 5: Fields of Incident ........................................................................................................... 87 Table 6: Fields of IncidentID ........................................................................................................ 88 Table 7: Fields of RelatedActivity ................................................................................................ 89 Table 8: Fields of Contact ........................................................................................................... 89 Table 9: Fields of EventData ....................................................................................................... 89 Table 10: Fields of Flow .............................................................................................................. 90 Table 11: Fields of System .......................................................................................................... 90 Table 12: Fields of Node ............................................................................................................. 90 Table 13: Fields of Address ......................................................................................................... 91 Table 14: Fields of Record .......................................................................................................... 91 Table 15: Fields of RecordData ................................................................................................... 91 Table 16: Fields of Assessment .................................................................................................. 92 Table 17: Fields of History ........................................................................................................... 92 Table 18: Fields of HistoryItem .................................................................................................... 92 Table 19: Fields of IDMEF-Message ........................................................................................... 93 Table 20: Fields of Alert .............................................................................................................. 93 Table 21: Fields of Analyzer ........................................................................................................ 94 Table 22: Fields of CreateTime ................................................................................................... 94 Table 23: Fields of DetectTime ................................................................................................... 95 Table 24: Fields of Source .......................................................................................................... 95 Table 25: Fields of Target ........................................................................................................... 95 Table 26: Fields of Node ............................................................................................................. 95 Table 27: Fields of Address ......................................................................................................... 96 Table 28: Fields of Classification................................................................................................. 96 Table 29: Fields of Assessment .................................................................................................. 96 Table 30: Fields of Impact ........................................................................................................... 96

Page 66: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 66 (97) A. Descriptions of Security Incidents and Countermeasures A.1 Incident/countermeasure CS-1 A.1.1 Description Incident name: Device behaving suspiciously in terms of communication pattern or content Since both suspicious traffic patterns and suspicious message content result in the same countermeasure being used, both types of incidents are covered by one countermeasure, CS-1. However, as the incidents of the two differ, the description in this chapter will be divided to two sub-chapters accordingly. This type of incidents and corresponding countermeasures will also be implemented at trial sites and there are trial site use case specific sub-chapters in this chapter covering those cases. A.1.1.1 Suspicious communication patterns Incident characteristics: The threat, or incident, to which this countermeasure is applied is when a device in the network behaves suspiciously with respect to communication patterns. In practice, this means that the communication pattern does not match the typical one for a device of this type e.g. with respect to the frequency of messages, size of message or communication peer, i.e. too many or too few packets, possibly sent too often or seldom either to expected or unexpected peers. Root cause: The cause for these suspicious behaviours, if indeed erroneous, stem from the device malfunctioning, which can be a result of a bug, configuration error or the device has been tampered with. Furthermore, an attacker on the path between the communication end-points could modify the traffic pattern as could an attacker that has compromised the device and modified its configurations. The software of devices in the network should be updated regularly to remove found vulnerabilities or other unwanted features and to add new functionality to add value. Software here means anything from firmware to operating system, drivers, applications and configurations. Even when these are updated/modified with good intent, it is possible that it results in unwanted behaviour due to bugs or erroneous configurations. Also, hardware bugs/breakdowns are possible and can result in similar behaviour. Incident identification: Identification of incidents where the communication pattern is modified is done by having the monitoring centre verifying traffic in the network against accepted/expected traffic patterns. This might be done as a collaborative effort, with nodes in the system reporting on suspicious behaviour. When the traffic differs from the expected more than a set threshold it is an indication that the incident has occurred and the appropriate countermeasures should be taken. An example of traffic pattern being suspicious could be if a high number of commands are received at the corresponding local devices (NORM). This would be an indication that the orders may be malicious and that they have to be invalidated before being executed. A.1.1.2 Suspicious message content Incident characteristics: The threat, or incident, to which this countermeasure is applied is when a device in the network behaves suspiciously with respect to the content of the communication. This can either be by having the device provide data that is outside the normal range and thresholds defined for the device type or even by having the data be of unexpected type. This can also mean that the data pattern acquired by the device does not match in terms of data consistency with the data acquired from other devices in the power grid, e.g. the frequency, voltage angles or voltage level range acquired from the NORM device from a specific network location is not consistent with the data acquired from other different power network places. Furthermore, messages with invalid integrity or confidentiality protection are also a sign of something being wrong. It should be noted that to avoid manipulation of private data, NORM sends in principle only non-private data, usually grid-related and not personal-related data. Root cause: The cause for these suspicious behaviours, if indeed erroneous, stems from the node or its credentials have been compromised. This means that an attacker has managed to gain control of the device or its credentials and fake data is now transmitted to the different actors, e.g. to the DSO. This is typically catered for by having security flaws in the device software that the attacker can abuse for gaining unauthorised access. This then links back to the previous sub-

Page 67: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 67 (97) chapter and the need to update the software of the device. The alternative is that the attacker generates traffic without being able to provide valid integrity and/or confidentiality protection and authentication of the data and its end-point. Unauthenticated or unauthorised messages, with invalid integrity or confidentiality protection is trivial to identify, and acts as an alarm that someone is trying to attack the system. It is a signal of either an attacker generating own traffic or modifying traffic sent in the network. This specific attack type is dealt with in CS-3. Assuming the communications and physical security measures discussed in deliverable D4.3 [6] have been implemented the only way an attacker can produce legitimate looking data is by gaining access to the device and/or its credentials. With end-to-end security applied, an attacker will not be able to inject data or generate own valid looking messages without approved credentials. For cases where the attacker has access to valid credentials identifying the incident/actions might be anything from easy to impossible depending on how aggressive the attacker is. Hardware faults and malfunction can also result in similar behaviour, as can also situations where the validity time of a certificate has expired, but is not the main focus of this work as we are here dealing with attacks rather than device failures. Incident identification: When the attacker has access to the target node and its credentials he can either modify the behaviour of the node, as discussed earlier or modify the content of the communication. In this case, the monitoring centre will notice the incident by either analysing the received data against the expected or by comparing consistency of data acquired from different electrical grid points, which may include also commands such as order of local breaker disconnection. When statistical deviations of selected parameters (frequency, voltage level and voltage angles pattern) differs from thresholds (signalisation and alarm levels), it is an indication that the incident has occurred and the appropriate countermeasures should be taken. A.1.1.3 Countermeasure The analysis of the devices’ traffic is performed by CI-SOC and SDC at the DSO/TSO level. As described in Ch. 4.3.1 and Ch. 4.3.2, CI-SOC provides certain data features such as RAM usage to SDC. SDC correlates this information with information from other sources (see Chs. 4.6.5 and 4.6.6) to extract new findings about the system’s status. If the affected device is part of the (edge) cloud, the VMs deployed on it as part of the double virtualisation solution should as a precaution be moved to another location to maintain the functionality they provide and prevent the attack from spreading. Functional VMs should be re-deployed, i.e. they will start with a clean state without any potentially harmful modifications that exist in the suspiciously behaving device. Data VMs should be migrated to not lose any data they store. In addition, the network needs to be re-configured to route the associated traffic via the new VM instances/locations rather than the original, potentially infected, ones. After the incident identification, remote attestation should be performed on the suspiciously behaving device. This will verify the state of the device and can find indications of compromising of devices. In CS-1, we assume the attestation does not fail, i.e. the device state is OK. Thus, the device should be red-flagged, indicating there is an unresolved issue with it, and it (preferably only the affected functional VM) should be isolated from the network pending further investigation. If it is suspected than an attacker has gained access to valid credentials and uses them to modify traffic, the used credentials (identified by the credentials used for integrity protected the suspicious messages) should be revoked and the entity owning the credentials should be assigned new credentials. Furthermore, an investigation should be performed to identify how the credentials would have been compromised. A.1.1.4 Relation to threats T000: The incident and countermeasure are applicable when the source of the DoS attack is a node in the system. Attacks by external sources of DoS are covered by CS-5. T100: All threats except T111, which is more related to physical security (EM), can be handled. T107 is not covered here, but should not be a big issue in a system with proper security configuration. Only the traffic pattern and message sizes are available to the attacker if the content is deemed to need confidentiality protection. T200: Whenever a malware modifies the behaviour of the targeted host it will be covered by this incident/countermeasure. Also, malware that has a control channel to the attacker for gathering data or reporting back can be identified based on untypical communication patterns.

Page 68: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 68 (97) T300: From this threat group, the communication needed to the target device for modifying SW will be noticed as untypical communication. In addition, T301 will modify data and be caught either here or in CS-3. T307 relates to physical security and is handled there. In addition, the HW manipulation might lead to suspicious network behaviour and could be caught here as well. T400: Most threats in this group result in changed traffic patterns and will be caught here. A solution for T405 is being worked on in the project, but could potentially also be caught by comparing values reported from across the network and identifying discrepancies. T410: This will be identified as an abnormal traffic pattern. In addition, password protected systems should as default have limits for re-try attempts to counter these types of attacks. T420: If the data is sent over the network, it will be identified as unusual traffic pattern. In case the data is accessed physically from the device, the physical security incident/countermeasures will be alerted to the attack and device logs should be able to provide information of actions in the device. T700: Only some of the threats are covered here; T701 if data is sent out from the system, T702 potentially if the virus is transferred over the network, T703, T704 assuming the device starts to behave strangely and T705-T707 if the information is sent over the network. A.1.1.5 Concrete use case from Irish trial The incident description that would result in countermeasure CS-1 to be applied covers a lot of different threats and attacks. One concrete example is given in deliverable 5.1, in Ch. 2.1.2, regarding the Irish trial. The use case is an attacker, e.g. a rogue TSO employee or one operating under duress, that wants to destabilise the network by issuing a TSO command for load interrupt of EV chargers, possibly in massive scale. To increase the effect the attacker could even toggle (on-off-on-….) the command to create big fluctuations. When this type of commands, regardless if they are genuine or not, are issued the DSO must react to them instantaneously, there is no time for human interaction but the response must be automated. The countermeasure to this type of attack would be for the DSO to verify that the network indeed is in a state that the received commands are warranted. Basically, the DSO verifies from its own network state information that the TSO has sent the commands based on a real issue in the network. The NORMs either provide, or can be queried by the DSO about, grid frequency evolution, from which aspects related to grid stability at distribution level can be inferred. The DSO can thus verify that the received command is indeed a response to instability in the network and that the command should be executed. If the NORMs do not report any instability (by analysing the level of grid frequency measured locally by NORM), the DSO can decide to block the received commands. One thing to note is that this countermeasure would be applied always when the TSO requests, which could have a significant impact on the traffic level received at the DSO. The commands could be spoofed, i.e. the attacker does not have access to the TSO and its systems. In this case, the DSO will notice this from the fact that the message does not have a valid signature, is not coming from an authenticated end-point or is not encrypted with the TSO/session key or all of the above. However, if the command is received from the TSO, e.g. because a disgruntled employee with suitable access rights sent it, it would not be possible to detect the attack from the message itself. In this case, the solution described above would be used for detecting the attack. The countermeasure to the attack, would also contain additional steps to the blocking of the malicious commands. An alarm should be raised at the DSO that the TSO has issued a malicious command. This would result in an investigation of where the command originated and who issued/authorised it. This partly correlates to the remote attestation step and the investigation of the generic countermeasure in CS-1. In addition, SDC should be made aware of the attack attempt as it could be a part of a larger scale attack targeting multiple network. E.g. two neighbouring networks that get targeted with this attack in suitable synchronisation could be even more devastating than a single occurrence. The identified source of the command, if provided via double virtualisation, could be re-deployed at a new location to counter cases where malware or an attacker in the device is the issuer of the command. Remote attestation could also be performed to rule out malicious configurations and alternations of the suspected node. However, as the source of the attack is outside the DSO domain, the attestation of the node should be done by the TSO after being informed of the incident by the DSO.

Page 69: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 69 (97) A.1.1.6 Concrete use case from Romanian and Italian trials The correct measurement and reporting of grid data is essential for any system to be able to use it efficiently for control and optimisation. In case of cyber-penetration at the level of measurement data, the message content might get modified. The incident description that would result in countermeasure CS-1 being applied, which covers a lot of different threats and attacks. One concrete example is the use case when an attacker that wants to destabilise the network by sending malicious / wrong grid data to the DSO. When this type of suspicious reports is received by the CI-SOC, they are assessed for consistency with such data received from other NORM devices, and an abnormal evolution can lead to alarm flags being raised. The particular data which is scrutinised for grid consistency are the voltage levels and frequency values, which are both essential for the grid assessment and are non-private by nature, appropriate to be analysed on a massive scale, meaning from any smart meter, e.g. from each NORM device. A.1.2 Related SW Functions From the communication perspective, the data centric security function in BR-GW described in Ch. 4.2.4 can be utilised to verify the state of the device. During device deployment, a signature based on device parameters is generated from data centric security function. This signature is taken as base for verifying state of device in real-time. Whenever the signature differs from the base signature taken during deployment, the BR-GW can act to block the traffic originating from that device. Also, it can further be used to notify CI-SOC regarding status of infected device. CI-SOC performs for online monitoring of network traffic patterns, deduction of usual behaviour and detection of suspicious behaviour (e.g., through threshold alerting when exceeding normal traffic levels deduced from historic monitoring data). CI-SOC performs for online monitoring of consistency of acquired data from the local devices (with focus on NORM devices) of the power network, deduction of usual behaviour and detection of suspicious behaviour (e.g., through threshold alerting when exceeding a certain inconsistency levels deduced from historic monitoring data or a suspicions number of commands which may disconnect the prosumer). When suspicious behaviour is detected this information is addressed to SDC which executes a meta-analysis. CI-SOC should itself also potentially react to this incident by moving VMs according to double virtualisation, performing remote attestation on suspiciously behaving nodes etc. A.1.3 Related Infrastructure Once CI-SOC has detected a security violation, it can first apply VM moving according to double virtualisation if possible. It tells BR-GW that one or more NORMs are compromised and must be disconnected and optionally their data has to be discarded. SDC at TSO/DSO level receives data from CI-SOC and combines them with further data sources such as firewall logs (see Figure 23) to extract new findings about the systems status. Therefore, SDC can generate a holistic view of the system to red-flag devices which cannot be trusted any more. After red-flagging the device, SDC shares the potential finding with the SA Node. The SA Node then searches on a pan-European level for significant patterns which indicate an attack. All new findings obtained by SA Node or SDC can potentially lead to a new strategy how to red-flag devices. However, how to proceed with these findings and how to implement a new red-flagging strategy is outside the scope of the success project. The cloud environment that applies the Double Virtualisation logic can be used to move the attacked virtual instances to another physical or logical entity. The moving process can be fully automatised. E.g. attacked data can be moved to another server or IP network. The implementation method used for the Double Virtualisation infrastructure considers also decoupling of applications and data. In fact, the data coming from the field devices is stored in a dedicated virtual instance managing only the data storage. In this way, the migration of both applications and virtual devices will be easier and faster, since it will not require also the migration of a large number of historical datasets.

Page 70: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 70 (97) A.2 Incident-countermeasure CS-2 A.2.1 Description Incident name: Remote attestation fails Incident characteristics: This incident is an alternative branch of the flow of CS-1, and relates to the situation where the remote attestation fails for the suspiciously behaving device. This indicates that there is indeed something seriously wrong with the node. Root cause: The fact that the remote attestation fails means that the state of the node is not acceptable, i.e. the node has been modified in some unacceptable way. This could be by malicious action or fault/bug in the system. Incident identification & countermeasure: Once the remote attestation fails the node is deemed to be compromised and it needs to be immediately isolated from the network. A maintenance unit needs to be sent to the node to reset and re-bootstrap the device. Re-bootstrapping means the device will get new credentials, which is necessary since it is possible that the device has been compromised by an attacker and the credentials can have been revealed. The old credentials of the device need to be revoked from the system. Also, the credentials of the VMs potentially run on the device need to be updated. A.2.1.1 Relation to threats The same threats are covered as for CS-1 as this one is an extension of it. T000: The incident and countermeasure are applicable when the source of the DoS attack is a node in the system. Attacks by external sources of DoS are covered by CS-5 T100: All threats except T111, which is more related to physical security (EM), can be handled. T107 is not covered here, but should not be a big issue in a system with proper security configuration. Only the traffic pattern and message sizes are available to the attacker if the content is deemed to need confidentiality protection. T200: Whenever a malware modifies the behaviour of the targeted host it will be covered by this incident/countermeasure. Also, malware that has a control channel to the attacker for gathering data or reporting back can be identified based on untypical communication patterns. T300: From this threat group, the communication needed to the target device for modifying SW will be noticed as untypical communication. In addition, T301 will modify data and be caught either here or in CS-3. T307 relates to physical security and is handled there. In addition, the HW manipulation might lead to suspicious network behaviour and could be caught here as well. T400: Most threats in this group result in changed traffic patterns and will be caught here. A solution for T405 is being worked on in the project, but could potentially also be caught by comparing values reported from across the network and identifying discrepancies. T410: This will be identified as an abnormal traffic pattern. In addition, password protected systems should as default have limits for re-try attempts to counter these types of attacks. T420: If the data is sent over the network, it will be identified as unusual traffic pattern. In case the data is accessed physically from the device, the physical security incident/countermeasures will be alerted to the attack and device logs should be able to provide information of actions in the device. 700: Only some of the threats are covered here; T701 if data is sent out from the system, T702 potentially if the virus is transferred over the network, T703, T704 assuming the device starts to behave strangely and T705-T707 if the information is sent over the network. A.2.2 Related SW Functions In case of failed attestation, BR-GW can put the device in its black list and will not accept traffic from the device anymore. CI-SOC performs remote attestation. If remote attestation fails, BR- GW informs CI-SOC about the failed node attestation and BR-GW blacklists the device. A.2.3 Related Infrastructure Breakout Gateway, CI-SOC.

Page 71: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 71 (97) CI-SOC shows to the DSO operator a set of possible countermeasures that can be put in place: • Send a maintenance unit to reset and re-bootstrap the device. • Disconnect the device from the grid sending information to BR-GW

Page 72: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 72 (97) A.3 Incident-countermeasure CS-3 A.3.1 Description Incident name: Unauthorised messages Incident characteristics: Network sees unauthorised messages, meaning the messages might not have a (valid) signature or might be of wrong type considering who is sending/receiving the messages. Root cause: Someone tries to send unauthorised messages, possibly with the intent to disrupt the system or gain information about the system. Incident identification & countermeasure: Either the monitoring centre notices these unauthorised messages or a node in the system can report to the monitoring centre about the unauthorised messages it sees. The first step is to try to identify the source of these messages, which could be either a node registered in the system or an unauthorised node attached to the system. Based on the source address of the unauthorised messages the node and network segment in which the node resides can be identified. For attacks including spoofing of source address a more detailed analysis of network behaviour and the suspicious flow need to be done to identify the network segment and/or node from where the messages are coming. If the messages are originating from a node in the system, countermeasure CS-1 should be applied on that node to verify its state. If the source of the messages is not part of the system, it means there is an unauthorised node in the network and the network segment from which the node is sending the messages should be isolated to minimise the damage. Isolation might here mean isolating just the traffic of the one unauthorised device so as to not disrupt the operation of the rest of the system which resides in or close to the network segment where the unauthorised device is operating. Finally, the network segment should be investigated to find the unauthorised node, or its point of entry. The result of the investigation should be identification of the node and/or point of entry to the network, and removing the node from the network as well as pre-emptive operations to prevent similar future attacks. A.3.1.1 Relation to threats T000: Depending on DoS attack, the DoS messages will be categorised as unauthorised messages, e.g. due to being unexpected type or from unexpected source. T100: Since mutual authentication should be used an attacker on the path will be noticed if he tries to do a man in the middle or similar type of attack where he impersonates a peer. T107 is not covered here, but should not be a big issue in a system with proper security configuration. Only the traffic pattern and message sizes are available to the attacker if the content is deemed to need confidentiality protection. Also, T111 is not covered here as discussed in CS-1 and CS-2. T200: If the malware is capable of using the credentials of the device it will be able to send authorised messages so it will not be caught here. However, CS-1, and CS-2 at least partly covers those cases. T300: T301 will result in unauthorised messages and will be caught. Also, T301-T307 might be identified here depending on how they are executed towards the device, i.e. is the device expecting authenticated messages. A.3.2 Related SW Functions The GBA function of the Breakout Gateway, described in Ch.4.2.4, can be utilised for the authenticating the application messages based on SIM card authentication done by cellular network provider. Furthermore, NMF function can check for the unauthorised transmission of packet on a network address. Based on detection of such events, action can be taken by BR-GW to isolate the device from network. A.3.3 Related Infrastructure Breakout Gateway, CI-SOC, SDC.

Page 73: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 73 (97) If an attack is detected by CI-SOC, it can attempt to move VMs according to double virtualisation. If this does not succeed, an alarm will be invoked at the CI-SOC and CI-SOC will inform BR-GW to block certain devices/traffic. If CI-SOC cannot detect the attack, it will be left for SDC to make the decision. SDC has a wider scope of knowledge and with machine learning algorithm can decide if disconnect the node.

Page 74: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 74 (97) A.4 Incident-countermeasure CS-4 A.4.1 Description Incident name: Virus detected in node. Incident characteristics: A virus is found in a node either based on antivirus software in the node identifying the virus or monitoring centre identifying communication pattern of a node matching the pattern of a virus infection. Alternatively, the result from investigation in countermeasure CS-1 might show that there is a new malware on the loose, which has affected the node. Root cause: Malware has found its way to a node. Incident identification & countermeasure: If the affected device is part of the (edge) cloud, the VMs deployed on it as part of the double virtualisation solution should be re-deployed to prevent the malware from spreading. Both functional and data VMs might have the virus so just moving them is not an option, but instead fresh instances need to be initialised. If the malware is suspected to reside in the physical host, then the VMs should be re-deployed in another physical device. The device (if suspecting device contamination), and/or the infected VMs (regardless of if device or VM contamination) should be isolated from the network to minimise risk for further infections and damage. If available, a backup node should be enabled to maintain full system operation. The infected device and/or VMs should be cleaned of the malware, which is most securely done by reinstalling the device and terminating the VMs. For data VMs, the stored data could optionally be backed up so that it could be taken into use after verifying it is not contaminated. After the device is cleaned and the VMs terminated and re-deployed, also the peers need to be verified to not have been infected. If the malware is new, i.e. not part of the known malware definitions, it should be added and the definitions should be updated across the system. A.4.1.1 Relation to threats T000: If the DoS is initiated by a malware in the device it will be caught here. T200: T204, T206, T208, T209, T210, T213 might not get detected by anti-virus or similar tools. However, device logs should record at least some of these types of events. T700: CS-4 mainly covers T702, T704. A.4.2 Related SW Functions If NORM detects any kind of virus or any suspicious activity, then it can inform BR-GW so that BR-GW can take some action on it accordingly. In addition, the Network Monitoring Function described in the Ch. 4.2.4 can detect DoS attacks initiated from a device in the system. Once such event is detected, BR-GW can block traffic originating from that device. DoS attacks originating outside the devices of the system are covered in CS-5. Furthermore, action can be notified to the CI-SOC. If a successful infection is detected by CI-SOC, it should initiate the countermeasure and report the incident to SDC for information sharing purposes. A.4.3 Related Infrastructure Breakout Gateway, CI-SOC, SDC.

Page 75: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 75 (97) A.5 Incident-countermeasure CS-5 A.5.1 Description Incident name: DoS suspicion Incident characteristics: The system, or a node in the system is targeted in a (D)DoS attack. This typically means that a huge amount of service requests (anything from an echo request (ping) to service specific API requests) are sent to the victim to exhaust its resources. This can be identified by analysing the traffic (abnormal amount of (similar) requests to the target) or the state of the node (resources exhausted). The node can itself report on this or some other node in the network can report on the abnormal traffic amount. Root cause: An attacker wants to disable parts of or the whole system and launches a (D)DoS attack. Incident identification & countermeasure: When the (D)DoS attack has been identified the associated flows and flow types (e.g. based on message type, protocol, source address and port destination address and port) should be blocked, if possible (it might not be possible to distinguish attack flows from legitimate flows), at the edge of the network so that the attack flow does not have to be processed and forwarded by more nodes than necessary. This can be done by configuring firewalls to block those flows/flow types and if the network supports SDN, flow rules can be installed into SDN switches to re-route the identified DoS traffic. In addition, if the affected node is part of the (edge) cloud, the VMs deployed on it as part of the double virtualisation solution should be moved to another location to maintain the functionality they provide and prevent the attack from taking down that part of the service/network. The VMs should be migrated to not lose any data they store. In addition, the network needs to be re-configured to route the associated traffic, except the DoS traffic, via the new VM locations rather than the original attacked one. Especially if the attacked node has services not running on VMs, i.e. services that cannot be migrated, a backup node for the attacked node should be enabled to maintain operation of the system. Furthermore, depending on the attack, the load in the network should be distributed to maintain operation. The attack flows should be analysed to verify it is actually an attack (natural disaster or other similar event could result in a lot of traffic being generated in the network). A.5.1.1 Relation to threats T000: CS-5 deals with external DoS sources while CS-1, CS-2 and CS-4 also deal with cases where the DoS source is part of the system. T300: T305 might be identified as a DoS attack. T410: Brute forcing access to a system can be seen as a form of DoS attack as it means the system is bombarded with access requests. T700: CS-5 mainly relates to T703 A.5.2 Related SW Functions The Data centric security function described in Ch. 4.2.4 can enable detection of malware insertion at device level. The BR-GW can verify the device integrity by regular checks of the device’s log; it can enable check of any unauthorised software insertion at device level (malware). Furthermore, signatures can be updated based for the firmware updates on the device. The BR-GW, after detecting changes in the device’s log using the Data centric security function, can block the traffic originating from device. After such an incident occurs, peers of infected device can be verifying by data centric security function. In case of a DoS attack on BR-GW itself, SDN technology can handle it and will block the traffic. The CI-SOC can gather information about traffic from NORM and BR-GW. This information then can be analysed using the analytic module, allowing construction of an immune based memory which allows classification of, and dynamical reflection of changes in, the analysed data. It sends this information to SDC to eventually put in place a countermeasure to block the illegitimate traffic flows. If the SA Node is targeted in a (D)DoS attack, it is incapable of receiving the public data from the SDC and external sources and sending back data with identified threat patterns. The fallback

Page 76: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 76 (97) operation mode of the SDC, explained in Ch. 4.3.2.3.3, should be triggered in this case, so that the information exchange between them is not compromised. The SDC send all their public data to their adjacent SDC and an aggregate analysis at SDC level is possible. If a SDC is under (D)DoS attack, it red-flags itself to clarify to other SDC instances that they should avoid sending data. Only the communication path with the SA Node is preserved and the data exchange with it may become feasible. The SDC is able to receive indirectly through the SA Node information and threat patterns of their adjacent SDC instances. A.5.3 Related Infrastructure Breakout gateway, CI-SOC, SDC. Double Virtualisation infrastructure in the cloud environment.

Page 77: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 77 (97) A.6 Incident-countermeasure CS-6 A.6.1 Description Incident name: Security algorithm deemed insecure Incident characteristics: Academia, standardisation organisations or other authority states that a security algorithm deployed and used in the network is insecure and should be deprecated. Root cause: Some entity has found a weakness in either an algorithm or a specific implementation of an algorithm. Incident identification & countermeasure: Reliable publication of affected algorithm and proof of weakness. If the weakness is serious and could affect operations, the affected algorithm needs to be deprecated system wide and replaced by another one. Already when designing and deploying a system, three should be defined multiple accepted security algorithms so that there are alternatives to fall back on in these types of scenarios. In addition to deprecating the affected algorithm and configuring the system to use one of the existing alternatives, a suitable new replacement algorithm could optionally also be identified and analysed. Once the new algorithm is approved, it can be installed in all places where the old (vulnerable) algorithm was used. A.6.1.1 Relation to threats T100: CS-6 deals with T100 types of threats where the attacker tries to exploit security algorithm weaknesses to get access to information or even modify it, i.e. everything except T110 and T111. T200: If the malware or the injection of it takes advantage of weak security algorithms the threats are covered here. T300: If the manipulation is based on broken security algorithms the threats are covered here. A.6.2 Related SW Functions The cryptographic library that implements and provides the affected vulnerable algorithm needs to be updated and/or the node(s) using it needs to be re-configured to not use the affected algorithm and the new replacing algorithm needs to be taken into use. A.6.3 Related Infrastructure Any node in the system that uses cryptographic functions might be affected depending on if they have been configured to use the affected library or vulnerable algorithm. For NORM, in specific cases when deployed security algorithm need more computational power or need changes in the HW which assist the security functionality (e.g. related to hardware needed to implement the PUF), the new algorithms may ask also for hardware upgrades. The Unbundled Smart Meter (USM) concept allows that both hardware and software upgrades can be made without affecting the hardware and functionality of the Smart Meter and of the PMU, as components which communicate with the Energy Gateway.

Page 78: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 78 (97) A.7 Incident-countermeasure PS-1 A.7.1 Description Incident name: Perimeter breached Incident characteristics: Unauthorised access to a site of an infrastructure device or other company premises. The site surveillance equipment, e.g. motion sensor, notices non-approved physical action or presence. Root cause: An unauthorised person accesses the site and triggers at least one of the monitoring equipment surveilling the site. Incident identification & countermeasure: An alarm is raised at the monitoring centre indicating the site and what has been observed. Security personnel is dispatched to the site to verify that everything is OK, or handle any situation that has occurred. Any breached or broken protection mechanism (doors, windows, locks etc.) need to be repaired or replaced. A.7.1.1 Relation to threats T100: If the attacker tries to gain access to a communication channel by physically inserting himself on the path it can be covered here. T300: PS-1 mainly relates to T307, but could also be applied for T301-T303, T305 and T306 if the exploit requires physical access to the device. T500: This is the main threat area covered by PS-1. Depends on if the attacker has access rights or not if the alarms will be triggered. A.7.2 Related SW Functions None. A.7.3 Related Infrastructure Any device that is being monitored against unauthorised access.

Page 79: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 79 (97) A.8 Incident-countermeasure PS-2 A.8.1 Description Incident name: Device casing breached Incident characteristics: The casing of a node or end-device is opened without authorisation. A sensor inside the device monitors the state of the case and signals the monitoring centre whenever it is opened. Root cause: Someone tries to open the case, possibly to try to modify the device e.g. to report wrongful meter readings. Incident identification & countermeasure: An alarm is raised at the monitoring centre indicating the node where the casing has been breached. If the incident is at an infrastructure node, security personnel is dispatched to the site as in PS-1. For end-devices this step can be skipped. Then, remote attestation is performed to verify the device state. Once security personnel have verified site integrity and safety, a maintenance person can be sent to the site if deemed necessary (based on visual observation and/or remote attestation result). The maintenance person should reset and re-bootstrap the node to generate new credentials for it and get those credentials properly configured to the backend/system and have the old credentials revoked. If needed, the device should be repaired or even replaced if required and the casing sensor should be re-applied. The same maintenance steps should be taken also for end-devices. Maintenance fees should fall on the owner of the end-device if the device located in owner premises. A.8.1.1 Relation to threats T100: Here we mainly deal with T111, and the EM part of it (assuming the casing is protecting against EM radiation). Also, T105 and T112 are relevant when considering attempts to report modified usage statistics etc. T300: PS-1 mainly relates to T307, but could also be applied for T301-T303, T304, T305 and T306 in the case the exploit requires physical access to the device. T400: Relating to cases where the attacker would need physical access to the device to alter its measurements; T404. T500: PS-2 deals mainly with T501 and T504, but could also cover T506. A.8.2 Related SW Functions At the level of NORM, a micro-switch will be considered to monitor the housing of NORM, which should be read by the Energy Gateway of NORM and sent as alert message to the Security Agent running in the Energy Gateway, event which need to be sent to the CI-SOC. A.8.3 Related Infrastructure As presented in Ch. A.8.2, the NORM housing will have a micro-switch to monitor eventual opening and intrusion inside for having NORM hardware access. The CI-SOC will get the alarm and initiate the countermeasure.

Page 80: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 80 (97) A.9 Incident-countermeasure PS-3 A.9.1 Description Incident name: Communication link unavailable Incident characteristics: The node’s communication link is unavailable, i.e. the node cannot be reached and it cannot communicate with others. Looks like node is not available and as such it is not maybe possible right away to identify the incident as being communication link unavailable. Root cause: For this incident characteristics, multiple incidents are possible (PS-3 – PS-5), and the reason for the device not being available could be either HW based or a result of a cyber-attack. The latter should be covered and protected against by cyber security countermeasures discussed earlier with respect to suspicious behaviour, DoS and malware incidents. For PS-3, the reason for not being reachable is that the communication link of the node is not functioning e.g. due to a physical attack on it or a breakdown. Incident identification & countermeasure: The node does not communicate according to communication pattern (i.e. if we expect to see a message and we do not see it) and cannot be connected to e.g. to perform remote attestation. If available, an attempt can be made to enable secondary/backup communication link of the node, e.g. by reconfiguring the SDN network. If a secondary access is available, but cannot be enabled, it is likely that the incident is either related to PS-4 or PS-5 and the countermeasures related to those incidents could be tried. Since the node is not reachable it means that its services cannot be utilised. Therefore, if the node is part of the (edge) cloud, the VMs deployed on it as part of the double virtualisation solution should be re-deployed in another location to maintain the functionality they provide. These new instances of the VMs will naturally not have all the up-to-date data as that has been “lost” with the currently unavailable VMs. In addition to re-deploying the VMs, the network needs to be re-configured to route the associated traffic via the new VM instances rather than the original, now unavailable, ones. Next, a maintenance unit should be sent to the site to fix the issue. Initially a reset of the communications unit could help, otherwise it needs to be repaired or even replaced. A.9.1.1 Relation to threats T300: Mainly relating to T301, T306 and T307. T500: PS-3 relates to T501 and T506. A.9.2 Related SW Functions CI-SOC will issue alert that the device is unavailable. A.9.3 Related Infrastructure Virtual resources (application or data) related to the device with an unavailable link can be migrated and the new link can be established. Any device in the system might have the connectivity issue.

Page 81: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 81 (97) A.10 Incident-countermeasure PS-4 A.10.1 Description Incident name: Device power unavailable. Incident characteristics: Node’s power is not available, node cannot operate. Looks like node is not available and as such it is not maybe possible right away to identify the incident as being device power unavailable. Root cause: For this incident characteristics multiple incidents are possible (PS-3 – PS-5) and the reason for the device not being available could be either HW based or a result of a cyber-attack. The latter should be covered and protected against by cyber security countermeasures discussed earlier with respect to suspicious behaviour, DoS and malware incidents. For PS-4, the reason for not being reachable is that the node has lost its power e.g. due to a physical attack on it or a breakdown. Incident identification & countermeasure: Node does not communicate according to communication pattern (i.e. if we expect to see a message and we do not see it) and cannot be connected to e.g. to perform remote attestation. As the power is out, enabling the secondary communications link as in PS-3 will not work. Next, enabling of possibly available backup power can be attempted. If backup power is available but cannot be enabled or enabling it does not solve the problem, the incident is most likely of type PS-5 and could be handled according to it. If backup power is not available, or the device does not recover when enabling it the services running on the node cannot be utilised. Therefore, if the node is part of the (edge) cloud, the VMs deployed on it as part of the double virtualisation solution should be re-deployed in another location to maintain the functionality they provide. These new instances of the VMs will naturally not have all the up-to-date data as that has been “lost” with the currently unavailable VMs. In addition to re-deploying the VMs, the network needs to be re-configured to route the associated traffic via the new VM instances rather than the original, now unavailable, ones. Next, a maintenance unit should be sent to the site to fix the issue. Initially a reset of the power supply unit could help, otherwise it needs to be repaired or even replaced. A.10.1.1 Relation to threats T300: Mainly relating to T306 and T307. T500: PS-3 relates to T501 and T506. A.10.2 Related SW Functions CI-SOC will issue alert that the device is unavailable. A.10.3 Related Infrastructure At NORM level, due to economic reasons, usually there is no backup supply, to allow functioning when network voltages are missing. This backup solution may exist only in resilient nodes, where power supply for critical loads is in place, based on UPS or other resilient architectures include energy storage. In case of power loss of the Cloud servers, virtual resources can be migrated to another server/zone.

Page 82: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 82 (97) A.11 Incident-countermeasure PS-5 A.11.1 Description Incident name: Device unavailable. Incident characteristics: Node is malfunctioning and will not operate. Looks like node is not available. However, the exact cause of why the device is unavailable might not be clear without further investigation. Root cause: For this incident characteristics, multiple incidents are possible (PS-3 – PS-5) and the reason for the device not being available could be either HW based or a result of a cyber-attack. The latter should be covered and protected against by cyber security countermeasures discussed earlier with respect to. suspicious behaviour, DoS and malware incidents. For PS-5, the reason for not being reachable is that the node is malfunctioning. Incident identification & countermeasure: Node does not communicate according to communication pattern (i.e. if we expect to see a message and we do not see it) and cannot be connected to e.g. to perform remote attestation. As the node is malfunctioning, enabling the secondary communications link as in PS-3 or backup power as in PS-4 will not work. To maintain optimal network operation a backup node should be enabled if available to take on the role of the faulty unit. If the unavailable node is part of the (edge) cloud, the VMs deployed on it as part of the double virtualisation solution should be re-deployed in another location to maintain the functionality they provide. If a backup node is available and has successfully been enabled, that is a natural location to re-deploy the VMs. These new instances of the VMs will naturally not have all the up-to-date data as that has been “lost” with the currently unavailable VMs. In addition to re-deploying the VMs, the network needs to be re-configured to route the associated traffic via the new VM instances rather than the original, now unavailable, ones. Next, a maintenance unit should be sent to the site to take care of the node. An attempt to reset and re-bootstrap the node could be done, but if not successfully recovered, the node would have to be repaired or even replaced. A.11.1.1 Relation to threats T300: Mainly relating to T306 and T307. T500: PS-3 relates to T501 and T506. A.11.2 Related SW Functions CI-SOC will issue alert that the device is unavailable. A.11.3 Related Infrastructure In case of cloud host malfunctioning, virtual resources can be migrated to another server/zone. The incident could happen to any node in the system.

Page 83: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 83 (97) B. Interfaces SUCCESS API I3 This chapter documents the RESTful SUCCESS API I3 (see Chapter 4.6.3 for details). Here, all messages which are received by SDC are covered. That includes Channel 3, Channel 4 and Channel 6 in Chapter 4.6.3. Below an example for Channel 3, that is, an IODEF message, is given. The example goes in accordance with the SUCCESS API data model described in Chapter 4.5. Endpoint URL: [/rest/api/{version}/new/{table}] Parameters Name Type Example Description version String v0.1 The version of the API to use. table String cisoc The endpoint which pushes the message Example: Channel 3 IODEF message Request Header 1. Accept: application/xml 2. Content-Type: application/xml Body 1. IODEF Document 2. { "version": "2.0", 3. "xml:lang": "en", 4. "Incident": [ 5. { 6. "purpose": "ext-value", 7. "ext-purpose": "countermeasurse", 8. "IncidentID": { 9. "id": 2, 10. "name": "CI-SOC1" 11. }, 12. "RelatedActivity": { 13. "IncidentID": { 14. "id": "1", 15. "name": "CI-SOC1" 16. } 17. }, 18. "GenerationTime": "2018-01-18T09:00:00-05:00",

Page 84: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 84 (97)

19. "Contact": { 20. "type": "organization", 21. "role": "creator" 22. }, 23. "EventData": 24. { 25. "DetectTime": "2018-01-18T09:00:00-05:00", 26. "Flow": { 27. "System": { 28. "category": "target", 29. "Description": "37cdc21ab826f120c62a7e9b43faed32" 30. }}}, 31. "History": [{ 32. "HistoryItem": { 33. "action": "ext-value", 34. "ext-action": "1", 35. "Description": "Update puf", 36. "DateTime": "2018-01-10T07:00:00-02:00", 37. "AdditionalData": [ 38. { 39. "name": "CmID", 40. "dtype": "string", 41. "text": "1" 42. }, 43. { 44. "name": "CmDescription", 45. "dtype": "string", 46. "text": "Re-sync PUF challenge" 47. }, 48. { 49. "name": "duration", 50. "dtype": "real", 51. "text": "0.133505" 52. }, 53. { 54. "name": "status", 55. "dtype": "boolean", 56. "text": true 57. }, 58. {

Page 85: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 85 (97)

59. "name": "logtext", 60. "dtype": "string", 61. "text": "{'msg': 'update ok', 'code': 200}" 62. }, 63. { 64. "name": "index", 65. "dtype": "integer", 66. "text": 0 67. }, 68. { 69. "name": "count", 70. "dtype": "integer", 71. "text": 1 72. }]}}, 73. { 74. "HistoryItem": { 75. "action": "ext-value", 76. "ext-action": "2", 77. "Description": "Update puf2", 78. "DateTime": "2018-01-11T07:00:00-02:00", 79. "AdditionalData": [ 80. { 81. "name": "CmID", 82. "dtype": "string", 83. "text": "1" 84. }, 85. { 86. "name": "CmDescription", 87. "dtype": "string", 88. "text": "Re-sync PUF challenge" 89. }, 90. { 91. "name": "duration", 92. "dtype": "real", 93. "text": "0.133508" 94. }, 95. { 96. "name": "status", 97. "dtype": "boolean", 98. "text": true

Page 86: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 86 (97)

99. }, 100. { 101. "name": "logtext", 102. "dtype": "string", 103. "text": "{'msg': 'update ok', 'code': 200}" 104. }, 105. { 106. "name": "index", 107. "dtype": "integer", 108. "text": 1 109. }, 110. { 111. "name": "count", 112. "dtype": "integer", 113. "text": 2 114. }]}}], 115. "AdditionalData": 116. { 117. "name": "CI", 118. "dtype": "string", 119. "text": "Energy" 120. } 121. }] 122. } Response 201 (application/json) Created Header 1. HTTP/1.0 201 CREATED 2. Content-Type: text/html; charset=utf-8 3. Content-Length: 76 4. Server: Werkzeug/0.12.2 Python/2.7.12 5. Date: Fri, 23 Mar 2018 13:48:25 GMT Body 1. Feature “cisoc” successfully tramsitted to the European Information Level!

Page 87: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 87 (97) C. SUCCESS API Data Models C.1 IODEF Data Model Table 4: Fields of IODEF-Document Name Type Cardinality Description version String 1 The version of the IODEF document to which this IODEF document conforms, e.g. version="2.0". xml:lang ENUM 1 The language identifier of the IODEF document, e.g. xml:lang="en". More details about the language identifier in Section 2.12 of [40] whose values and form are described in [42]. The interpretation of the code is described in Section 6 of [30]. Incident Class 1..* The Incident class contains information related to a single incident. (See Table 2 in C1 for further details). Table 5: Fields of Incident Name Type Cardinality Description purpose ENUM Required The purpose attribute describes the rationale for information in the IODEF document, the common used values are (reporting, countermeasure), e.g. purpose="reporting". These values are described in the "Incident-purpose" of [43]. ext_purpose String Optional The ext_purpose attribute is a mechanism to extend the purpose attribute to add new value, e.g. ext-purpose="countermeasure". IncidentID Class 1 The IncidentID class represents the incident tracking number assigned to a single incident in the IODEF document. (See Table 3 in C1 for further details). GenerationTime DATETIME 1..* The GenerationTime attribute represents the time when the content of the IODEF document was generated, e.g. 2018-01-18T09:00:00-05:00. The DATETIME data type is implemented in the data model as an "xs:dateTime" type per Section 3.2.7 of [40].

Page 88: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 88 (97)

Contact Class 1..* The Contact class contains information of the parties involved in the incident. (See Table 5 in C1 for further details). Description ML_STRING 0..1 The Description attribute describes the incident, e.g. Large bot-net. The ML_STRING data-type is a character string of type "xs:string" whose language may be specified by the xml:lang attribute. See Section 2.4 of [30] for more details. RelatedActivity Class 0..1 The RelatedActivity class lists incident tracking numbers of previously observed incidents that refer to the current incident described in the IODEF document. (See Table 4 in C1 for further details). EventData Class 0..* The EventData class is a container class to organize data about events that occurred during an incident. (See Table 6 in C1 for further details). History Class 0..* The History class is the log of the actions (countermeasures) performed by the involved parties while handling the incident. (See Table 15 in C1 for further details). AdditionalData EXTENSION 1 The AdditionalData class is a mechanism to extend the data model. The information included does not fit into the designed data model. The information provided in the AdditionalData describes the critical infrastructure the incident belongs to. The AdditionalData class is of EXTENSION data-type where the name and dtype attributes are used to describe the CI information. The used value for the name attribute is CI and for dtype attribute is STRING, e.g. < AdditionalData name="CI” dtype= "string"> Energy</AdditionalData>. For more details of using EXTENSION data-type and the possible dtype value are described in Section 2.16 of [30]. Table 6: Fields of IncidentID Name Type Cardinality Description name String Required The name attribute describes the sender of the document, e.g. name="CI-SOC1".

Page 89: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 89 (97)

Id String Optional The identifier string value of the incident, e.g. 123. Note: The name and the id together compose a unique identifier describing the incident. Moreover, the “Id” is not an attribute as a part of the class, rather it is the string value of the IncidentID class Table 7: Fields of RelatedActivity Name Type Cardinality Description IncidentID Class 1..* The incident tracking number of a related incident, e.g. the tuple (id="122" name="CI-SOC1") represents the tracking number of a related incident. (See Table 3 in C1 for further details). Table 8: Fields of Contact Name Type Cardinality Description role ENUM Required The role attribute specifies the role the contact fulfils, e.g. role="creator". These values are maintained in the "Contact-role" per Section 10.2 of 30]. type ENUM Required The type attribute specifies the type of the described contact, e.g. type="organization". These values are maintained in the "Contact-type" in Section 10.2 of [30]. Table 9: Fields of EventData Name Type Cardinality Description Description ML_STRING 0..* The description attribute describes each event (single/coordinated attack) involved in the incident, e.g. These hosts are compromised and acting as bots. Flow Class 0..* The Flow class contains description of the systems or networks involved in the incident. (See Table 7 in C1 for further details). DetectTime DATETIME 0..1 The DetectTime attribute represents the time when the event was detected, e.g. 2018-01-01T01:15:45+00:00.

Page 90: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 90 (97)

Record Class 0..1 The Record class provides supportive information about the events involved in the incident. The data provided in the record class is often the output of monitoring tools (logs or any audit data). (See Table 11 in C1 for further details). Assessment Class 0..1 The Assessment class describes the technical and non-technical repercussions of the event on the victim. (See Table 13 in C1 for further details). Table 10: Fields of Flow Name Type Cardinality Description System Class 1..* The System class describes a system or a network involved in the event. (See Table 13 in C1 for further details). Table 11: Fields of System Name Type Cardinality Description Description ML_STRING 0..* The Description attribute describes the system involved in the incident, e.g. Target in head office. category ENUM Optional The category attribute specifies the role the host or network played in the incident, e.g. category="Target". These values are maintained in the "Systemcategory" per Section 10.2 of [30]. Node Class 1 The Node class represents a specific node that is part of the event. (See Table 9 in C1 for further details). Table 12: Fields of Node Name Type Cardinality Description Address Class 0..* The Address class represents a hardware (Layer 2), network (Layer 3), or application (Layer 7) address of the node. (See Table 10 in C1 for further details).

Page 91: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 91 (97)

Location ML_STRING 0..* The Location attribute is a description of the physical location of the node (geohash value), e.g. "-78.75", "20.4". Table 13: Fields of Address Name Type Cardinality Description category ENUM Required The category attribute specifies the type of the address, e.g. category="ipv4-addr". These values are maintained in the "Address-category" per Section 10.2 of [30]. address String Required The address string value represents the address whose semantics are determined by the category attribute, e.g. 192.0.2.200. Note: The “address” is not an attribute as a part of the class, rather it is the string value of the Address class Table 14: Fields of Record Name Type Cardinality Description RecordData Class 1..* The RecordData class provides description and reference to log or evidence data from a monitoring tool. (See Table 12 in C1 for further details). Table 15: Fields of RecordData Name Type Cardinality Description Description ML_STRING 0..* The Description attribute describes the data provided in the RecordItem attribute, e.g. Web-server logs. RecordItem EXTENSION 0..* The RecordItem attribute contains log, audit, or forensic data to support the incident described in the document (could be an IDMEF alert message). The RecordItem class is of type EXTENSION and the attribute used to represent the type item is the dtype attribute, e.g. < RecordItem dtype: "string"> text from the log file </RecordItem>. URL can be specified here also, e.g. < RecordItem dtype: "url"> http://mylogs.example.com/logs/httpd_access </RecordItem>.

Page 92: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 92 (97)

Table 16: Fields of Assessment Name Type Cardinality Description IncidentCategory ML_STRING 0..* The IncidentCategory A free-form text description categorizing the type of incident., e.g. Dos-attack. Table 17: Fields of History Name Type Cardinality Description HistoryItem Class 1..* The HistoryItem class contains the particular action performed which belongs to a specific countermeasure. (See Table 16 in C1 for further details). Table 18: Fields of HistoryItem Name Type Cardinality Description action ENUM Required The action attribute specifies the category of the type of the performed action documented in the history log entry. It will be used to specify the ID of the action by setting the value of the action attribute to be: action="ext-value", and specifying the value of the action ID in the field ext-action attribute. ext-action String Required The ext-action attribute contains the identifier of the performed action, e.g. "1". Usually this attribute is optional in the IODEF format, but it is used as mandatory field within the HistoryItem class to identify the action. Description ML_STRING 0..* The Description attribute is a free-form text describing the action performed, e.g. Update puf. DateTime DATETIME 1 The DateTime attribute is the timestamp of the action performed, e.g. 2017-10-20 06:20:52.811722+00:00. AdditionalData EXTENSION 0..* The AdditionalData class provides extra information about the action performed. The extra are: the identifier of the countermeasure the action belongs to (CmID), the name of the countermeasure (CmDescription), the action

Page 93: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 93 (97)

duration in seconds (duration), the result of action (status), the log text of action (logtext), the current index of action in countermeasure, range [0, count) (index), the number of all actions in a countermeasure (count). The AdditionalData class is of EXTENSION data-type uses the name and the dtype attributes to describe the extra information. e.g. < AdditionalData name="CmDescription” dtype= "string"> Re-sync PUF challenge</AdditionalData>. For more details on using the EXTENSION data-type and the possible dtype value are described in Section 2.16 of [30]. C.2 IDMEF Data Model Table 19: Fields of IDMEF-Message Name Type Cardinality Description version String Required The version attribute specifies the version of the IDMEF-Message, e.g. version="1.0". Alert Class 1 The Alert class is the alert message corresponds to a single or multiple events detected by the sender. (See Table 2 in C2 for further details). Table 20: Fields of Alert Name Type Cardinality Description messageid String Optional The messageid attribute is the unique identifier of the alert message, e.g. messageid="abc123". The alert unique identifier is a couple of identifiers (analyzerid, messageid). Analyzer Class 1 The Analyzer class is the sender that originated the alert message or the source that detected or suspected an attack. (See Table 3 in C2 for further details). CreateTime Class 1 The CreateTime attribute represents the time the alert was created. (See Table 4 in C2 for further details). DetectTime Class 0..1 The DetectTime attribute represents the date and the time when the event(s) producing the alert were detected by the analyse. (See Table

Page 94: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 94 (97)

5 in C2 for further details). Source Class 0..* The Source class represents the source(s) of the event(s) that generated the alert. (See Table 6 in C2 for further details). Target Class 0..* The Target class represents target(s) of the incident(s)/ attack(s) leading up to the alert. (See Table 7 in C2 for further details). Classification Class 1 The Classification class contains information allowing the receiver of the alert to determine what it is. (See Table 8 in C2 for further details). Assessment Class 0..* The Assessment class provides the Analyser’s assessment of the event. (See Table 9 in C2 for further details). AdditionalData EXTENSION 0..* The AdditionalData class is a mechanism to extend the data model. The information included does not fit into the designed data model. The information provided in the AdditionalData describes the critical infrastructure the incident belongs to. The AdditionalData class is of EXTENSION data-type where the name and dtype attributes are used to describe the CI information. The used value for the name attribute is CI and for dtype attribute is STRING, e.g. < AdditionalData name="CI” dtype="string"> Energy</AdditionalData>. More details on the use of the EXTENSION data-type in Section 5 of [31]. Table 21: Fields of Analyzer Name Type Cardinality Description name String Optional The name attribute is the name of the analyzer, e.g. name="SA Node 1". analyzerid String Optional The anlayzerid attribute is the unique identifier of the analyser, e.g. analyzerid = "SA1". This field is partially optional. It must be provided whenever the ‘ident’ attribute is used in other classes. Table 22: Fields of CreateTime

Page 95: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 95 (97)

Name Type Cardinality Description ntpstamp DATETIME Required The timestamp the alert was created, e.g. 2018-01-09T08:12:32-01:00. The DATETIME data type is described in Section 3.2.6 of [31]. Table 23: Fields of DetectTime Name Type Cardinality Description ntpstamp DATETIME Required The timestamp the event has been detected, e.g. 2018-01-09T08:12:32-01:00. The DATETIME data type is described in Section 3.2.6 of [31]. Table 24: Fields of Source Name Type Cardinality Description ident String Optional The ident attribute is the unique identifier of the source of the event, e.g. ident="S1-001". Node Class 0..1 The Node class represents the host or the device that appears to be causing the events (network address, network name, etc.). (See Table 8 in C2 for further details). Table 25: Fields of Target Name Type Cardinality Description ident String Optional The ident attribute is the unique identifier for the target of the event, e.g. ident="T1-001". Node Class 0..1 The Node class host or device at which the event(s) (network address, network name, etc.) is being directed. (See Table 8 in C2 for further details). Table 26: Fields of Node

Page 96: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 96 (97)

Name Type Cardinality Description ident String Optional The ident attribute is the unique identifier for the node, e.g. ident="PC1". location String 0..1 The location attribute provides the location of the node, e.g. "-78.75", "20.4". Address Class 0..* The Address class provides the network or the hardware address of the equipment. (See Table 9 in C2 for further details). Table 27: Fields of Address Name Type Cardinality Description category ENUM Optional The category attribute specifies the type of address represented, e.g. category="ipv4-addr". The default value is "unknown". More details in Section 10 of [31]. address String 1 The address attribute represents the IP address. The format of this data is specified in the category attribute., e.g. 192.0.2.200. Table 28: Fields of Classification Name Type Cardinality Description text String Required The text attribute explains and identifies the alert message, e.g. Loadmodule attack detected. Table 29: Fields of Assessment Name Type Cardinality Description Impact Class 1 The Impact class is the assessment of the impact of the happened event on the target(s). (See Table 12 in C2 for further details). Table 30: Fields of Impact

Page 97: SUCCESS D4.6 v28

SUCCESS D4.6 v1.0 Page 97 (97)

Name Type Cardinality Description severity ENUM 1 The severity attribute estimates the relative severity of the event, e.g. severity="high" The permitted values are described in Section 10 of [31].