156
Managing Cyber Threat Activities through Formal Modeling of CTI Data By Zafar Iqbal (Registration No: 2012-NUST-PhD-IT-35) Thesis Supervisor: Dr. Zahid Anwar Department of Computing School of Electrical Engineering and Computer Science, National University of Sciences & Technology (NUST), Islamabad, Pakistan. (2020)

Managing Cyber Threat Activities through Formal Modeling

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Managing Cyber Threat Activities through Formal

Modeling of CTI Data

By

Zafar Iqbal

(Registration No: 2012-NUST-PhD-IT-35)

Thesis Supervisor: Dr. Zahid Anwar

Department of Computing

School of Electrical Engineering and Computer Science,

National University of Sciences & Technology (NUST),

Islamabad, Pakistan.

(2020)

Managing Cyber Threat Activities through Formal

Modeling of CTI Data

By

Zafar Iqbal

(Registration No: 2012-NUST-PhD-IT-35)

A thesis submitted to the National University of Sciences and Technology, Islamabad,

in partial fulfillment of the requirements for the degree of

Doctor of Philosophy in

Information Technology

Thesis Supervisor: Dr. Zahid Anwar

Department of Computing

School of Electrical Engineering and Computer Science,

National University of Sciences & Technology (NUST)

Islamabad, Pakistan

(2020)

Abstract

Cyber-attacks launched by nation-states, organizations, and individuals within and

across borders are on the rise. Modern-day adversaries change signatures and use

multiple malware to launch attacks. Such attacks are termed as Advanced Persistence

Threats (APTs). Although, a large amount of cyber threat data regarding these APTs is

available online, however, due to its high veracity and large volume, timely analysis

of APTs is a challenging task for security analysts. Moreover, it is being witnessed that

APTs launched against an organization subsequently succeeded with high probability

against other similar organizations. Therefore, it has become a need of the time that or-

ganizations accumulate and share cyber threat data with peers. Furthermore, this data

should incorporate information regarding various phases of cyber threat management

(CTM) namely cyber threat prevention, detection, and the response. In this regard, a

few efforts have been made towards the structuring and sharing of cyber threat data.

Noteworthy among these is the Structured Threat Information Expression (STIX). Un-

fortunately, the current state of the structured data is poor. Structured reports are not

appropriately formatted, use incorrect vocabulary, wrongly label threat data or leave

out key components, which curtail their usefulness for CTM. The solution presented

in this thesis to address the aforesaid problems can be categorized under three formal

sub-frameworks namely STIXGEN, SCERM, and A2CS. Each of these sub-frameworks

is designed towards obtaining three exclusive thesis goals.

The STIX Generation (STIGEN) framework is proposed and its prototype is devel-

oped to automatically generate distinct, threat relevant, and error-free structured data.

A comprehensive STIX dataset of well-known APTs has been generated and shared

with the community for the benefit of researchers.

The Structured threat data Cleansing, Evaluation, and Refinement (SCERM) frame-

work has been developed to acquire STIX reports from the STIXGEN and other re-

i

sources and uplift Cyber Threat Intelligence (CTI) data, refining incomplete or missing

components, and valuating it for different phases of CTM. During SCERM’s evalua-

tion, it is observed that current STIX reports have limited information on prevention

and almost none for the response phase of CTM. The results further demonstrate that

SCERM significantly enriches STIX reports. The improvement in prevention is 73%

and in the response is 100%.

Subsequently, the APTs Analysis and Classification System (A2CS) has been devel-

oped for automatic analysis of APTs. It employs ontology modeling and semantic rules

for APTs analysis, identification of their missing artifacts, and inferencing of the tac-

tics, techniques and procedures (TTPs) being employed. A2CS takes refined structured

data as input from SCERM and extracts both high and low-level artifacts according to

the various attacker and defender models. Then, it maps this data on the ontology that

helps in identification of the missing artifacts of APTs and inferencing of high-level

TTPs with help of low-level artifacts.

Overall the proposed solution generates refined, distinct, error-free, and properly

labeled structured threat data, valuates it for different phases of CTM and employs

different attacker and defender models for automated analysis of APTs, identification

of missing artifacts, and inferencing of the high-level artifacts.

ii

Acknowledgment

All the praises and thanks be to the Allah Almighty, Who showered his countless bless-

ings and bestowed the intellect, strength and resources upon me to complete this the-

sis.

I owe my deepest gratitude to my supervisor, Dr. Zahid Anwar, whose ever-present

support, and guidance enabled me to complete my thesis, well within the stipulated

time. Despite his prolonged commitments with a series of foreign assignments, he

always remained available to nourish my stray ideas with his valuable experiences and

strong technical background for which, I am highly indebted to him. This dissertation

would not have been completed without his guidelines and encouragement.

I am also heartily thankful to my co-supervisor Dr. Yousra Javed, and to my guid-

ance committee members, Dr. Rafia Mumtaz, Dr. Asad Waqar Malik, Dr. Hassan Islam,

and Dr. Shahzad Saleem for their effective supervision, encouragement, and guidance.

This thesis would not have been possible without the love, prayers, and support of

my parents and my wife who effectively shared my responsibilities and independently

managed all domestic affairs, thus enabled me to stay focused on my research.

I am also grateful to all members of NUST administration, particularly, Dr. Osman

Hasan (Principal SEECS), Dr. Sharifullah Khan (Senior HoD Deptt. of Computing

(DoC), SEECS), Dr. Rafia Mumtaz (HoD IT, SEECS), Dr. Rabia Irfan (PhD Coordinator

Doc), Mr. Zahid Aslam Raja (OIC Exams (PG), SEECS), Mr. Muhammad Banaras (DD

Monitoring at HQ NUST), Mr. Ejaz Ahmed (DoC Secretary) and Mr. Muhammad

Adnan Bhatti (Personnel Assistant of SHOD DoC) for their kind support and guidance

in administrative affairs. I am also thankful to all those who remember me in their

prayers, during all phases of PhD.

iii

In the name of Allah, the most Gracious, the most Merciful.

I dedicate my work to my parents, my wife and my all family members, whose sacrifices, love,

and prayers enable me to reach this stage.

iv

List of Publications

Journal Publications

1. Zafar Iqbal, and Zahid Anwar., “SCERM - A Novel Framework for Automated

Management of Cyber Threat Response Activities”, Future Generation Computer

Systems, Volume 108, July 2020, Pages 687-708, Publisher = Elsevier.

2. Zafar Iqbal, and Zahid Anwar. ”Ontology Generation of Advanced Persistent

Threats and their Automated Analysis.” NUST Journal of Engineering Sciences

9, Volume no. 2 (2016): Pages 68-75.

Conference Publications

1. Zafar Iqbal, Zahid Anwar, and Rafia Mumtaz. ”STIXGEN-A Novel Framework

for Automatic Generation of Structured Cyber Threat Information.” In 2018 Inter-

national Conference on Frontiers of Information Technology (FIT), Pages 241-246.

IEEE, 2018.

v

Table of Contents

1 Introduction 1

1.1 Cyber Attack - A Global Risk . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Cyber Attacks and Worldwide Expenditures . . . . . . . . . . . . . . . . 2

1.3 Cyber Threat Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3.1 Classification of Data . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3.2 Structuring of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Cyber Threat Management . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.4.1 Shared Responsibility . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4.2 Cyber Threat Strategies . . . . . . . . . . . . . . . . . . . . . . . . 6

1.5 Present Security Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.6 Research Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.7 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.8 Proposed Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.8.1 STIXGEN - STIX Generator . . . . . . . . . . . . . . . . . . . . . . 10

1.8.2 SCERM - Structured threat data Cleansing, Evaluation, and Re-

finement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.8.3 A2CS - APTs Analysis and Classification System . . . . . . . . . . 11

1.9 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.10 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.11 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Background 14

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Cyber Security Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.1 Intrusion Detection System . . . . . . . . . . . . . . . . . . . . . . 15

vi

2.2.2 Security Information and Event Management System . . . . . . . 16

2.2.3 Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3 Cyber Threat Analysis Models . . . . . . . . . . . . . . . . . . . . . . . . 19

2.3.1 Cyber Kill Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.3.2 Pyramid of Pain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.3.3 MITRE ATT&CK . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3.4 Diamond Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.4 Structured Threat Intelligence Solutions . . . . . . . . . . . . . . . . . . . 22

2.4.1 STIX Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.4.2 STIX-Shifter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.4.3 STIXViz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3 Related Work 27

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.3 Advanced Persistence Threats . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.3.1 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.3.2 Tactics, Techniques, and Procedures . . . . . . . . . . . . . . . . . 29

3.3.3 Advanced Persistence Threats Exploit Humans . . . . . . . . . . 30

3.4 Cyber Threat Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.4.1 Structuring of Cyber Threat Data . . . . . . . . . . . . . . . . . . . 31

3.4.2 Structured Threat Data Generation . . . . . . . . . . . . . . . . . . 32

3.4.3 Cyber Threat Intelligence Quality Testing . . . . . . . . . . . . . . 33

3.5 Cyber Preparation Assessment . . . . . . . . . . . . . . . . . . . . . . . . 34

3.6 Machine learning based systems . . . . . . . . . . . . . . . . . . . . . . . 36

3.7 Cyber Threat Scoring System . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.8 Graph-Based Ranking Systems . . . . . . . . . . . . . . . . . . . . . . . . 37

3.9 Reputation-Based Security Systems . . . . . . . . . . . . . . . . . . . . . . 39

3.10 Inference or Ontology-Based Security Systems . . . . . . . . . . . . . . . 40

3.11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4 Automatic Generation of Structured Threat Data 43

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

vii

4.2 Research Approach and Contributions . . . . . . . . . . . . . . . . . . . . 43

4.3 STIXGEN System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.4 STIXGEN Design and Architecture . . . . . . . . . . . . . . . . . . . . . . 45

4.5 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.5.1 Retail Industry - APTs Selection . . . . . . . . . . . . . . . . . . . 46

4.5.2 Data Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.5.3 STIX Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.5.4 Analysis of the Generated STIX . . . . . . . . . . . . . . . . . . . . 49

4.5.5 Comparison of the POS APTs . . . . . . . . . . . . . . . . . . . . . 52

4.6 STIXGEN Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.6.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.6.2 Effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5 Cyber Threat Response Activities 59

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.2 Research Approach and Contributions . . . . . . . . . . . . . . . . . . . . 59

5.3 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.3.1 Formal Model of STIX Architecture - SAM . . . . . . . . . . . . . 60

5.3.2 Modeling of the Use Case - Managing Cyber-Threat Response

Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.3.3 Cyber threat Prevention and Response Model . . . . . . . . . . . 66

5.3.4 Cyber threat Detection . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.4 Architecture and Implementation . . . . . . . . . . . . . . . . . . . . . . . 75

5.4.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5.4.2 Valuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.4.3 Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.5 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.5.1 APT Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.5.2 A Brief Description of the Report . . . . . . . . . . . . . . . . . . 81

5.5.3 Signal Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5.5.4 Valuation of the TG-3390 Boosted STIX Report . . . . . . . . . . . 85

5.5.5 Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.5.6 Valuation Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 90

viii

5.6 SCERM Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.6.1 Dataset Selection and Evaluation Setup . . . . . . . . . . . . . . . 91

5.6.2 Current State of the STIX Reports for Cyber Threat Management 92

5.6.3 Effectiveness of the Proposed Solution . . . . . . . . . . . . . . . . 93

5.6.4 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

6 APTs Analysis and Classification System 101

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

6.2 Research Approach and Contributions . . . . . . . . . . . . . . . . . . . . 101

6.3 A2CS Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

6.4 Analysis via Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

6.4.1 Identification of Missing Artifacts . . . . . . . . . . . . . . . . . . 105

6.4.2 Tactics, Techniques and Procedure (TTPs) Analysis . . . . . . . . 107

7 Discussion 110

8 Conclusions and Future Research Directions 113

8.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

8.2 Future Research Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

Bibliography 115

Appendices 130

A STIX Dataset and Source Code 131

A.1 STIX Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

A.2 Source Code and Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

ix

List of Figures

1.1 Advanced Cyber Threats Management Challenges . . . . . . . . . . . . . 8

1.2 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1 Intrusion Detection System . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2 Security Information and Event Management System . . . . . . . . . . . 16

2.3 SIEM Search Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4 Cyber Kill Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.5 Pyramid of Pain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.6 MITRE ATT&CK Vs Cyber Kill Chain . . . . . . . . . . . . . . . . . . . . 22

2.7 Diamond Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.8 POS STIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.1 Overview of Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.1 STIXGEN Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.2 STIXGEN’s Database Schema . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.3 Backoff APT and Security Blogs . . . . . . . . . . . . . . . . . . . . . . . . 47

4.4 POS STIX : POS’s STIX Report generated by STIXGEN . . . . . . . . . . . 49

4.5 Alina POS APT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.6 JackPOS APT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.7 BackOff POS APT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.8 CenterPOS APT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.9 ProPOS APT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.10 TTP employed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.11 Protocol Employed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.12 Operating System Employed . . . . . . . . . . . . . . . . . . . . . . . . . 53

x

4.13 Folder Path Employed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.14 Encryption Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.15 Observables for CTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.16 IBM X-Force Exchange vs STIXGEN . . . . . . . . . . . . . . . . . . . . . 57

4.17 IBM X-Force Exchange Textual Report . . . . . . . . . . . . . . . . . . . . 57

4.18 IBM X-Force Exchange vs STIXGEN . . . . . . . . . . . . . . . . . . . . . 58

5.1 Campaign and its Related Components . . . . . . . . . . . . . . . . . . . 62

5.2 Formal Depiction of the Campaign Components . . . . . . . . . . . . . . 63

5.3 COA Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.4 COA and its Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.5 High level Architecture Diagram of SCERM . . . . . . . . . . . . . . . . . 76

5.6 IBM X-Force STIX XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.7 STIX-1: IBM X-Force STIX . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.8 IBM Text Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5.9 IBM STIX Description Portion . . . . . . . . . . . . . . . . . . . . . . . . . 84

5.10 STIX-2 : Boosted STIX Report . . . . . . . . . . . . . . . . . . . . . . . . . 84

5.11 Valuation Report for Cyber Threat Prevention . . . . . . . . . . . . . . . 85

5.12 Valuation Report for Cyber Threat Detection . . . . . . . . . . . . . . . . 87

5.13 Valuation Report for Cyber Threat Response . . . . . . . . . . . . . . . . 88

5.14 STIX Valuation for CTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.15 TG-3390 Techniques, Mitigation and Detection . . . . . . . . . . . . . . . 89

5.16 SCERM’s Refined STIX Report . . . . . . . . . . . . . . . . . . . . . . . . 90

5.17 Valuation Comparison - Boosted vs Refined STIX Reports . . . . . . . . 90

5.18 Current State of STIX Repositories for CTM . . . . . . . . . . . . . . . . . 92

5.19 Evaluation of RAW STIX Reports for CTM . . . . . . . . . . . . . . . . . . 94

5.20 Evaluation of STIX Repositories for CTM . . . . . . . . . . . . . . . . . . 95

5.21 Statistical Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

5.22 SCERM Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

6.1 Combined Ontology of CKC and POP . . . . . . . . . . . . . . . . . . . . 102

6.2 A2CS Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

6.3 Concepts Extraction and Mapping . . . . . . . . . . . . . . . . . . . . . . 104

xi

6.4 Identification of Missing Artifacts . . . . . . . . . . . . . . . . . . . . . . . 105

6.5 Correlation of JackPOS and BackOff APTs . . . . . . . . . . . . . . . . . . 107

6.6 Summary of Correlation Results . . . . . . . . . . . . . . . . . . . . . . . 108

6.7 Ontology of Rule-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

6.8 Ontology of Rule-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6.9 Ontology of Rule-3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

A.1 Financial APT’s STIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

A.2 Cyber Espionage APT’s STIX . . . . . . . . . . . . . . . . . . . . . . . . . 132

A.3 MITRE APT’s STIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

A.4 POS APT’s STIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

A.5 Ransomware APT’s STIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

A.6 GitHub link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

xii

List of Tables

4.1 Comparison of APTs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.2 Comparison of STIX Generators . . . . . . . . . . . . . . . . . . . . . . . . 58

5.1 Component Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.2 SCERM’s Variables and their purpose . . . . . . . . . . . . . . . . . . . . 64

5.3 Levels of Impact, Efficacy, and Confidence for Course of Action . . . . . 68

5.4 COAs Producers and their Strength . . . . . . . . . . . . . . . . . . . . . . 69

5.5 Variables for Prevention and Response phases . . . . . . . . . . . . . . . 71

5.6 Indicator Efficacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.7 Variables for Detection phase . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.8 STIX Valuation for Prevention Phase . . . . . . . . . . . . . . . . . . . . . 86

5.9 STIX Valuation for Detection Phase . . . . . . . . . . . . . . . . . . . . . . 87

5.10 STIX Valuation for Response Phase . . . . . . . . . . . . . . . . . . . . . . 88

5.11 STIX Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.12 Qualitative Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

5.13 Participants Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

5.14 SCERM Evaluation Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

xiii

List of Abbreviations

AI Artificial Intelligence

APT Advanced Persistent Threats

ACT Activities

A2CS APTs Analysis and Classification System

CKC Cyber Kill Chain

CTM Cyber Threat Management

Cyber Threat Intelligence

COA Course of Actions

LM Lockheed Martin

Mgmt Management

OWL Web Ontology Language

STIX Structured Threat Information eXpression

POP Pyramid of Pain

POS Point of Sales

RESP Response

SWRL Semantic Web Rule Language

SPARQL Simple Protocol and RDF Query Language

SQWRL Semantic Query Enhanced Web Rule Language

xiv

STIXGEN STIX Generation

SCERM Structured threat data Cleansing, Evaluation, and Refinement

SAM STIX Architecture based formal Model

SDO STIX Domain Objects

SRO STIX Relationship Objects

SEM Security Event Management

SIM Security Information Management

SIEM Security Information and Event Management

TTP Tactics Techniques and Procedures

TAXII Trusted Automated eXchange of Indicator Information

TA Threat Actor

xv

Chapter 1

Introduction

In this chapter, at first, a highlight of cyber attacks launched in the last decade is shared.

Subsequently, worldwide spending on cyber security is discussed. After that, cyber

threat data classification and structuring are shared. Then, the organizational roles of

different individuals according to cyber threat strategies and their related cyber threat

indicators are reviewed. Afterwards, different security solutions presently employed

to handle the current cyber threats are presented. Subsequently, the research problem is

introduced and then the research motivation is built. Finally, the chapter is concluded

by summarizing the research contributions and their outcomes.

1.1 Cyber Attack - A Global Risk

In this era of information technology, cyber attacks [1] [2] [3] are becoming a global

risk. According to the World Economic Forum’s (WEF) Global Risk Report (GRR) 2018 [4],

cyber attacks were the third most probable global risk for 2018.

As discovered in a security survey [5], security incidents have raised to 42.8M

around the world and these incidents rise 66% each year since 2009. In 2014, the aver-

age reported loss was up to 34% as compared to 2013 and 86% of the cyber-attacks

involved by these losses were launched by nation-states. Some governments have

made cyber-attacks campaign part of their military strategy and have built their cy-

ber armies. According to ISACA report [6], cyber criminals are trying their best to

attack individuals, organizations and different states. A majority of these attacks are

targeting government, financial, healthcare and marketing industries. As reported by

1

Chapter 1. Introduction

Symantec report, ISTR April 2017 [7], 106 new families of ransomware were discov-

ered in 2016, which are more than three times seen in the previous year. The “Wan-

naCry” [8] ransomware attack was launched in May 2017, which swiftly spread in 150

countries and damaged more than 0.3M computers. In June 2017, “NotPetya” [9] was

launched against Ukraine and other countries, which caused an estimated loss [10] of

$200M to $300M in the 3rd Quarter (Q3) alone to a shipping giant Maersk and $300M

to another shipping company FedEx. In May 2017, two billion phone records [11] were

stolen from a Chinese firm namely Du Caller group. In September 2017, Equifax [12],

a US-based company endured a data breach, where $145M (44% of the US popula-

tion) customer’s personal and credit card data was stolen. In November 2017, 57M

customer and driver data were stolen in the Uber [12] data breach and the company

paid a $ 100,000 to hackers to delete the stolen data. In the same year (2017), the Son-

icWall Capture Labs threat network detected 101.2% (2,855) ransomware signatures [13]

in contrast to 1,419 identified in the year 2016.

In another cyber-attack [14], 30M US-based Facebook users’ data was stolen by the

UK based firm Cambridge Analytica (CA). Later on, this data was used to attract voters

for the USA president Trump’s 2016 elections campaign. Moreover, attacks having a

subversive purpose, in particular, those launched during the US presidential elections

2016 [15] represent a new form of top-notch cyber-attacks. As stated in the McAfee Labs

threats report March 2018 [16], the health care division faced 210% increased security

incidents (publicly disclosed) in 2017 as compared to the previous year. If we deeply

analyze the above attacks and other such attacks like Zeus [17], BackOff Point of Sales

(POS) [18], we will find that with passage of time these attacks have succeeded against

many analogous organizations. This fact shows that a cyber-attack launched against an

organization can be easily used against parallel organizations because of their similar

IT infrastructure. Therefore, the collection and timely sharing of CTI data are very

important for the prevention, detection, and response of cyber-attacks.

1.2 Cyber Attacks and Worldwide Expenditures

In the past, cyber-attacks were floated against individual users for fun and damage but

nowadays these attacks are being launched against business chains, industries and na-

2

Chapter 1. Introduction

tions for financial and political gains. It seems that the Internet landscape has become

binary warfare.

Although worldwide information security spending is increasing every year [19],

it has reached $124 billion in 2019, yet record-breaking data breaches are occurring

globally. According to Gartner Inc., [19] this outlay was approximately $96 billion in

2018, which is 8% more than the previous year from 2017. In 2017, major data breaches

took place. In May 2017, two billion phone records [11] were stolen from a Chinese

firm. In the same year, a US-based company Equifax [12] suffered a data breach, where

145M customers’ data was stolen.

Due to the proliferation of cyber incidents, Cyber Threat Management (CTM) is

emerging as a systematic approach for the timely prevention, detection, and response

of these incidents because its activities involve identifying threats, understanding their

nature and applying appropriate actions.

1.3 Cyber Threat Data

Multiple organizations are continuously sharing a large volume of CTI data for CTM.

For example, MITRE Corporation [20] is a non-profit organization. Currently, it pro-

vides CTI data for about 94 different threat groups. This CTI data consists of vari-

ous indicators such as TTPs, Network and Host artifacts, IPs, and DNS information.

Likewise, HAILATAXII [21] is an open-source repository which has about 1,107,066 in-

dicators. Similarly, IBM-XForce Exchange [22] shares machine-readable indicators for

security tools such as IDS, IPS, and firewalls. Financial Services Information Sharing and

Analysis Center (FS-ISAC) [23] is an industry consortium which regularly provides CTI

data to safeguard the financial domain from cyber threats. Similarly, Research and Ed-

ucation Networks Information Sharing and Analysis Center (REN-ISAC) [24] is producing

a large amount of CTI data for incident response teams, researchers and education

community.

Although, a massive volume of cyber threat data is available on different security

blogs, however, it has become a great challenge for security analysts, to decide which

is required data for cyber threat management. In this regard, multiple models are

available. Details are provided in the following subsections.

3

Chapter 1. Introduction

1.3.1 Classification of Data

In the recent past, multiple models are presented for the classification of cyber threat

data. Among these models, the Cyber Kill Chain (CKC) [25] and the Pyramid of Pain

(POP) [26] are prominent. The CKC model guides an analyst how a perpetrator may

use different phases such as Reconnaissance, Weaponization, Delivery, Exploitation, Instal-

lation, and Exfiltration to launch an Advanced Persistent Threats (APTs), while the POP

details how signatures and artifacts, available at different attack levels, can be used to

defend their network from APTs. The POP model further guides that publically avail-

able cyber threat data is generally regarding atomic and computed indicators namely

IPs, Domain Names, and Hash Values, while the data related to higher-level artifacts

such as File name, Registry entries, Protocols used, Obfuscation methods, and TTPs, which

is more related to decisions is mostly missing. The model further imparts that perpe-

trators can change the atomic indicators with little effort but the higher-level artifacts

are hard to change because perpetrators invested great time and money during the

development of these artifacts.

1.3.2 Structuring of Data

In the last decade, a massive volume of cyber threat data has been published on dif-

ferent security blogs, however, this data is generally scattered, as well as unstruc-

tured [27]. Several efforts such as IODEF, STIX [28] and YARA [29] are put forward by

the government and the industry to convert non-structured data [30] into a structured

and machine-readable format. Among these, STIX [28] is a de-facto standard [31].

STIX is a community-based effort, which not only structures cyber threat data but also

enables sharing, visualization, and analysis capabilities. STIX has several components

such as Campaign, Tactics techniques and procedures (TTPs), Exploits, Indicators, Observ-

ables, Incidents, Course of Actions (COAs) and Threatactors to represent cyber threat data.

1.3.2.1 Present State of Structured Data

Although, STIX is a remarkable effort for structuring and sharing of CTI data, however,

it is slow in adoption, which is due to the manual STIX generation process. Moreover,

it has been noticed that publicly available STIXs are few and have mostly erroneous,

4

Chapter 1. Introduction

misplaced and meaningless data. Although sharing and structuring of CTI is very

important, it is paramount that data being shared must be meaningful, threat-relevant

[32], properly placed and error-free.

1.3.2.2 Generation of Structured Data

There are many cyber threat analysis tools publicly available such as Bro [33], Splunk

[34], STIXViz [35], where Bro is a log analysis tool, Splunk is being used to search,

visualize and analyze the logs generated from different sensors, while STIXViz is for

visual analysis of the STIXs reports. However, there is no tool available to generate

distinct, threat relevant and error-free structured data.

1.3.2.3 Valuation of Structured data

There are several challenges to the current state of CTI data [36] [37] that hinders the

automation of CTM. Cyber analysts witness a lot of sketchy, erroneous, and redun-

dant CTI data [38], lack of novel information, as well as a paucity of a standardized

vocabulary. This means that CTI data producers do not always follow the standards

when publishing information or they republish the same threat information in part or

by the whole that they or another source published previously. Also frequently, there

is a lot of extraneous information that is not very useful to the threat analyst amongst

very sparse new terms. The same information is published using semantically similar

terminology due to a lack of standardized vocabulary. Moreover, currently available

CTI data has very limited information for CTM.

1.4 Cyber Threat Management

Cyber Threat Management (CTM) involves prevention, detection, and response to

cyber-attacks by identifying and understanding threats and applying appropriate ac-

tions.

5

Chapter 1. Introduction

1.4.1 Shared Responsibility

Cyber Threat Management is a shared responsibility undertaken by multiple stake-

holders within an organization [39] such as the Chief Executive Officer (CEOs), the Chief

Information Security Officer (CISO), and the Security Administrator (SA), each of whom

consumes specialized components of cyber threat intelligence data in order to effec-

tively perform their duties. For example, the CEO is generally interested in under-

standing if the prevalent cyber attack is relevant to the organization’s primary business

and determining the threatactor whether they are a competitor or elements who want

to conduct extortion. A CISO, on the other hand, wants to know if the organization can

resist the attack and if not, then he determines the COAs to safeguard the organization.

Accordingly, the SA applies the identified COAs.

1.4.2 Cyber Threat Strategies

Cyber Threat Management has several strategies, which can be grouped into three

phases namely cyber threat prevention, detection, and response. These phases are

continuous and concurrent processes, each of which requires a separate team having

focused tasks and expertise. For cyber threat prevention, the CISO studies and au-

dits the organizational network, analyzes assets, operational procedures and identifies

the exploits and their COAs. Afterwards, the SA implements the COAs in the shape of

patch updates and defines policies to prevent cyber attacks. Despite these preventive

measures, the prevention team cannot stop all of the advanced, sophisticated, multi-

stage and targeted attacks. Therefore, to trace these attacks the responsibility lies on

the detection team. This team studies emerging attacks by using the corresponding

indicators and observables signifies the behavioral signatures and correlates these to the

log files of the organizational network to determine the nature of a suspected cyber

attack. Once identified, the CISO studies the appropriate COAs to mitigate the attack.

Once approved the SA implements these COAs in the shape of software installation

and defines policies to stop or limit ongoing cyber attacks.

6

Chapter 1. Introduction

1.5 Present Security Solutions

Nowadays, several security tools are used for CTM such as Antivirus, Intrusion Detec-

tion Systems (IDS) and Security Information and Event Management Systems (SIEM). Virus

Total [40] is an antivirus, which employs signatures for identification of the malware.

Bro [33] is an IDS which takes log files as an input. In Bro, rules can be written to de-

tect intrusion. Splunk [34] is a SIEM. It correlates low-level artifacts such as log files,

for intrusion detection. The above-mentioned tools do not process structured data di-

rectly but mostly examine low-level attack artifacts such as log files. Moreover, these

aforesaid tools or any other tools for that matter allow for very limited valuation and

refinement of structured threat feeds.

Cyber-attacks of the present time are dynamic, stealthy [41] and persistent, which

can’t be blocked by legacy security mechanisms.

1.6 Research Motivation

In the present time, cyber threats management has many challenges as shown in Figure

1.1. For example, present day APTs are prolonged, customised, and targeted, there-

fore, most of the time these remain undetected by the conventional security solutions.

These attacks have diverse goals such as some attacks are launched for financial gains,

for example, Zeus and Carbanak, some attacks aimed at political gains and sabotage

like Naikon and Stuxnet APTs and other required personal information, for instance,

PoSeidon and BlackPOS.

Despite this, a substantial amount of cyber threat data is available in the litera-

ture and online repositories, however, most of the data is unstructured and distributed

which cannot be read by machines and humans as well. Due to high adaptivity, large

volume, and unstructured nature, analyzing information about cyber incidents is a

challenging task for security analysts.

Although, multiple efforts are being carried out to analyze the APTs and to struc-

ture CTI data for CTM, however, none of these became successful so for. During the

literature review, it is revealed that to understand systems and to study their compo-

nents, W3C recommends ontological modeling. Moreover, ontology’s are developed

to share, reuse, and to analyze the domain knowledge. Therefore, this research was

7

Chapter 1. Introduction

Figure 1.1: Advanced Cyber Threats Management Challenges

started to form ontological modeling. Multiple security models such as Cyber Kill Chain

(CKC) and Pyramid of Pain (POP) are studied and analysed. It is identified that each of

these models has some pros and cons. Therefore, the need is felt that all solutions

must be integrated to have a good security solution. Accordingly, security concepts

are taken from these models and developed an ontology model for CTM. Moreover,

semantic rules are written for automatic analysis of APTs such as identification of their

missing artifacts and inferencing of the Tactics, Techniques and Procedures being em-

ployed.

During the collection of CTI data for our proposed ontology model, it is studied

whether present CTI data contains components information for the proposed inte-

grated security model? If such CTI data is available then what is the quality of it?

Then, we downloaded CTI data from security blogs and mapped it on our proposed

model. It is recognised that most of the available CTI data is unstructured and missing

security concepts that are necessary for CTM. Furthermore, it is identified that most of

the available CTI data is unstructured, erroneous, irrelevant, and wrongly labeled.

This research aims to allow for effective CTM by performing automatic analysis

of APTs, identification of their missing artifacts, and inferencing of the Tactics, Tech-

niques, and Procedures through the various attacker and defender models. All of these

tasks are present-day challenges. Due to the large volume and unstructured nature of

CTI data, the identification and extraction of artifacts is not possible by machines and

humans as well. Therefore, for the automatic analysis of APTs, CTI data must be in a

8

Chapter 1. Introduction

structured form. Moreover, this data must be error-free, threat relevant, and distinct

otherwise, it would lead to the wrong conclusion. Furthermore, it is learned during the

research that most of the publicly available CTI data is wrongly labeled, having incom-

plete artifacts, and is distributed over different security blogs. Therefore, for effective

CTM the threat data must be collected, boosted, and refined from various security

blogs. Presently, all of these tasks cannot be accomplished due to the non-availability

of such algorithms and frameworks that automatically generate, refines, valuates and

analyse the structured CTI data. Although, some manual tools are available to gener-

ate structured CTI data, however, these tools are naturally difficult to use and produce

errors.

Therefore, the need is felt for such a CTM framework that should consist of three

stages. The first stage must automatically generate error-free, properly labeled, and

threat relevant CTI data in the structured format. Whereas, the second stage should

evaluate the quality of the input structured data for various phases of the CTM namely

cyber threat prevention, detection, and response. Moreover, this stage must be able to

boost and refine the structured CTI data through the input of various analysts and se-

curity blogs. Likewise, the third stage of the framework should take refined structured

CTI data as input and extract both high and low-level artifacts according to the various

attacker and defender models. Finally, this stage needs to deduce the required TTPs

based on the previously extracted indicators through formal models.

This research takes all aforesaid problems as a challenge and develop a frame-

work that generates refined, distinct, error-free, and properly labeled structured threat.

Moreover, it also valuates the structured CTI data for different phases of CTM. Fur-

thermore, this framework employs different security models for automatic analysis of

APTs, identification of their missing artifacts, and inferencing of the TTPs.

1.7 Research Questions

This research will focus on addressing the following questions. (1) Does currently

available cyber threat intelligence data follows NIST guidelines of timely, relevant,

specific, accurate, and actionable threat intelligence? (2) Is it possible to quantitatively

measure the quality of CTI data produced by cyber threat sources and ultimately rank

9

Chapter 1. Introduction

them? (3) What level of CTI data’s refinement can be achieved for cyber threat pre-

vention, detection and response activities? (4) If ontological modeling of cyber threat

data according to existing solutions is performed, will it help to understand and de-

fend cyber attacks? (5) Can formal rules be devised such that they can aid machines in

automated analysis of cyber attacks, their prevention, detection, and response?

1.8 Proposed Framework

The proposed framework can be divided into three sub-frameworks namely STIX-

GEN, SCERM, and A2CS, as shown in Figure 1.2. Each of these frameworks fulfills

distinct yet closely related research goals to facilitate the security teams in the analysis

of advanced cyber threats and their prevention, detection and response activities. The

salients of aforesaid frameworks are as follows.

Figure 1.2: Proposed Solution

1.8.1 STIXGEN - STIX Generator

Although, STIX is a remarkable effort for structuring and sharing of CTI, however, it

is underutilized due to a largely manual STIX generation process, which is naturally

difficult and produces errors. This research takes all these deficits as a barrier in STIX

utilization and these shortcomings have become a motivation for this research work.

Therefore, STIXGEN is designed according to STIX standard in such a way that it gen-

erates meaningful, properly placed and error-free structured data. Therefore, it will

increase the sharing of structured CTI between peer organizations.

10

Chapter 1. Introduction

1.8.2 SCERM - Structured threat data Cleansing, Evaluation, and Re-

finement

During this research, it is realised that the identification and prioritization of CTI data

for CTM cannot be meaningfully accomplished without having a formal model of

threat intelligence components, their connectivity, and dependency. Therefore, SCERM

is proposed that boosts, refines, and valuates STIX reports for CTM. The prototype pro-

duces valuation scores for STIX reports and a list of extracted components for every

phase of CTM. In fact, SCERM provides a starting point for CTM teams for the preven-

tion, detection, and response of cyber threats.

1.8.3 A2CS - APTs Analysis and Classification System

Due to the importance of the CKC and the POP models, A combined ontology of both

models is developed. The proposed framework A2CS accepts both the structured and

unstructured CTI data as input. Then, it extracts CTI data related to the CKC and the

POP models. After that, the A2CS maps this data on the integrated ontology of the CKC

and the POP models that helps an analyst for identification of the missing artifacts of

APTs and inferencing of the high-level TTPs with the help of the low-level artifacts.

1.9 Results

For the thorough assessment of the proposed framework, CTI data of real-life APTs is

taken. For example, for a comprehensive assessment of STIXGEN, multiple APTs [42]

are selected and generated their STIXs by using STIXGEN and by employing state-of-

the-art online tools. It was found that our proposed framework’s results are better than

the results of other tools and are distinct, relevant and error-free.

Likewise, SCERM is evaluated by using publicly available STIX’s repositories such

as the Schemas-test [43], IBM X-Force Exchange [22], and HAILATAXII [21]. These repos-

itories were analyzed, valuated, and prioritized for different phases of CTM life-cycle.

The evaluation results highlight that publicly available STIX reports have limited infor-

mation for the cyber threat prevention and they contain almost none for the response

phase of CTM. The valuation results demonstrate that the SCERM system significantly

11

Chapter 1. Introduction

augments the STIX reports.

Similarly, A2CS framework, two famous Point of Sale (POS) APTs are selected and

correlated. The results generated by the proposed system indicate that most of the

phases of these APTs such as Weaponization, Host Artifacts, Network Artifacts, and TTPs

are common.

1.10 Contributions

During our research, we develop three novel sub-frameworks. Each of these frame-

works fulfills distinct yet closely related research goals to facilitate the security teams

in the analysis of advanced cyber threats and their prevention, detection, and response

activities.

Currently, threat data is error-prone and missing important CTI for CTM. Therefore

threat analysts hesitate to use threat data. Our first sub-framework takes CTI data as

input and produces properly labeled, error-free, and threat relevant structured threat

data for CTM.

It is learned during the research that most of the publicly available CTI data is

wrongly labeled, having incomplete artifacts, and missing important indicators re-

garding cyber threat prevention, detection, and response. Therefore, for effective CTM,

there is a need for a sub-framework that should boost, refine, and evaluate the struc-

tured CTI data. However, these tasks cannot be meaningfully accomplished without

having a formal model of threat intelligence components, their connectivity, and de-

pendency. Therefore, A novel sub-framework is proposed for the valuation of struc-

tured data, which formally models the STIX architecture on the basis of the STIX use

case Managing cyber threat response activities.

It is expected that the proposed framework will enhance the user confidence over

structured CTI data, and hence the quality and usage of structured reports for CTM

will increase. Moreover, it will be used to generate good quality STIXs for students

and analysts in a simple and effective way.

12

Chapter 1. Introduction

1.11 Thesis Organization

The rest of the thesis is organized as follows. Chapter 2 briefly describes the back-

ground knowledge of key domain concepts namely Present Security Solutions, Cyber

Kill Chain, Pyramid of Pain, Structured Threat Intelligence Solutions, and STIXViz. Then,

chapter 3 shares a comprehensive literature review that describes research contribu-

tions carried out in the domain of APTs analysis, CTI data analysis and structuring,

and other associated areas. After that, chapter 6 describes how ontological model-

ing and semantic rules are used for APTs analysis. Next, chapter 4 details how dis-

tinct, threat relevant, and error-free structured data is automatically generated. Sub-

sequently, chapter 5 formally models the STIX architecture and valuates STIX reports

for different phases of cyber threat management. Chapter 7 provides answers to the

aforementioned research question raised in chapter 1. Finally, chapter 8 concludes

this thesis and provides future research directions. Moreover, a comprehensive STIX

dataset is provided for researchers in Appendix A.1.

13

Chapter 2

Background

2.1 Introduction

This chapter briefly describes various security solutions, standards, and techniques

that are employed in various sub-frameworks proposed in the thesis. For example,

the Pyramid of Pain (POP), Cyber Kill Chain (CKC), and Ontologies are chosen for cyber

threat analysis and these concepts are made part of the A2CS framework. Similarly,

STIX standard, its Use Cases, and MITRE ATT&CK are employed in STIXGEN and

SCERM frameworks for the analysis, refinement, and the valuation of the CTI data for

CTM. We do not assume that users have prior knowledge of these. For ease of their

reading and better understanding, we are briefly discussing these concepts namely

Present security solutions, Ontology, Pyramid of Pain, Cyber Kill Chain, and State-of-the-Art

solutions for sharing and visualization of Structured Threat Intelligence are shared in the

following subsections. Moreover, references are provided for further reading.

2.2 Cyber Security Solutions

Presently, several security solutions are used for cyber threat prevention, detection, and

response. These solutions can be divided into three main categories namely Intrusion

Detection Systems (IDS), Security Information and Event Management Systems (SIEMS),

and Ontology based systems. Details of these are provided in the following subsections.

14

Chapter 2. Background

2.2.1 Intrusion Detection System

Primarily, Intrusion Detection Systems (IDS) are signature based. These systems con-

sider atomic and computed indicators of previously known attacks for the detection

of an imminent attack. There are two types of IDSs such as Host-based IDSs (HIDSs)

and Network-based IDSs (NIDSs). HIDSs are installed and worked on a single ma-

chine while NIDSs take care of whole network, as can be seen in Figure 2.1. According

to techniques IDSs have different types such as Signature-Based, Anomaly-based, and

Rule-based IDSs. Details of these are provided in ensuing subsections.

Figure 2.1: Intrusion Detection System

2.2.1.1 Signature-Based IDSs (SIDSs)

SIDSs employ specific attack patterns for detection of cyber attack. These patterns are

called signature. These IDSs generally search attack signatures from logs and network

traffic and if become successful then generate alarm. Although these systems are accu-

rate, generate less false alarm but system can not detect zero-day cyber attacks.

2.2.1.2 Anomaly-Based Intrusion Detection System

Anomaly-based IDSs are designed to analyze the behavior of the network traffic against

a baseline profile. The baseline profile is a detailed description of normal network

behavior, usually enumerated by the administrator. These IDSs classify all normal and

abnormal behavior on the network with reference to the baseline behavior. A poorly

defined baseline profile reduces the detection ability of these system.

15

Chapter 2. Background

2.2.1.3 Rule-Based Intrusion Detection System

In Rule-based IDSs, the intrusion is detected by perceiving events on the network. Rules

are applied to decide whether an activity is an intrusion or not. The malware detection

capability of such systems greatly depends on the rules. In these systems, defining

the correlation rules is the biggest challenge. Furthermore, analysts need to consider

numerous logs because they don not have an idea, which log will be relevant. To keep

track all of this requires considerable expertise. Customized protocols used by the

perpetrator makes writing rules a difficult job. With all of these challenges, manual

writing of rules is not practically feasible.

2.2.2 Security Information and Event Management System

Security Information and Event Management System (SIEM) is a software-based security

solution. It is developed for cyber threat detection, investigation, and repose. SIEM’s

connectivity with various host and network-based devices is provided in Figure 2.2

while its Working details are provided in following sub-section.

Figure 2.2: Security Information and Event Management System

2.2.2.1 SIEM Working Principal

At first, SIEM tools collect log files produced by various applications, systems, and

network devices such as Proxy server, Activity Directory server, Routers, Switches, Email

16

Chapter 2. Background

servers, Access points, Database server and different Vulnerability scanners. Then, it parses

these logs and correlates events. If some malicious activity is detected then it generates

alerts.

In fact, SIEM has two main modules called Security Event Management (SEM) and

Security Information Management (SIM). The SEM is responsible for real-time monitor-

ing of events and their correlation. Once suspicious activity detected then it generates

alert and takes measures according. While, the SIM is responsible for storage and re-

porting of data. SIEM provide fast search based on big-data indexing techniques which

can be seen in Figure 2.3.

Figure 2.3: SIEM Search Mechanism

SIEM also correlates the event data with assets, users, vulnerability, and threat

data for for user as well as cyber security event monitoring. A number of SIEM so-

lutions are available in market. According to Gartner [44] best SIEM tools of 2019 are

Elasticsearch/Logstash/Kibana-(ELK), LogPoint -SIEM, Splunk Enterprise Security (ES), Lo-

gRhythm SIEM, LogRhythm SIEM, ManageEngine SIEM, SolarWinds Log & Event Manager

(LEM), and Splunk SIEM.

2.2.3 Ontology

In the last decade, the Web has become an important mean of information sharing.

However, in order to utilize the web to its full extent, it is felt that information must

not only understandable by humans but also readable by machines. Therefore, World

Wide Web Consortium (W3C) introduces the concept of semantic web and develops

standards and tools to shape the information in such a way that both computers and

17

Chapter 2. Background

people consumes it and work in a cooperative manner. In this regard, Ontologies are

introduced which acts as a key for the semantic web.

Ontology is a graph model which represents domain knowledge, by which devel-

opers and machines can exchange domain information with each other and with other

experts. Since last few years, researchers have focused on how an ontology and linked

knowledgebase could be constructed from structured and unstructured data sources

and how to infer an attack using knowledgebase.

Ontologies are developed in the form of concepts, axioms, data values, and their

relationships. These are designed for sharing of formally represented knowledge. Web

Ontology Language (OWL) [45] is the W3C recommendation for ontologies design and

management. It is a de-facto standard of the semantic web. OWL is developed by

the World Wide Web Consortium (W3C). Formally an ontology is defined as: O =

{C,I,R,A}

C : Set of Domain’s Concepts.

I : Set of Domain’s Objects.

R : Set of Relationships between Concepts and Objects.

A : Set of Axioms holding among Concepts, Objects and their Relationships.

2.2.3.1 Rule-based Reasoning

As Web Ontology Language (OWL) cannot be used to deduce new knowledge. There-

fore, Semantic Web Rule Language (SWRL) is introduced by W3C. It is an extension of

OWL. SWRL rules are simple and are developed from OWL concepts and properties.

It has a number of data handling operations such as arithmetic, comparison, date, time

and many others. These rules have two parts i.e. antecedent (body) consequent (head).

Antecedent =⇒ consequent

When conditions in the body of the rule becomes true then conditions in the head

part must also holds.

hasClass(?x, ?z) ∧ hasClass(?y, ?z) =⇒ hasSameClass(?x, ?y)

From this rule, if Ali is studying in class seven and Aslam is also studying in class

seven then we can say that both are in same class.

18

Chapter 2. Background

2.2.3.2 Querying the Inferred Knowledge

The OWL and SWRL languages based on Open World Assumption, therefore they do

not support closure. Moreover, OWL does not support operations such as counting,

aggregation, and negation. To overcome these gaps Semantic Query-enhanced Web

Rule Language (SQWRL) and Simple Protocol and RDF Query Language (SPARQL)

are developed. It allows the use of both SWRL and SQWRL side-by-side. To count all

student of class seven following SQWRL query can be used.

student(?x, ?z) −→ sqwrl : count(?x)

The main advantage of the ontological modeling is their ability to define a semantic

model of data with its domain knowledge. Beside this, ontologies are also used to link

various types of semantic knowledge. Furthermore, it is important to highlight that

ontologies are not only used to present already shared knowledge but new domain

knowledge can be added. Therefore, it can be concluded that ontological modeling

provides data presentation, addition, searching, and reasoning capabilities.

2.3 Cyber Threat Analysis Models

Cyber attacks are increasing every year. Several security efforts are made for the pre-

vention, detection, and response of cyber attacks such as the Cyber Kill Chain (CKC) [25],

Pyramid of Pain (POP) [26], MITRE ATT&CK [20], and Diamond model [46]. CKC is

an attacker model whereas the POP is a defender model. The CKC describes various

phases of a cyber attack. Whereas, the POP model guides the security analyst on how

signatures and artifacts of various attack levels can be used for the prevention, detec-

tion, and response of cyber attack. Likewise, the MITRE ATT&CK is a knowledgebase

that provides CTI data of real cyber attacks. Similarly, the Diamond model describes

how cyber attackers launch cyber attacks. Moreover, this model also guides analysts

about the analysis of cyber attacks. Details of the aforesaid models are provided in the

following subsections.

19

Chapter 2. Background

2.3.1 Cyber Kill Chain

The Kill Chain is a military concept [25] used for structuring an attack. It is a stage

based model used to describe different phases of an attack. Recently, the authors in [47]

and (An American Global Aerospace, Defense, Security, and Advanced Tech Com-

pany) have used this concept in Information Security (IS) domain to combat against

the advanced threats. According to authors, a malware campaign may be divided into

seven different phases, as shown in Figure 2.4. In Reconnaissance phase, the perpe-

trator collects information regarding the target through web, social media and using

other publically available information.

Figure 2.4: Cyber Kill Chain

Then, in Weaponization phase, the perpetrator analyzes the collected data of the

Reconnaissance phase and decides: what attack method should be used; who should

be targeted in an organization and which OS and technologies should be targeted.

In the Delivery phase of the CKC, the perpetrator sends the malware payload to the

target. Once delivered, malware exploits the vulnerabilities at the target machine to

execute the perpetrator code. Then the malware is installed on the target machine and

it establishes a communication channel with adversary Command and Control (C2).

Finally, the perpetrator collects the desired data during Exfiltration phase, encrypt it

and then send it to the C2.

2.3.2 Pyramid of Pain

The Pyramid of Pain (POP) is a cyber threat defender model [26]. It is a cyber threat

hunting framework. This model describes the efficacy of several indicators such as

Hash values, IP addresses, DNs, Network artifacts, Host artifacts, Tools, and TTPs, and

places them at different levels of the pyramid, according to their efficacy, as shown

20

Chapter 2. Background

in Figure 2.5. It emphasizes that the addressing of low-level CTI data such as hash val-

ues, IPs, and DNs will cause small damage to the adversary while preventing high-level

CTI data such as host and network artifacts, tools and TTPs will be more painful because

they are hard to change. POP is used in our work to rank the indicator components

provided in the STIX reports.

Figure 2.5: Pyramid of Pain

2.3.3 MITRE ATT&CK

MITRE ATT&CK is a publically available knowledgebase provided by MITRE Corpo-

ration. It shares Tactics, Techniques, and Procedures (TTPs) information about real-

world cyber attacks in twelve different classes namely Initial Access, Execution, Persis-

tence, Privilege Escalation, Defense Evasion, Credential Access, Discovery, Lateral Movement,

Collection, Command and Control, Exfiltration, and Impact. Furthermore, it also provides

indicators for cyber threat prevention, detection, and response. ATT&CK provides ba-

sis for the development of security models. The MITRE ATT&CK uses CKC model

which can be seen in Figure 2.6.

It can be identified from the figure that the tactics presented in MITRE ATT&CK are

related to the last four phases of CKC namely Exploitation, Installation, Command and

Control, and Exfiltration.

2.3.4 Diamond Model

The Diamond model captures the attacker capabilities such as use of Malware, Exploits,

and Certificates against some infrastructure of a victim, as shown in Figure 2.7. It states

that all these activities are events and these events have atomic features. Moreover,

21

Chapter 2. Background

Figure 2.6: MITRE ATT&CK Vs Cyber Kill Chain

it guides the security analyst can discover information about the attacker, its infras-

tructure and capabilities, and victim by moving between edges and vertices of the

diamond.

Figure 2.7: Diamond Model

2.4 Structured Threat Intelligence Solutions

There are several commercial and community-based efforts such as the Kaspersky,

Crowdstrike, Alientvault, Checkpoint Metasploit, IntelliStore and Kres and Chat rooms,

which mainly aim to provide structured cyber threat data. The governments are push-

ing for the structuring and sharing of cyber data. Multiple efforts are in progress to

22

Chapter 2. Background

express non-structured cyber data into a structured and machine-understandable for-

mat. The National Vulnerability Database (NVD) [48] and IBM X-Force provide XML

and JSON feeds to deliver CTI related to cyber attacks and vulnerabilities with a di-

verse level of details. There are numerous solutions for cyber threat data structuring

and sharing such as IODF by IBM [49] [50], CIF [51] [52], OpenIOC [53], MAEC [54],

Trusted Automated eXchange of Indicator Information (TAXII) [55], CAPEC [56], STIX [57],

ThreatConnect [58], IDMEF [59], Soltra Edge [60], CRITS [61], ElectricIQ [62], Malware

Information Sharing Platform (MISP) [63], CybOX [64], and VERIES [65]. Among these,

MISP and STIX are comprehensive efforts. These are community-based efforts, devel-

oped for the structuring of CTI data for different collaborating nodes. MISP employs

a flat model for CTI data structuring. In MISP, every new entry is called an object

that has multiple attributes such as threat level, organization, date, and comments. At-

tributes are defined by Category and Type fields. The Category field indicates what the

attribute shows such as financial fraud, targeting data, and network activity. While,

the Type field describes the Category such as Host name, IP address, Port number, Files/

Folder name, emails, and DNS. Whereas, STIX is a graph of nodes and edges which has

two types of objects such as STIX Domain Objects (SDO) and STIX Relationship Objects

(SRO). STIX Objects describe various aspects of cyber threat data. SDOs are combined

through SROs to represent cyber threat data. In fact, STIX covers a wider extend than

MISP, and aims at becoming the standard. Therefore, it is being used in our research.

2.4.0.1 STIX Domain Objects (SDO)

The STIX Domain Objects [66] are the nodes of the STIX graph. Their details are as

follows.• Observables: These are stateful properties, which belongs to the

computer or the network. These have information about registry

keys, files (hash, name, and size), ports, and protocols used by the

attacker for data exfiltration.• Indicators: These represent the presence of an APT at the target ma-

chine or network. Indicators have information such as Files Watch-

list, Protocol Watchlist, and Port Watchlist, each of which has one or

more observables.

23

Chapter 2. Background

• Incidents: They detail victims, assets effected, and the impact of the

cyber attack.

• Tactics, Techniques and Procedures (TTPs): These depict the behavior

or the strategy of a cyber attacker.

• ThreatActor (TA): This component describes a malicious actor,

which launches a cyber attack.

• Exploit Targets: These describe the weaknesses of a target system

and its network.• Campaign: This component is a collection of instances of the Ac-

tor’s presumed intents, which can be observed through TTPs, in-

cidents, indicators, and exploits across the organizations.

• Course of Actions (COA): These are the specific measures for the

prevention, detection or response to a cyber attack.

• Attack Pattern: It is a type of TTP that are used by TAs to compro-

mise targets.

• Identity: It is used to represent information related to individuals,

organizations, and groups such as contact information and sectors.

• Malware: It is a type of TTP or software used to compromise the

target’s data.

• Tool: These are tools that are used by attackers to perform cyber

attacks.• Report: These are collections of the CTI data about various STIX

domain objects.

2.4.0.2 STIX Relationship Objects (SRO)

The STIX Relationship Objects [66] are the edges of the STIX graph. STIX has defines

two types of SROs. Details are as follows.• Relationship: It describes the SDO’s relationship with itself or with

another SDO. Relationship’s examples are uses, mitigates, targets,

and indicates.• Sighting: It is a count which indicates how many time a SDOs is

observed.

24

Chapter 2. Background

2.4.1 STIX Use Cases

The STIX provides four high-level use cases for cyber threat management [67] which

are (1) cyber-threat analysis, (2) specifying indicator patterns, (3) managing cyber threat re-

sponse activities and (4) CTI sharing.

In these, the “managing cyber threat response activities” is the most important use case,

which expresses the significance of different STIX components with the cyber threat

management life-cycle. The use case characterizes the significance of different STIX

components according to the cyber threat management life-cycle. The use case asserts

that all STIX components are not equally important for every phase of cyber threat

management rather certain components are more relevant to a particular phase. For

example, exploits and their COAs are necessary for cyber threat prevention, indicators

and observables are essential for cyber threat detection, while indicators, observables and

their respective COAs are important for the cyber threat response phase.

2.4.2 STIX-Shifter

STIX-Shifter [68] is a python based open-source library. It uses STIX Patterning mod-

ule to connect with various cyber threat products that have data repositories. STIX

Patterning module takes STIX patterns as an input, searches data from the connected

data repository, and if matches, then return the identified pattern. Afterwards, the

STIX-Shifter converts the identified pattern into STIX format.

2.4.3 STIXViz

STIXViz is a graphical tool [35] that is designed and developed by the STIX project.

This tool is implemented in JavaScript and HTML by employing the NW.js application.

STIXViz is designed for the visual analysis of STIX reports in the node-link graph. In

this tool, multiple views such as graph, tree, and timeline are provided for visual analysis

of the STIX reports. Where the graph view provides a forced directed graph of STIX’s

components, tree view shows the STIX’s components in a tree structure, whereas the

timeline view displays time-stamped STIX components. The tree view of Point of Sale

(POS) APTs is shown in Figure 2.8.

25

Chapter 2. Background

Figure 2.8: POS STIX

Since we represent STIX reports as a graph-based structure, in SCERM we will vi-

sualize these structures throughout the thesis using STIXViz graphs.

26

Chapter 3

Related Work

3.1 Introduction

The research work shared in the thesis is innovative and comprises of closely-meshed

research disciplines. This work includes APTs analysis and classification, structured

threat data generation, boosting, evaluation, and refinement of CTI data for CTM. It

is therefore important that before sharing this state-of-the-art research work with the

reader, a brief description of the related works and their critical comparison with the

research work be presented, where appropriate.

3.2 Overview

Due to the novelty of this research work, there is a lack of literature directly related to

this domain. Research is being carried out on closely related domains. To understand

our research, it is necessary to grasp these associated research domains. Therefore,

various closely related research domains are thoroughly studied and are made part of

this thesis, as can be seen in Figure 3.1. For example, this research is mainly related to

APTs. Therefore, a section is provided to understand APTs, their TTPs, and analysis

models as shown in the first branch of the taxonomy tree shown in Figure 3.1. Similarly,

the second branch of this diagram shows the taxonomy of cyber threat data which is

necessarily important for APT analysis. Therefore, a section is added that describes the

importance and quality of structured CTI data. Moreover, this section shares publicly

available tools for the generation of structured CTI data. Next, in the third branch

27

Chapter 3. Related Work

of the figure, a brief overview of various standards and systems is presented which

are designed to assess the cyber threat preparation of an organization. Subsequently,

existing research contributions regarding cyber threat scoring and ranking are shared,

as shown in the fourth branch of the taxonomy tree. Likewise, the last branch of this

figure presents an overview of various security systems.

Figure 3.1: Overview of Related work

3.3 Advanced Persistence Threats

Presently, advanced cyber attacks are prolonged, customised, and targeted. These at-

tacks employ multiple malware and obfuscation techniques to avoid detection. In this

regard, various frameworks and models are proposed to understand advanced persis-

tent threats. However, there is a lack of work on advanced cyber threat prevention,

detection, and response. Details of various APTs analysis models, TTPs, and their hu-

man exploitation techniques are shared in the following subsections.

3.3.1 Models

Cyber threat analysis is the process in which an analyst studies various cyber attacks

and identifies their indicators for cyber threat prevention, detection, and response.

With the ever-increasing number of data breaches due to cyber-attacks, timely diag-

nosis of attack vectors is of paramount importance. In [69], the authors present a

28

Chapter 3. Related Work

framework to model APTs attack by using Intrusion Kill Chain (IKC) which is simi-

lar to the Lockheed Martin (LC) KC. The researchers in [70] classify APTs attack into

five different phases from malware delivery to data exfiltration. They do not discuss

the Reconnaissance and Weaponization phases of APTs.

In [71], the authors present the analysis of different attacks and on the basis of these,

they describe an attack process model. The model has eight different steps and some of

these are similar to CKC. The authors in [72] present a computer attack taxonomy that

has five components such as Target, Carrier, Vulnerability, Privilege Escalation and Firing

Source.

All of these research works study APTs from different angles. Therefore, to get

maximum benefits from these, it is required to use these works holistically. Accord-

ingly, these are combined for analysis, boosting, valuation, and refinement of APTs for

different phases of CTM.

3.3.2 Tactics, Techniques, and Procedures

Research shows that tactics and techniques in multiple APTs remain the same or used

with small changes. Therefore, if analysts know the general technique of APTs then

they can detect multiple APTs easily. The McAfee in [70] outlines that during the anal-

ysis of a single Command and Control (used by Operation Shady Rat) their researchers

have found a single organization that hacked almost 71 companies of 31 diverse indus-

tries of different countries. In [73], the researchers developed a technique to identify

the patterns in DNS to infer whether an attack is generated by an algorithm or by some

human beings. This technique can be employed to detect domain fluxing. The authors

in [74] provide a survey of various obfuscation techniques that are being employed by

APTs such as Dead-Code insertion, Subroutine Reordering, Register Reassignment, Instruc-

tion Substitution, Code Transposition, and Code Integration. Moreover, they also predict

future trends of obfuscation techniques such as JavaScript and Emulation of virtual pro-

cessors.

Eric et al. [39] present a layered taxonomy model to classify cyber threat sharing

platforms. The proposed model defines five layers namely Transport, Session, Indicators,

Intelligence and 5Ws. The Transport layer is the first layer, it provides communication

between different organizations to share the CTI data. The Session Layer is the second

29

Chapter 3. Related Work

layer of the model which provides authentication and authorization services. The third

layer is the Indicators layer that shares information about patterns or observables that

show the presence of the cyber attack within a network. The Intelligence layer is the

fourth layer that describes the COAs i.e. when and what to do. The topmost layer 5W’s

illustrates the actors, techniques, procedures, and victims. Moreover, authors map

information sharing technologies such as STIX, IODEF, and YARA to the proposed

taxonomy. They highlight that STIX has a broader range of terms like TTPs, Indicators,

and Course of Actions than others.

All of these research works highlight that TTPs are very important for attackers

because they invest greater time and money on them [26]. Therefore, for effective

cyber threat management, the prevention, detection, and response of the TTPs are

paramount important. Due to these facts, our research work considered TTPs as the

topmost indicator for the analysis and valuation of the APTs.

3.3.3 Advanced Persistence Threats Exploit Humans

Presently, attackers are extensively using social engineering techniques such as Emails,

Facebook, LinkedIn, and Blogs for the Reconnaissance and Delivery phases of a cyber

attack. In [75], the authors describe that social media is widely used for target re-

connaissance and the delivery of malware. They also present the taxonomy of social

engineering that classifies cyber attack characteristics and attack scenarios. In [76], the

author presents different techniques, which can be used to send malicious codes to

victim machines. Spear phishing and web-based click hijacking are mostly used for

malware delivery. The authors in [77] describe that the Reconnaissance and Delivery

phases of APTs are successful because of human manipulation. They highlight some of

the famous examples of APTs that uses human manipulation for delivery of the APTs

such as Stuxnet uses USBs; Dugu uses infected MS Word files via email; Red October

uses infected MS Word and Excel documents via spear-phishing; Operation Aurora uses

infected web sites; Operation Shady Rat uses infected MS Word, Excel and PDF documents

via spear phishing and RSA attacks uses MS Excel documents attachment within spear-

phishing emails.

All of the above research works describe that APTs are widely exploiting humans

through social engineering techniques. In our research, human aspects of the APT’s

30

Chapter 3. Related Work

indicators are particularly focused because of their paramount importance seen in the

research works.

3.4 Cyber Threat Data

Cyber threat data provides information regarding context, tactics, techniques and pro-

cedures, TTPs, indicators, impact, and remedial actions of cyber attacks. For example,

IPs, domain names, hash values, filename, registry entries, protocols used, obfuscation meth-

ods, and TTPs. This data is used for the prevention, detection, and response of cyber

attacks. Although, a large amount of cyber threat data is publicly available, however,

most of the data is unstructured and distributed which cannot be read by machines

and humans as well. Due to the large volume and unstructured nature, analyzing this

information about cyber incidents is a challenging task for security analysts. In sub-

sections, first of all, a brief description of different structuring techniques are shared.

Then, various solutions of structured CTI generations are described. Finally, state of

CTI data is presented.

3.4.1 Structuring of Cyber Threat Data

Multiple efforts are being carried out for expressing non-structured information into

the structured and machine-understandable format. In this regard, few efforts are

made for structuring and sharing of CTI data. IBM X-Force [22] and National Vul-

nerability Database (NVD) [48] provide an XML feed that gives information regarding

cyber-attacks and vulnerabilities with the diverse degree of details. There exist mul-

tiple standards of threat information exchange such as CIF, IODF by IBM, CRITS by

Community, OPEN IOC, STIX, TAXII, and Cybox.

Furthermore, researchers decompose multiple CTI structuring formats such as

IODEF, STIX [28] and YARA [29] according to the various layers of the proposed taxon-

omy model [39] and explore interoperability between these. Moreover, they conclude

that STIX is a promising standard, which provides broader concepts of CTM namely

the TTPs, exploits, indicators, observables, COAs, threat actors and incidents. Clemens et

al. [31] conducted a survey to examine cyber threat intelligence platforms. During the

study, they analyze, compare 22 threat intelligence platforms and identify STIX as a

31

Chapter 3. Related Work

de-facto standard, which not only structures CTI data but also provides visualization

and analysis capabilities.

Sara et al. [78] present a cyber threat analytic platform called STIX Analyzer, which

is built on Web Ontology Language - OWL ontology, CVEs, CyboX and STIX. STIX Ana-

lyzer is developed to analyze cyber attacks. It acquires STIX repositories, extracts STIX

components: TTPs, exploits, indicators, observables, incidents and populates the ontol-

ogy. Then it performs inferencing by employing Semantic web Rule Language - SWRL to

identify the exploits and to perform risk analysis within the network.

Although multiple efforts are being carried out for the structuring of CTI data

which are competing and have many things in common. However, research shows

that STIX and YARA are most prominent. We have also chosen to formalise the STIX

format for our research because of its popularity.

3.4.2 Structured Threat Data Generation

The manual process of sifting through tons of log data to pinpoint APTs tactics and

techniques is a challenging job. Efforts are required for the automatic detection of APTs

techniques. Accordingly, several tools are developed for expressing unstructured CTI

into the structured and machine-understandable format. A few of these tools are as

follows.

3.4.2.1 STIX Data Generator

STIX Data Generator (SDG) [79] is developed by the Cosive team, which offers “Ran-

dom and Selected” modes for STIX generation. In both of these modes, SDG does not

take CTI data from the user for STIX generation but uses test data only. Moreover,

if we select a single CTI parameter (URL) or several parameters (URL, IP, DNs), the

generated STIX remains the same.

3.4.2.2 Python-STIX Library

The Python-STIX library [80] has been developed by the MITRE Corporation, which

provides an API for the creation and parsing of a STIX XML report. It is a console-

based solution, which requires programmer level expertise for data entry and STIX

32

Chapter 3. Related Work

generation. These requirements limit the utilization of STIX and due to the manual

process, there are always chances of errors in generated STIXs.

3.4.2.3 IBM X-Force Exchange

IBM has a CTI platform by the name of “X-Force Exchange” [22] which allows organi-

zations to consume and share threat intelligence, and get benefit from the contributions

of IBM’s experts. It provides CTI in textual as well as in STIX format. Besides human

aid, it is also supported by machine-generated CTI. It provides a free API, which gives

limited programmatic access for non-commercial use.

Presently, there is a lack of easy to use frameworks, which produce and share

distinct, error-free, and threat relevant CTI data in a structured form. All of these

problems are the motivation for our research work. Therefore, we developed a sub-

framework that structures CTI data in STIX format.

3.4.3 Cyber Threat Intelligence Quality Testing

In [81], researchers introduce an Intelligence Quotient Test tiq-test, which employs mul-

tiple tests to measure the novelty, life span, population and uniqueness of the CTI data.

Where the Novelty test details how often a threat feed updates itself, the Aging test mea-

sures the life span of the indicator on the feed i.e. how long an indicator stays on a feed.

The Population test guides the user on how the population distribution of the CTI feed

compares with the user’s data. The Uniqueness test highlights how many unique indi-

cators are present on a CTI feed. The Overlap test checks how many threat indicators

are repeated on different threat intelligence feeds.

Roland et al. [82] present FeedRank, an algorithm for the ranking of cyber threat in-

telligence feeds (CTIF). The FeedRank valuates the CTIFs according to the novelty of

their provided information and the reuse of their contents by other CTIFs. It performs

the temporal correlation of the feed’s contents to identify the dishonest feeds among

the real feeds. In [83], the authors use a triangulation study for analysis and classifi-

cation of cyber threat data sources. This study comprises of literature review, a quan-

titative analysis of expert-level discussion on Twitter, and a data sources survey. In

total, 68 publicly available cyber threat data sources are analyzed and classified based

on Information type, Timelines, Integrability, Originality, Type of source, and Trustworthi-

33

Chapter 3. Related Work

ness. In [84], the authors proposed a framework namely Enhanced Cyber Attribution

Framework (NEON). At first, NEON gets cyber threat data from various sources such

as security blogs, social media, honey pots, incident detection systems, and network

forensics. Then, it correlates the input data. Subsequently, NEON employs a game

theory approach to propose optimal security response.

These efforts describe several characteristics of the CTI data and compare different

threat feeds, however, these do not discuss the valuation of CTI data for CTM. On the

other hand, our research not only explores several traits of CTI data but it also valuates

the structured CTI data for CTM.

3.5 Cyber Preparation Assessment

The cyber threat landscape is continuously changing. Attackers are using new tactics

and techniques, which enable them to target a wide range of organizations within and

across the borders. Therefore, organizations are being required to define their strate-

gies for cyber threat management. Cyber Prep 2.0 [85] is a threat oriented methodology

presented by MITRE Corporation to identify the threat levels faced by an organiza-

tion. It defines five classes of cyber threats that are formulated on the attacker’s inten-

tion such as cyber Vandalism, Incursion, Breach, Organizational Disruption, Espionage and

Cyber Supported Strategic Extended Disruption. Similarly, it describes five correspond-

ing classes of organizational preparation according to the expertise of an attacker such

as inexperienced, average resourced, experienced and well-resourced attacker. These classes

guide an organization to prepare its business risk management framework, define its

cybersecurity methodology and designate inconsistency between its risk management

framework and methodologies.

Mark et al. [86] present an Operational Threat Assessment framework called OTA.

It describes the process of collecting information regarding the system under assess-

ment from classified and open-source documents. Afterwards, it identifies threats and

vulnerabilities of the system. Subsequently, it confirms remedial actions accordingly.

The OTA framework employs a generic threat matrix (GTM) to identify the threat level

faced by an organization. The GTM defines two types of threat attributes such as com-

mitment and resource. The commitment attributes describe the threat such as threat

34

Chapter 3. Related Work

intensity, stealth, and duration, while the resource attributes define people, knowledge,

and access. These attributes represent eight different threat levels i.e. 1 to 8, which de-

scribe dangerous to the least capable threat, in sequence. In [87], researchers present a

Cyber Threat Intelligence capability model (CTI-CM). This model describes various ca-

pabilities required for cyber threat experts such as analytical component capability (ACC),

contextual response capability (CRC), and experiential practice (EPC). The ACC is associ-

ated with the management of the analytical aspects of CTI. The CRC is related to man-

aging business and security to respond to APTs. Whereas, the EPC is a capability that

belongs to solutions formulation.

Anoop and Ximming [88] present a security risk analysis model of enterprise net-

works by using probabilistic attack graphs. This model describes how several vul-

nerabilities may be clustered to attack a network. It measures the security risk of the

enterprise network by using the common vulnerabilities score CVSS [89]. At first, it

accumulates the vulnerabilities by using a probabilistic attack graph, which represents

all the attack paths that allow network penetration, then it propagates the possibility

of a cyber attack through the graph. Using this attack graph technique, the security

assessment can cause a high degree of complexity if the network is too large. Another

limitation of the proposed model is that since it solely relies on the availability of vul-

nerability scores (CVSS), if a zero-day vulnerability is employed than the assessment

will fail. In contrast, in our research, the refinement phase will discover appropriate

preventive actions from similar attacks that have been seen before even if the informa-

tion about the particular vulnerabilities does not exist in the report.

By comparing Cyber Prep, OTA and our research, it can be identified that Cyber

Prep employs qualitative metrics, only, while OTA and our research use qualitative, as

well as quantitative metrics. The OTA identifies threats faced by an organization and

proposed remedies accordingly, however, it does not valuate the CTI data for CTM.

Whereas, our research performs valuation and refinement of CTI data for different

phases of CTM.

35

Chapter 3. Related Work

3.6 Machine learning based systems

In [81], researchers introduce an Machine Learning based Security project (MLSec),

which employs machine learning techniques to measure the novelty, life span, popula-

tion and uniqueness of the CTI data. These tests are written in the R language. MLSec

takes low-level artifacts such as IP and Domain Names in structured format that is .csv.

In [90], researchers present a supervised machine learning (SML) approach for au-

tomatic extraction of high-Level threat intelligence from unstructured data sources. It

uses Natural Language Processing (NLP) based learning of a Named Entity Recogni-

tion (NER) model for extraction of high-level CTI data from the textual content. Sub-

sequently, the proposed solution removes data redundancy and provides CTI in STIX

format. Moreover, it ranks the text sources according to the novelty and quality of their

shared data.

In [91], the authors propose a machine learning based frame work namely Data

Breach Investigation Framework (DBIF). It detects cyber attacks on the basis of identified

cyber threat indicators. DBIF receives cyber threat investigation reports as input, in-

dexes them, and prepares a TTP dictionary. Afterwards, the DBIF framework is trained

on extracted TTPs for detection of cyber attacks.

In [92], researchers share a framework called Artificial Intelligence (AI) based Cyber

Threat Framework which is designed to detect AI-based cyber attack. This framework

is based on the Cyber Kill Chain (CKC) that is employed to understand various cyber-

attacks and to opt multiple defensive strategies. It divides CKC stages into three phases

namely Planning, Intrusion, and Execution. The Planning phase consists of the first two

stages of CKC namely Reconnaissance and Weaponization. This phase is responsible for

target research and to identify weaponizing deliverable. Intrusion is the second phase

of the proposed framework that consists of three stages of the CKC namely Delivery,

Exploitation, and Installation. This phase describes the exploitation, delivery, and instal-

lation of the malicious code. Whereas, the Execution phase comprises of Command and

Control and Exfiltration stages of the CKC. This phase details the paths and objectives

of the threat actor.

These efforts describe several characteristics of the CTI data, however these do not

address the valuation and refinement of CTI data to a sufficient extent for cyber threat

management. On the other hand, this research work not only explores several traits of

36

Chapter 3. Related Work

CTI data but it also valuates and refines the structured data for cyber threat manage-

ment.

3.7 Cyber Threat Scoring System

Peter et al. present a Common Vulnerability Scoring System [89] to measure the risk

associated with computer vulnerabilities. It is a comprehensive system, which consists

of the base, the temporal and the environmental metric groups. Where the bases metric

group describes vulnerability’s inheritance characteristics, the temporal metric group

describes such characteristics of vulnerabilities that change with respect to the time,

whereas the environmental metric explains such characteristics of the vulnerabilities

which change with respect to the environment.

TISA [93] is a scoring and analysis model for threat intelligence, which uses natural

language processing (NLP) and machine learning techniques for scoring and analysis

of threats. It is designed to identify and prioritize the CTI indicators. Similarly, the

Common Weakness Scoring System (CWSS) [94] is a community-based effort, which is

designed to identify the weaknesses of a software. CWSS is very simple in operation.

At first, it offers quantitative measurements of the software’s weaknesses and then

prioritizes them. CWSS is different from CVSS in many aspects one of which is the

usage scenarios. For example, CVSS is used to assess already identified vulnerabilities,

whereas CWSS can be used earlier.

All of these efforts are aimed to assess the risk associated with software vulnerabil-

ities. Whereas our research work assesses the structured data and valuates it for CTM

life-cycle, which improves cyber threat prevention, detection and response results.

3.8 Graph-Based Ranking Systems

Hassan and Lise [95] present the FutureRank algorithm [95] for future citation calcu-

lation of research articles. This algorithm combines information regarding authors,

publications and citations for predicting the future ranking of scientific articles. The

FutureRank algorithm is based on a number of assumptions. For example, important

publications are cited by other important publications, authors with high repute pro-

37

Chapter 3. Related Work

duce high-quality publications, recently published publications will be cited more in

the future and among old publications, and newly cited publications are more useful.

The arXiv (High Energy Physics Theory (hep-th) from 1993 to 2003) dataset is used to

evaluate the FutureRank algorithm. Lawrence et al. [96] share a web page ranking tech-

nique called PageRank which is based on the graph of web lines without considering

the contents of the actual web pages. Web pages have forward links (out-edges) and

backward links (in-edges) for other web pages. The proposed algorithm is based on

the assumption that the rank of a web page is high if the sum of its backlinks (in-edges)

is high and backlinks from important web pages are more vital than average or normal

web pages’ backlinks.

Wenzheng et al. [97] present a structural diversity model to find the most persuasive

users. Social networks are the most economical and rapid way of marketing. The com-

pany selects the most influential users on the social network and gives them product

samples without any cost. These users then endorse the offered product to their social

media friends. According to the proposed model, a user is more likely to accept a prod-

uct recommendation if more of his friends with diverse contexts suggest the product to

him. For the evaluation of their proposed model, they use datasets from four real social

networks namely NetHEPT - arXiv High Energy Physics theory section, NetPHY - arXiv

Physics section, Facebook - Online social network and DBLP - Computer Science Bibliography.

In another similar work, Wang et al. [98] present a conformity based model to find the

top-k most persuasive users. This model is based on emotional conformity i.e. during

retweet, how much a user follows the original user from the emotional point of view.

The sentiments are expressed -1 or negative or opposite to original user, 0 or neutral and

1 or positive sentiment or same sentiment with the original user. To evaluate the proposed

model, a dataset from a famous Chinese social media platform is collected.

Zahid et al. [99] present a dynamic cybersecurity solution for a power grid. A power

grid is a network, which delivers electricity from the power station to the consumers.

As these systems have modernized and computer networks have become core compo-

nents of these, hence their security has become critical. Although the perfect security

of these systems is ideal, however, it is not possible due to budgetary constraints. The

solution is proposed to figure out the spending on the security devices that are most

critical. The proposed solution takes electrical network configurations with budgetary

38

Chapter 3. Related Work

constraints and security schemes as an input, identifies the critical devices and selects

the best scheme for maximum security.

Abel and Allan [100] present a systematic review to estimate the use of Open Source

Intelligence (OSINT) to identify the threats and exploits on social networks for reme-

dial purposes. They retrieved eighteen research papers and reviewed them. Eleven

out of eighteen papers, quantitatively recognized social media vulnerabilities because

of user’s ignorance, while three studies qualitatively identified a small set of Person-

ally Identifiable Information (PII) that users require to provide for social media inter-

actions.

The above-stated ranking systems generally talk about the ranking of published pa-

pers, web sites, and social media users but none of these consider CTI data refinement

and valuation for different phases of CTM, which is the core theme of our research.

3.9 Reputation-Based Security Systems

According to the McAfee corporation [101], nowadays organizations are relying on

reputation-based security systems. These systems provide reputation scores for dy-

namic policy decisions. Traditional security systems are static: whitelist and blacklist

systems, whereas reputation-based systems learn and update the reputational score of

the indicators or observables with the time. System confidence is built through data vol-

ume, longevity, and trustworthiness.

Tayson [102] proposes a reputation-based security system. The central entity In-

telligence Head Quarter - IHQ receives raw cyber threat data from multiple sensors. It

combines and processes the input data, prepares a threat list and then shares it with

all the sensors, which collects cyber data accordingly. The collected data may have

IPs, ASNs, Ports, DN, CIDR blocks, and Payload. Subsequently, IHQ gets collected data

from the sensors, processes it and computes a reputation score for indicators or observ-

ables. Afterwards, the IHQ prepares an updated threat list and shares it with sensors

for further data collection. This data cycle between sensors and the IHQ repeats, which

provides system maturity, anomalies and attack detection on the basis of regular pat-

terns.

Allan and Christopher [103] present a system TIC, which receives threat indicators

39

Chapter 3. Related Work

from the data source and shares it through a graphical user interface with the cyber

threat analyst. The analyst evaluates the provided indicator on the basis of several char-

acteristics and calculates the TIC score accordingly. Then, the computed TIC score is

shared with the data source and saved into a TIC server for future processing. All of

these systems perform CTI data evaluation on the basis of indicators’ occurrence and

do not consider the efficacy of indicators for CTM.

In contrast to these systems, our solution evaluates indicators on the basis of their

efficacy and ranks them according to the POP model and the STIX use case on manag-

ing cyber threat response activities.

3.10 Inference or Ontology-Based Security Systems

Few efforts has been made on ontologies and are generally based on the representation

of the cyber-attacks attributes in a taxonomical structure. In [104], the authors suggest

countermeasures based on the cost of the metrics. The researchers in [105] describe

nine different metrics such as Input Validation, Authentication, Authorization, Configura-

tion and Installation, Sensitive Data, Session Management, Cryptography, Exception Manage-

ment, Auditing, and Logging. They suggested a metric based model for malware clas-

sification. In [106], the authors present an ontology-based framework for cyber threat

analysis. Initially, SWRL rules are written for verification of the proposed framework

in the domain of digital banking. After that, the authors implemented a java-based in-

ference engine to enhance the performance. The researchers believe that the proposed

framework can be employed in business and commercial operations. The paper [107]

is an extension of the author’s previous work [105]. The research is mostly focused on

extracted metrics, attacks against these and countermeasures to prevent these attacks.

The authors in [108] present a model that takes security logs as input and employs

storytelling techniques to generate cyber threat reports. This model comprises of four

layers namely Preprocessing, Extraction, Inference, and Storytelling. The Preprocessing

layer takes log messages as input and parse them. Then, the Extraction layer extracts

date, time, IPs, and port numbers. After that, the Inference layer employs snort rules

to identify the TA and aim of the cyber attack. Subsequently, the Storytelling layer

generates the story of the cyber attack from the above extracted CTI data.

40

Chapter 3. Related Work

The researchers in [109] present a CTI exchange framework that employs blockchain,

semantic web technologies, and STIX standard. It defines the different roles of partic-

ipants such as producers, consumers, and owners. Then, it assigns incentives for each

aforesaid role. The proposed framework is a smart marketplace that defines CTI data

as digital asset. This marketplace incentivize the shared CTI data by its reasoning ca-

pabilities, varying from the participants’ role to the inference of new CTI data.

In [110], the researchers present the idea of extracting security concepts from the

text, compare these with monitoring sensors logs and then generate security alerts with

the help of reasoner. To the best of our knowledge, the idea of using heterogeneous

sources (txt and IDS logs) is a worthy solution, although ontology (taken from [111]) is

a very basic and does not give a holistic view of an attack. The authors in [27] present

a framework for the extraction of vulnerability and cyber-attack related information

from web text and then compare these with Wikitology. A model is proposed in [112],

which takes unstructured text as an input, automatically extract the entities and con-

cepts from it and then passes these to the DBpedia spotlight. At DBpedia, these concepts

matched and assigned corresponding class values. The authors in [113] present a Max-

imum Entropy model for automatic labeling of text.

In [114], the authors propose a cyber-attack analysis model that groups the cyber

attacks based on infringement information such as time, Command and Control IPs, pro-

tocols, exploit site, malware, distribution site, attack vulnerability, domain names, files names,

registry entries, strings, API sequences, and services names used by malicious codes.

The authors in [115] present a threat intelligence system to learn attack patterns and

TA behaviors. The propose system is evaluated by employing several techniques such

as cloud-based honeypots called Kippo, Elasticsearch stack, and Kibana. The Kippo is used

for the collection of various events logs. The Elastic stack is employed for cyber threat

event search. Whereas, the Kibana is an open-source CTI visualization dashboard for

the Elasticsearch. In the proposed system, several cyber attack events are identified

such as Root trying auth none, Root trying auth password, Root failed with a password, Login

attempt failed, Channel open failed, Root authenticated with a password, Connection Lost, and

Unauthorized login.

All aforesaid works are a worthy contribution for point of data retrieval and these

efforts are complementary for our work.

41

Chapter 3. Related Work

3.11 Conclusion

In the literature review, a case is prepared that APTs are a complex cyber attack. It is ob-

served that although a massive volume of CTI data is publicly available, however, most

of the data have quality issues. Hence, APTs analysis is a challenging task. Although

tools are publicly available to generate structured CTI data, however, their produced

data is redundant, erroneous, threat-irrelevant, and does not follow threat analysis

models properly, especially that are related to CKC and POP. All of these issues be-

come a motivation for our research. During the literature review, the STIX format is

selected for the analysis of structured CTI data. Then, a tool namely STIXGEN is de-

veloped to generate error-free and threat-relevant structured CTI data in STIX format.

Subsequently, it is felt that most of the CTI data is not suitable for different phases of

CTM. Therefore, a sub-framework called SCERM is developed, which boosts, refines,

and valuates structured CTI data for the detection, prevention, and response phases

of CTM. Afterwards, it is studied that ontological modeling is an appropriate way for

the analysis of domain knowledge. Therefore, a combined ontology of CKC and POP

is developed for APTs analysis and effective CTM.

42

Chapter 4

Automatic Generation of Structured

Threat Data

4.1 Introduction

Presently, a large number of CTI data is publically available regarding APTs. How-

ever, due to the large volume and distributed nature of the data, the identification and

collection of the data for CTM is challenging. It is observed during the research that

APTs launched against an organization subsequently succeeded with high probability

against other similar organizations. Therefore, it is the need of the time that organiza-

tions compile and share CTI data with peers in a structured form for timely prevention,

detection, and the response of a cyber attack. Ironically, publically available solutions

of the structure data generation are manual and produce erroneous and redundant

CTI data, most of the time. To overcome these problems, this chapter presents a sub-

framework namely STIXGEN which takes CTI data as input and produces properly

labeled, error-free, and threat relevant structured threat data for CTM. In this regard,

the “Structured Threat Information eXpression (STIX)” format is used which is a com-

prehensive effort.

4.2 Research Approach and Contributions

We take all the aforesaid deficits as a barrier in structured data utilization and these

shortcomings have become a motivation for our research work. We designed and de-

43

Chapter 4. Automatic Generation of Structured Threat Data

veloped a prototype of the STIXGEN to overcome the issues of CTI collection, struc-

turing and sharing. We developed prototype of STIXGEN framework as lightweight

application using Microsoft Visual Basic.Net and Microsoft Access 2010 database. It

takes CTI data as an input and generates STIX report as an output. In the following

paragraph, the methodology of the STIXGEN sub-framework is presented in detail.

We not only proposed the STIXGEN sub-framework for structured threat generation

but also developed its prototype for a proof of concept.

4.3 STIXGEN System Model

Our methodology aims to develop a sub-framework for generation of error-free and

threat relevant STIX reports. During our literature review, we have found that a large

volume of CTI is available, but it is mostly unstructured. A few efforts like Open

IOC [53] and STIX are made towards the standardization of cyber threat data by gov-

ernments but are slow in adoption. Among these, we found STIX a comprehensive

one. We surveyed different security blogs, gathered STIXs and checked their quality.

We found that publically available STIXs are few and have erroneous and incomplete

information. Therefore threat analysts hesitate to use threat data. Our proposed sub-

framework STIXGEN generates threat-relevant, properly placed and error-free struc-

tured data. Therefore, we feel that it will increase the user confidence over structured

CTI data, hence the quality and usage of structured CTI data for the CTM will be in-

creased.

To describe our proposed sub-framework, we have selected well-known family of

APT i.e. Retail industry APTs [116]. According to the Illusive Networks [117], global

retail industry makes about $20 trillion sales per year through millions of dollars from

online and credit card based payment methods. This large annual revenue makes the

retail industry attractive to an attacker. The detailed description of our proposed sub-

framework and its prototype is presented in the following section.

44

Chapter 4. Automatic Generation of Structured Threat Data

4.4 STIXGEN Design and Architecture

The design and architecture of STIXGEN revolves around the STIX standard, as shown

in Figure 4.1.

Figure 4.1: STIXGEN Flow Diagram

The threat analyst gets APTs data related to different STIX components namely

campaign, TTPs, indicators, observables, incidents, COAs, exploits, TAs and feeds

them into a database. The important entities of the STIX schema have been highlighted

in Figure 4.2. Owing to the STIX requirements, we have created separate tables for each

STIX component. The STIX encoder retrieves CTI data from the database, encodes it

according to the STIX standard and generates a STIX report accordingly, which can

be further shared with peer organizations for cyber threat prevention, detection and

response.

Figure 4.2: STIXGEN’s Database Schema

45

Chapter 4. Automatic Generation of Structured Threat Data

4.5 Case Study

A case study is provided for a better understanding of STIXGEN with a real-world

example. For this purpose, we have selected well-known APTs of the retail industry. At

first, we will briefly describe the retail industry’s APTs, then we will describe how the

user feeds CTI data in STIXGEN and generates STIX reports. Subsequently, analysis of

the Generated STIX will be shared.

4.5.1 Retail Industry - APTs Selection

The retail industry comprises of individuals and companies involved in the selling

of goods and services to the end-users. Earlier, a cash register was used for record-

keeping, which has been replaced by an electronic device such as “Point of Sale (POS)

terminal”. These terminals are being used by for the payments of goods through credit

and debit cards. The POS system gets the user’s financial data from credit and debit

cards, and saves it in a central server. POS APTs are launched to steal the user’s finan-

cial data from the POS terminals and the central servers. POS APTs have more than

a dozen variants [116]. We selected some of these variants to describe the working of

STIXGEN. The detailed description of the STIXGEN sub-framework and its prototype

is presented in the ensuing sections.

4.5.2 Data Entry

First of all, a threat analyst scans different security blogs to gather CTI data related to

renowned POS APTs such as Alina [118], JackPOS [119], BackOff POS [18], CenterPOS

[120], and ProPOS [121]. After data collection, threat analyst extracts CTI data related

to STIX components from security blogs and feeds it into the database through an entry

form. The part of CTI data related to the Backoff APT collected from three different

security blogs namely SecureBox, Symantec, and RSA can be seen in Figure 4.3. It can

be identified that the SecureBox provides information regarding Campaign and TTPs

only. Whereas, the Symantec and RSA share indicator information.

46

Chapter 4. Automatic Generation of Structured Threat Data

Figure 4.3: Backoff APT and Security Blogs

4.5.3 STIX Encoder

STIX Encoder is the heart of STIXGEN. It retrieves CTI data from the database, pro-

cesses the information and generates a STIX report. The part of the STIX encoding

algorithm can be seen in algorithm: 4.1.

STIX Encoder performs the following operations:

1. First, it adds the header information (including the namespaces).

2. Then, the encoder connects to the database, as can be seen in line 5.

3. Next, it read the Campaign table, retrieves the campaign’s ID and title from the

campaign table that can seen in line number 7 to 10.

4. Accordingly, it fetches CTI data namely TTPs, indicators, incidents, TAs, observ-

ables, exploits and COAs from their corresponding tables and adds it in STIX

report, as can be seen in line 12 to 44.

5. Similarly, for the next campaign the encoder repeats step 3 and 4.

6. This process keeps going until all campaigns are processed and final STIX report

is generated which can be seen in line 47.

7. In the end, database is closed, as can be seen in line 49.

In this way, a combined STIX of the POS family having five different APTs is gen-

erated through STIXGEN. Next section provides analysis of geerated STIX.

47

Chapter 4. Automatic Generation of Structured Threat Data

Algorithm 4.1 : STIX Generation.1: Input : CTIData2: Output : STIXReport3: . Connect to Database4: Connect(DB)5: if DatabaseConnection ≡ successful then6: Read(CampaignTable)7: for all RecordofCampaign Table do8: CampaignID ≡ Campaign Table.CampaignID9: CampaignT itle ≡ Campaign Table.CampaignName10: . Adding TTP details11: for all RecordofTTPT able do12: if CampaignID ≡ TTP Table.CampaignID then13: WriteInStix(TTP Table.TTP Name)14: . Add related Exploits and COAs15: WriteInStix(Exploit Table.Exploit V alue)16: WriteInStix(COA Table.COA Name)17: end if18: end for19: . Adding Indicator details20: for all RecordofIndicator Table do21: if (CampaignID ≡ Indicator Table.CampaignID) then22: WriteInStix(Indicator Table.IndicatorName)23: WriteInStix(Indicator Table.IndicatorV alue024: . : Add related TTPs, COAs and Observables25: WriteInStix(TTP Table.TTP V alue)26: WriteInStix(COA Table.COA Name);27: WriteInStix(Observable Table.Observable Name)28: end if29: end for30: . Adding Incident details31: for all RecordofIncidentT able do32: if (CampaignID ≡ Incident Table.CampaignID) then33: WriteInStix(Incident Table.Incident Name)34: WriteInStix(Incident Table.Incident V alue)35: end if36: end for37: . Adding ThreatActor details38: for all RecordofThreatActor Table do39: if CampaignID ≡ ThreatActor Table.CampaignID then40: WriteInStix(ThreatActor Table.ThreatActor Name)41: end if42: end for43: end for44: end if45: . Generating STIX Report46: Generate(STIXReport)47: . Closing Database48: Close(DB)

48

Chapter 4. Automatic Generation of Structured Threat Data

4.5.4 Analysis of the Generated STIX

A STIXViz snapshot of the generated STIX can be seen in Figure 4.4. In the figure, five

different POS APTs namely Alina POS, JackPOS, BackOff POS, CenterPOS, and ProPOS

can be seen from left to right. Analysis details of aforesaid APTs are provided in the

following subsections while their comparisons are provided in the next section.

Figure 4.4: POS STIX : POS’s STIX Report generated by STIXGEN

4.5.4.1 Alina POS APT

Figure 4.5 provides a close snapshot of the Alina APT. This APT is publically disclosed

in May 2013. In this APT, attackers generally access the target system through Remote

Desktop Login and install the malware. After installation, it identifies desired processes,

scans their memory, and gets payment card data. Afterwards, it encrypts the extracted

data by using the XORing function and then transmits it to the Command and Control

server via HTTP Post. It is believed that this APT is launched by Black Atlas Operation’s

actors against several bars and restaurants in the US.

Figure 4.5: Alina POS APT

49

Chapter 4. Automatic Generation of Structured Threat Data

4.5.4.2 JackPOS APT

A zoomed-in snapshot of the JackPOS APT STIX can be seen in Figure 4.6. JackPOS is

generally installed through Fake Java Update. Like Alina, this APT employs the Memory

Scrapping technique for data stealing. It performs Base64 encoding on the stolen data

and then transmits it to the Command and Control server by using the HTTP Post. It is

launched against several countries such as the US, India, and Spain.

Figure 4.6: JackPOS APT

4.5.4.3 BackOff POS APT

A closed snapshot of the BackOff POS APT is shown in Figure 4.7. This APT is iden-

tified first time in July 2014. The actor behind this APT uses the Remote Desktop Ap-

plications and the Brute-force login techniques for its delivery. Moreover, it employs

the Memory Scrapping and Key-Logging techniques for data extraction. This APT com-

promised more than 1000 business in the US including Target stores [122] and it stole

millions of users’ personnel data. Furthermore, this APT employs the RC4 and the

Base64 encoding to obscure the stolen data.

4.5.4.4 CenterPOS APT

Figure 4.8 shares a closed STIXViz snapshot of the CenterPOS APT. This APT is dis-

covered in Sep 2015. Like its predecessor, it employs the Memory Scrapping technique

for data stealing. It uses the HTTP protocol for data exfiltration. This APT employs

the Triple-DES standard to encrypt the stolen data. It is supposed that the CenterPOS is

launched by the actors of the Black Atlas Operation against several countries.

50

Chapter 4. Automatic Generation of Structured Threat Data

Figure 4.7: BackOff POS APT

Figure 4.8: CenterPOS APT

4.5.4.5 ProPOS APT

A zoomed-in snapshot of the ProPOS APT’s STIX is shared in Figure 4.9. It is discov-

ered in Dec 2015. This APT employs the Memory Scrapping technique for data stealing.

ProPOS performs the Base64 encoding and the XORring technique to obscure the stolen

data.

Figure 4.9: ProPOS APT

51

Chapter 4. Automatic Generation of Structured Threat Data

4.5.5 Comparison of the POS APTs

In this section, a detailed comparison of the Alina, JackPOS, BackOff POS, CenterPOS,

and ProPOS APTs is shared. These APTs are correlated in multiple ways and STIXViz

screenshots of each scenario are shared to justify the reader why all of these APTs

are kept under the common umbrella of a single-family. Details are provided in the

following subsections.

4.5.5.1 Tactics Techniques and Procedures

Generally, POS APTs employs several techniques to steal user data such as the Mem-

ory Scrapping, Key Logging, Network Sniffing, and Cameras. It can be identified from the

Figure 4.10 that all selected POS APTs namely Alina POS, JackPOS, BackOff POS, Cen-

terPOS, and ProPOS employ the Memory Scrapping technique for data stealing. The

BackOff POS APT is one that additionally employs the Key Logging technique. There-

fore, it can be inferred that aforesaid APTs belong to the same family.

Figure 4.10: TTP employed

4.5.5.2 Protocol Analysis

It is learned through various security blogs that POS APTs normally uses the HTTP

POST and Get, FTP, and DNS for the exfiltration of stolen data to Command and Con-

trol servers. Figure 4.11 highlights that HTTP POST is being employed by all of the

five aforesaid APTs.

52

Chapter 4. Automatic Generation of Structured Threat Data

Figure 4.11: Protocol Employed

4.5.5.3 Operating System Analysis

It is identified through literature that POS terminals run on different variants of the

Unix and the Windows operating system. It is also studied that the development and

maintenance of POS applications for various variants of the Windows OS are easy

as compared to Unix. Naturally, there are more Windows-based POS terminals than

Unix. This also means that Windows-based POS devices unavoidably attract the cyber

criminals. This assumption can be verified from the generated STIX as can be seen in

Figure 4.12. This figure highlights that all the aforementioned APTs are designed to

target Windows-based terminals.

Figure 4.12: Operating System Employed

4.5.5.4 Folder Analysis

APTs create folders on the victim machine for their installation and temporary storage

of stolen information. Figure 4.13 indicates that selected POS APTs use the same folder

for the installation and data storage.

53

Chapter 4. Automatic Generation of Structured Threat Data

Figure 4.13: Folder Path Employed

4.5.5.5 Encryption Analysis

Generally, the retail industry attacker employs multiple techniques to obscure the

stolen data. An earlier version of POS APTs employs simple obfuscation techniques

such as the XORing and the Base64 encoding. Whereas, the recent APTs use encryption

techniques such as the RC4 and the DES. It can be identified through Figure 4.14 that

Alina and JackPOS are earlier POS variants that employ XORing and Base64 encoding.

Whereas, the BackOff APT is a middle-age APT which employs the RC4 encryption

technique. Similarly, the CenterPOS is a recent APT that employs the Triple-DES to

obscure the stolen data.

Figure 4.14: Encryption Evolution

4.5.5.6 Comparison Outcomes

Correlation results of the aforesaid APTs are shown in Table 4.1. It can be noted from

correlation results of TTPs, protocols, OS used, and folder created that POS APTs namely

JackPOS, BackOff POS, CenterPOS, and ProPOS are various variants of the Alina POS

APT.

54

Chapter 4. Automatic Generation of Structured Threat Data

Table 4.1: Comparison of APTs

APT TTP Protocol OS Folder EncryptionAlina Memory Scrapping HTTP POST Windows %APPDATA% XOR KeyJackPOS Memory Scrapping HTTP POST Windows %APPDATA% Base64

BackOff Memory Scrapping,Key Logging HTTP POST, FTP Windows %APPDATA% RC4

CenterPOS Memory Scrapping HTTP POST Windows %APPDATA% Triple DESProPOS Memory Scrapping HTTP POST Windows %APPDATA% RC4 and XOR Key

It can be further affirmed through, encryption analysis results that selected APTs

are belong to the same family and Alina is the predecessor of the remaining four APTs.

Therefore, it can be concluded that the structuring of CTI data is the best way for the

classification of APTs.

4.5.5.7 Cyber Threat Management through Observables

Multiple observables cab be identified in Figure 4.15 such as Protocols: HTTP and FTP,

Domain Names: Jackkk[.]com and Sobra[.]ws, Files: Epson.exe, Wnhelp.exe, javaw.exe,

NTProvider.exe, windefender.exe, and driver.sys, and APIs: Process32First() and Pro-

cess32Next().

Figure 4.15: Observables for CTM

These observables can be employed for cyber threat prevention, detection, and re-

sponse phases of the CTM. For example, the detection team can monitor outbound

HTTP traffic to check if some data is being stolen. The prevention team can place

the Command and Control’s domain names under observation through the firewall to

check if the machine tries to connect to the Command and Control. Similarly, files and

folder names can be added in the Antivirus software to block any POS attack.

55

Chapter 4. Automatic Generation of Structured Threat Data

4.6 STIXGEN Evaluation

The evaluation of the STIXGEN sub-framework is based on accuracy and effectiveness.

At first, we started by collecting a variety of text-based threat reports, generated their

STIXs via state-of-the-art IBM X-Force Exchange tool and by using STIXGEN proto-

type. Then, we compared these STIXs based on the components. Next, we presented

a comparative analysis of features offered by different state-of-the-art STIX generator

tools. At the end, we provided a comprehensive STIX dataset [123] on GitHub, so that

researchers and analysts can use it for their research.

4.6.1 Accuracy

We randomly collected 10 different text reports from IBM X-Force Exchange threat

repository, generated their STIXs both by using the IBM X-Force Exchange (export op-

tion) and by employing our proposed STIXGEN prototype. Then we compared the

resulting STIX dataset based on the correctness and accuracy of the generated com-

ponents. A bar chart of the 10 APTs vs their number of indicators generated by both

IBM X-Force Exchange and STIXGEN can be seen in Figure 4.16. We choose to show

the “indicator” component here, which we thought was the most relevant. There are

three bars in the graph, where, the first bar represents indicator components present

in the input text reports, the second bar shows indicators generated by our proposed

framework STIXGEN and third bar represents indicators produced by IBM X-Force

Exchange.

The BackOff APT shown in the graph is a well-known POS APT. According to

IBM X-Force Exchange’s text report, part of which is given in Figure 4.17, it has five

different indicators namely HTTP Post, FTP and Beacons after every 45 sec, MD5 Hash

927AE15DBF549BD60EDCDEAFB49B829E. It can be observed from the graph that the

number of indicators in the blog’s input text report and the STIXGEN’s output (first

and the second bar of the graph) are exactly the same, which shows 100% accuracy

of STIXGEN. Whereas, the output of the IBM X-Force Exchange’s STIX shows 49 indi-

cators that are contradictory to the IBM input text report. Details are provided in the

ensuing paragraphs.

Upon close examination of the STIXViz snapshot of IBM X-Force Exchange’s STIX

56

Chapter 4. Automatic Generation of Structured Threat Data

Figure 4.16: IBM X-Force Exchange vs STIXGEN

in Figure 4.18(a), it can be observed that in 49 indicator components there are only

two distinct titles “Contained in XFE Collection” and “Malware risk high”, which are con-

stantly repeated. Moreover, none of these indicators match with the actual indicators

present in the IBM X-Force Exchange’s text report (Figure 4.17).

Figure 4.17: IBM X-Force Exchange Textual Report

On the basis of these outcomes, it can be seen that STIX generated by the IBM X-

Force Exchange has a greater number of components from the input text report and

many of the generated components have dummy, irrelevant and erroneous informa-

tion. Whereas, STIXGEN’s generated STIXs have the exactly same number of indica-

tor components as present in the input IBM X-Force Exchange’s text reports, which are

distinct, relevant to IBM X-Force Exchange’s text report and are error-free, see Figure

4.18(b).

57

Chapter 4. Automatic Generation of Structured Threat Data

(a) IBM X-Force Exchange (b) STIXGEN

Figure 4.18: IBM X-Force Exchange vs STIXGEN

4.6.2 Effectiveness

STIX is a new and evolving standard, devised for structuring and sharing of CTI. A

few positive efforts namely Cosive STIX Data Generator (CSDG), IBM X-Force Exchange

and Python-STIX Library have been made towards the structuring of cyber threat data.

But many of these do not take CTI as an input from the user and generally produce

erroneous STIX reports having dummy, unrelated and repeated information most of

the time. A comparison of different STIX generation tools is shown in Table 4.2, which

clearly shows that STIXGEN is easier to use than other competitors.

Table 4.2: Comparison of STIX Generators

Feature/ Tools STIXGEN CSDG Python-Lib IBM X-ForceExchange

GUI/ Console GUI GUI Console GUIUser Input Data Yes No Yes (programming required) YesSkills Required Operator Operator Programming required Operator

STIXGEN provides Graphical User Interface (GUI), which takes CTI from the user

and produces error-free, and threat-relevant STIXs. Whereas, CSDG does not take CTI

from the user but uses dummy data for STIX generation. So it is hard to say if it

produces correct STIXs. Similarly, IBM X-Force Exchange has a CTI repository from

where one can select data for STIX generation. Like STIXGEN, Python STIX Library

takes CTI directly from user. It is a console-based solution, which relies on other tools

to feed it the data components and their connections.

58

Chapter 5

Cyber Threat Response Activities

5.1 Introduction

In the previous chapter, a novel sub-framework namely STIXGEN is presented which

is designed to automatically generate distinct, error-free, and threat relevant structured

CTI data. It is learned during the research that most of the publicly available CTI data

is wrongly labeled, having incomplete artifacts, and missing important indicators re-

garding cyber threat prevention, detection, and response. Therefore, for effective CTM,

there is a need for a sub-framework that should boost, refine, and evaluate the struc-

tured CTI data. Accordingly, a formal sub-framework called SCERM is developed that

ranks, boosts and refines the structured CTI data for CTM. This chapter thoroughly

provides the details of the SCERM.

5.2 Research Approach and Contributions

For this research, our observation is that the identification and prioritization of CTI

data for different phases of cyber threat management cannot be meaningfully accom-

plished without having a formal model of threat intelligence components, their con-

nectivity, and dependency. Therefore, SCERM is proposed for the valuation of struc-

tured data, which formally models the STIX architecture [66] on the basis of the STIX

use case Managing cyber threat response activities [67]. The use case characterizes the

significance of different STIX components according to the cyber threat management

life-cycle. The use case asserts that all STIX components are not equally important

59

Chapter 5. Cyber Threat Response Activities

for every phase of cyber threat management rather certain components are more rel-

evant to a particular phase. For example, exploits and their COAs are necessary for

cyber threat prevention, indicators and observables are essential for cyber threat detec-

tion, while indicators, observables and their respective COAs are important for the cyber

threat response phase. As part of our solution we developed a prototype of SCERM,

which boosts, refines, and valuates STIX reports for cyber threat management. The

Boosting module remaps wrongly placed contents to a data model of STIX components

if required. Then, the Refinement module identifies and augments incomplete or miss-

ing artifacts. Subsequently, the Valuation component evaluates the refined CTI data and

provides valuation reports. These reports comprise of valuation score (vScore) and a list

of extracted components for every phase of cyber threat management. The valuation

and refinement processes are repeated until the STIX report improves to a threshold

suitable for use in cyber threat management. In fact, SCERM provides a starting point

for cyber threat management teams and categorizes STIX reports based on their benefit

for the prevention, detection, and the response phases of cyber threat management or

a combination thereof.

5.3 Design

This section provides a detailed description of the formal model used in the SCERM

system. The STIX Architecture based formal Model (SAM) is presented first, followed

by the formalization of the use case managing cyber threat response activities. The STIX

formal model is used to derive individual tests for different phases of cyber threat man-

agement namely cyber threat prevention, detection and response. Details are provided

in the following subsections.

5.3.1 Formal Model of STIX Architecture - SAM

The STIX architecture [66] describes cyber threat concepts as autonomous and reusable

constructs. The reason for the popularity of the STIX is that it objectively defines differ-

ent aspects of the cyber-threat that answer questions such as “what happened”, “how

the incident occurred”, “what vulnerabilities were exploited” and “who did it”. At the

same time, it establishes connections between these aspects. Based on our literature

60

Chapter 5. Cyber Threat Response Activities

review and study we have concluded that any valuation criterion must measure the

presence of these aspects as well as the associated connections. This valuation will

have to consider which aspects are more important to particular phases of the cyber

threat management and the confidence level of the reporting source regarding the CTI.

STIX is primarily designed to qualitatively model cyber threat data. The subjective

nature of descriptions of components’ properties and their contexts makes it difficult

to perform quantitative measurement of the different aspects of the threat. Particularly

the current STIX architecture cannot valuate the efficacy of STIX reports for different

phases of cyber threat management. Therefore, an alternative model called SAM is

developed, which considers characteristics of the STIX domain and relationship objects

in a quantitative fashion. This model is employed by SCERM to valuate STIX reports

for cyber threat management.

5.3.1.1 Modelling of Campaign Component

SAM defines the domain and relationship objects present in the STIX architecture [66]

as variables in a mathematical relation. The variables campaign, TTP, incident, TA, COA,

ExploitTarget, indicator, and observables are used to represent STIX domain objects. The

variables CC (Campaign Component), CrCr (Campaign related Component), TTPC

(TTP Component), TTPrCs (TTP related Component), EC (Exploit Component), ErCt

(Exploit related Component), IndC (Indicator Component), IndrCu (Indicator related

Component), IncC (Incident Component), IncrCv (Incident related Component), COAC

(Course of Action Component), COArCw (Course of Action related Component), ObsC

(Obervable Component), ObsrCx (Observable related Component), TAC (ThreatActor

Component) and TArCy (ThreatActor related Component) are employed for the selec-

tion of the aforesaid components. vScore is a variable, which is used to store the ranking

score for a STIX report.

Multiple functions such as COA ranking (CRF(coa,p)), indicator ranking

(IRF (indicator, |observable|)), producer strength (PS(p)), COA mass (CM(coa)), indi-

cator efficacy (IE(indicator)), and indicator mass (IM(indicator)) are introduced to measure

different characteristics of the aforesaid components. Similarly, j and k are iterators,

which are employed to iterate the components during the calculation of vScore. Since

the STIX architecture [66] is relatively huge, with several domain objects and complex

61

Chapter 5. Cyber Threat Response Activities

interrelations. Therefore we will explain the modeling with the help of the campaign

component and its related components. The rest follow similarly. A campaign may be

associated with one or more other campaigns, it may use related TTPs or have related

incidents and may be attributed to a TA as shown in Figure 5.1.

Figure 5.1: Campaign and its Related Components

Consider the following. campaignj belongs to Campaign run in the attack j

(campaignj ∈ Campaign ) where the cardinality of the Campaign is |Campaign| =

m camp. The symbol ∈ depicts the belongs to, whereas the symbol 3 depicts the own or

has a member relationship between components.

Similarly, ttpk belongs to TTP employed in this attack ( ttpk ∈ TTP ) where

|TTPs| = n ttp. Then the cardinality of the campaign-TTP relationship can be for-

mally expressed as in Equation 5.1.

m camp∑j=0

n ttp∑k=0

campaignj 3 ttpk (5.1)

The first summation describes the range of Campaign, whereas the second sum-

mation is used to represent the number of related TTPs. Other relations are modeled

similarly. A portion of the model that illustrates the four relationships of the campaign

can be seen in Figure 5.2.

On the left-hand side, we have the Campaign component and the arrows show the

62

Chapter 5. Cyber Threat Response Activities

Figure 5.2: Formal Depiction of the Campaign Components

relations to the several related components on the right-hand side of this figure. Each

relation is labeled by the formalism depicting the cardinality. The Valuation process

considers one or more of these components or their relationships by using selection

variables. One of these selection variables namely CrCr can be seen in this figure. The

next subsection describes the selection process in greater detail.

5.3.1.2 Component Selection

In SAM, the inclusion or exclusion of a STIX components is controlled by a single

Boolean variable. A TRUE value indicates that the component is included and a FALSE

indicates that it is excluded. In Figure 5.2, the Campaign component can be seen because

of its control variable, CC is set to TRUE. Similarly, the relationships with other compo-

nents are controlled via a vector of boolean variables. For instance, the CrCr is used to

control the campaign component’s relationship to the associated campaign, TTP, incident,

and threatactor. The subscript r indicates the index within the vector. CrC0 controls the

associated campaign relation. CrC1, CrC2 and CrC3 are used to control the TTP, incident,

and threatactor relations respectively. The complete Karnaugh map of all the variable

values of Campaign and its related components is shown in Table 5.1.

Accordingly, the details of all the variables employed in the SAM valuation model

63

Chapter 5. Cyber Threat Response Activities

Table 5.1: Component Selection

r CrCr Component status0 CrC0=0 Campaign dropped0 CrC0=1 Campaign selected1 CrC1=0 TTP dropped1 CrC1=1 TTP selected2 CrC2=0 Incident dropped2 CrC2=1 Incident selected3 CrC3=0 Threatactor dropped3 CrC3=1 Threatactor selected

for selection of different STIX components and their relationships are detailed in Table

5.2.

Table 5.2: SCERM’s Variables and their purpose

Variable Purpose(to Add/Drop components) Variable Purpose (to Add/Drop

related components of)CC Campaign CrCr CampaignTTPC TTP TTPrCs TTP’sEC Exploit ErCt ExploitIndC Indicator IndrCu IndicatorIncC Incident IncrCv IncidentCOAC COA COArCw COAObsC Observable ObsCx ObservableTAC Actor TArCy Actor

Note that we have employed the letter r to distinguish between the variable con-

trolling the component e.g. CC and the vector variable controlling the relationship e.g.

CrCr.

5.3.1.3 Valuation Score

The SAM model so far detailed is employed in the calculation of the vScore variable in

Equation 5.2, and depicts the efficacy of a STIX report. On the right-hand side, all STIX

64

Chapter 5. Cyber Threat Response Activities

components are listed, which are additive.

vScore =

CC ·∑m camp

j=0 Campaignj

+CrCr ·∑m camp

j=0

∑n assCampk=0 Campaignj 3 AssociatedCampaignk

+CrCr ·∑m camp

j=0

∑n ttpk=0 Campaignj 3 TTPk

+CrCr ·∑m camp

j=0

∑n inck=0 Campaignj 3 Incidentk

+CrCr ·∑m camp

j=0

∑n tak=0 Campaignj 3 TAk

+TTPC ·∑m

j=0 TTPj

+TTPrCs ·∑m ttp

j=0

∑n rttpk=0 TTPj 3 RelatedTTPk

+TTPrCs ·∑m ttp

j=0

∑n exploitk=0 TTPj 3 ExploitTargetk

+EC ·∑m exploit

j=0 ExploitTargetj

+ErCt ·∑m exploit

j=0

∑n rExpltk=0 ExploitTargetj 3 RelatedExploitTargetk

+ErCt ·∑m exploit

j=0

∑n coak=0 CRF (ExploitTargetj , COAk)

+IndC ·∑m ind

j=0 Indicatorj

+IndrCu ·∑m ind

j=0

∑n rIndk=0 Indicatorj 3 RelatedIndicatork

+IndrCu ·∑m ind

j=0

∑n campk=0 Indicatorj 3 Campaignk

+IndrCu ·∑m ind

j=0

∑n ttpk=0 Indicatorj 3 TTPk

+IndrCu ·∑m ind

j=0 IRF (Indicatorj , |Observable|)

+IndrCu ·∑m ind

j=0

∑n coak=0 CRF (Indicatorj , COAk)

+IncC ·∑m inc

j=0 Incidentj

+IncrCv ·∑m inc

j=0

∑n rInck=0 Incidentj 3 RelatedIncidentk

+IncrCv ·∑m inc

j=0

∑n ttpk=0 Incidentj 3 TTPk

+IncrCv ·∑m inc

j=0

∑n coaTakenk=0 CRF (Incidentj , COATakenk)

+IncrCv ·∑m inc

j=0

∑n coaReqk=0 CRF (Incidentj , COARequestedk)

+IncrCv ·∑m inc

j=0

∑n Indk=0 Incidentj 3 Indicatork

+IncrCv ·∑m inc

j=0

∑n obsk=0 Incidentj 3 Observablek

+IncrCv ·∑m inc

j=0

∑n tak=0 Incidentj 3 TAk

+COAC ·∑m cao

j=0 CRF (COAj , Nil)

+COArCw ·∑m coa

j=0

∑n rcoak=0 CRF (COAj , RelatedCOAk)

+COArCw ·∑m coa

j=0

∑n parObsk=0 COAj 3 ParameterObservablek

+ObsC ·∑m obs

j=0 Observablej

+ObsCx ·∑m obsm

j=0

∑n subObsk=0 Observablej 3 SubObservablek

+TAC∑m ta

j=0 TAj

+TArCy ·∑m ta

j=0

∑n rTAk=0 TAj 3 RelatedTAk

+TArCy ·∑m ta

j=0

∑n campk=0 TAj 3 Campaignk

+TArCy ·∑m ta

j=0

∑n ttpk=0 TAj 3 TTPk

(5.2)

With every relation, a selection variable is multiplied for inclusion or exclusion

of the relation. In SAM, several functions are employed for the assessment of CTI

data such as course of action ranking function (CRF(coa, p)), indicator ranking function

(IRF(indicator,|observable|)), producer strength (PS(p)), COA mass (CM(coa)), indicator

65

Chapter 5. Cyber Threat Response Activities

efficacy (IE(indicator)), and indicator mass (IM(indicator)), which will be explained in

subsections.

5.3.2 Modeling of the Use Case - Managing Cyber-Threat Response

Activities

The STIX provides four high-level use cases for cyber threat management [67] which

are (1) cyber-threat analysis, (2) specifying indicator patterns, (3) managing cyber threat re-

sponse activities and (4) CTI sharing. In these, the “managing cyber threat response activi-

ties” is the most important use case, which expresses the significance of different STIX

components with the cyber threat management life-cycle. We have utilized the formal

model of the STIX architecture [66] (Equation 5.2) to derive individual tests for the val-

uation of the cyber threat management phases. Details are provided in the ensuing

subsections.

5.3.3 Cyber threat Prevention and Response Model

According to the STIX use case “managing cyber threat response activities” [67], the cyber

threat prevention team studies different preventive COAs for the identified threat and

selects suitable measures. Then, it applies these COAs e.g. software update, patch in-

stallation or firewall rules implementation for cyber threat prevention. Once the cyber-threat

has been detected, the response team takes corrective measures such as blocking the

data ex-filtration channel and restoring the systems. It is important to note that both the

prevention and response phases of the cyber threat management use the COAs. The

STIX standard defines various key properties or fields of the COA such as title, stage,

type, description, impact, cost, efficacy, and confidence. To valuate the COAs for the preven-

tion and response phases, we thoroughly studied the aforesaid properties of the COA

component and its relational bonds. Details of these are provided in the following

subsections.

5.3.3.1 Course of Action - Stage and Type

The stage property distinguishes whether the COA belongs to cyber threat prevention

or response. The default enumeration for the stage property is “COAStageVocab”. If

66

Chapter 5. Cyber Threat Response Activities

stage is set to Remedy then the COA is designed for cyber threat prevention and if its

value is Response then the COA is defined for cyber threat response, as can be seen in

Figure 5.3.

Figure 5.3: COA Stage

This property is applied through the type property, which states a class of the COA.

The type property is implemented through vocabulary “CourseOfActionTypeVocab-

1.0”. This vocabulary defines multiples classes of COA such as patching, hardening,

redirection, public or logical address restriction, eradication, perimeter or host blocking.

5.3.3.2 Course of Action - Impact, Efficacy, and Confidence

The STIX standard provides several properties such as impact, efficacy, and confidence

to describe the COA. (1) The impact property describes the repercussion of implement-

ing the COA. (2) The efficacy states the effectiveness of the COA in getting its intended

goals. (3) The confidence property gives the level of trust of the analyst on the assigned

scores of the impact and efficacy. The STIX standard uses an enumeration “HighMedi-

umLowVocab”, which defines vocabulary to express the various level of these proper-

ties such as unknown, none, low, medium, and high.

To measure the strength of a COA, the following procedure is adopted. (1) At first,

aforesaid qualitative vocabulary levels are converted into quantitative values 0, 1, 2,

and 3, respectively, for the valuation of the COA, as can be seen in Table 5.3. (2) Then

four functions namely CM(coa), I(coa), E(coa) and C(coa, string) are introduced. The

I(coa) (Equation 5.3) and E(coa) (Equation 5.4) functions take coa as input and extracts

67

Chapter 5. Cyber Threat Response Activities

Table 5.3: Levels of Impact, Efficacy, and Confidence for Course of Action

EnumerationVocabulary Values

AssignedNumerical Values

High 3Medium 2Low 1None or Unknown 0

the impact and efficacy levels, which are from 0 to 3 according to Table 5.3.

I (coa) 7−→ {0 , 1 , 2 and 3}

where : 0 , 1 , 2 and 3 are impact levels(5.3)

E (coa) 7−→ {0 , 1 , 2 and 3}

where : 0 , 1 , 2 and 3 are efficacy levels(5.4)

The C(coa, “impact or efficacy”) function takes the coa as well as a string argument asinput. When the caller function passes “impact” as a string then the C(coa, “impact”)function gives the confidence score for the impact of the subject COA, as can be seen inEquation 5.5. This function may results impact and efficacy score from 0 to 3.

C (coa, “impact ′′) 7−→ {0 , 1 , 2 and 3}

where : 0 , 1 , 2 and 3 are confidence levels(5.5)

On the other hand, when “efficacy” is passed then C(coa, “efficacy”) function pro-

duces a confidence score for the effectiveness of the COA, which can be seen in Equa-

tion 5.6.

C (coa, “efficacy ′′) 7−→ {0 , 1 , 2 and 3}

where : 0 , 1 , 2 and 3 are confidence levels(5.6)

(3) The CM(coa) is the main function, which calls the aforesaid IM(coa), E(coa), and

C(coa) functions, adds their produced scores namely impact, efficacy, and confidence,

as shown in Equation 5.7.

CM(coa) = I (coa) + C (coa, “impact ′′) + E (coa) + C (coa, “efficacy ′′) (5.7)

68

Chapter 5. Cyber Threat Response Activities

5.3.3.3 Course of Action and its Associations

According to the STIX architecture [66], there are three producers of the COA namely

the victim, indicator and the exploit target components, which can be seen in Figure 5.4.

Figure 5.4: COA and its Relations

Producers are components that convey COA details for the threat under considera-

tion. Upon close examination of the figure, different types of relational bonds between

the COA and its producer components can be observed. These bonds are labeled as

COA taken, COA requested, suggested COA, potential COA, and related COA. The strength

of these bonds can be judged on the basis of the experience and the knowledge of the

analyst who authored the producer component.

The most reliable and trustworthy producer is the victim himself, because he faced,

analyzed, and responded to the cyber attack. Therefore, the bond “COA taken” is con-

sidered as the highest level and is given a value 5, as can be seen in Table 5.4.

Table 5.4: COAs Producers and their Strength

Producer Bonding Producer StrengthIncident COA Taken 5 or HighestIncident COA Requested 4 or Medium-highIndicator Suggested COA 3 or MediumExploit Potential COA 2 or Medium-lowCOA Related COA 1 or LowCOA Nil 0 or Nil

Whereas, the requested COA is the second highest or of medium-high bond level

because it is identified by the victim after the analysis and observation of the actual cy-

69

Chapter 5. Cyber Threat Response Activities

ber attack but somehow he could not apply it. Hence, it is considered a second higher

remedial action for the cyber attack. Therefore, it is assigned a value 4. The “suggested

COA” is considered as a medium bond, because it is suggested by an expert after study

and analysis of the cyber attack. Hence, it is assigned a value 3. The COA produced on

the basis of common sense knowledge namely “potential COA” is more of an estimate

and is of a medium-low level or value 2. The related COA is considered as a weak bond

because some of the producers generically associate certain defense mechanisms with

each other without considering the cyber attack scenario. For example, the firewalls and

IDSes are commonly associated with network defence even though in actuality each of

these have their own specific utilization when considering the exact network attack in

question. Hence if a related COA has been mentioned in the STIX then the proposed

model assigns it a low-level value of 1. Similarly, the value of “ Nil or 0” is assigned a

COA, which does not have any association with a producer.

Afterwards, a function namely PS(p) is introduced to measure the strength of a

COA’s producer, which can be seen in Equation 5.8. It take producer as input and

returns the producer strength score according to Table 5.4.

PS (p) 7−→ {0 , 1 , 2 , 3 , 4 , and 5}

where : P ∈ producer of coa

0, 1, 2, 3, 4, and 5 are producer strength scores

(5.8)

5.3.3.4 Ranking of a Course of Action

To rank the COA component, a CRF function is introduced, which can be seen in Equa-

tion 5.9. It accepts coa and its producer (p) as input arguments and passes these to

the CM(coa) (sec. 5.3.3.2) and PS(p) (sec. 5.3.3.3) functions, respectively. The CM(coa)

function produces the mass score of a COA, while PS(p) function returns the producer

strength score. Finally, these scores are added (CM(coa) + PS(p)) to produce the rank-

ing score of the COA.

CRF (coa, p) = CM (coa) + PS (p)

where : coa ∈ COA

p ∈ the producer of the COA.

(5.9)

70

Chapter 5. Cyber Threat Response Activities

The STIX use case - “managing cyber threat response activities” [67], the COA com-

ponent properties and its relational bonding is a basis for us for the valuation of the

STIX reports for the prevention and response phases of the cyber threat management.

The valuation metric is formalized for all the relations enumerated in the SAM model

which have COA components and can be seen in an Equation 5.10 for the cyber threat

prevention and response phases.

vScore = COAC·m coa∑j=0

CRF (COAj , Nil)

+COArCw·m coa∑j=0

n rCOA∑k=0

CRF (COAj , RelatedCOAk)

+IndrCu·m Ind∑j=0

n coa∑k=0

CRF (Indicatorj , COAk)

+IncrCv ·m Inc∑j=0

n coareq∑k=0

CRF (Incidentj , COARequestedk)

+IncrCv ·m Inc∑j=0

n coaTaken∑k=0

CRF (Incidentj , COATakenk)

+TTPrCt·m Exp∑j=0

n coa∑k=0

CRF (ExploitTargetj , COAk)

(5.10)

To automate the relations selection procedure for the cyber threat prevention and

response phases, it is required to set the component selection variables in the SAM

equation (Equation 5.2) according to Table 5.5.

Table 5.5: Variables for Prevention and Response phases

Variables ValuesCC, CrCr 0 , r = 0TTPC, TTPrCs 0 , s = 0EC, ErCt 0 , t = 2IndC, IndrCu 0 , u = 5IncC, IncrCv 0 , v = 3, 4COAC, COArCw 1 , w = 1ObsC, ObsCx 0 , x = 0TAC, TArCy 0 , y = 0

The first column in the table shows the components selection variables, while the

second column represents the values of the variables for the automatic inclusion or

exclusion of the STIX component and to reduce the SAM equation for the prevention

71

Chapter 5. Cyber Threat Response Activities

and response phases. The detailed procedure for the inclusion or exclusion of a STIX

component is already provided in section: 5.3.1.2.

5.3.4 Cyber threat Detection

According to the STIX use case - “managing cyber threat response activities” [67], in or-

der to detect the cyber attack, after having defined threat indicators, the cyber threat

detection team collects and monitors the indicators and their observables in their cyber

environment.

The use-case suggests that for cyber threat detection the indicators and their observ-

ables such as IPs, port numbers, protocols, hashes, files or folders names, APIs and registry

entries used by the attacker are key components. These are forensic artifacts of the cy-

ber attack and are important for identifying the occurrence of the attack on the host

or within the network. The response team studies these and takes remedial actions to

block or respond to the cyber attack. In fact, cyber threat detection and response is

not possible without these components. The STIX standard defines several properties

or fields of the indicator components such as title, type, description, valid time position,

observables, indicated TTP, likely impact, confidence, and sighting. To valuate the indicator

component for the cyber threat detection phases, we thoroughly studied the afore-

said properties and the indicator’s classification model - the POP [26]. In the ensuing

subsections, we will describe how we formalized the key properties of the indicator

component to measure its strength and how POP’s levels are formalized into efficacy

score.

5.3.4.1 Indicator - likely impact and confidence

The STIX standard provides likely impact and confidence properties to describe the in-

dicator component. (1) The likely impact property describes the probable impact of

the indicator if it occurred. (2) The confidence property provides the level of trust of

the correctness of the provided indicator. The STIX standard defines the “HighMedi-

umLowVocab” enumeration. It states various levels of the likely impact and confidence

properties such as unknown, none, low, medium and high.

To measure the strength of the indicator the following steps are applied. (1) The

aforesaid vocabulary levels for likely impact and confidence are quantified in the range 0

72

Chapter 5. Cyber Threat Response Activities

to 3, with 0 being the lowest and 3 being the highest, similar to how the trustworthiness

levels of the COA component properties were mapped as shown in section: 5.3.3.2. (2)

Next, IM(indicator) function is employed, which takes indicator component information

as an input and forwards this information to the LI(indicator) and C(indicator) functions.

Where, the LI(indicator) function produces the likely impact and C(indicator) function

gives the confidence level for the impact of the subject indicator. Subsequently, these

scores are added to produce the indicator mass score, which can be seen in Equation

5.11.

IM (indicator) = I (indicator) + C (indicator) (5.11)

5.3.4.2 Formalization of POP indicator levels as efficacy scores

The POP model emphasises that all indicators are not equally important for cyber at-

tack detection. It classifies the indicators on the basis of their efficacy and places them

at different levels of the pyramid. Moreover, this model suggests that the higher an

indicator is in the pyramid, the more useful it is for cyber threat management because

it causes more damage to the adversary and it is difficult to change, as the adversary

invests more resources and time on indicators that are higher in the pyramid. For ex-

ample, responding to low-level indicators namely hashes, IPs and DNs will cause minor

damage, while preventing high-level indicators such as host and network artifacts, tools

and TTPs will cause more pain to the adversary. According to the various levels of the

POP, efficacy scores are assigned to the indicators, as shown in Table 5.6.

A lower score value is assigned to the low-level indicator than that assigned to the

higher level indicator. For example, a score of 5 is assigned to exploit watchlist which

is at a higher level than hash watchlist that is assigned a 1. All indicators are assigned

scores in this fashion. Next, IE(indicator) function is introduced, which takes indicator

as an input argument and returns indicator efficacy score according to the Table 5.6,

as can be seen in Equation 5.12.

IE (indicator) 7−→ {1 , 2 , 3 , 4 , and 5}

where : 1 , 2 , 3 , 4 , and 5 are indicator efficacy scores(5.12)

73

Chapter 5. Cyber Threat Response Activities

Table 5.6: Indicator Efficacy

Indicator Efficacy ScoreExploit Watchlist 5APIs Watchlist 4Folders Watchlist 4Files Watchlist 4Registry Watchlist 4Mutex Watchlist 4Registry Watchlist 4Data Staged 4Protocol Watchlist 3Port Watchlist 3DN Watchlist 2IP Watchlist 2Hash Watchlist 1

5.3.4.3 Ranking of an Indicator

Indicator Ranking Function (IRF (indicator, |observable|)) is proposed to rank the indica-

tor component of the STIX reports, which can be seen in Equation 5.13. It takes indica-

tor and |observable| as input and forwards indicator to the indicator mass (IM(indicator))

(sec. 5.3.4.1) and indicator efficacy (IE(indicator)) (sec. 5.3.4.2) functions. These func-

tions return indicator mass and efficacy scores respectively, which are later added and

the sum is multiplied with the |observable|. Subsequently, the result is returned to the

caller function.

IRF (indicator , |observable|) = {IM (indicator) + IE (indicator)} × |observable| (5.13)

The STIX use case - “managing cyber threat response activities” [67], the indicator com-

ponent’s properties and the indicators classification model - POP is a basis for us for the

valuation of the STIX reports for the cyber threat detection phase of the cyber threat

management. The valuation metric is formalized as Equation 5.14 for the relation enu-

merated in the SAM model and uses the indicator and observable components.

vScore = IndrC ·m ind∑j=0

IRF (Indicatorj, |Observable|) (5.14)

74

Chapter 5. Cyber Threat Response Activities

To automate the relations selection procedure for the cyber threat detection phase,

it is required to set the component selection variables in the SAM equation (Equation

5.2) according to Table 5.7.

Table 5.7: Variables for Detection phase

Variables ValuesCC, CrCr 0 , 0TTPC, TTPrCs 0 , 0EC, ErCt 0 , 0IndC, IndrCu 0 , u=4IncC, IncrCv 0 , 0COAC, COArCw 0 , 0ObsC, ObsCx 0 , 0

The table shows the component selection variables and their values to reduce the

SAM equation (Equation 5.2) for the cyber threat detection phase. The detailed proce-

dure for the inclusion or exclusion of a STIX component is already provided in section:

5.3.1.2.

5.4 Architecture and Implementation

The high-level architecture of the SCERM system is shown in Figure 5.5 while pseudo

code is provided in algorithm 5.1 detailing the connectivity of the various modules and

their submodules.

The three main modules that the SCERM system is composed of includes (1) the

Preprocessing, (2) the Valuation and (3) the Refinement. The Preprocessing module consists

of Parser and Booster submodules. The Parser accepts STIX reports as an input, extracts,

and stores the desired STIX components into the graph database. Afterwards, the

Booster submodule retrieves distinct components and saves them into Distinct compo-

nent list (DCL). Then, the Booster function identifies and places the misplaced component

under a Native Component List (NCL[]).

Afterwards, the Valuation module takes the database as an input, formally evalu-

ates the STIX model and generates valuation scores for different phases of CTM. These

scores are communicated to the analyst. Subsequently, Refinement module gets com-

75

Chapter 5. Cyber Threat Response Activities

Figure 5.5: High level Architecture Diagram of SCERM

ponents and identifies incomplete components. Afterwards, the Crawler crawls a pre-

pared dataset called PD[][], retrieves the missing components, and saves them into a

list called comprehensive component List[] (CCL[]). Accordingly, Valuation module valu-

ates the refined STIXs and the cyclic feedback process repeats until the STIX converges

to an optimum or desired valuation score determined by the analyst. The detailed

description of each of these is provided in the ensuing subsections.

5.4.1 Preprocessing

The Preprocessing module comprises of the Parser and the Signal Booster submodules.

It accepts STIX reports as an input. These reports are available as either xml, json or

other structured formats and are used by the cyber threat teams as a continuous threat

management process. The part of the Preprocessing algorithm can be seen in algorithm

5.2, which performs the following operations. (1) First, it initializes the variables. (2)

Then, a connection with a graph database (DB) is created for reading and writing the

STIX components’ information, as can be seen in line 6. (3) Next, the Parser function

reads the STIX reports by using a combination of regular expression pattern match-

ing and tag recognition, which can be seen in line 8. It further extracts and stores

the desired STIX components into the graph database. The graph database is a col-

lection of nodes and edges. In SCERM, STIX domain objects (SDOs) are defined as

nodes of the database while STIX relationship objects are defined as edges. (4) After-

wards, the Booster function reads the DB, retrieves and saves components information

76

Chapter 5. Cyber Threat Response Activities

Algorithm 5.1 : SCERM.1: Input = STIX Report2: Output = Refined STIX Graph and V aluation Reports for CTM.3: . Variables:4: CL[] := Component List.5: DCL[] := Distinct Component List, have unique Components.6: SDO := STIX Domain Object.7: NCL[] := Native Component list, a list of SDO.8: PD[] := Prepared Data Set of Blog Reports.9: ICL[] := Incomplete ComponentList.10: CCL[] := Completed Component List, list after crawling.11: dvScore := Desired V aluation Score.12: vScore := V aluation Score of STIX Report for CTM.13: DB := Database.14: Connect(DB)15: . - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -16: . Module-1: Preprocessing17: . - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -18: . Sub-module : Parser - Parses STIX Report, saves extracted components in Graph Database19: DB := Parser(STIX Report)20: . Sub-module : Booster - Stores distinct components in DCL array21: CL[] := reading(DB)22: DCL[iterator i++] = Distinct((CL[iterator j ++])23: . Sub-module : Remapper - Remaps wrongly placed components under their native SDOs24: Remapper(NCL[iterator i++], DCL[iterator j ++])25: save(DB,NCL[])26: . - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -27: . Cyclic Feedback Process Repeats Until The STIX Converges to a Desired Valuation Score28: while vScore ≤ dvScore do29: . - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -30: . Module-2: Valuation31: . - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -32: . Sub-module : Valuator - Performs evaluation of STIX Graph33: vScore = V aluator(NCL[])34: . - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -35: . Module-3: Refinement36: . - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -37: . Sub-module : Component Analyzer - Identifies incomplete components, stores them in Array38: ICL[iterator i++] = ComponentAnalyzer(NCL[iterator j ++])39: . Sub-module : Crawler - Crawls prepared dataset (PD[][]) and extracts required components40: if Crawler(ICL[iterator i++],PD[iterator j++]) then41: . Sub-module : Adder - Adds crawled components in Array42: Adder(CCL[iterator i++], PD[iterator j ++]43: end if44: CCL[iterator k ++] = Crawler(ICL[iterator i++], PD[iterator j ++])45: . - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -46: end while47: . - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -48: . Gets and saves retrieved components into STIX Graph DB49: save(DB,CCL[])

into a list called CL[]. Next, this list is traversed and distinct components are saved

into a list named as distinct component list (DCL), as can be seen in line 10 to 18.

(5) Subsequently, in line 20 to 34, the Booster function respectively retrieves the com-

ponent information from the DCL[], saves into a variable called misplaced component.

Booster assumes that the selected component is a misplaced component and compares

it with already stored components in the component dictionary (CD[][]). The dictio-

nary is comprised of SDOs and their related components. If components are found

equal, then the Booster function further verifies whether types of both the components

are similar or not. If found dissimilar, this indicates that the selected component was

77

Chapter 5. Cyber Threat Response Activities

wrongly placed in the STIX report. Afterwards, the Booster function places the mis-

placed component under a native component list (NCL[]), which can be seen in line

28. (6) This process keeps going until all components are processed. (7) In the end,

remapped components information from the NCL[] is saved into the DB for the valua-

tion and refinement processes.

Algorithm 5.2 : Preprocessing.1: Input := STIX Report2: Output := A Boosted STIX Report3: . Variables:4: CD[][] := ADictionary to boost the components.5: SDO = STIX Domain Objects.6: dcl index := 07: . Connecting with Database8: Connect(DB)9: . STIX report parsing and saving extracted components into a Graph Database10: DB := Parser(STIX Report)11: . STIX Booster : Gets distinct components from the Graph Database12: CL[] := reading(DB)13: for each (component in CL[]) do14: for each (component in DCL[]) do15: if (DCL[iterator i] = CL[iterator j]) then16: continue17: end if18: DCL[dcl index++] = CL[iterator j]19: end for20: end for21: . STIX Remapper : Remaps wrongly placed components under their native SDOs22: for each docomponent in DCL[]23: . Assuming component is misplaced24: misplaced component := DCL[iterator]25: . Validating above assumption26: for each do(SDO (s) in CD[][])27: for each do(related component (rc) of selected SDO)28: if (misplaced component = CD[s,rc]) then29: if (misplaced component.Type != CD[s,rc].Type) then30: NCL[] := selected component()31: Delete(misplaced component)32: end if33: end if34: end for35: end for36: end for37: save(DB,NCL[])

5.4.2 Valuation

The Valuation module takes the database as an input and retrieves the STIX compo-

nents. It formally evaluates the STIX model to generate valuation scores for cyber

threat prevention, detection, and response. These scores are communicated to the an-

alyst to aid him in prioritizing the intelligence. Subsequently, the Valuation module

triggers the Refinement module for possible STIX refinement and to increase the valua-

tion score.

78

Chapter 5. Cyber Threat Response Activities

5.4.3 Refinement

The Refinement module consists of three submodules namely (1) the Component Ana-

lyzer, (2) the Crawler, and (3) the Adder. The refinement algorithm is shown in algorithm

5.3. It performs the following operations. (1) The Component Analyzer retrieves, an-

alyzes and processes the component information. (2) At first, it extracts components

from the graph database that are necessary for different phases of cyber threat man-

agement. Then it saves these identified components into a list called “component list

(CL[]”. Based on expert feedback from the security community, we have determined

that these components are TTPS, exploits, indicators, observables, and COAs.

Algorithm 5.3 : Refinement.1: Input:= Boosted STIX Graph2: Output:= Refined STIX Graph and Report3: CL[] := read(DB)4: . Component Analyzer: Identifies incomplete components5: for each component(C) in CL do6: if Ci of SDO(i).Table 6∈ corresponding SDO(j).Table then7: ICL(iterator)(0) := Ci.T itle8: ICL(iterator)(1) := SDO(j).Name9: end if10: end for11: . Crawler: Crawls prepared dataset (PD[][]) and extracts required components12: for each component(C) in ICL do13: for each component (C) in PD do14: if ICL[iterator i)(0).PD(iterator j)(0) then15: if ICL(iterator i)(1) = PD(iterator j)(1) then16: . adding and saving ICL and PD components in a single list17: CCL(iterator k)(0) := ICL(iterator j)(0)18: CCL(iterator k)(1) := ICL(iterator j)(1)19: CCL(iterator k)(2) := PD(iterator j)(2)20: iterator k ++21: end if22: end if23: iterator j ++24: end for25: iterator i++26: end for27: . Gets and saves retrieved components them into the STIX Graph DB28: save(DB,CCL[])

(3) Then, Component Analyzer identifies incomplete components, saves their title

and required component (SDO) name into a incomplete component list (ICL[]). This list

has incomplete artifact information such as a TTP without an exploit, an exploit or an

indicator without a COA. (4) Afterwards, the Crawler function proceeds to process the

required list of components (ICL[]), as can be seen in line 11 to 26. It crawls a prepared

dataset called PD[][] of curated blog reports, retrieves the missing artifacts and com-

ponents, and saves them into a list named as comprehensive component List[] (CCL[]),

which can be seen in line 17 to 19. (5) Finally, retrieved information is fuses into the

STIX graph database, which can be seen in line 28. Accordingly, refined component

79

Chapter 5. Cyber Threat Response Activities

information is made available to the analyst as well as to the Valuation module. The

Valuation module then evaluates the refined STIXs and the cyclic feedback process re-

peats until the STIX converges to an optimum or desired valuation score determined

by the analyst or stops improving.

5.5 Case Study

We consider a recent APT to demonstrate the working of the proposed system SCERM

in performing valuation and refinement. On the basis of the outcomes, we will guide

the user which STIX report is better for the prevention, detection, and response phases

of the cyber threat management. In subsequent subsections, first of all, we will briefly

introduce the selected STIX report and summarize its components details. Then, we

will explain how signal boosting is performed by the remapping of the CTI data. Next,

boosted STIX report will be valuated for different phases of the cyber threat manage-

ment. After that, we will precisely describe how CTI data from security blogs are used

for the refinement of the STIX report. Subsequently, we will perform the valuation of

refined STIX report. In the end, a comparison of the refined and raw STIX reports will

be provided on the basis of their valuation scores.

5.5.1 APT Selection

We selected a high impact APT meant for cyber-espionage attributed to the threat

group “TG-3390” [124], which also goes under the aliases Goblin Panda, APT27, Emis-

sary Panda, Hellsing, Cycledek as well as Bronze Union. Since 2013, the APT has been

launched against various sectors such as aerospace, pharmacy, intelligence, energy, nuclear

as well as the defense to steal high-value information. In order to precisely demonstrate

the valuation and refinement functionality of SCERM, a holistic STIX report compris-

ing of CTI components beneficial for all three phases of cyber threat management was

desired. For this purpose a number of cybersecurity blogs were scanned and a rea-

sonably good sample from the IBM X-Force threat exchange was retrieved that reports

incidents attributed to “TG-3390”.

80

Chapter 5. Cyber Threat Response Activities

5.5.2 A Brief Description of the Report

Threat Incident reports that are provided by IBM are made available in both textual

form as well as in STIX XML and JSON formats. A small portion of the TG-3390’s

XML based STIX report retrieved from the IBM X-Force threat exchange can be seen in

Figure 5.6.

Figure 5.6: IBM X-Force STIX XML

For ease of the reader in correlating the XML tags with STIX components, the com-

ponent labels have been highlighted as well as annotated in the figure. Figure 5.7

shows the same portion of the STIX report in a visual format displayed using the

STIXViz tool. The reader will notice the STIX components TTPs, cybox, and indica-

tors defined as XML tags as well as he will notice icons representing the same in the

figure.

During visual analysis of the STIX report multiple STIX components such as TTPs,

indicators, and observables are identified. Details of these components are as follows. (1)

There are 120 TTPs in the IBM STIX file, which can be divided into five groups on the

basis of their titles such as heuristic, trojan, virus, worm, and spyware. 12 out of 120 TTPs

can be identified in the figure (Figure 5.7).

(2) 98 indicators are observed in the STIX report that can be equally divided into

two types. (a) indicators with a title “Contained in XFE Collection”. (b) indicators with the

title “Malware risk high”. 5 out of 98 indicators are shown in the figure (Figure 5.7). (3)

Similarly, there are 49 observables in the STIX report, which have a title “XFE Observable

81

Chapter 5. Cyber Threat Response Activities

Figure 5.7: STIX-1: IBM X-Force STIX

for” concatenated with the different hash values. 4 out of 49 observables can be seen in

the figure (Figure 5.7). This STIX report like numerous other structured threat data,

provided in threat feeds, depicts a high level of noise. For instance, a small portion of

the input IBM TG-3390 text report can be seen in Figure 5.8. There are two important

concepts namely Remote Access Trojans and Spearphishing emails as TTPs shown in the

report, however, both of these TTPs are not present in the STIX file (Figure 5.7) that is

produced.

It is important to notice that the IBM text report also highlights some COAs such as

Keep applications, OS, antivirus and associated files up-to-date and block all URL, hash, and

IP based IoCs at the firewall, IDS, routers but these are also missing in the STIX report.

These are just a couple of examples that were illustrated. In total there are 7 concepts

that have a discrepancy in that they are not reflected in both the text report and the

structured STIX output or do not have the proper labels. In the next subsection, we will

show how SCERM consolidates these noise discrepancies and boosts the intelligence

signal of the structured report.

82

Chapter 5. Cyber Threat Response Activities

Figure 5.8: IBM Text Report

5.5.3 Signal Boosting

During our research, it has been observed that STIX reports are not appropriately for-

matted, use incorrect vocabulary and are either missing key components or have erro-

neously labeled elements reducing their usefulness for effective cyber threat manage-

ment. For example, the TG-3390 STIX report selected for the case study has important

CTI data under the description tag. By zooming into the description tag in the figure

(Figure 5.6), the CTI data related to important STIX components such as TTPS, indica-

tors, observables, and COAs can be identified, as shown in Figure 5.9.

This CTI data is exactly the same as the IBM text report (Figure 5.8). The Signal

Booster retrieves CTI data from the description tag of the STIX and remaps it under

the appropriate STIX component’s tag. For example, the CTI Remote Access Trojans and

Spearphishing emails are placed under the TTP tag, the Keep applications, OS, antivirus

and associated files up-to-date are placed under the exploit components, while Block hashes

at the firewall, IDS, routers are placed under the observable component. Then the updated

information is stored into the shared graph database. The updated STIX report, gen-

erated from the boosted components information, has meaningful, threat-relevant and

distinct CTI Data, as shown in Figure 5.10.

Upon close examination of the updated STIX report, followings components infor-

mation can be observed. (1) There are 2 TTPs namely Remote Access Trojan and Spear

83

Chapter 5. Cyber Threat Response Activities

Figure 5.9: IBM STIX Description Portion

Figure 5.10: STIX-2 : Boosted STIX Report

phishing. (2) The indicator labeled as “Hash watchlist” can be identified. It has several

“Hashes” as observables, which can be used for cyber threat detection. (3) There are

multiple COAs such as Keep application, software and antivirus update and Block hashes at

firewalls and gateways, which can be used for cyber threat prevention and response.

84

Chapter 5. Cyber Threat Response Activities

5.5.4 Valuation of the TG-3390 Boosted STIX Report

The valuation module retrieves the boosted STIX report components’ information from

the graph database for the valuation and prioritization. Then, it evaluates the STIX

model and automatically generates valuation: reports, scores and graph for different

phases of the cyber threat management. These reports provide key STIX components

such as TTPs, exploits, indicators, observables and their corresponding COAs to users in

filtered form for every cyber threat management phase. The valuation details of the

IBM STIX report for different phases of cyber threat management is provided in the

ensuing subsection.

5.5.4.1 Valuation for Cyber Threat Prevention

As discussed earlier, STIX’s components namely TTPs, exploit targets, and COAs are

important for cyber threat prevention. The valuation module retrieves these boosted

STIX’s components from the graph database. Then it generates a valuation report

as well as a valuation score for the prevention phase of cyber threat management.

Regarding TG-3390, the valuation report guides the analyst that this APT employs a

Spearphishing TTP. The TTP uses an email attachment as an exploit, which can be seen in

Figure 5.11. It further indicates that the analyst can safeguard his organization from the

aforesaid exploit by employing COAs namely use up-to-date-antivirus and use up-to-date

OS and applications.

Figure 5.11: Valuation Report for Cyber Threat Prevention

With reference to TG-3390, the valuation score (vScore) for cyber threat prevention

phase is shown in Table 5.8, while calculation details of (vScore) are as follows:

• The aforementioned COAs are potential remedies for the spearphishing email ex-

ploits; hence each of these will get a producer strength score PS(p) as 2 (Table 5.4).

85

Chapter 5. Cyber Threat Response Activities

• The impact score of the first COA “use up-to-date antivirus” I(coa) with a high level

of confidence is 4 and the efficacy score E(coa) of the COA with a medium level of

confidence is 3. The substitution of the impact and efficacy scores in the Equation

5.7 outputs the COA Mass score of 7. According to the Equation 5.9, the COA’s

ranking score is computed as (CM(coa)+(PS(p)), which is 9.

• The impact score I(coa) of the second COA “use up-to-date OS and applications”

with a medium level of confidence is 5, while the efficacy score (E(coa)) with a high

level of confidence is 6. The substitution of these scores in the Equation 5.7, results

in the COA Mass score of 11. According to the Equation 5.9, the COA’s ranking

score is calculated as PS(p) + CM(coas), that is 13 in this case. The procedure of

this calculation can be seen in Table 5.8.

• The overall valuation score (vScore) of the IBM STIX report for the prevention

phase of cyber threat management is the sum of the individual ranking scores of

all COAs. In this case, for the two COAs, this computes to 22 (Equation 5.10).

Table 5.8: STIX Valuation for Prevention Phase

Component PS(p) CM(coa) CRF(coa,p)coa1 2 7 9coa2 2 11 13

vScore = CRF( coa1, p ) + CRF( coa2, p) 22

5.5.4.2 Valuation for Cyber Threat Detection

In order to detect the cyber attack within the victim network, the indicators and their

observables are used. As regards to TG-3390, the detection report guides the analyst that

this APT has 49 hash values, which can be used for detection of the APT, as shown in

Figure 5.12.

Regarding the TG-3390, the procedure of the valuation score (vScore) calculation for

the detection phase of CTM is shown in Table 5.9 and its details are as follows. (1)

Indicator Mass score (IM(indicator)): The Likely impact score of the indicator (hash watch-

list) with a medium level of confidence is (LI(indicator) + C(indicator)) 3 (Equation

86

Chapter 5. Cyber Threat Response Activities

Figure 5.12: Valuation Report for Cyber Threat Detection

5.11) and is called Indicator Mass score. (2) The Hash Watchlist indicator (Figure 5.10)

has the efficacy score (IE(indicator)) as 1 (Table 5.6). According to the Equation 5.14,

the indicator’s ranking score is calculated through IRF (indicator, |observable|) function

as {(IM(indicator) + IE(indicator)} × |observables|, which is 196 in this scenario. (3) The

final valuation score (vScore) for 49 observables of the IBM’s STIX report for cyber threat

detection is computed as IRF(indicator)× |observables|, which is 196 here.

Table 5.9: STIX Valuation for Detection Phase

Component IM(indicator) IE(indicator) |observable| IRF(indicator, |observable|)Hash Watchlist 3 1 49 196

vScore = IRF (indicator, |observable|) 196

5.5.4.3 Valuation for Cyber Threat Response

In order to respond the cyber attack, the indicators, observables, and their COA’s are

used. As regards to TG-3390, the response report is shown in Figure 5.13. It illustrates

that to stop the aforesaid APT hash values should be blocked at firewalls and gateways.

Regarding the case study, the valuation score (vScore) for the response phase of cyber

threat management is shown in Table 5.10 and its details are as follows.

• The COA is produced from the indicator component (Figure 5.4), therefore it has

a producer strength score as 3 (Table 5.4).

87

Chapter 5. Cyber Threat Response Activities

Figure 5.13: Valuation Report for Cyber Threat Response

• The impact score of the COA with a medium level of confidence is 3, while the coas

efficacy score with a low level of confidence is 2. On the basis of these scores the

COA Mass score (Equation 5.7) is computed, which is 5 in this case.

Table 5.10: STIX Valuation for Response Phase

Component PS(p) CM(coa) CRF(coa,p)coa 3 5 8

vScore = CRF ( coa,p ) 8

• According to the Equation 5.9, the COA’s ranking score is calculated as (CM(coa)

+ PS(p)) that is 8 in this scenario. The overall valuation score (vScore) (Equation

5.10) of the TG-3390 for the response phase of cyber threat management will be 8.

5.5.4.4 Valuation Graph

With respect to the TG- 3390 IBM STIX report, a pie graph is generated to provide

a relative comparison of different phases of cyber threat management, as shown in

Figure 5.14. It displays two components namely cyber threat phases and their valuation

scores. In the graph, every cyber threat management phase value is displayed as a

percentage of the total, which are represented by angles of a circle. The valuation score

(vScore) and the relative percentage of the share of every phase are shown.

A glance at the graph, the reader can see that the boosted STIX report (Figure 5.10)

provides the highest amount of information which is 196 (87%), shown as a blue slice

in the graph, for the detection phase of cyber threat management.

It can be further noticed that the valuation score for the prevention phase of cyber

threat management is 22 (10%), whereas for the response phase of cyber threat man-

agement the valuation score is 8 (3%). The refinement for TG-3390 case study is shown

88

Chapter 5. Cyber Threat Response Activities

Figure 5.14: STIX Valuation for CTM

in ensuing paragraphs.

5.5.5 Refinement

The Mitre ATT&CK is a knowledgebase [125], which provides details about real-world

cyber attacks and guides the security teams on how to prevent, detect, and respond to a

cyber attack. According to the extracted information (Figure 5.15) from the ATT&CK’s

knowledgebase, the Remote Access TTP can be mitigated by several techniques such as Use

of IPS, Properly Configure Firewalls and Proxies and by applying Application Whitelisting.

Figure 5.15: TG-3390 Techniques, Mitigation and Detection

Moreover, the Crawler module identified new indicator Port Watchlist, its observable

Port such as 50, 80, 443 and their remedies for cyber threat detection and response.

Subsequently, the Adder module fuses the newly retrieved components and generates

a refined STIXs graph accordingly, which can be seen in Figure 5.16.

Then, the refined STIX report is made available to the analyst as well as loop backed

to the Valuation module. The Valuation module processes the refined STIX report and

generates valuation reports for cyber threat prevention, detection, and response.

89

Chapter 5. Cyber Threat Response Activities

Figure 5.16: SCERM’s Refined STIX Report

5.5.6 Valuation Comparison

To provide the valuation comparison of the TG-3390’s boosted and refined STIX re-

ports, a pie graph is generated, which can be seen in Figure 5.17.

Figure 5.17: Valuation Comparison - Boosted vs Refined STIX Reports

It can be observed from the graph that the refined STIX report has enhanced valu-

ation scores for all three phases of cyber threat management. A detailed comparison

of these scores is as follows. (1) The valuation score for the prevention phase is in-

creased by 13%, while the response phase score is increased by 12%. (2) Although the

overall share of the detection phase seems reduced, in fact, this is due to the increase

in CTI share by the other two phases in a greater proportion. Otherwise, the CTI data

provided by the refined STIX report for the detection phase has increased, which can

be judged from the valuation score, which has become 223 in case of the refined STIX

90

Chapter 5. Cyber Threat Response Activities

report whereas the initial boosted STIX report was 196.

5.6 SCERM Evaluation

Our evaluation is based on measuring the effectiveness and efficiency of the SCERM

framework. The effectiveness is measured in terms of the accuracy and usability of the

system. The efficiency, on the other hand, is evaluated with reference to processor and

memory utilization. The evaluation outcomes confirm the usefulness of the SCERM

framework for cyber threat management. In subsequent sections, at first, a brief de-

scription of the datasets selected for the evaluation is provided. Then, the current state

of STIX reports for cyber threat management is presented. Next, the effectiveness and

efficiency results of the SCERM are presented in the ensuing sections. Afterwards com-

parative comparison of SCERM is provided.

5.6.1 Dataset Selection and Evaluation Setup

To demonstrate the current state of the STIX reports for cyber threat management and

to evaluate the SCERM system three different datasets are selected, which can be seen

in Table 5.11. These datasets are retrieved from STIX’s repositories namely Schema-

test [43], HAILATAXII [21], and IBM X-Force Exchange [22].

Table 5.11: STIX Dataset

Dataset # STIX Repository |STIXs|1 IBM X-Force Exchange 252 HAILATAXII 253 Schemas-test 25

Total 75

According to IBM, the X-Force Exchange has cyber threat data from 270M devices

and 25 billion CTI data from cyber attacks, security blogs. While the Schemas-test repos-

itory provides corpus for the testing of the STIX schemas and is comprises of about

4788 STIX reports. Whereas, the HAILATAXII is an open source threat feed, which pro-

vides CTI data in STIX format. currently, it has about 1107066 cyber threat indicators.

For our experiment, 25 STIX reports are randomly selected from every repository to

91

Chapter 5. Cyber Threat Response Activities

demonstrate the current state of the STIX reports for cyber threat management. Sub-

sequently, 27 out of 75 STIX reports are selected for evaluation of the SCERM system,

which have a greater number of STIX components related to cyber threat management.

5.6.2 Current State of the STIX Reports for Cyber Threat Manage-

ment

During research, it is learned that most of the time, STIX reports do not contain CTI

data for different phases of cyber threat management. This hypothesis becomes a mo-

tivation for us to develop a framework for the valuation, boosting and refinement of

STIX reports for cyber threat management. To convince the security community, first

of all, components evaluation of the STIX datasets for cyber threat management will

be presented.

In order to evaluate the current state of the STIX reports for different phases of

cyber threat management, an experiment is performed, whose details are as follows.

At first, 75 STIX reports (Table 5.11) are retrieved. Then pre-processing module with

limited functionalities is employed to extract the key components for different phases

of cyber threat management. Details of these components and their associated cyber

threat management phases are as follows. (1) The COA components having remedy

option (Figure 5.3) are selected for the prevention phase. (2) While the indicators and

observables are picked for the detection phase. (3) Whereas, COAs having response op-

tion are chosen for the response phase of cyber threat management. Subsequently, a

Frequency Distribution test is applied and a histogram is generated to show the valua-

tion score of STIX reports for different phases of cyber threat management, which can

be seen in Figure 5.18.

Figure 5.18: Current State of STIX Repositories for CTM

92

Chapter 5. Cyber Threat Response Activities

If we carefully observe the graph, it can be identified that STIX repositories are

shown on the x-axis, while the valuation scores for different phases of cyber threat

management are provided on the y-axis. There are three bars for every category in the

graph. These bars are represented by horizontal bricks, diagonal brick, and zig-zag pat-

terns, which represent a sum of the valuation scores taken by underlying STIX repos-

itories for cyber threat prevention, detection, and response phases, respectively. A

detailed description of the graph is as follows. (1) The Schemas-test repository does

not provide any CTI for the prevention, detection, and response phases of cyber threat

management. In this repository, a number of STIX reports are found, which have in-

appropriate and incomplete information. (2) The HAILATAXII repository outlines CTI

data for the detection phase of cyber threat management. It does not share any infor-

mation about cyber threat prevention and response. Similar to the Schemas-test repos-

itory, the HAILATAXII has various STIX reports with missing, incomplete, and inap-

propriate information. (3) The IBM STIX reports are providing greater CTI data for the

detection phase than for the prevention phase. In this repository, several STIX reports

are identified that have inappropriate and redundant CTI. In this section, a high-level

valuation of STIX reports was presented. In the next section, a deeper working of the

proposed system will be shared.

5.6.3 Effectiveness of the Proposed Solution

In order to evaluate the effectiveness of our proposed system (SCERM), 27 STIX re-

ports are selected from the STIX’s dataset (Table 5.11) and processed through all the

three stages of the SCERM system i.e pre-processing, valuation, and refinement. After

each stage, the valuation scores are plotted to track the usefulness of the STIX reports

for the cyber threat management process. A detailed description of the aforesaid valu-

ation and refinement procedures is as follows. To have a fair comparison the valuation

score (vScore) of the raw STIX reports is evaluated for the prevention, detection, and

response phases of cyber threat management before any processing by SCERM is at-

tempted. This can be seen in Figure 5.19.

The raw STIX reports are shown on the x-axis, while the vScores for the three dif-

ferent phases of cyber threat management are provided on the y-axis. The dashed down-

ward diagonal, zig-zag and the diagonal brick pattern bars depict the prevention, detection

93

Chapter 5. Cyber Threat Response Activities

Figure 5.19: Evaluation of RAW STIX Reports for CTM

and response phases, respectively. It can be seen that the valuation score for the detec-

tion phase is relatively higher overall than for the prevention and response phases of

cyber threat management. The highest valuation scores for cyber threat management

detection and response were achieved by Operation Monsoon as 2616 and Dark Hotel as

505 respectively. The overall average prevention and detection scores are measured to

be 34 and 505 respectively for the raw STIX samples considered.

Next, the Preprocessor performs the boosting operation on this sample and the val-

uation is repeated. In this experiment, we consider not only the individual valuation

scores (vSocres) of the APTs but a sum of vScores calculated over all the APTs com-

bined. This is done to provide clarity in a presentation for a comparative analysis of

the raw and boosted STIX reports for each of the prevention, detection, and response

phases. The comparative analysis can be seen in Figure 5.20.

The dashed downward diagonal pattern bar presents vScore values for the raw STIX

reports. Whereas, the zig-zag pattern bar depicts the vScore values of the boosted STIX

reports. It can be seen that the boosting results in a minor increase in the valuation

scores. In this case, the detection score has increased by only 3% and the prevention

scores remain the same.

Afterwards, the boosted STIX reports are refined and the valuation procedure is

repeated. A sum of vScores is calculated over all the refined APTs combined for the

prevention, detection and response phases. The comparison of the refined, boosted,

and the raw STIX reports for CTM can be seen in Figure 5.20. The diagonal brick pattern

94

Chapter 5. Cyber Threat Response Activities

bar describes vScore values of the refined STIX reports.

Figure 5.20: Evaluation of STIX Repositories for CTM

The vScores after the refinement procedure are visibly improved as compared to

the scores of the raw and the boosted reports. Specifically, the improvement in the

prevention phase is 73% and in the response is 100%. This is because of the zero vS-

core for the raw and the boosted STIX reports. Whereas, the detection scores remain

unchanged.

5.6.3.1 Comparative Analysis

Due to SCERM’s novelty, we were unable to find competing tools for direct comparison

but we did perform both a qualitative and statistical analysis with the closest possible

CTM systems, tools, and algorithms available today. This includes Virus Total [40],

Bro [33], Splunk [34], Machine Learning Based Security project (MLSec) [81], FeedRank [82],

TISA [93], and SML [90]. Due to the open-source nature and availability of similar fea-

tures such as boosting and refinement in the MLSec project, this project, in particular,

was selected for the statistical comparison of the results.

5.6.3.1.1 Qualitative Comparison A detailed qualitative comparison of aforesaid

machine learning based solutions with SCERM is provided in Table 5.12. Virus To-

tal receives URLs and malware-infected files as input, employs both signature-based

and machine learning techniques to detect zero-day threats. Bro and Splunk are de-

signed for analysis of log files. MLSec and FeedRank are designed to analyze CTI feeds.

FeedRank correlates CTI feeds and ranks them according to their correlation score. TISA

employs supervised and semi-supervised machine learning techniques for extraction

95

Chapter 5. Cyber Threat Response Activities

of CTI data from structured and unstructured textual reports. Whereas, SML uses

a supervised machine learning technique to get CTI artifacts from unstructured text

reports. It can be seen from the table that most of the tools performed boosting of

low-level artifacts such as Hash, IP, and DNs while boosting of high-level artifacts such

as Network Artifacts, Host Artifacts, and TTPs is only performed by SML and SCERM.

Similarly, a few of the tools perform Remapping and Refinement of low-level artifacts,

however, none of the tools perform refinement of high-level artifacts and valuation of

CTI data for different phases of cyber threat management, except SCERM.

Table 5.12: Qualitative Comparison

CTM Solution Input Boosting Remapping Refinement Valuationfor CTM

Low-Level High-Level Low-Level High-LevelVirus Total URL and Log Files Yes No No No No NoBro Log Files Yes No Yes No No NoSplunk Log Files Yes No Yes Yes No NoMLSec CTI Feeds Yes No No Yes No NoFeedRank CTI Feeds Yes No No Yes No NoTISA Structured / Unstructured Textual Data Yes No Yes Yes No NoSML Unstructured Textual Data Yes Yes No No No NoSCERM Structured Textual Data (STIX Reports) Yes Yes Yes Yes Yes Yes

5.6.3.1.2 Statistical Comparison To conduct a fair comparison of the level of effi-

cacy achieved by SCERM’s boosting and refinement with competing machine learn-

ing tools we selected the MLSec Project. MLSec provides the Uniqueness and Enrich-

ment tests which may be directly compared to the Boosting and Refinements functions

of SCERM. In the experiment, the MLSEc software is downloaded and installed from

the provider’s website. Then, CTI data from the Attack.Mitre is extracted and labeled

to test the efficacy of both systems. MLSec accepts data in .csv format while SCERM

receives .xml or .json files as input. Therefore the extracted data is encoded in both .csv

and .xml files without any loss of information. Afterwards, MLSEc is opened and its

Uniqueness and Enrichment tests are performed on the .csv file. Similarly, the .xml file is

processed through SCERM and results are shown in Figure 5.21 where tests names are

shown on the x-axis, while corresponding scores are provided on the y-axis.

It can be observed from the figure (Figure 5.21) that results produced by SCERM

are more accurate than MLSec. Using manual analysis, it was identified that there were

50 unique malicious IPs in extracted CTI data. SCERM extracted all IPs while MLSec

96

Chapter 5. Cyber Threat Response Activities

Figure 5.21: Statistical Comparison

extracted 40 IPs only. Upon investigation of this behaviour, it was observed that MLSec

was unable to identify the CTI data for boosting which was ambiguously labeled in

the input file by the provider. Similarly, during the manual investigation of refinement

results, it was observed that there were 20 Domain Names in the input data and SCERM

used them to identify and extraction of additional 20 IPS during refinement.

To confirm the effectiveness of the SCERM system further, it is shared with domain

experts. They performed multiple tests and confirmed the efficacy of the generated

results. Moreover, it is endorsed that valuation, prioritization, and extraction of STIX

components such as TTPs, indicators, exploits, observables and their COAs are not possi-

ble to perceive manually. The experiment’s details and outcomes are provided in the

next section.

5.6.3.2 User Study

A study is carried out to verify the effectiveness of the SCERM system from the user’s

viewpoint in terms of cyber threat management. The proposed framework’s proto-

type is provided to the participants with all the prerequisite configuration details and

sample STIX reports. They are asked to use the SCERM system and share results. The

participants’ demography summary is provided in Table 5.13. All of them belong to

the information security domain and have experience between 1 to 5 years.

Table 5.14 provides users’ feedback regarding the SCERM system. It reveals that

100% of the participants feel that the current state of the structured threat data is poor

and there is a need for a tool which performs data boosting, refinement, and evaluation.

80% of the participants acknowledged that SCERM is easy to use. 90% of the users

admit that SCERM’s results are accurate, efficient, and easy to understand as compared

97

Chapter 5. Cyber Threat Response Activities

Table 5.13: Participants Details

User details CountTotal Participants 20

Education GraduatePostgraduate

Expertise:Cyber Threat Analysis 12 (60%)Software Development 8 (40%)

Experience: 1 to 5 yearsAge 22 to 35 YearsKnowledge of:

Advanced Threats 16 (80%)Structured cyber threat intelligence 8 (40%)

to the manual method. Few of the suggestions are about the automation of the boosting

dictionary “CD[][]” (can be seen in algorithm 5.2) and curated list “PD[][]” (can be

seen in algorithm 5.3) generation. It is worth mentioning, 100% of the users confirm

that automatic analysis of STIX reports and key components extraction by SCERM for

cyber threat prevention, detection, and responses phases allow them to perform cyber

threat management efficiently.

Table 5.14: SCERM Evaluation Survey

Survey Questions ResultsIs there a current need of automatic boosting, enhancement 100%and quality testing of the structured threat intelligence.How you compare SCERM and other tools which you usedfor CTM:

The SCERM is easier to use. 80%Its results are more accurate than others. 90%Its outcomes are easy to understand. 100%It provides additional outcomes. 100%

During SCERM’s experiment, how you perceive:The directory generation process of components 80%remapping module is a simple procedure.The curated list preparation for refinement is an easy task. 70%

The automatic analysis of STIX reports and key components 100%extraction for different phases of CTM allow meto perform CTM more efficiently.

5.6.4 Efficiency

To study the algorithmic efficiency of SCERM, CPU utilization during processing and

memory space usage is analyzed and found to be quite low. The details of the experi-

ment are provided below and an intuitive discussion of this is as follows. The SCERM

98

Chapter 5. Cyber Threat Response Activities

design is based on simple and concise scripts that are extensible and do not rely on a

particular platform or technology.

To enhance efficiency, SCERM is designed to perform functionalities in an offline

mode such as the (1) parsing of input reports and (2) preparation of Booster’s com-

ponent dictionary and Refinement’s dataset. As CTI for new threats emerge these can

be added to the database incrementally. Moreover, the algorithms presented in sec-

tion 5 have a polynomial running time in the size of the input. In our implementation

we have pre-computed constant parts, array elements have been carefully referenced,

conditional statements are properly terminated, the database has been normalized and

the code avoids redundant computations.

The efficiency measurement is performed by processing different sets of STIX re-

ports (Table 5.11). These reports are provided offline and are processed in a batch-

processing fashion. To test the efficiency of the SCERM system, it is deployed on an

Intel(R) Pentium(R) machine with CPU B950 @ 2.10GHz and 6 GB of RAM. The OS of

the machine is Windows 7 Ultimate, 64-bit. Minor increases in processor and memory

utilization are observed by varying the number of STIX reports and their sizes.

5.6.4.1 Processor utilization

The efficiency testing of the SCERM system in terms of CPU utilization is performed

as follows. (1) First of all, 2500 STIX reports are imported and 5 different sets are

composed. These sets comprise of 500, 1000, 1500, 2000 and 2500 STIX reports. (2)

Next, each set of STIX report is processed through the SCERM system and processor

usage and execution time is calculated, which can be seen in Figure 5.22 (a). It can

be observed that several sets of STIX reports are shown on the x-axis, while the CPU

utilization in terms of CPU percentage and execution time is provided on the y-axis. The

solid line shows the CPU utilization, while the dotted line depicts the execution time of the

SCERM’s software.

It can be noticed that the increase in CPU execution time is proportional to the

change in the number of input STIX reports, whereas, CPU utilization percentage

increases slightly. It is important to highlight that at the time of writing in the At-

tack.Mitre [20] knowledge base there are 100 intrusion activities (groups), whereas dur-

ing testing we run SCERM on 2500 reports and no degradation in CPU or memory

99

Chapter 5. Cyber Threat Response Activities

(a) CPU Utilization (b) Memory Utilization

Figure 5.22: SCERM Efficiency

usage is observed (Figure 5.22), therefore it is reasonable to say that it is an efficient

tool for the work load in the IT enterprise.

5.6.4.2 Memory Utilization

The proposed framework SCERM performs three main operations namely boosting,

valuation, and refinement and all of these operations consume memory. Figure 5.22 (b)

presents the memory usage by the SCERM framework. It can be identified that sets of

STIX reports are shown on the x-axis, while the memory usage is provided on the y-axis.

The figure shows that memory usage slightly increases with respect to the number of

input files.

100

Chapter 6

APTs Analysis and Classification

System

6.1 Introduction

This chapter presents the procedure and techniques adopted by the APTs Analysis

and Classification System A2CS for automatic analysis of APTs, identification of their

missing artifacts, and inferencing of the Tactics, Techniques and Procedures being em-

ployed. In the A2CS sub-framework, a combined ontology of CKC and POP models

is developed. SWRL rules are written for APTs analysis and identification of their

missing artifacts. Furthermore, a case study of the Point of Sales (POS) system is also

presented to demonstrate the working of the A2CS.

6.2 Research Approach and Contributions

In the recent past, several models have been proposed related to cyber attack analysis

of which two particulars models are of interest and are more popular. These models

are the CKC [25] and the POP [26]. The CKC guides an analyst regarding how a perpe-

trator uses different phases such as Reconnaissance, Weaponization, Delivery, Exploitation,

Installation, and Exfiltration to launch a cyber attack. It further guides the security an-

alyst regarding how various signatures and artifacts available at different attack levels

can be used to defend their network from advanced cyber attacks. Whereas, the POP

model describes the efficacy of indicators namely Hash values, IP addresses, DNs, Net-

101

Chapter 6. APTs Analysis and Classification System

work artifacts, Host artifacts, Tools, and TTPs. It places these indicators at different levels

of the pyramid. Moreover, it states that the treatment of the low-level artifacts such as

hash values, IPs, and DNs cause less damage to the attacker while high-artifacts like host

and network artifacts, tools and TTPs cause more damage.

Heretofore, the CKC and POP are theoretical models and are not used in real se-

curity solutions. These models are complementary to each other and the cyber attack

picture cannot be seen holistically without using one of these models. Due to these rea-

sons, a combined ontology of both models is developed that can be seen in Figure 6.1.

In the proposed ontology, 45 classes, 44 objects, and 10 data properties are developed.

The blue circles in the figure depict entities of CKC, orange circles are associated with

POP and green entities are common.

Figure 6.1: Combined Ontology of CKC and POP

At first, real examples of the Point of Sale’s (POS) well-known APTs are selected

for the demonstration of the A2CS. Afterwards, various security blogs are scanned to

102

Chapter 6. APTs Analysis and Classification System

gather CTI data related to these APTs. Although, a significant amount of CTI data is

found, however, the following challenges are faced. (1) The conversion of extracted

CTI data in a structured form and developing its connection and relationship is a chal-

lenging task. Moreover, it is learned that Ontology is the best way to develop and

analyse such relationships. Accordingly, extracted CTI data is mapped on the CKC

and the POP models. (2) The second problem with CTI data is that it generally con-

tains low-level artifacts while the high-level artifacts related to most of the APTs are

missing. Therefore, at first, incomplete artifacts are identified in the CTI data. Then,

high-level artifacts are deduced through a combination of low-level artifacts.

6.3 A2CS Architecture

The A2CS architecture can be seen in Figure 6.2. Details of its various modules are pre-

sented by using POS APTs namely JackPOS and BackOff. These APTs are selected from

a large family of POS APTs [116] which comprises of Reedum, Fsyna, Dexter, Treasure

hunt, Posfind, Alina, Poseidon, JackPOS, and BackOff. A2CS system fetches web reports

from the internet and forwards these to the Parser module.

Figure 6.2: A2CS Flow Diagram

103

Chapter 6. APTs Analysis and Classification System

The Parser parses the data and extracts the entities and concepts. Next, the Mapper

module correlates these extracted concepts with different phases of CKC and POP. As

the example is shown in Figure 6.3. The outputs of the Mapper module are as follows:

• Installation/ Host Artifacts: These artifacts are registry entries, filenames or folder-

name. For example, during installation phase, the JackPOS creates files namely

%Temp%

svchost.exe, java.exe, javaw.exe, javcpl.exe , and the BackOff creates javaw.txt, Log.txt,

Local.dat, winserv.exe files.

• Network Artifacts: These artifacts are related to the Command and Control (C2) or

Domain Name. In this phase, both the malware are using the HTTP protocol and

hard-coded domain names to communicate with C2.

• TTPs: The BackOff malware uses both Memory Scraping and Keystroke logging tech-

niques for data stealing while JackPOS uses Memory Scraping technique, only.

Then Mapper module feeds this extracted information into the knowledge base.

Next, the Reasoner module executes the rules over the knowledge base. The next section

will give details of the reasoning module.

Figure 6.3: Concepts Extraction and Mapping

104

Chapter 6. APTs Analysis and Classification System

6.4 Analysis via Reasoning

During research, various methods for the analysis of APTs are employed such as Time

analysis, Common Artifacts analysis, and TTPs analysis for evaluation of the proposed

sub-framework. Whereas, Risk analysis, Dependency analysis, and Complexity analysis

are planned in the future work.

6.4.1 Identification of Missing Artifacts

As a result of our studies, it is observed that high-level artifacts of APTs are generally

missing. In this research, two types of techniques are developed for the identification

of these missing artifacts. Using the first technique called Time analysis. A2CS fetches

information regarding various aspects of the APT from multiple reports of different

date and time and combine them in the ontology knowledgebase. For example, in

our case of information retrieval regarding the BackOff APT, concerning Host artifacts

are retrieved from the Symantec portal whereas Network artifacts are extracted from

IBM X-force, as shown in Figure 6.4. This is important because threat sources usually

specialize in particular aspects of threat reporting.

Figure 6.4: Identification of Missing Artifacts

105

Chapter 6. APTs Analysis and Classification System

The second technique is called Common Artifacts analysis. It concerns the aug-

mentation and enrichment of information about an incomplete APT from information

about known or previously studied APTs of the same family. For example, JackPOS is a

recent successor of BackOff and is therefore not as well studied as the latter. Our knowl-

edgebase already consisted of information regarding BackOff APT’s stealing methods

and affected device. When the reasoning module correlated the artifacts of both, it

concluded that since both are attacking the same domain i.e. the retailer industry and

directly affecting the terminal. Therefore, JackPOS may be employing a similar stealing

Method as used by the BackOff.

A number of queries are developed for the identification of missing artifacts in the

Semantic Query-Enhanced Web Rule Language (SQWRL), a sample of these queries are as

follows.

The Query-1 Equation 6.1 correlates files and folders names, and identifies the com-

mon.

Attacker(?AT ) ∧ APT (?AP) ∧ launch(?AT , ?AP)∧

producesHostArtifacts(?AP , ?HA) ∧ createFile(?HA, ?CF )→

sqwrl : select(?AP , ?CF ) ∧ sqwrl : orderBy(?CF )

(6.1)

Similarly, the Query-2 Equation 6.2 is designed for finding the information regard-

ing stealing methods used by the APTs.

Attacker(?AT ) ∧ APT (?AP) ∧Weaponization(?WP)∧

Perform(?AT , ?WP) ∧ StealingMethod(?WP , ?SM )→

launch(?AT , ?AP)∧

sqwrl : select(?AP , ?SMF ) ∧ sqwrl : orderBy(?AP)

(6.2)

The correlation of the JackPOS and BackOff APTs generated by our proposed system

is shown in Figure 6.5. Dotted lines indicate partially matched artifacts while fully

matched artifacts are presented by solid lines.

The correlation results are summarized in Figure 6.6. These results indicate that

106

Chapter 6. APTs Analysis and Classification System

Figure 6.5: Correlation of JackPOS and BackOff APTs

most of the phases such as Weaponization, Host Artifacts, Network Artifacts, and TTPs

are common in JackPOS and BackOff.

The results demonstrate that both the APTs have 53% artifacts in common. On

the bases of these results, the A2CS declares that both the APTs belong to the same

family. The main difference between the APTs is in their Delivery phase i.e. the JackPOS

focused more on the Delivery phase than BackOff. If an analyst wants to block these

APTs then he should focus on deploying controls to mitigate their Delivery phase.

6.4.2 Tactics, Techniques and Procedure (TTPs) Analysis

In the cyber-attack analysis, the role of the TTPs is to identify individual patterns of be-

haviors. Identifying the behaviors allows the identification and characterization of the

general behavior of an attacker. If an organization can block the general APT behavior,

then he can cause much more pain to the attacker. If data about the low-level indicator

107

Chapter 6. APTs Analysis and Classification System

Figure 6.6: Summary of Correlation Results

is available in knowledgebase then A2CS based on ontological design and inferencing

rules can predict the TTPs. Several SWRL rules are developed for inferencing of the

TTPs. Rule-1 can be seen in Equation 6.3 and its ontology is shown in Figure 6.7. This

rule infers that if target exploits Remote Desktop Login vulnerability then the delivery

method of the malware will be Manual planting.

Figure 6.7: Ontology ofRule-1

Attacker(?AT ) ∧APT (?AP) ∧ TTP(?TT ) ∧Vulnerability(?VUL)∧

Delivery(?DV ) ∧ launch(?AT , ?AP)∧

hasTTP(?AP , ?TT ) ∧ hasDeliveryVector(?AP , ?DV )∧

targetVulnerability(?AP , ?VUL)∧

hasVulType(?VUL,Remote Desktop Login)→

hasSource(?DV ,Manual Planting)

(6.3)

Similarly, Rule-2 is provided in Equation 6.4 and its ontology is shown in Figure

6.8. It describes that if the RAM Scrapping technique is used and APT belongs to the

POS family then the aim of the perpetrator will be to steal Credit card and Personally

identifiable information (PII) data.

108

Chapter 6. APTs Analysis and Classification System

Figure 6.8: Ontology of Rule-2

Attacker(?AT ) ∧APT (?AP) ∧ TTP(?TT )∧

belongsTo(?AP ,POS Family)∧

StealingMethod(?WP ,Ram Scrapping)→

hasAim(?TT ,Credit Card and PII )

(6.4)

Likewise, the Rule-3 is shown in Equation 6.5 and its ontology can be seen in Figure

6.9. This rule describes that if in an attack RAM Scrapping technique and Browser are

used then the perpetrator will be interested in stealing Banking Credentials and PII.

Figure 6.9: Ontology of Rule-3

Attacker(?AT ) ∧APT (?AP) ∧ TTP(?TT )∧

Weaponization(?WP) ∧ uses(?AP ,Browser)∧

StealingMethod(?WP ,Ram Scrapping)→

hasAim(?TT ,Credit Card and PII )

(6.5)

The inferencing results are very meaningful. These indicate that if someone belongs

from an organization that deals with the credit card or online accounts then he must

be careful about these APTs and try to safeguard the system from these information

stealing techniques.

109

Chapter 7

Discussion

This research work uses the strengths of ontologies, formal methods, various secu-

rity models, and structuring languages to generate, analyze, and rank the structured

CTI data for different phases of cyber threat management. Particularly, the focus of

research work was to (1) develop a sub-framework for the generation of structured,

error-free, distinct, and threat-relevant CTI data. (2) develop a formal model to mea-

sures the quality of the structured reports,, boost, and refine the CTI data. (3) build a

combined ontology of the CKC and POP models and then identify missing artifacts of

APTs and detection of high-level artifacts through the analysis and correlation of the

low-level artifacts. This chapter will discuss and answer the research questions which

were raised in section 1.7.

Question # 1 : Does currently available cyber threat intelligence data follows

NIST guidelines of timely, relevant, specific, accurate, and actionable threat intelli-

gence for CTM?

Answer : To analyse the current state of the CTI data, 75 structured reports from

three different publically available repositories namely Schema-test [43], HAILATAXII

[21], and IBM X-Force Exchange [22] are retrieved. Then, key components for different

phases of cyber threat management are extracted and analysed. It is identified that

most of the time STIX reports contain inappropriate, incomplete and redundant CTI

data. The Schemas-test repository does not provide any CTI for the prevention, detec-

tion, and response phases of cyber threat management. The HAILATAXII repository

outlines CTI data for the detection phase of cyber threat management, only. Like the

Schemas-test repository, the HAILATAXII has various STIX reports with missing, in-

110

Chapter 7. Discussion

complete, and inappropriate CTI data. Whereas, the IBM STIX reports are providing

greater CTI data for the detection phase than for the prevention phase. All of these out-

comes confirm that most of the time STIX reports contain inappropriate, incomplete,

and redundant CTI data for CTM.

Question # 2 : Is it possible to quantitatively measure the quality of CTI data

produced by cyber threat sources and ultimately rank them?

Answer : During this research various efforts are studied which describe the subjec-

tive nature of CTI data most of the time. These efforts make it difficult to perform quan-

titative measurements of the different aspects of the threat data. Therefore, an alterna-

tive model called SAM is developed, which considers the characteristics of the STIX

domain and relationship objects in a quantitative fashion. Furthermore, we valuated

structure threat reports for cyber threat prevention, detection, and response phases of

CTM.

Question # 3 : What level of CTI data’s refinement can be achieved for cyber

threat prevention, detection and response?

Answer : To answer this question a case study (section 5.5) is presented, which

demonstrates the Refinement and other functionalities of the SCERM. The outcomes

(section: 5.5.6) reveal that the SCERM prototype significantly refined the STIX reports.

Moreover, a number of publically available STIX reports are retrieved and processed

through SCERM. The valuation results reveal that after the Refinement procedure up-

dated STIX reports significantly enhanced, as can be seen in Figure 5.19. The enhance-

ment in the prevention phase is 73% while in the response phase is 100%.

Question # 4 : If ontological modeling of cyber threat data according to existing

solutions is performed, will it help to understand and defend cyber attacks?

Answer : A sub-framework namely A2CS (Chapter: 6) is developed that is based

on a combined ontology of the attacker and defender models. In the proposed ontol-

ogy, 45 classes, 44 objects, and 10 data properties are developed. A2CS extracts CTI

data from various blogs and structured reports and maps it on the combined ontology.

Subsequently, this sub-framework helps an analyst for the identification of missing or

incomplete CTI data and inferencing of the TTPs.

Question # 5 : Can formal rules be devised such that they can aid machines in

automated analysis of cyber attacks, their prevention, detection, and response?

111

Chapter 7. Discussion

Answer : To answer this question, several rules are developed (section: 6.4.2) in the

Semantic Web Rule Language (SWRL) that helps an analyst to predict high-level artifacts

from the low-level artifacts. Moreover, multiple queries are also developed (section:

6.4.1) in the Semantic Query-Enhanced Web Rule Language (SQWRL) for the identification

of the missing artifacts. Then, two well-known APTs namely JackPOS and BackOff are

selected for the inferencing of missing artifacts and analysis of TTPs. Various security

blogs are scanned and CTI data is extracted. subsequently, this CTI data is mapped

on the combined ontology and aforesaid rules and queries are applied. The results

demonstrate that both the APTs have 53% artifacts in common. On the bases of these

results, the A2CS declares that both the APTs belong to the same family. The main

difference between the APTs is in their Delivery phase i.e. the JackPOS focused more

on the Delivery phase than BackOff. If an analyst wants to block these APTs then he

should focus on deploying controls to mitigate their Delivery phase.

Although STIX is a well-defined language, which provides multiple properties such

as impact, efficacy, and confidence to describe the usefulness of the shared CTI. However,

most of the shared STIX reports do not employ these properties. Therefore shared re-

ports do not contribute much to the CTM. It is important to highlight that available

STIX reports provide indicators and their observables most of the time and they share

fewer COAs, which are instrumental components for the prevention and response

phases of CTM.

It is very encouraging that governments and the security industry are struggling

hard for structuring and sharing of CTI as a standard, as well as a routine process.

Therefore, it is assuming that ultimately security firms will be more vigilant about the

authenticity and completeness of the structured CTI data for various phases of the

CTM. Accordingly, our framework contribution regarding the structuring, boosting,

valuation, refinement, and APT analysis will be increased significantly.

112

Chapter 8

Conclusions and Future Research

Directions

8.1 Conclusions

The outcome of this thesis is a security framework that consists of three sub-

frameworks namely STIXGEN, SCERM, and A2CS. Each of these sub-frameworks is

developed to achieve set of goals aimed in this research. A brief synopsis of the novel

contributions made by aforesaid sub-frameworks is provided in the following subsec-

tions.

It is learned during the research that presently no online tool is available that au-

tomatically generates distinct, meaningful and error-free structured CTI from the text.

The STIXGEN prototype is the first tool that is designed according to STIX standard

in such a way that it generates threat-relevant, properly placed and error-free struc-

tured data. Therefore, we feel that it will increase the STIX utilization and sharing of

structured CTI between peer organizations. Moreover, it will be used to generate good

quality STIXs for students and analysts in a simple and effective way.

According to our study, most of the time, structured CTI reports do not contain CTI

data for the prevention, detection and response phases of CTM. Although some data

is available, however, that carries inappropriate, incomplete, wrongly placed, and re-

dundant information. Therefore available structured reports cannot be used for CTM.

Ironically, no tool is available to measure the quality of publicly available structured

data. The SCERM prototype is the first tool, which valuates the structured reports ac-

113

Chapter 8. Conclusions and Future Research Directions

cording to different phases of the CTM such as the prevention, detection, and response.

First of all, it performs CTI data cleansing and remapping. Afterwards, it identifies

missing CTI data required for CTM and refines the structured reports accordingly. In

fact, our proposed sub-framework SCERM will enhance the user confidence over struc-

tured CTI data, hence the quality and usage of structured CTI data for the CTM will be

increased.

The Pyramid of Pain and Cyber Kill Chain are emerging and promising models for

network defense. These models are complementary for each other and the cyber at-

tack picture can not be seen exclusively without any of these models. To best of our

knowledge both of these models are theoretical and previously no one has developed a

combined ontology of these. Due to these reasons, we developed a combined ontology

of both the models for the identification of missing artifacts and inferencing of TTPs.

We tested our proposed solution using data from real-world APTs and found that a

large percentage of APTs have several behaviors in common.

8.2 Future Research Directions

The research work shared in this thesis can be further extended in multiple new di-

rections. Few of the research ideas, which are currently being worked upon by our

research group are briefly explained in the following subsections.

Currently, a prototype of STIXGEN is developed to generate structured threat data.

A more useful extension of our proposed framework is to upgrade the STIXGEN ap-

plication for online users so that security analysts can utilize it for STIX generation

and we can get more and more structured CTI to draw the bigger picture of the cyber

threats.

We discuss some limitations of our work and plans for addressing these in our fu-

ture work. The NIST standard [126] defines good quality threat intelligence to be timely,

relevant, specific, accurate, and actionable. Presently, the SCERM framework directly con-

siders the relevance, specificity, and actionable properties of shared CTI data. Regarding

accuracy, we assume that sources of CTI data are trustworthy and the provided im-

pact, efficacy, and confidence scores of COA and indicators are directly consumed by

SCERM without verification. We also utilized data from well-reputed threat vendors

114

Chapter 8. Conclusions and Future Research Directions

for our experiments. However, it is entirely plausible for low integrity threat sources

to lie or report false threat data to mislead others. Consider the obvious advantage

to a malicious actor to attempt to poison the well and inject fake indicators about his

next imminent APT in the CTI community to avoid detection. In the future, we plan

to develop a methodology to measure the accuracy of the provided CTI data. Rank-

ing threat sources according to a majority decision and keeping track of their previous

credibility will be explored.

Similarly, timely sharing of threat data is critical for effective CTM. Historically,

it has been the case that the same cyber-attack was launched against multiple orga-

nizations almost simultaneously. Firms are generally reluctant in sharing data about

breaches because they feel it may damage their reputation and drop their stock price.

To encourage timely data sharing, we will examine temporal aspects of threat feeds

and develop procedures to rank the data sources based on timely sharing.

For the refinement phase of SCERM, we plan to explore other related datasets simi-

lar to ATTACK.MITRE such as the IMPACT [127] dataset to enhance threat feeds. Merg-

ing CTI reports addressing different incidents related to the same APT may allow for

better refinement of CTI data.

Finally, in this work, we have mostly focused on understanding the big picture

of the cyber threat landscape for CTM. Actually investigating the presence of these

threats in an enterprise environment will require considering user data. In the future,

we will correlate users’ traffic behaviors with the CTI data for cyber threat prevention,

detection, and response.

As we know, ontologies are developed for knowledge sharing and reuse. More-

over, these empower software system to analyze and reason over this shared knowl-

edge. In the A2CS sub-framework, a combined ontology of CKC and POP models is

developed. SWRL rules are written for APTs analysis and identification of their miss-

ing artifacts. Therefore, in the ontology engineering domain, one of the alluring works

is to develop an Intrusion Detection System (IDS) on the bases of our proposed A2CS

sub-framework.

115

Bibliography

[1] R. Walters, “Cyber attacks on us companies in 2014,” The Heritage Foundation,

vol. 4289, pp. 1–5, 2014.

[2] A. Mohaisen and O. Alrawi, “Unveiling zeus: automated classification of mal-

ware samples,” in Proceedings of the 22nd International Conference on World Wide

Web. ACM, 2013, pp. 829–832.

[3] Krebs, “Home depot hit by same malware as target,” www.krebsonsecurity.

com/, accessed: 2018-1-5.

[4] World Economic Forum, “The global risks report 2018,” http://www3.

weforum.org/docs/WEF GRR18 Report.pdf/, accessed: 2018-8-30.

[5] PricewaterhouseCoopers, “The Global State of Information Security Survey

2015,” https://www.pwc.ru/en/publications/information-security-survey1.

html/, accessed: 2016-06-22.

[6] Information Systems Audit and Control Association, “2015 advanced persistent

threat awareness - third annual study,” https://www.isaca.org//, accessed:

2016-04-17.

[7] Symantec, “Internet security threat report,” https://www.symantec.com/

content/dam/symantec/docs/reports/istr-22-2017-en.pdf/, accessed: 2018-21-

3.

[8] McAfee, “Wannacry,” https://ics-cert.kaspersky.com/tag/wannacry/, ac-

cessed: 2018-30-13.

[9] N. Lomas, “Uk accuses russia of 2017’s notpetya ran-

somware attacks,” https://techcrunch.com/2018/02/15/

116

Chapter 8. BIBLIOGRAPHY

uk-accuses-russia-of-2017s-notpetya-ransomware-attacks/, accessed: 2018-

30-3.

[10] Techrepublic, “Notpetya ransomware outbreak

cost merck,” https://www.techrepublic.com/article/

notpetya-ransomware-outbreak-cost-merck-more-than-300m-per-quarter/,

accessed: 2018-30-4.

[11] A. Coburn, J. Daffron, A. Smith, J. Bordeau, E. Leverett, S. Sweeney, and T. Har-

vey, “Cyber-risk outlook,” 2018.

[12] CheckPoint, “2018 security report,” www.checkpoint.com/, accessed: 2018-22-8.

[13] SonicWall, “2018 sonicwall cyber threat report,” www.cdn.sonicwall.com/, ac-

cessed: 2018-21-8.

[14] K. Wagner, “Facebook suspended accounts for strategic communi-

cation laboratories,” https://www.recode.net/2018/3/17/17132646/

facebook-cambridge-analytica-suspended-trump-election-campaign/, ac-

cessed: 2018-21-4.

[15] E. Tillett, “Dhs official : Election systems in 21 states were targeted in russia cyber

attacks,” https://firenewsfeed.com/news/178859/, accessed: 2018-21-3.

[16] McAfee Lab, “2018 mcafee labs threat report,” www.mcafee.com/, accessed:

2018-31-8.

[17] KasperSky, “Zeus malware,” https://usa.kaspersky.com/resource-center/

threats/zeus-virus/, accessed: 2017-12-1.

[18] US-CERT, “Backoff point-of-sale malware,” https://www.us-cert.gov/ncas/

alerts/TA14-212A/, accessed: 2017-1-4.

[19] Gartner, “Gartner forecasts worldwide security spending,” https://www.

gartner.com/newsroom/id/3836563/, accessed: 2018-30-3.

[20] MITRE ATTCK, “Adversarial tactics, techniques & common knowledge,” ac-

cessed: 2016-05-2.

117

Chapter 8. BIBLIOGRAPHY

[21] Open Source, “Hailataxii : Open source cyber threat intelligence feeds,” www.

hailataxii.com/, accessed: 2018-2-1.

[22] IBM, “Ibm x-force exchange,” www.exchange.xforce.ibmcloud.com/, accessed:

2017-10-10.

[23] FS-ISAC, “Safeguarding the global financial system by reducing cyber-risk (fs-

isac),” https://www.fsisac.com/, accessed: 2019-06-11.

[24] REN-ISAC, “Research and education networks information sharing and analysis

center (ren-isac),” https://www.ren-isac.net/, accessed: 2019-06-11.

[25] A. J. Hebert, “Compressing the kill chain,” Air Force Magazine, vol. 86, no. 3, pp.

50–50, 2003.

[26] D. Bianco, “The pyramid of pain,” https://detect-respond.blogspot.com/2013/

03/the-pyramid-of-pain.html/, accessed: 2016-06-22.

[27] V. Mulwad, W. Li, A. Joshi, T. Finin, and K. Viswanathan, “Extracting infor-

mation about security vulnerabilities from web text,” in Proceedings of the 2011

IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent

Technology-Volume 03. IEEE Computer Society, 2011, pp. 257–260.

[28] MITRE, “Sharing threat intelligence just got a lot easier,” www.oasis-open.

github.io, accessed: 2018-31-12.

[29] YARA, “Yara,” www.plusvic.github.io/, accessed: 2018-16-7.

[30] T. D. Wagner, “Sharing cyber intelligence in trusted environments - a literature

review,” School of Computing, Telecommunications and Networks Faculty of

Computing, Engineering and the Built Environment, Birmingham City Univer-

sity.

[31] C. Sauerwein, C. Sillaber, A. Mussmann, and R. Breu, “Threat intelligence shar-

ing platforms: An exploratory study of software vendors and research perspec-

tives,” 2017.

[32] H. Dalziel, How to define and build an effective cyber threat intelligence capability.

Syngress, 2014.

118

Chapter 8. BIBLIOGRAPHY

[33] BRO, “Bro - the bro network security monitor,” https://www.bro.org/, accessed:

2017-6-6.

[34] Splunk Corporation, “Splunk - log management and analysis,” https://www.

splunk.com/, accessed: 2017-6-5.

[35] MITRE, “Stix visualization tool,” https://github.com/STIXProject/stix-viz/, ac-

cessed: 2017-10-2.

[36] M. S. Abu, S. R. Selamat, A. Ariffin, and R. Yusof, “Cyber threat intelligence:

Issue and challenges,” Indonesian Journal of Electrical Engineering and Computer

Science, vol. 10, no. 1, pp. 371–379, 2018.

[37] W. Tounsi and H. Rais, “A survey on technical threat intelligence in the age of

sophisticated cyber attacks,” Computers & security, vol. 72, pp. 212–233, 2018.

[38] Z. Iqbal, Z. Anwar, and R. Mumtaz, “Stixgen-a novel framework for automatic

generation of structured cyber threat information,” in 2018 International Confer-

ence on Frontiers of Information Technology (FIT). IEEE, 2018, pp. 241–246.

[39] E. W. Burger, M. D. Goodman, P. Kampanakis, and K. A. Zhu, “Taxonomy model

for cyber threat intelligence information exchange technologies,” in Proceedings

of the 2014 ACM Workshop on Information Sharing & Collaborative Security. ACM,

2014, pp. 51–60.

[40] Virus Total, “Virus total,” www.virustotal.com/, accessed: 2019-24-2.

[41] I. You and K. Yim, “Malware obfuscation techniques: A brief survey,” in 2010

International conference on broadband, wireless computing, communication and appli-

cations. IEEE, 2010, pp. 297–300.

[42] Z. Iqbal and Z. Anwar, “Ontology generation of advanced persistent threats and

their automated analysis,” NUST Journal of Engineering Sciences, vol. 9, no. 2, pp.

68–75, 2016.

[43] MITRE, “Stixproject schema test,” www.github.com/, accessed: 2017-20-3.

[44] Gartner, “Best security information and event management software of 2019,”

https://www.gartner.com/, accessed: 2019-05-11.

119

Chapter 8. BIBLIOGRAPHY

[45] H. McGuinness, “Owl web ontology language overview,” W3C recommendation,

vol. 10, no. 10, p. 2004, 2004.

[46] S. Caltagirone, A. Pendergast, and C. Betz, “The diamond model of intrusion

analysis,” Center For Cyber Intelligence Analysis and Threat Research Hanover

Md, Tech. Rep., 2013.

[47] E. M. Hutchins, M. J. Cloppert, and R. M. Amin, “Intelligence-driven computer

network defense informed by analysis of adversary campaigns and intrusion

kill chains,” Leading Issues in Information Warfare & Security Research, vol. 1, no. 1,

p. 80, 2011.

[48] NIST, “National vulnerability database,” www.nvd.nist.gov/, accessed: 2017-09-

5.

[49] IBM, “Input output definition file (iodf),” www.ibm.com/, accessed: 2017-8-3.

[50] T. Takahashi, K. Landfield, T. Millar, and Y. Kadobayashi, “Iodef-extension

to support structured cybersecurity information,” draft-ietf-mile-sci-05. txt, IETF

draft, 2012.

[51] Iovin and Gabriel, “Collective intelligence framework,” www.github.com/, ac-

cessed: 2017-21-8.

[52] Leimeister and J. Marco, “Collective intelligence,” Business & Information Systems

Engineering, vol. 2, no. 4, pp. 245–248, 2010.

[53] FireEye, “Openioc back to the basics,” https://www.fireeye.com/blog/

threat-research/2013/10/openioc-basics.html/, accessed: 2017-25-10.

[54] MITRE Corporation, “Malware attribute enumeration and characterization

(maec),” www.maecproject.github.io/, accessed: 2018-5-4.

[55] OASIS, “Trusted automated exchange of indicator information (taxii),” www.

taxiiproject.github.io/, accessed: 2018-5-4.

[56] MITRE Corporation, “Common attack pattern enumeration and classification,”

www.capec.mitre.org/, accessed: 2018-5-6.

120

Chapter 8. BIBLIOGRAPHY

[57] M. Apoorva, R. Eswarawaka, and P. V. B. Reddy, “A latest comprehensive study

on structured threat information expression (stix) and trusted automated ex-

change of indicator information (taxii),” in Proceedings of the 5th international con-

ference on frontiers in intelligent computing: theory and applications. Springer, 2017,

pp. 477–482.

[58] Threatconnect, “Threatconnect, inc.(2015),” URL: http://www. informationweek.

com/whitepaper, 2017.

[59] H. Debar, D. Curry, and B. Feinstein, “The intrusion detection message exchange

format (idmef),” 2007.

[60] Soltra, “Soltra edge,” www.soltra.com/en/, accessed: 2017-16-7.

[61] Open Source, “Collaborative research into threats,” www.github.com/crits/, ac-

cessed: 2017-12-7.

[62] EclecticIQ, “Eclecticiq platform,” www.eclecticiq.com/, accessed: 2017-13-7.

[63] C. Wagner, A. Dulaunoy, G. Wagener, and A. Iklody, “Misp: The design and im-

plementation of a collaborative threat intelligence sharing platform,” in Proceed-

ings of the 2016 ACM on Workshop on Information Sharing and Collaborative Security.

ACM, 2016, pp. 49–56.

[64] OASIS, “Cyber observable expression (cybox) archive,” www.cyboxproject.

github.io/, accessed: 2018-21-12.

[65] Verizon, “2017 data breach investigations report,” 2015.

[66] MITRE, “Stix architecture,” https://stixproject.github.io/about/, accessed:

2017-2-9.

[67] MITRE Corporation, “Stix use cases,” https://www.stixproject.github.io/, ac-

cessed: 2017-24-11.

[68] Open Source, “Stix shifter,” https://pypi.org/project/stix-shifter/2.5.5/, ac-

cessed: 2020-4-15.

121

Chapter 8. BIBLIOGRAPHY

[69] P. Bhatt, E. T. Yano, and P. Gustavsson, “Towards a framework to detect multi-

stage advanced persistent threats attacks,” in 2014 IEEE 8th International Sympo-

sium on Service Oriented System Engineering. IEEE, 2014, pp. 390–395.

[70] McAfee, “Combating advanced persistent threats, white pa-

per,” https://www.mcafee.com/mx/resources/white-papers/

wp-combat-advanced-persist-threats.pdf/, accessed: 2016-21-4.

[71] M. Gadelrab, A. A. El Kalam, and Y. Deswarte, “Execution patterns in automatic

malware and human-centric attacks,” in 2008 Seventh IEEE International Sympo-

sium on Network Computing and Applications. IEEE, 2008, pp. 29–36.

[72] M. S. Gadelrab, E. Kalam, A. Abou, and Y. Deswarte, “Defining categories to

select representative attack test-cases,” in Proceedings of the 2007 ACM workshop

on Quality of protection. ACM, 2007, pp. 40–42.

[73] Yadav, Sandeep et al., “Detecting algorithmically generated malicious domain

names,” in Proceedings of the 10th ACM SIGCOMM conference on Internet measure-

ment, 2010, pp. 48–61.

[74] I. You and K. Yim, “Malware obfuscation techniques: A brief survey,” in 2010

International conference on broadband, wireless computing, communication and appli-

cations. IEEE, 2010, pp. 297–300.

[75] K. Krombholz, H. Hobel, M. Huber, and E. Weippl, “Advanced social engineer-

ing attacks,” Journal of Information Security and applications, vol. 22, pp. 113–122,

2015.

[76] Ponemon.Institute, “2014 state of endpoint risk. white paper,” Dec 2013.

[77] M. Bere, F. Bhunu-Shava, A. Gamundani, and I. Nhamu, “How advanced per-

sistent threats exploit humans,” International Journal of Computer Science Issues

(IJCSI), vol. 12, no. 6, p. 170, 2015.

[78] S. Qamar, Z. Anwar, M. A. Rahman, E. Al-Shaer, and B.-T. Chu, “Data-driven

analytics for cyber-threat intelligence and information sharing,” Computers & Se-

curity, vol. 67, pp. 35–58, 2017.

122

Chapter 8. BIBLIOGRAPHY

[79] Cosive, “Stix data generator,” https://generator.cosive.com/, accessed: 2017-14-

12.

[80] Open Source, “Python - stix,” https://github.com/STIXProject/python-stix/,

accessed: 2018-5-1.

[81] Open.Source, “Threat intelligence quotient test,” github.com/mlsecproject/

tiq-test/, accessed: 2016-20-12.

[82] R. Meier, C. Scherrer, D. Gugelmann, V. Lenders, and L. Vanbever, “Feedrank:

A tamper-resistant method for the ranking of cyber threat intelligence feeds,”

in 2018 10th International Conference on Cyber Conflict (CyCon). IEEE, 2018, pp.

321–344.

[83] C. Sauerwein, I. Pekaric, M. Felderer, and R. Breu, “An analysis and classifica-

tion of public information security data sources used in research and practice,”

Computers & security, vol. 82, pp. 140–155, 2019.

[84] N. Pitropakis, E. Panaousis, A. Giannakoulias, G. Kalpakis, R. D. Rodriguez, and

P. Sarigiannidis, “An enhanced cyber attack attribution framework,” in Interna-

tional Conference on Trust and Privacy in Digital Business. Springer, 2018, pp.

213–228.

[85] D. Bodeau and R. Graubart, “Cyber prep 2.0: Motivating organizational cyber

strategies in terms of threat preparedness,” MITRE, Bedford, MA, USA, Tech. Rep,

pp. 15–0797, 2017.

[86] M. Mateski, C. M. Trevino, C. K. Veitch, J. Michalski, J. M. Harris, S. Maruoka,

and J. Frye, “Cyber threat metrics,” Sandia National Laboratories, 2012.

[87] B. Shin and P. B. Lowry, “A review and theoretical explanation of the

‘cyberthreat-intelligence (cti) capability’that needs to be fostered in information

security practitioners and how this can be accomplished,” Computers & Security,

p. 101761, 2020.

[88] A. Singhal and X. Ou, “Security risk analysis of enterprise networks using prob-

abilistic attack graphs,” in Network Security Metrics. Springer, 2017, pp. 53–73.

123

Chapter 8. BIBLIOGRAPHY

[89] P. Mell, K. Scarfone, and S. Romanosky, “A complete guide to the common vul-

nerability scoring system version 2.0,” in Published by FIRST-Forum of Incident

Response and Security Teams, vol. 1, 2007, p. 23.

[90] Y. Ghazi, Z. Anwar, R. Mumtaz, S. Saleem, and A. Tahir, “A supervised machine

learning based approach for automatically extracting high-level threat intelli-

gence from unstructured sources,” in 2018 International Conference on Frontiers

of Information Technology (FIT). IEEE, 2018, pp. 129–134.

[91] U. Noor, Z. Anwar, A. W. Malik, S. Khan, and S. Saleem, “A machine learning

framework for investigating data breaches based on semantic analysis of adver-

sary’s attack patterns in threat intelligence repositories,” Future Generation Com-

puter Systems, vol. 95, pp. 467–487, 2019.

[92] N. Kaloudi and J. Li, “The ai-based cyber threat landscape: A survey,” ACM

Computing Surveys (CSUR), vol. 53, no. 1, pp. 1–34, 2020.

[93] CyberInt, “Threat intelligence scoring and analysis,” www.cyberint.com/

wp-content/uploads/, accessed: 2018-21-5.

[94] MITRE, “Common weakness scoring system,” www.cwe.mitre.org/cwss/, ac-

cessed: 2018-12-6.

[95] H. Sayyadi and L. Getoor, “Futurerank: Ranking scientific articles by predicting

their future pagerank,” in Proceedings of the 2009 SIAM International Conference on

Data Mining. SIAM, 2009, pp. 533–544.

[96] L. Page, S. Brin, R. Motwani, and T. Winograd, “The pagerank citation ranking:

Bringing order to the web.” Stanford InfoLab, Tech. Rep., 1999.

[97] W. Xu, W. Liang, X. Lin, and J. X. Yu, “Finding top-k influential users in social

networks under the structural diversity model,” Information Sciences, vol. 355,

pp. 110–126, 2016.

[98] Q. Wang, Y. Jin, S. Cheng, and T. Yang, “Conformrank: A conformity-based rank

for finding top-k influential users,” Physica A: Statistical Mechanics and its Appli-

cations, vol. 474, pp. 39–48, 2017.

124

Chapter 8. BIBLIOGRAPHY

[99] Z. Anwar, M. Montanari, A. Gutierrez, and R. H. Campbell, “Budget con-

strained optimal security hardening of control networks for critical cyber-

infrastructures,” International Journal of Critical Infrastructure Protection, vol. 2, no.

1-2, pp. 13–25, 2009.

[100] A. Yeboah-Ofori and A. Brimicombe, “Cyber intelligence & osint: Developing

mitigation techniques against cybercrime threats on social media a systematic re-

view july 2017,” International Journal of Cyber-Security and Digital Forensics, vol. 7,

no. 1, pp. 87–99, 2018.

[101] J. Barnett, “Reputation: The foundation of effective threat protection,” McAfee,

White Paper, vol. 11, 2010.

[102] T. Macaulay, “System and method for generating and refining cyber threat intel-

ligence data,” Aug. 25 2015, uS Patent 9,118,702.

[103] A. Thomson and C. D. Coleman, “Apparatuses, methods and systems for a cyber

threat confidence rating visualization and editing user interface,” Mar. 14 2017,

uS Patent 9,596,256.

[104] I. Kotenko, O. Polubelova, I. Saenko, and E. Doynikova, “The ontology of metrics

for security evaluation and decision support in siem systems,” in 2013 Interna-

tional Conference on Availability, Reliability and Security. IEEE, 2013, pp. 638–645.

[105] M. S. Geramiparvar and N. Modiri, “Presenting a metric-based model for mal-

ware detection and classification,” International Journal of Computer & Information

Technology (IJOCIT), vol. 2, pp. 528–539, 2014.

[106] V. Vassilev, V. Sowinski-Mydlarz, P. Gasiorowski, K. Ouazzane, A. Phipps

et al., “Intelligence graphs for threat intelligence and security policy valida-

tion of cyber systems,” in Proc. Int. Conf. on Artificial Intelligence and Applications

(ICAIA2020). Advances in Intelligent Systems and Computing, Springer, 2020.

[107] M. S. Geramiparvar and N. Modiri, “An approach to counteracting the common

cyber-attacks according to the metric-based model,” International Journal of Com-

puter Science and Network Security (IJCSNS), vol. 16, no. 1, p. 81, 2016.

125

Chapter 8. BIBLIOGRAPHY

[108] N. Afzaliseresht, Y. Miao, S. Michalska, Q. Liu, and H. Wang, “From logs to

stories: Human-centred data mining for cyber threat intelligence,” IEEE Access,

vol. 8, pp. 19 089–19 099, 2020.

[109] R. Riesco, X. Larriva-Novo, and V. Villagra, “Cybersecurity threat intelligence

knowledge exchange based on blockchain,” Telecommunication Systems, vol. 73,

no. 2, pp. 259–288, 2020.

[110] S. More, M. Matthews, A. Joshi, and T. Finin, “A knowledge-based approach to

intrusion detection modeling,” in 2012 IEEE Symposium on Security and Privacy

Workshops. IEEE, 2012, pp. 75–81.

[111] J. Undercoffer, A. Joshi, and J. Pinkston, “Modeling computer attacks: An on-

tology for intrusion detection,” in International Workshop on Recent Advances in

Intrusion Detection. Springer, 2003, pp. 113–135.

[112] A. Joshi, R. Lal, T. Finin, and A. Joshi, “Extracting cybersecurity related linked

data from text,” in 2013 IEEE Seventh International Conference on Semantic Comput-

ing. IEEE, 2013, pp. 252–259.

[113] R. Corinne, L. Jones et al., “Adversarial tactics, techniques, and common knowl-

edge,” arXiv, 2013.

[114] K. Son, Kim et al., “Cyber-attack group analysis method based on association

of cyber-attack information.” KSII Transactions on Internet & Information Systems,

vol. 14, no. 1, 2020.

[115] H. Al-Mohannadi, I. Awan, and J. Al Hamar, “Analysis of adversary activities

using cloud-based web services to enhance cyber threat intelligence,” Service Ori-

ented Computing and Applications, pp. 1–13, 2020.

[116] Symantec, “Attacks on point of sales systems,” https://

www.symantec.com/content/dam/symantec/docs/white-papers/

attacks-on-point-of-sale-systems-en.pdf/, accessed: 2016-12-8.

[117] Illusive, “Retail industry under attack,” https://cdn2.hubspot.net/hubfs/

725085/Fact Sheets/RetailIndustryUnderAttack.pdf/, accessed: 2016-31-7.

126

Chapter 8. BIBLIOGRAPHY

[118] Panda Media Center, “Alina, the latest pos malware,” https://www.

pandasecurity.com/mediacenter/pandalabs/alina-pos-malware/, accessed:

2019-11-6.

[119] EnigmaSoft, “Jackpos,” https://www.enigmasoftware.com/jackpos-removal/,

accessed: 2019-11-6.

[120] FireEye, “Centerpos: An evolving pos threat,” https://www.fireeye.com/blog/

threat-research/2016/01/, accessed: 2019-9-6.

[121] CISCO, “Threat spotlight: Holiday greetings from pro pos,” https://blog.

talosintelligence.com/2015/12/pro-pos.html, accessed: 2019-10-6.

[122] Target Corporation, “Target confirms unauthorized access to payment card

data in u.s. stores,” https://corporate.target.com/press/releases/2013/12/, ac-

cessed: 2016-12-6.

[123] Z. Iqbal and Z. Anwar, “Stix dataset,” https://github.com/zafarabbasi/

STIXDataset/, created: 2018-5-7.

[124] Secureworks, “Threat group tg 3390,” www.secureworks.com/research/, ac-

cessed: 2017-20-12.

[125] L. Obrst, P. Chase, and R. Markeloff, “Developing an ontology of the cyber secu-

rity domain.” in STIDS, 2012, pp. 49–56.

[126] Chris Johnson NIST, “Nist sp 800-150:guide to cyber threat information shar-

ing,” https://csrc.nist.gov/, accessed: 2018-09-5.

[127] IMPACT, “Information marketplace for policy and analysis of cyber-risk &

trust,” https://www.impactcybertrust.org/, accessed: 2019-27-12.

[128] SANS, “Killing advanced threats in their tasks - an intellignece approach to at-

tack prevention,” 2014.

[129] D. Bianco, “What do you get when you cross a pyramid

with a chain,” https://detect-respond.blogspot.com/2013/03/

what-do-you-get-when-you-cross-pyramid.html/, accessed: 2016-06-22.

127

Chapter 8. BIBLIOGRAPHY

[130] A. Oltramari, L. F. Cranor, R. J. Walls, and P. D. McDaniel, “Building an ontology

of cyber security.” in STIDS. Citeseer, 2014, pp. 54–61.

[131] A. Oltramari, L. F. Cranor, R. J. Walls, and P. McDaniel, “Computational ontol-

ogy of network operations,” in MILCOM 2015-2015 IEEE Military Communica-

tions Conference. IEEE, 2015, pp. 318–323.

[132] A. Razzaq, K. Latif, H. F. Ahmad, A. Hur, Z. Anwar, and P. C. Bloodsworth, “Se-

mantic security against web application attacks,” Information Sciences, vol. 254,

pp. 19–38, 2014.

[133] Z. Syed, A. Padia, T. Finin, L. Mathews, and A. Joshi, “Uco: A unified cyberse-

curity ontology,” in Workshops at the Thirtieth AAAI Conference on Artificial Intelli-

gence, 2016.

[134] B. Schneier, “Structured threat information expression (stix) 1.x archive website,”

https://stixproject.github.io/, accessed: 2017-23-10.

[135] G. White and K. Harrison, “State and community information sharing and anal-

ysis organizations,” 2017.

[136] NIST, “Comparing stix 1.x/cybox 2.x with stix 2,” https://oasis-open.github.io/

cti-documentation/stix/compare/, accessed: 2017-31-12.

[137] Symantec, “nfostealer malumpos,” https://www.symantec.com/

security-center/writeup/2015-060806-3221-99/, accessed: 2017-6-5.

[138] Panda Security, “Pos and credit cards: In the line of fire with punkeypos,”

https://www.pandasecurity.com/mediacenter/malware/punkeypos/, ac-

cessed: 2017-6-5.

[139] Trend Micro, “Fighterpos a new one-man pos malware cam-

paign,” https://www.trendmicro.com/vinfo/us/security/news/

cybercrime-and-digital-threats/fighterpos-one-man-pos-malware-campaign/,

accessed: 2017-7-5.

128

Chapter . BIBLIOGRAPHY

[140] Z. Iqbal and Z. Anwar, “Ontology generation of advanced persistent threats and

their automated analysis,” NUST Journal of Engineering Sciences, vol. 9, no. 2, pp.

68–75, 2016.

129

Appendices

130

Appendix A

STIX Dataset and Source Code

A.1 STIX Dataset

Using STIXGEN, we generated meaningful STIXs for some of the most recent hard to

comprehend APTs of renowned domains and industries (Retail Industry APTs [1], Fi-

nancial Industry APTs [2], Ransomware, Cyber Espionage APTs – Nation-state APTs

[124] and Attack.MITRE – Credential Stealing APTs [20]) from 2016 to 2018 using well-

established threat sources. We published [123] these STIXs for the security communi-

ties for conducting further research. The generated STIXs not only describe indepen-

dent attacks but depict families of attacks and show their natural evolution. To the best

of our knowledge, it is the first such contribution, where researchers now have access

to meaningful and error-free STIXs.

A.2 Source Code and Dataset

A GitHub link to the source code and executable is provided, as can be seen in Figure

A.6.

131

Chapter 8. STIX Dataset and Source Code

Figure A.1: Financial APT’s STIX

Figure A.2: Cyber Espionage APT’s STIX

132

Chapter 8. STIX Dataset and Source Code

Figure A.3: MITRE APT’s STIX

Figure A.4: POS APT’s STIX

133

Chapter 8. STIX Dataset and Source Code

Figure A.5: Ransomware APT’s STIX

Figure A.6: GitHub link

134