16
Cyber Threat Intelligence Modeling Based on Heterogeneous Graph Convolutional Network Jun Zhao 1,2 , Qiben Yan 3,* , Xudong Liu 1,2,* , Bo Li 1,2,* , Guangsheng Zuo 1,2 1 School of Computer Science and Engineering, Beihang University, Beijing, China 2 Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing, China 3 Computer Science and Engineering, Michigan State University, East Lansing, Michigan, USA Abstract Cyber Threat Intelligence (CTI), as a collection of threat in- formation, has been widely used in industry to defend against prevalent cyber attacks. CTI is commonly represented as In- dicator of Compromise (IOC) for formalizing threat actors. However, current CTI studies pose three major limitations: first, the accuracy of IOC extraction is low; second, isolated IOC hardly depicts the comprehensive landscape of threat events; third, the interdependent relationships among hetero- geneous IOCs, which can be leveraged to mine deep security insights, are unexplored. In this paper, we propose a novel CTI framework, HINTI, to model the interdependent relation- ships among heterogeneous IOCs to quantify their relevance. Specifically, we first propose multi-granular attention based IOC recognition method to boost the accuracy of IOC extrac- tion. We then model the interdependent relationships among IOCs using a newly constructed heterogeneous information network (HIN). To explore intricate security knowledge, we propose a threat intelligence computing framework based on graph convolutional networks for effective knowledge dis- covery. Experimental results demonstrate that our proposed IOC extraction approach outperforms existing state-of-the-art methods, and HINTI can model and quantify the underlying relationships among heterogeneous IOCs, shedding new light on the evolving threat landscape. 1 Introduction Nowadays, we are witnessing a rapid growth of sophisti- cated cyber attacks (e.g., zero-day attack, advanced persis- tent threat) [34]. Such attacks can effortlessly bypass tra- ditional defenses such as firewalls and intrusion detection systems (IDS), breach critical infrastructures, and cause dev- astating catastrophes [7, 20, 39]. To combat these emerg- ing threats, security experts proposed Cyber Threat Intel- ligence (CTI) that consists of a collection of Indicators of Compromise (IOCs). Different from the well-known secu- rity databases (e.g., CVE 1 , ExploitDB 2 ), CTI can facilitate organizations to proactively release more comprehensive and valuable threat warnings (e.g., malicious IPs, malicious DNS, malware and attack patterns, etc.) when a system encounters suspicious outsider or insider threats [23]. In recent years, CTI has been increasingly adopted by se- curity researchers and industries to share and capitalize on their discoveries, as well as by security firms to analyze the threat landscape using the deluge of data [5, 30]. The orig- inal CTI extraction and analysis require extensive manual inspection of the attack event descriptions, which becomes rather time-consuming given the enormous volume of threat description data. Recent studies have proposed automated methods to extract CTI in the form of Indicator of Compro- mise (IOC) from unstructured security-related texts [4, 22]. Most of existing IOC extraction methods, such as CleanMX 3 , PhishTank 4 , IOC Finder 5 , and Gartner peer insight 6 , follow the OpenIOC [10] standard and extract particular types of IOCs (e.g., malicious IP, malware, file Hash, etc) by lever- aging a set of regular expressions. However, such extraction approaches face three major limitations. First, the accuracy of IOC extraction is low, which inevitably leads to the omission of critical threat objects [22]. Second, isolated IOC hardly depicts the comprehensive landscape of threat events, making it virtually impossible for CTI subscribers to gain a complete picture into the incoming threat. Third, there is a lack of an effective computing framework to efficiently measure the interactive relationships among heterogeneous IOCs. To combat these limitations, HINTI, a cyber threat intel- ligence framework based on heterogeneous information net- work (HIN), is proposed to model and analyze CTIs. Specifi- cally, HINTI proposes a multi-granular attention based IOC recognition approach to boost the accuracy of IOC extraction. 1 http://cve.mitre.org/ 2 https://www.exploit-db.com/ 3 http://list.clean-mx.com 4 https://www.phishtank.com 5 https://www.fireeye.com/services/freeware/ioc-finder.html 6 https://www.gartner.com/reviews/market/security-threat-intelligence- services USENIX Association 23rd International Symposium on Research in Attacks, Intrusions and Defenses 241

Cyber Threat Intelligence Modeling Based on Heterogeneous ... · 2.1Cyber Threat Intelligence Cyber Threat Intelligence (CTI) extracted from security-related data is structured information

  • Upload
    others

  • View
    26

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Cyber Threat Intelligence Modeling Based on Heterogeneous ... · 2.1Cyber Threat Intelligence Cyber Threat Intelligence (CTI) extracted from security-related data is structured information

Cyber Threat Intelligence Modeling Based on Heterogeneous GraphConvolutional Network

Jun Zhao12 Qiben Yan3 Xudong Liu12 Bo Li12 Guangsheng Zuo12

1 School of Computer Science and Engineering Beihang University Beijing China2 Beijing Advanced Innovation Center for Big Data and Brain Computing Beihang University Beijing China

3 Computer Science and Engineering Michigan State University East Lansing Michigan USA

Abstract

Cyber Threat Intelligence (CTI) as a collection of threat in-formation has been widely used in industry to defend againstprevalent cyber attacks CTI is commonly represented as In-dicator of Compromise (IOC) for formalizing threat actorsHowever current CTI studies pose three major limitationsfirst the accuracy of IOC extraction is low second isolatedIOC hardly depicts the comprehensive landscape of threatevents third the interdependent relationships among hetero-geneous IOCs which can be leveraged to mine deep securityinsights are unexplored In this paper we propose a novelCTI framework HINTI to model the interdependent relation-ships among heterogeneous IOCs to quantify their relevanceSpecifically we first propose multi-granular attention basedIOC recognition method to boost the accuracy of IOC extrac-tion We then model the interdependent relationships amongIOCs using a newly constructed heterogeneous informationnetwork (HIN) To explore intricate security knowledge wepropose a threat intelligence computing framework based ongraph convolutional networks for effective knowledge dis-covery Experimental results demonstrate that our proposedIOC extraction approach outperforms existing state-of-the-artmethods and HINTI can model and quantify the underlyingrelationships among heterogeneous IOCs shedding new lighton the evolving threat landscape

1 Introduction

Nowadays we are witnessing a rapid growth of sophisti-cated cyber attacks (eg zero-day attack advanced persis-tent threat) [34] Such attacks can effortlessly bypass tra-ditional defenses such as firewalls and intrusion detectionsystems (IDS) breach critical infrastructures and cause dev-astating catastrophes [7 20 39] To combat these emerg-ing threats security experts proposed Cyber Threat Intel-ligence (CTI) that consists of a collection of Indicators ofCompromise (IOCs) Different from the well-known secu-

rity databases (eg CVE1 ExploitDB2) CTI can facilitateorganizations to proactively release more comprehensive andvaluable threat warnings (eg malicious IPs malicious DNSmalware and attack patterns etc) when a system encounterssuspicious outsider or insider threats [23]

In recent years CTI has been increasingly adopted by se-curity researchers and industries to share and capitalize ontheir discoveries as well as by security firms to analyze thethreat landscape using the deluge of data [5 30] The orig-inal CTI extraction and analysis require extensive manualinspection of the attack event descriptions which becomesrather time-consuming given the enormous volume of threatdescription data Recent studies have proposed automatedmethods to extract CTI in the form of Indicator of Compro-mise (IOC) from unstructured security-related texts [4 22]Most of existing IOC extraction methods such as CleanMX3PhishTank4 IOC Finder5 and Gartner peer insight6 followthe OpenIOC [10] standard and extract particular types ofIOCs (eg malicious IP malware file Hash etc) by lever-aging a set of regular expressions However such extractionapproaches face three major limitations First the accuracy ofIOC extraction is low which inevitably leads to the omissionof critical threat objects [22] Second isolated IOC hardlydepicts the comprehensive landscape of threat events makingit virtually impossible for CTI subscribers to gain a completepicture into the incoming threat Third there is a lack of aneffective computing framework to efficiently measure theinteractive relationships among heterogeneous IOCs

To combat these limitations HINTI a cyber threat intel-ligence framework based on heterogeneous information net-work (HIN) is proposed to model and analyze CTIs Specifi-cally HINTI proposes a multi-granular attention based IOCrecognition approach to boost the accuracy of IOC extraction

1httpcvemitreorg2httpswwwexploit-dbcom3httplistclean-mxcom4httpswwwphishtankcom5httpswwwfireeyecomservicesfreewareioc-finderhtml6httpswwwgartnercomreviewsmarketsecurity-threat-intelligence-

services

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 241

Then HINTI leverages HIN to model the interdependent re-lationships among heterogeneous IOCs which can depict amore comprehensive picture of threat events Moreover wepropose a novel CTI computing framework to quantify theinterdependent relationships among IOCs which helps un-cover novel security insights In short the main contributionsof this paper are summarized as follows

bull Multi-granular Attention based IOC RecognitionWe propose multi-granular attention based IOC recogni-tion approach to automatically extract cyber threat ob-jects from multi-source threat texts which can learn thesignificance of features with different scales Our modeloutperforms the state-of-the-art methods in terms of IOCrecognition accuracy and recall In total we extract over397730 IOCs from the unstructured threat descriptions

bull Heterogeneous Threat Intelligence Modeling Wemodel different types of IOCs using heterogeneous infor-mation network (HIN) which introduces various meta-paths to capture the interdependent relationships amongheterogeneous IOCs while depicting a more comprehen-sive landscape of cyber threat events

bull Threat Intelligence Computing Framework We arethe first to present the concept of cyber threat intelligencecomputing and design a general computing frameworkas shown in Figure 5 The framework first utilizes aweight-learning based node similarity measure to quan-tify the interdependent relationships between heteroge-neous IOCs and then leverages attention mechanismbased heterogeneous graph convolutional networks toembed the IOCs and their interactive relations

bull Threat Intelligence Prototype System To evaluate theeffectiveness of HINTI we implement a CTI prototypesystem Our system has identified 1262258 relation-ships among 6 types of IOCs including attackers vul-nerabilities malicious files attack types devices andplatforms based on which we further assess the real-world applicability of HINTI using three real-world ap-plications IOC significance ranking attack preferencemodeling and vulnerability similarity analysis

2 Background

21 Cyber Threat IntelligenceCyber Threat Intelligence (CTI) extracted from security-related data is structured information used to proactively resistcyber attacks CTI consists of reasoning context mechanismindicators implications and actionable advice about an ex-isting or evolving cyber attack that can be used to createpreventive measures in advance [30] CTI allows subscribersto expand their visibility into the fast-growing threat land-scape and enable early identification and prevention of a

cyber threat Take WannaCry virus as an example if securityguards can timely capture the threat intelligence that indicatesldquoWannacry permeates port 445 to attack systems the mali-cious intrusion can be easily blocked by locking down port445 which is the most direct and effective way of combatingWannaCry virus [7]

Meanwhile social media (eg Blog Twitter) has increas-ingly become an effective medium for exchanging and spread-ing cyber security information on which security experts andorganizations often post their discoveries to reach a wideraudience promptly [32] These posts usually include a mag-nitude of valuable security-related information [25 26] suchas the warnings regarding latest vulnerabilities hacking toolsdata breaches and existing or upcoming software patchesproviding one of the main raw materials for extracting CTIs

Early CTI extraction requires extensive manual inspec-tion of the threat descriptions which becomes rather time-consuming given the enormous volume of such descrip-tions To facilitate the automatic generation and sharing ofCTI a large volume of methods and frameworks are es-tablished such as IODEF [13] STIX [4] TAXII [36] Ope-nIOC [10] and CyBox [28] CleanMX PhishTank IOC Finderand [2223146] The majority of existing methods and frame-works leverage regular expressions to extract IOCs whichmay suffer from a low accuracy due to their inability in pre-defining a comprehensive set of the IOC rules

22 Motivation

The main goal of this research is to address the limitationsof the existing CTI analytics frameworks by modeling theinterdependent relationships among heterogeneous IOCs Asa motivating example given a security-related post ldquoLastweek Lotus exploited CVE-2017-0143 vulnerability to affecta larger number of Vista SP2 and Win7 SP devices in IranCVE-2017-0143 is a remote code execution vulnerability in-cluding a malicious file SMBbatrdquo Most of the existing CTIframeworks can extract specific IOCs but neglect the rela-tionships among them as shown in Figure 1 It is obviousthat such IOCs could not draw a comprehensive picture ofthe threat landscape let alone quantifying their interactiverelationships for in-depth security investigation

Different from the existing CTI frameworks HINTI aimsto implement a computational CTI framework which can notonly extract IOCs efficiently but also model and quantify therelationships between them Here we use the motivating ex-ample to illustrate how HINTI works step-by-step in practiceas follows

(i) First the security-related post is annotated by the B-I-O sequence tagging method [43] as shown in Figure 2where B-X indicates that the element of type X is located atthe beginning of the fragment I-X means that the elementbelonging to type X is located in the middle of the fragmentand O represents a non-essential element of other types In this

242 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

Figure 1 An example of extracted IOCs without any relationsamong them

research we annotated 30000 such training samples from5000 threat description texts which are the raw materialsused to build our IOC extraction model

Figure 2 An annotation example with the B-I-O taggingmethod

(ii) The labeled training samples are then fed into the pro-posed neural network architecture as shown in Figure 6 totrain our proposed IOC extraction model As a result HINTIhas the ability to accurately identify and extract IOCs (egLotus SMBbat) using the proposed multi-granular attentionbased IOC extraction method (see Section 41 for details)

(iii) HINTI then utilizes the syntactic dependency parser[6] (eg subject-predicate-object attributive clause etc) toextract associated relationships between IOCs each of whichis represented as a triple (IOCirelation IOC j) In this moti-vating example HINTI extracts the relationship triples involv-ing (LotusexploitCV E minus 2017minus 0143) (CV E minus 2017minus0143a f f ectVistaSP2) etc Note that the extracted rela-tional triples can be incrementally pooled into an HIN tomodel the interactions among IOCs for depicting a morecomprehensive threat landscape Figure 3 shows a miniaturegraphic representation describing interactive relations amongIOCs extracted from the example Compared with Figure 1 itis obvious that HINTI can depict a more intuitive and compre-hensive threat landscape than the previous approaches In thispaper we mainly consider 9 relationships (R1simR9) among 6different types of IOCs (see Section 42 for details)

(iv) Finally HINTI integrates a CTI computing framework

Figure 3 A miniature of a constructed CTI includes attackervulnerability malicious file attack type device and platformwhich describes a particular threat an attacker utilizes CVE-2017-0143 vulnerability to invade Vista SP2 and Win7 SP1devices CVE-2017-0143 is a remote code execution vulnera-bility that involves a malicious file ldquoSMBbat

based on heterogeneous graph convolutional networks (seeSection 43) to effectively quantify the relationships amongIOCs for knowledge discovery Particularly our proposedCTI computing framework characterizes IOCs and their re-lationships in a low-dimensional embedding space based onwhich CTI subscribers can use any classification (eg SVMNaive Bayes) or clustering algorithms (K-Means DBSCAN)to gain new threat insights such as predicting which attack-ers are likely to intrude their systems and identifying whichvulnerabilities belong to the same category without the expertknowledge In this work we mainly explore three real-worldapplications to verify the effectiveness and efficiency of theCTI computing framework IOC significance ranking (seeSection 61) attack preference modeling (see Section 62)and vulnerability similarity analysis (see Section 63)

23 Preliminaries

In this paper we use heterogeneous information net-work (HIN) to model the relationships among IOCs Here wefirst introduce the preliminary knowledge about HIN

Definition 1 Heterogeneous Information Network ofThreat Intelligence (HINTI) is defined as a directed graphG = (VET ) with an object type mapping function ϕ VrarrMand a link type mapping function Ψ ErarrR Each object visin V belongs to one particular object type in the object typeset M ϕ(v) isinM and each link e isin E belongs to a particularrelation type in the relation type set R Ψ(e)isinR T denotesthe types of nodes and relationships

In this paper we focus on 6 common types of IOCs at-tacker (A) vulnerability (V) device (D) platform (P) mali-cious file (F) and attack type (T) and the links connectingdifferent objects represent different semantic relationshipsTo better understand the object types and relationship types inHINTI it is imperative to provide the meta-level (ie schema-level) description of the network Consequently we introduce

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 243

(a) Network schema (b) Network instance

Figure 4 Network schema and instance of HIN containing 6 types of IOCs (a) The network schema of HIN which depicts

the relationship template among different types of IOCs such as Devicebelongminusminusminusrarr Plat f orm (b) An instance of network schema

which describes the concrete relationships between IOCs by following a network schema eg O f f ice 2012belongminusminusminusrarrWindows

the network schema [37] for describing the meta-level rela-tionships

Definition 2 Network Schema The network schema ofHINTI denoted as HS = (AR) is a meta template for G =(VET ) with the object type mapping ϕ VrarrM and the linktype mapping Φ ErarrR It is a directed graph of object typesM with edges representing relations from R

The schema of HINTI specifies type constraints on the sets ofIOCs and their relationships Figure 4 (a) shows the networkschema of HINTI which defines the relationship templatesamong IOCs to effectively guide the semantic exploration inHINTI For example for a relationship that describes ldquoat-tackers invade devices the semantic schema can be writtenas attacker invademinusminusminusrarrdevice Figure 4 (b) presents a concreteinstance of the network schema

Definition 3 Meta-path A meta-path [37] P is a path se-quence defined on a network schema S = (NR) and is repre-

sented in the form of N1R1minusrarrN2

R2minusrarr middotmiddot middot RiminusrarrNi+1 which definesa composite relation R = R1 R2 middot middot middot Ri+1 where denotesthe composition operator on relations A meta-path P is asymmetric path when the relation R defined by the path issymmetric (ie P is equal to Pminus1)

Table 1 displays the meta-paths considered in HINTI Forexample the relationship ldquothe attackers (A) exploit the samevulnerability (V) can be described by a length-2 meta-path

attackerexploitminusminusminusminusrarr vulnerability

exploitminusminusminusminusminusminusrarr attacker denoted asAVAT (P4) which means that the two attackers exploit thesame vulnerability Similarly AV DPDTV T AT (P17) portraysa close relationship between IOCs that ldquotwo attackers wholeverage the same vulnerability invade the same type of deviceand ultimately destroy the same type of platform

Table 1 Meta-paths used in HINTI

ID Meta-path

P1 Attacker-Attacker

P2 Device-Device

P3 Vulnerability-Vulnerability

P4 Attacker-Vulnerability-Attacker

P5 Attacker-Device-Attacker

P6 Device-File-Device

P7 Device-Platform-Device

P8 Vulnerability-File-Vulnerability

P9 Vulnerability-Type-Vulnerability

P10 Vulnerability-Device-Vulnerability

P11 Vulnerability-Platform-Vulnerability

P12 Attacker-Device-Platform-Device-Attacker

P13 Attacker-Vul-Device-Vul-Attacker

P14 Attacker-Vul-Platform-Vul-Attacker

P15 Attacker-Vul-Type-Vul-Attacker

P16 Vul-Device-Platform-Device-Vul

P17 Attacker-Vul-Device-Platform-Device-Vul-Attacker

3 Architecture Overview of HINTI

HINTI as a cyber threat intelligence extraction and comput-ing framework is capable of effectively extracting IOCs fromthreat-related descriptions and formalizing the relationshipsamong heterogeneous IOCs to demystify new threat insightsAs shown in Figure 5 HINTI consists of four major compo-nents including

bull Data Collection and IOC Recognition We first de-

244 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

Figure 5 The overall architecture of HINTI HINTI consists of four major components (a) collecting security-related data andextracting threat objects (ie IOCs) (b) modeling interdependent relationships among IOCs into a heterogeneous informationnetwork (c) embedding nodes into a low-dimensional vector space using weight-learning based similarity measure and (d)computing threat intelligence based on graph convolutional networks and knowledge mining

velop a data collection system to automatically capturesecurity-related data from blogs hacker forum posts se-curity news and security bulletins The system utilizesa breadth-first search to collect the HTML source codeand then leverages Xpath (XML Path language) to ex-tract threat-related descriptions After that we utilize amulti-granular attention based IOC recognition methodto extract IOC from the collected threat-related descrip-tions (see Section 41 for details)

bull Relation Extraction and IOC modeling HINTI ad-dresses the challenge of CTI modeling by leveragingheterogeneous information networks which can natu-rally depict the interdependent relationships betweenheterogeneous IOCs As an example Figure 4 shows amodel that capture the interactive relationships among at-tacker vulnerability malicious file attack type platformand device (see Section 42 for details)

bull Meta-path Design and Similarity Measure Meta-path is an effective tool to express the semantic rela-tions among IOCs in constructed HIN For instance

attackerexploitminusminusminusminusrarr vulnerability

exploitminusminusminusminusminusminusrarr attacker indi-cates that two attackers are related by exploiting the samevulnerability We design 17 types of meta-paths (SeeTable 1) to describe the interdependent relationships be-tween IOCs With these meta-paths we present a weight-learning based node similarity computing approach toquantify and embed the relationships as the premise forthreat intelligence computing

bull Threat Computing and Knowledge Mining In thiscomponent an effective threat intelligence computingframework is proposed which can quantify and measure

the relevance among IOCs by leveraging graph convolu-tional network (GCN) Our proposed threat intelligencecomputing framework could reveal richer security knowl-edge within a more comprehensive threat landscape

4 Methodology

41 Multi-granular Attention Based IOC Ex-traction

Extracting IOCs from multi-source threat texts is one of themajor tasks of threat intelligence analytics and the qualityof the extracted IOCs significantly influences the analysisresults of cyber threats Recently Bidirectional Long Short-Term Memory+Conditional Random Fields (BiLSTM+CRF)model [15] has demonstrated excellent performance in textchunking and Named-entity Recognition (NER) Howeverdirectly applying this model to IOC extraction is unlikely tosucceed since threat texts usually contain a large number ofthreat objects with different grams and irregular structuresConsequently we need an efficient method to learn the dis-criminative characteristics of IOCs with different sizes In thispaper we propose a multi-granular attention based IOC extrac-tion method which can extract threat objects with differentgranularity Particularly Figure 6 presents the proposed IOCextraction framework which leverages the multi-granular at-tention mechanism to characterize IOCs Different from thetraditional BiLSTM+CRF model we introduce new word-embedding features with different granularities to capture thecharacteristics of IOCs with different sizes Furthermore weutilize a self-attention mechanism to learn the importance ofthe features to improve the accuracy of IOC extraction

Our proposed method takes a threat description sentenceX = (x1x2 middot middot middot xi) as input where xi represents i-th word

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 245

Figure 6 The framework of multi-granular IOC extraction

in X We first chunk the sentence into n-gram componentsincluding char-level 1-gram 2-gram and 3-gram which arethe inputs of our trained model written as follows

e jxi=V j

embedding(xi) (1)

where V jembedding transforms the chunk with granularity j into

a vector space and xi is the i-th word in a sentence X Thusthe threat description sentence Xi can be vectorized as follows

rarrh j

i = LST M f orward([e jx0e j

x1 middot middot middot e j

xi])

larrh j

i = LST Mbackward([e jx0e j

x1 middot middot middot e j

xi])

(2)

whererarrh j

i andlarrh j

i are the embedded features learned byforward LSTM and backward LSTM respectively Let O bethe output of Bi-LSTM which is a weighted sum of embeddedfeatures with weights corresponding to the importance ofdifferent features

O = H middotW T (3)

where H =j

sum~βiσ(h

j1h

j2 middot middot middot h

ji ) h j

i = (rarrh j

i +larrh j

i ) ~βi is theweight vector to represent the importance of h j

i in whichj and i are the segmentation granularity of sentences and thecorresponding index of the chunk W is the parameter matrix

Given a security-related sentence X = (x1x2 middot middot middot xi) itscorresponding threat object sequence Y = (y1 y2 middot middot middot yi) andits output of Bi-LSTM O we can compute the overall labelscore of X and Y as follows

S(X Y ) =n

sumi=0

(Ayiyi+1 +Oiyi) (4)

where Ayiyi+1 is the state transition matrix in CRF model andOiyi as the output of Bi-LSTM hidden layer (calculated byEq (3)) represents the label score of i-th word corresponding

to the type yi Next we utilize so f tmax function to normalizethe overall label score

p(Y |X) =eS(X Y )

sumyisinYX

eS(X Y )(5)

We design an objective function to maximize the proba-bility p(Y |X) to achieve the highest label score for differentIOCs which can be written as follows

argmax log(p(Y |X)) = argmax (S(X Y )minus

log( sumyisinYX

eS(X y))) (6)

By solving the objective function above we assign correctlabels to the n-gram components according to which we canidentify the IOCs with different lengths Our multi-granularattention based IOC extraction method is capable of identify-ing different types of IOCs and its evaluation is presented inSection 5

42 Cyber Threat Intelligence ModelingCTI modeling is an important step to explore the intricaterelationship between heterogeneous IOCs In our work HINis introduced to group different types of IOCs into a graphto explore their interactive relationships In this section weportray the main principle of threat intelligence modeling

To model the intricate interdependent relationships amongIOCs we define the following 9 relationships among 6 typesof IOCs as follows

bull R1 To depict the relation of an attacker and the ex-ploited vulnerability we construct the attacker-exploit-vulnerability matrix A For each element Ai j isin 01Ai j=1 indicates attacker i exploits vulnerability j

bull R2 To depict the relation of an attacker and a devicewe build the attacker-invade-device matrix D For eachelement Di j isin 01 Di j=1 indicates attacker i invadesdevice j

bull R3 Two attacker can cooperate to attack a target Tostudy the relationship of attacker-attacker we constructthe attacker-cooperate-attacker matrix C For each ele-ment Ci j isin 01 Ci j=1 indicates there exists a cooper-ative relationship between attacker i and j

bull R4 To describe the relation of a vulnerability and theaffected device we build the vulnerability-affect-devicematrix M For each element Mi j isin 01 Mi j=1 indi-cates vulnerability i affects device j

bull R5 A vulnerability is often labeled as a specific attacktype by Common Vulnerabilities and Exposures (CVE)

246 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

system7 To explore the relation of vulnerability-attacktype we build the vulnerability-belong-attack type ma-trix G where each element Gi j isin 01 denotes if vul-nerability i belongs to an attack type j

bull R6 A vulnerability often involves one or more maliciousfiles To describe the relation of vulnerability-file webuild the vulnerability-include-file matrix F For eachelement Fi j isin 01 Fi j=1 denotes that vulnerability iincludes malicious file j

bull R7 A malicious file often targets a specific device Weestablish the file-target-device matrix T to explore therelation of file-device For each element Ti j isin 01Ti j=1 indicates malicious file i targets device j

bull R8 Oftentimes a vulnerability evolves from anotherTo study the relationship of vulnerability-vulnerabilitywe build the vulnerability-evolve-vulnerability matrixE where each element Ei j isin 01 indicates if vulnera-bility i evolves from vulnerability j

bull R9 To depict the relation device-platform that a de-vice belongs to a platform we build the device-belong-platform matrix P where each element Pi j isin 01 il-lustrates if device i belongs to platform j

Based on the above 9 types of relationships HINTIleverages the syntactic dependency parser [6] (eg subject-predicate-object attributive clause etc) to automatically ex-tract the 9 relationships among IOCs from threat descriptionseach of which is represented as a triple (IOCirelation IOC j)For instance given a security-related description ldquoOn May12 2017 WannaCry exploited the MS17-010 vulnerabilityto affect a larger number of Windows devices which is aransomware attack via encrypted disks Using the syntacticdependency parser we can extract the following triples (Wan-naCry exploit MS17-010) (MS17-010 affect Windows de-vice) (WannaCry is ransomware) Such triples extractedfrom various data sources can be incrementally assembledinto HINTI to model the relationships among IOCs whichcould offer a more comprehensive threat landscape that de-scribes the threat context Particularly we further define 17types of meta-paths shown in Table 1 to probe into the interde-pendent relationships over attackers vulnerabilities maliciousfiles attack types devices and platforms HINTI is able toconvey a richer context of threat events by scrutinizing 17types of meta-paths and reveal the in-depth threat insightsbehind the heterogeneous IOCs (see Section 6 for details)

43 Threat Intelligence ComputingIn this section we illustrate the concept of threat intelligencecomputing and design a general threat intelligence computing

7httpcvemitreorg

framework based on heterogeneous graph convolutional net-works which quantifies and measures the relevance betweenIOCs by analyzing meta-path based semantic similarity Herewe first provide a formal definition of threat intelligence com-puting based on heterogeneous graph convolutional networks

Definition 4 Threat Intelligence Computing Based onHeterogeneous Graph Convolutional Networks Given thethreat intelligence graph G = (VE) the meta-path set M =P1P2 middot middot middot Pi The threat intelligence computing i) com-putes the similarity between IOCs based on meta-path Pi togenerate corresponding adjacency matrix Ai ii) constructsthe feature matrix of nodes Xi by embedding attribute in-formation of IOCs into a latent vector space iii) conductsgraph convolution GCN(AiXi) to quantify the interdependentrelationships between IOCs by following meta-path Pi andembeds them into a low-dimensional space

The threat intelligence computing aims to model the seman-tic relationships between IOCs and measure their similaritybased on meta-paths which can be used for advanced secu-rity knowledge discovery such as threat object classificationthreat type matching threat evolution analysis etc Intuitivelythe objects connected by the most significant meta-paths tendto bear more similarity [37] In this paper we propose aweight-learning based threat intelligence similarity measurewhich uses self-attention to improve the performance of simi-larity measurement between any two IOCs This method canbe formalized as below

Definition 5 Weight-learning based Node Similar-ity Measure Given a set of symmetric meta-path setP = [Pm]

Mprime

m=1 the similarity S(hih j) between any two IOCshi and h j is defined as

S(hih j) =Mprime

summ

rarrw

2middot | hirarr j isin Pm || hirarri isin Pm |+ | h jrarr j isin Pm |

(7)

where hirarr j isin hm is a path instance between IOC hi andh j following meta-path Pm hirarri isin Pm is that between IOCinstance hi and hi and h jrarr j isin Pm is that between IOC in-stance h j and h j where | hirarr j isin Pm |=CPm(i j) | hirarri isinPm |=CPm(i i) | h jrarr j isin Pm |=CPm( j j) and CPm is thecommuting matrix based on meta-path Pm defined below

rarrw =

[w1 wm wMprime ] denote the meta-path weights wherewm is the weight of meta-paths Pm and M

primeis the number of

meta-pathsS(hih j) is defined in two parts (1) the semantic overlap

in the numerator which describes the number of meta-pathbetween IOC instance hi and h j (2) and the semantic broad-ness in the denominator which depicts the number of totalmeta-paths between themselves The larger number of meta-path between IOC instance hi and h j the more similar thetwo IOCs are which is normalized by the semantic broadness

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 247

of denominator Moreover different from existing similaritymeasures [37] we propose an attention mechanism basedsimilarity measure method by introducing the weight vec-torrarrw = [w1 wm wMprime ] which is a trainable coefficient

vector to learn the importance of different meta-paths forcharacterizing IOCs

Obviously it is computationally expensive to measure thesimilarity among IOCs in the constructed heterogeneousgraph as it usually requires to randomly walk a larger numberof nodes in the graph Fortunately in our work it is unneces-sary to walk through the entire graph as we prescribe a limitby introducing predefined meta-paths and we only focus onthe symmetrical meta-paths presented in Table 1 To calcu-late the similarity between IOCs under different meta-pathinstances we need to compute the corresponding commutingmatrices [37] following the meta-paths

Given a meta-path set P = sumMprime

m A1A2 middot middot middot Al+1 themeta-path based commuting matrix can be defined as CP =UA1A2 UA2A3 middot middot middot UAAl+1 where CP(i j) represents the prob-ability of object iisinA1 reaching object j isinAl+1 under the pathP and is a connection operation These symmetric meta-paths not only efficiently reduce the complexity of walkingbut also ensures that the commuting matrix can be easily de-composed which greatly reduces the computational costs Inaddition the symmetric meta-paths in the graph G allow us toleverage the pairwise random-walk [37] to further acceleratethe computation

With Eq (7) and pairwise random-walk we can obtain thesimilarity embedding between any two IOCs hi and h j undera meta-path set P Based on the low-dimensional similarityembedding we derive a weighted adjacent matrix betweenIOCs denoted as Ai isin RNtimesN where N is the number of aspecific type of IOC in G Meanwhile to utilize the attributedinformation of nodes we train a word2vec model [24] toembed the attribute information of nodes into a feature matrixXi isin RNtimesd where N is the number of IOCs in Ai and d is thedimension of node feature With the learned adjacency matrixAi and its feature matrix Xi we can leverage the classicalGCN [18] to characterize the relationship between IOC hiand h j Particularly the layer-wise propagation rule of GCNcan be defined as below

H(l+1) = σ(Dminus12 ADminus

12 H(l)W (l)) (8)

where A = A+ IN is the adjacency matrix of IOCs with self-connections IN is the identity matrix Dii =sum j Ai j and W (l) isa l-th layer trainable weight matrix σ(middot) denotes an activationfunction such as relu H(l) isin RNtimesd is the matrix of activationin the l-th layer We perform graph convolution [18] on Aiand Xi to generate the embedding Z between IOCs belongingto type i

Z = f (XiAi) = σ(Ai middot relu(AiXiW(0)i )W (1)

i ) (9)

where W (0)i isin RdtimesH is an input-to-hidden weight matrix for a

hidden layer with H feature mapsW (1)i isinRHtimesF is a hidden-to-

output weight matrix with F feature maps in the output layerXi isin RNtimesd N is the number of a specific type of IOCs d is thedimension of their corresponding features and σ is anotheractivation function such as sigmoid Ai = Dminus

12 AiDminus

12 can be

calculated offline Here we leverage the cross-entropy loss tooptimize the performance of our proposed threat intelligenceframework written as follows

Loss(Yl f Zl f ) =minussumlisinYl

F

sumf=1

Yl f middot lnZl f (10)

where Yl is the set of node indices that have labels Yl f is thereal label and Zl f is a corresponding label that our modelpredicts Based on Eq (10) we conduct stochastic gradientdescent to continuously optimize the neural network weightsW (0)

i W (1)i and

rarrw to reduce the loss and build a general

threat intelligence computing framework Using this frame-work security organizations are able to mine richer securityknowledge hidden in the interdependent relationships amongIOCs

5 Experimental Evaluation

51 Dataset and SettingsWe develop a threat data collector to automatically collectcyber threat data from a set of sources including 73 inter-national security blogs (eg fireeye cloudflare) hacker fo-rum posts (eg Blackhat Hack5) security bulletins (egMicrosoft Cisco) CVE detail description and ExploitDB Acomplete list of data sources is presented in the Baidu cloud8We set up a daemon to collect the newly generated securityevents every day So far more than 245786 security-relateddata describing threat events have been collected For trainingand evaluating our proposed IOC extraction method 30000samples from 5000 texts are annotated by utilizing the B-I-Osequence tagging method (see Section 22 for the example)and an annotation example is shown in Figure 2

For 30000 labeled samples we randomly select 60 ofsamples as a training set 20 of samples as a verificationset and the rest of the samples as our test set Based on thedata sets we comprehensively evaluate the performance ofHINTI for extracting IOCs and threat intelligence computingWe run all of the experiments on 16 cores Intel(R) Core(TM)i7-6700 CPU 340GHz with 64GB RAM and 4times NVIDIATesla K80 GPU The software programs are executed on theTensorFlow-GPU framework on Ubuntu 1604

52 Evaluation of IOC ExtractionA set of experiments are conducted to evaluate the sensitiv-ity of different parameters in the multi-granular based IOC

8httpspanbaiducoms1J631WMYY_T_awa8aY5xy3A

248 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

extraction model We mainly consider 8 hyper-parametersthat seriously impact the performance of the model as shownin Table 2 More specifically Embedding_dim is one ofthe most important factors that determine the generaliza-tion capability of the model Here we fix other parame-ters while fine-tuning the embedding size in the range of(50100150200250300350400) Experimental resultsshow that the accuracy of extracted IOC achieves the bestwhen Embedding_dim=300 Learning_rate is another ma-jor factor for determining the stride of gradient descent inminimizing the loss function which determines whetherthe model can find a global optimal solution We fix otherparameters to fine-tune the Learning_rate in the rangeof (000100050010050105) and the performancereaches the best when the Learning_rate=0001 Similarlywe fine-tune the other hyper-parameters with 5000 epochsand the hyper-parameters allowing our model to perform op-timally are recorded in Table 2

Table 2 Hyperparameters setting in the multi-granular basedIOC extraction method

Parameter value Parameter Value

Embedding_dim 300 Hidden_dim 128

Sequence_length 500 Epoch_num 5000

Learning_rate 0001 Batch_size 64

Dropout_rate 05 Optimizer Adam

Table 3 Performance of IOC extraction wrt IOC types

IOC Type Precision Recall Micro-F1

IP 9956 9952 9954File 9436 9688 9560Type 9986 9981 9983

Email 9932 9987 9949Device 9326 9278 9302Vender 9307 9445 9424Version 9698 9799 9748Domain 9658 9589 9623Software 8825 8931 8878Function 9503 9559 9531Platform 9431 9257 9343Malware 8976 9123 9049

Vulnerability 9925 9873 9899Other 9829 9842 9835

In this paper we extract 13 major types of IOCs and the

Figure 7 Performance of IOC extraction using embeddingfeatures with different granularity

performance is presented in Table 3 Overall our IOC ex-traction method demonstrates excellent performance in termsof precision recall and Micro-F1 (ie micro-averaged F1-score) for most types of IOCs such as function maliciousIP and device However we observe a performance degra-dation when recognizing software and malware This can beattributed to the fact that most software and malware is namedby random strings such as md5 hash Moreover we find thatthe number of training samples impacts the performance ofthe model Specifically the performance becomes unsatisfac-tory (eg Software Malware) when the number of a certaintype of training samples is insufficient (ie less than 5000)

In order to verify the effectiveness of multi-granular embed-ding features we assess the performance of IOC extractionwith features of different granularity including char-level1-gram 2-gram 3-gram and multi-granular features The ex-perimental results are demonstrated in Figure 7 from whichwe can observe that the proposed multi-granular embeddingfeature outperforms others since it leverages the attentionmechanism to simultaneously learn multi-granular features tocharacterize different patterns of IOCs

Next to verify the effectiveness of the proposed IOC ex-traction method we compare it with the state-of-the-art en-tity recognition approaches including general NER toolsNLTK NER9 and Stanford NER10 professional IOC extrac-tion method Stucco [16] and iACE [22] and popular en-tity recognition approaches CRF [21] BiLSTM and BiL-STM+CRF [15] The experimental results of different meth-ods on real-world data are demonstrated in Table 4

The results indicate that our proposed IOC extraction out-performs the state-of-the-art entity recognition methods andtools in terms of precision recall and Micro-F1 and its im-provement can be attributed to the following factors Firstcompared with Standford NER and NLTK NER the NLP tools

9httpwwwnltkorgbookch07html10httpsstanfordnlpgithubioCoreNLPnerhtml

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 249

Table 4 Performance of threat entity recognition usingdifferent methods

Method Accuracy Precision Micro-F1

NLTK NER 6945 6851 6749

Stanford NER 6835 6674 6858

iACE 9214 9126 9225

Stucco 9116 9221 9147

CRF 9264 9180 9265

BiLSTM 9478 9521 9435

BiLSTM+CRF 9638 9642 9627

Multi-granular 9859 9872 9869

trained with general news corpora our method uses a security-related training corpus collected and labeled by ourselves asa data source for training our model Second different fromthe rule-based extraction approaches (eg iACE and Stucco)our proposed deep learning based method provides an end-to-end system with more advanced features to represent variousIOCs Third comparing to RNN-based methods (eg BiL-STM and BiLSTM+CRF) our proposed method brings inmulti-granular embedding sizes (char-level 1-gram 2-gramand 3-gram) to simultaneously learn the characteristics of vari-ous sizes and types of IOCs which can identify more complexand irregular IOCs Last but not the least our method imple-ments an attention mechanism to learn the weights of featureswith various scales to effectively characterize different typesof IOCs further enhancing the IOC recognition accuracy

6 Application of Threat Intelligence Comput-ing

Our proposed threat intelligence computing framework basedon heterogeneous graph convolutional networks can be usedto mine novel security knowledge behind heterogeneous IOCsIn this section we evaluate its effectiveness and applicabilityusing three real-world applications profiling and ranking forCTIs attack preference modeling and vulnerability similarityanalysis

61 Threat Profiling and Significance Rankingof IOCs

Due to the disparity in the significance of threats it is im-portant to derive the threat profile and rank the significanceof IOCs for demystifying the landscape of threats Howevermost of the existing CTIs are incapable of modeling the asso-ciated relationships between heterogeneous IOCs

Different from isolated CTIs HINTI leverages HIN to

model the interdependent relationships among IOCs withtwo characteristics first the isolated IOCs can be integratedinto a graph-based HIN to clearly display the associated rela-tionships among IOCs which is capable of directly depictingthe basic threat profile For example Figure 3 depicts a threatprofiling sample an attacker utilizes CVE-2017-0143 vulnera-bility to invade Vista SP2 and Win7 SP1 devices belonging tothe Microsoft platform and CVE-2017-0143 is a remote codeexecution vulnerability that uses a ldquoSMBbat malicious fileSecond the significance of IOCs in HINTI can be naturallyranked based on the proposed threat intelligence computingframework

Table 5 shows the top 5 authoritative ranking score [35] ofvulnerability attacker attack type and platform from whichsecurity experts can gain a clear insight into the impact ofeach IOC Degree centrality [33] which describes the numberof links incident upon a node is widely used in evaluatingthe importance of a node in a graph It can used to quantifythe immediate risk of a node that connects with other nodesfor delivering network flows such as virus spreading Heredegree centrality can be utilized in verifying the effectivenessof the proposed threat intelligence computing framework inranking the importance of IOCs It is worth noting that bothour ranking method and degree centrality work regardless ofthe time of attacks We compute the degree centrality rank-ing of IOCs based on the fact that the node with a higherdegree centrality is more important than a node with a lowerone For instance if the degree centrality of a vulnerabilityis higher it indicates that this vulnerability is exploited bymore attackers or it affects more devices The ranking resultof degree centrality shown in Table 5 is consistent with theranking result based on the proposed threat intelligence com-puting framework demonstrating the capability of the CTIcomputing framework in ranking the importance of differenttypes of IOCs

62 Attack Preference Modeling

Attack preference modeling is meaningful for security organi-zations to gain insight into the attack intention of attackersbuild attack portraits and develop personalized defense strate-gies Here we leverage HINTI to integrate different types ofIOCs and their interdependent relationships to comprehen-sively depict the picture of attack events which helps modelthe attack preferences With the proposed threat intelligencecomputing framework we model attack preferences by clus-tering the embedded attackersrsquo vectors

In this task each malicious IP address is treated as an in-truder and its attack preferences are mainly reflected in threefeatures including the platforms it destroys (including Win-dows Linux Unix ASP Android Apache etc) the industriesit invades (eg education finance government Internet ofThings and Industrial control system etc) and the exploittypes it employs (eg DOS Buffer overflow Execute code

250 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

Table 5 The significance ranking of different types of IOCs (CV E1 CVE-2017-0146 CV E2 CVE-2006-5911 CV E3 CVE-2008-6543 CV E4 CVE-2012-1199 CV E4 CVE-2006-4985 AR Authoritative Ranking DC Degree Centrality value)

Vulnerability Attacker Platform Attack Type

No AR DC Monicker AR DC Category AR DC Exploit_type AR DC

CV E1 02713 7643 Meatsploit 02764 549 PHP 04562 17865 Webapps 05494 11648

CV E2 02431 7124 GSR team 01391 327 windows 02242 13793 DOS 01772 8741

CV E3 02132 6833 Ihsan 00698 279 Linux 00736 8792 Overflow 01533 7652

CV E4 01826 6145 Techsa 00695 247 Linux86 00623 8147 CSRF 00966 5433

CV E5 01739 5637 Aurimma 00622 204 ASP 00382 5027 SQL 00251 2171

(a) AV DV T AT (P13) (b) AV DPDTV T AT (P17) (c) AV PV T AT (P14)

Figure 8 The performance of attack preference modeling with different meta-paths in which the preference of attacker i isreduced to a two-dimensional space (xiyi) and each cluster represents a group with a specific attack preference

Sql injection XSS Gain information etc)Specifically we first utilize our proposed threat intelligence

computing framework to embed each attacker into a low-dimensional vector space and then perform DBSCAN algo-rithm on the embedded vector to cluster attackers with thesame preferences into corresponding groups Figure 8 showsthe top 3 clustering results under different types of meta-paths in which the meta-path AV DPDTV T AT (P17) performsthe best performance with compact and well-separated clus-ters indicating that it contains richer semantic relationshipsfor characterizing attack preferences than other meta-paths

To verify the effectiveness of attack preference modelingwe identify 5297 distinct attackers (each unique IP address istreated as an attacker) who have submitted at least 10 cyberattacks For these attackers five cybersecurity researchersconsisting of three doctoral and two master students spentabout fortnight to manually annotate their attack preferencesfrom three perspectives the platforms they destroyed the in-dustries they attacked and the attack types they exploited Toensure the accuracy of data labeling we test the consistencyof the tags for the 5297 attackers and remove the sampleswith ambiguous tags As a result we obtain 3000 sampleswith consistent tags Based on the labeled samples we furtherevaluate the performance of different meta-paths on model-

Table 6 Performance of modeling attack preference withdifferent meta-paths

Metapath Accuracy Precision Micro-F1

P1 7431 7622 7525

P4 7116 7327 7216

P5 6915 7143 7027

P12 7214 7646 7424

P13 7965 8131 8047

P14 7748 7934 7840

P15 8017 7976 7996

P17 8139 8172 8155

ing attack preferences In the attack modeling scenario weonly focus on the meta-paths that both the start node and theend node are attackers The experimental results are demon-strated in Table 6 Obviously different meta-paths displaydifferent abilities in characterizing the attack preferences ofcyber intruders The performance when using P17 is more

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 251

superior than the one with other meta-paths which indicatesthat P17 holds more valuable information that characterizesthe attack preferences of cybercriminals since P17 includesthe semantics information of P1P4P5 and P12 sim P15

In addition we compare the capabilities of our proposedcomputing framework with those of other state-of-the-art em-bedding methods in terms of attack preference modelingOur analysis result shows that the accuracy of attack pref-erence modeling reaches 081 which outperforms the exist-ing popular models Node2vec (with precision of 071) [1]metapaht2vec (with precision of 073) [11] and HAN (withprecision of 076) [42] The performance improvement canbe attributed to the following characteristics First our com-puting framework utilizes weight-learning to learn the signifi-cance of different meta-paths for evaluating the similarity be-tween attackers Second the proposed computing frameworkleverages GCN to learn the structural information betweenattackers to obtain more discriminative structural features thatimproves the performance of attack preference modeling

63 Vulnerability Similarity AnalysisVulnerability classification or clustering is crucial for conduct-ing vulnerability trend analysis the correlation analysis ofincidents and exploits and the evaluation of countermeasuresThe traditional vulnerability analysis relies on the manualinvestigation of the source codes which requires expert ex-pertise and consumes considerable efforts In this sectionwe propose an unsupervised vulnerability similarity analy-sis method based on the proposed threat intelligence com-puting framework which can automatically group similarvulnerabilities into corresponding communities Particularlythe vulnerability-related IOCs are first embedded into a low-dimensional vector space using CTI computing frameworkThen the DBSCAN algorithm is performed on the embed-ded vector space to cluster vulnerabilities into correspondingcommunities The clustering results are presented in Figure 9

Figure 9 (c) shows all vulnerabilities are clustered into 12clusters using meta-path V DPDTV T (P16) which is very closeto the classification standard (ie 13) recommended by CVEdetails an authoritative database that publishes vulnerabilityinformation By manually analyzing the training samples wefind that HTTP Response Splitting vulnerability does not ap-pear in our dataset Therefore our cluster number (ie 12)is consistent with CVE Details11 To further validate the ef-fectiveness of threat intelligence computing framework forvulnerability clustering we randomly select 100 vulnerabili-ties from each cluster for manual inspection to measure theconsistency of the vulnerability types in each cluster and theresults are presented in Table 7 Obviously the clustering per-formance of cluster 8 (ie File Inclusion) and cluster 10 (ieDirectory Traversal) is remarkably worse than other clustersTo explain the reason we examine our training data and the

11httpswwwcvedetailscom

Table 7 Accuracy of vulnerability clustering

Cluster ID Vulnerability type Accuracy

cluster1 Denial of Service 8012

cluster2 XSS 8353

cluster3 Execute Code 8150

cluster4 Overflow 7650

cluster5 Gain Privilege 9156

cluster6 Bypass Something 7174

cluster7 CSRF 9327

cluster8 File Inclusion 6172

cluster9 Gain Informa 7042

cluster10 Directory Traversal 6949

cluster11 Memory Corruption 8156

cluster12 SQL Injection 8067

average 7851

computing framework We found that the proportion of thesetwo types of vulnerabilities is too small (cluster 8 is 34and cluster 10 is 42) making our computing frameworkvery likely to be under-fit with insufficient data Howeverthe proposed computing framework performs well on mosttypes of vulnerabilities in an unsupervised manner especiallygiven sufficient samples (eg cluster 7 is 176 and cluster 5is 157)

In addition by examining the clustering results we havean observation that the vulnerabilities in the same cluster arelikely to have evolutionary relationships For instance CVE-2018-0802 an office zero-day vulnerability is evolved fromthe CVE-2017-11882 They both include EQNEDT32exe fileused to edit the formula in Office software which allowsremote attackers to execute arbitrary codes by constructinga malformed font name The modeling and computation ofinterdependent relationships among IOCs in HINTI facilitatethe discovery of such intricate connections between vulnera-bilities

In summary HINTI is capable of depicting a more compre-hensive threat landscape and the proposed CTI computingframework has the ability to bring novel security insightstoward different real-world security applications Howeverthere are still numerous opportunities for enhancing thesesecurity applications Specifically for attack preference mod-eling although each individual IP address is treated as anattacker we cannot determine whether it belongs to a realattacker or is disguised by a proxy Fortunately even if thereal attack address cannot be captured understanding the at-tack preferences of these IP proxies which are widely usedin cybercrime is also meaningful for gaining insight into thecyber threats For vulnerability similarity analysis data imbal-

252 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

(a) AV DV T AT (P13) (b) V DV T (P10) (c) V DPDTV T (P16)

Figure 9 Illustration of the vulnerability similarity analysis based on different meta-paths in which vulnerability i can be reducedinto a two-dimensional space (xiyi) and each cluster indicates a particular type of vulnerability

ance issue affects the performance of model and inadequatetraining samples often result in model underfitting as shownin the case of cluster 8 and cluster 10

7 Related Work

Cyber Treat Intelligence An increasing number of securityvendors and researchers start exploring CTI for protectingsystem security and defending against new threat vectors [28]Existing CTI extraction tools such as IBM X-Force12 Threatcrowd13 Openctiio14 AlienVault15 CleanMX16 and Phish-Tank17 use regular expression to synthesize IOC from thedescriptive texts However these methods often producehigh false positive rate by misjudging legitimate entities asIOCs [22]

Recently Balzarotti et al [2] develop a system to extractIOCs from web pages and identify malicious URLs fromJavaScript codes Sabottke et al [31] propose to detect po-tential vulnerability exploits by extracting and analyzing thetweets that contain ldquoCVErdquo keyword Liao et al [22] present atool iACE for automatically extracting IOCs which excelsat processing technology articles Nevertheless iACE identi-fies IOCs from a single article which does not consider therich semantic information from multi-source texts Zhao etal [46] define different ontologies to describe the relationshipbetween entities based on expert knowledge Numerous popu-lar CTI platforms including IODEF [9] STIX [3] TAXII [40]OpenIOC [13] and CyBox [19] focus on extracting and shar-ing CTI Yet none of the existing approaches could uncoverthe interdependent relations among CTIs extracted from multi-source texts let alone quantifying CTIsrsquo relevance and miningvaluable threat intelligence hidden behind the isolated CTIs

12httpsexchangexforceibmcloudcom13httpswwwthreatcrowdorg14httpsdemoopenctiio15 httpsotxalienvaultcom16httplistclean-mxcom17httpswwwphishtankcom

Heterogeneous Information Network Real-world systemsoften contain a large number of interacting multi-typed ob-jects which can naturally be expressed as a heterogeneousinformation network (HIN) HIN as a conceptual graph repre-sentation can effectively fuse information and exploit richersemantics in interacting objects and links [37] HIN has beenapplied to network traffic analysis [38] public social mediadata analysis [45] and large-scale document analysis [41]Recent applications of HIN include mobile malware detec-tion [14] and opioid user identification [12] In this paper forthe first time we use HIN for CTI modelingGraph Convolutional Network Graph convolutional net-works (GCN) [17] has become an effective tool for address-ing the task of machine learning on graphs such as semi-supervised node classification [17] event classification [29]clustering [8] link prediction [27] and recommended sys-tem [44] Given a graph GCN can directly conduct the con-volutional operation on the graph to learn the nonlinear em-bedding of nodes In our work to discern and reveal the inter-active relationships between IOCs we utilize GCN to learnmore discriminative representation from attributes and graphstructure simultaneously which is the premise for threat intel-ligence computing

8 Discussion

Data Availability The proposed framework assumes thatsufficient threat description data can be obtained for generat-ing comprehensive and the latest CTIs Fortunately with thegrowing prosperity of social media an increasing number ofsecurity-related data (eg blogs posts news and open secu-rity databases) can be collected effortlessly To automaticallycollect security-related data we develop a crawler system tocollect threat description data from 73 international securitysources (eg blogs hacker forum posts security bulletinsetc) providing sufficient raw materials for generating cyberthreat intelligenceModel Extensibility In this paper 6 types of IOCs and 9

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 253

types of relationships are modeled in HINTI However ourproposed framework is extensible in which more types ofIOCs and relationships can be enrolled to represent richer andmore comprehensive threat information such as maliciousdomains phishing Emails attack tools their interactions etcHigh-level Semantic Relations In view of the computa-tional complexity of the model our threat intelligence comput-ing method focues on utilizing the meta-paths to quantify thesimilarity between IOCs while ignoring the influence of themeta-graph on it which inevitably misses characterizing somehigh-level semantic information Nevertheless the proposedcomputing framework introduces an attention mechanism tolearn the signification of different meta-paths to character-ize IOCs and their interactive relationships which effectivelycompensates for the performance degradation caused by ig-noring the meta-graphsSecurity Knowledge Reasoning Although our proposedframework exhibits promising results in CTI extraction andmodeling computing how to implement advanced securityknowledge reasoning and prediction is still an open problemeg it remains challenging to predict whether a vulnerabilitycould potentially affect a particular type of devices in thefuture We will investigate this problem in the future

9 Conclusion

This work explores a new direction of threat intelligence com-puting which aims to uncover new knowledge in the relation-ships among different threat vectors We propose HINTI acyber threat intelligence framework to model and quantify theinterdependent relationships among different types of IOCsby leveraging heterogeneous graph convolutional networksWe develop a multi-granular attention mechanism to learnthe importance of different features and model the interde-pendent relationships among IOCs using HIN We proposethe concept of threat intelligence computing and present ageneral intelligence computing framework based on graphconvolutional networks Experimental results demonstratethat the proposed multi-granular attention based IOC extrac-tion method outperforms the existing state-of-the-art methodsThe proposed threat intelligence computing framework caneffectively mine security knowledge hidden in the interdepen-dent relationships among IOCs which enables crucial threatintelligence applications such as threat profiling and rankingattack preference modeling and vulnerability similarity analy-sis We would like to emphasize that the knowledge discoveryamong interdependent CTIs is a new field that calls for acollaborative effort from security experts and data scientists

In future we plan to develop a predicative and reasoningmodel based on HINTI and explore preventative countermea-sures to protect cyber infrastructure from future threats Wealso plan to add more types of IOCs and relations to depicta more comprehensive threat landscape Moreover we willleverage both meta-paths and meta-graphs to characterize the

IOCs and their interactions to further improve the embeddingperformance and to strike a balance between the accuracyand computational complexity of the model We will alsoinvestigate the feasibility of security knowledge predictionbased on HINTI to infer the potential future relationshipsbetween the vulnerabilities and devices

Acknowledgement

We would like to thank our shepherd Tobias Fiebig andthe anonymous reviewers for providing valuable feedbackon our work We also thank Hao Peng and Lichao Sunfor their feedback on the early version of this work Thiswork was supported in part by National Science Founda-tion grants CNS1950171 CNS-1949753 It was also sup-ported by the NSFC for Innovative Research Group ScienceFund Project (62141003) National Key RampD Program China(2018YFB0803 503) the 2018 joint Research Foundationof Ministry of Education China Mobile (MCM20180507)and the Opening Project of Shanghai Trusted Industrial Con-trol Platform (TICPSH202003020-ZC) Any opinions find-ings and conclusions or recommendations expressed in thismaterial do not necessarily reflect the views of any fundingagencies

References

[1] Jure Leskovec Aditya Grover node2vec Scalable fea-ture learning for networks In Acm Sigkdd InternationalConference on Knowledge Discovery Data Mining2016

[2] Marco Balduzzi Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications In WWW 2016

[3] Sean Barnum Standardizing cyber threat intelligenceinformation with the structured threat information ex-pression (stix) Mitre Corporation 111ndash22 2012

[4] Eric W Burger Michael D Goodman Panos Kam-panakis and Kevin A Zhu Taxonomy model for cyberthreat intelligence information exchange technologiesIn Proceedings of the 2014 ACM Workshop on Infor-mation Sharing amp Collaborative Security pages 51ndash602014

[5] Onur Catakoglu Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications The web conference pages 333ndash3432016

[6] Danqi Chen and Christopher D Manning A fast and ac-curate dependency parser using neural networks pages740ndash750 2014

254 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

[7] Qian Chen and Robert A Bridges Automated behav-ioral analysis of malware a case study of wannacry ran-somware In 16th IEEE ICMLA pages 454ndash460 2017

[8] Weilin Chiang Xuanqing Liu Si Si Yang Li SamyBengio and Chojui Hsieh Cluster-gcn An efficient al-gorithm for training deep and large graph convolutionalnetworks knowledge discovery and data mining pages257ndash266 2019

[9] Roman Danyliw Jan Meijer and Yuri Demchenko Theincident object description exchange format Interna-tional Journal of High Performance Computing Appli-cations 50701ndash92 2007

[10] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 2017

[11] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 ACM 2017

[12] Yujie Fan Yiming Zhang Yanfang Ye and Xin Li Au-tomatic opioid user detection from twitter Transductiveensemble built on different meta-graph based similari-ties over heterogeneous information network In IJCAIpages 3357ndash3363 2018

[13] Fireeye Openioc httpswwwfireeyecomblogthreat-research201310openioc-basicshtml Accessed January 20 2020

[14] Shifu Hou Yanfang Ye Yangqiu Song and Melih Ab-dulhayoglu Hindroid An intelligent android malwaredetection system based on structured heterogeneous in-formation network In Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Dis-covery and Data Mining pages 1507ndash1515 2017

[15] Zhiheng Huang Wei Xu and Kai Yu Bidirectional lstm-crf models for sequence tagging arXiv1508019912015

[16] Michael D Iannacone Shawn Bohn Grant NakamuraJohn Gerth Kelly MT Huffer Robert A Bridges Erik MFerragut and John R Goodall Developing an ontologyfor cyber security knowledge graphs CISR 1512 2015

[17] Thomas Kipf and Max Welling Semi-supervised clas-sification with graph convolutional networks arXivLearning 2016

[18] Thomas N Kipf and Max Welling Semi-supervisedclassification with graph convolutional networks arXivpreprint arXiv160902907 2016

[19] Tero Kokkonen Architecture for the cyber security situ-ational awareness system In Internet of Things SmartSpaces and Next Generation Networks and Systemspages 294ndash302 Springer 2016

[20] Mehmet Necip Kurt Yasin Yılmaz and Xiaodong WangDistributed quickest detection of cyber-attacks in smartgrid IEEE Transactions on Information Forensics andSecurity 13(8)2015ndash2030 2018

[21] John Lafferty Andrew Mccallum and Fernando PereiraConditional random fields Probabilistic models for seg-menting and labeling sequence data ICML pages 282ndash289 2001

[22] Xiaojing Liao Yuan Kan Xiao Feng Wang Li Zhouand Raheem Beyah Acing the ioc game Toward auto-matic discovery and analysis of open-source cyber threatintelligence In ACM Sigsac Conference on ComputerCommunications Security 2016

[23] Rob Mcmillan Open threat intelligencehttpwwwgartnercomdoc2487216definition-threat-intelligence AccessedJanuary 20 2020

[24] Tomas Mikolov Kai Chen Greg Corrado and JeffreyDean Efficient estimation of word representations invector space arXiv preprint arXiv13013781 2013

[25] Sudip Mittal Prajit Kumar Das Varish Mulwad Anu-pam Joshi and Tim Finin Cybertwitter Using twitterto generate alerts for cybersecurity threats and vulnera-bilities In Proceedings of the 2016 IEEE Advances inSocial Networks Analysis and Mining pages 860ndash8672016

[26] Eric Nunes Ahmad Diab Andrew Gunn EricssonMarin Vineet Mishra Vivin Paliath John RobertsonJana Shakarian Amanda Thart and Paulo ShakarianDarknet and deepnet mining for proactive cybersecuritythreat intelligence In 2016 IEEE ISI pages 7ndash12 2016

[27] Shirui Pan Ruiqi Hu Guodong Long Jing Jiang LinaYao and Chengqi Zhang Adversarially regularizedgraph autoencoder for graph embedding pages 2609ndash2615 2018

[28] P Pawlinski P Jaroszewski P Kijewski L SiewierskiP Jacewicz P Zielony and R Zuber Actionable infor-mation for security incident response European UnionAgency for Network and Information Security Herak-lion Greece 2014

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 255

[29] Hao Peng Jianxin Li Qiran Gong Yangqiu SongYuanxing Ning Kunfeng Lai and Philip S Yu Fine-grained event categorization with heterogeneous graphconvolutional networks In Proceedings of the 28th In-ternational Joint Conference on Artificial Intelligencepages 3238ndash3245 AAAI Press 2019

[30] Sara Qamar Zahid Anwar Mohammad Ashiqur Rah-man Ehab Al-Shaer and Bei-Tseng Chu Data-drivenanalytics for cyber-threat intelligence and informationsharing Computers amp Security 6735ndash58 2017

[31] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In USENIXSecurity 2015

[32] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In 24thUSENIX Security pages 1041ndash1056 2015

[33] Deepak Sharma and Avadhesha Surolia Degree Cen-trality Springer New York 2013

[34] Saurabh Singh Pradip Kumar Sharma Seo Yeon MoonDaesung Moon and Jong Hyuk Park A comprehensivestudy on apt attacks and countermeasures for futurenetworks and communications challenges and solutionsJournal of Supercomputing pages 1ndash32 2016

[35] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks Principles and methodologiesSynthesis Lectures on Data Mining and Knowledge Dis-covery 3(2)1ndash159 2012

[36] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks a structural analysis approachACM SIGKDD 14(2)20ndash28 2013

[37] Yizhou Sun Jiawei Han Xifeng Yan Philip S Yu andTianyi Wu Pathsim Meta path-based top-k similaritysearch in heterogeneous information networks Proceed-ings of the VLDB Endowment 4(11)992ndash1003 2011

[38] Yizhou Sun Brandon Norick Jiawei Han Xifeng YanPhilip S Yu and Xiao Yu Pathselclus Integrating meta-path selection with user-guided object clustering in het-erogeneous information networks ACM Transactionson TKDD 7(3)11 2013

[39] Wiem Tounsi and Helmi Rais A survey on technicalthreat intelligence in the age of sophisticated cyber at-tacks Computers amp Security 72212ndash233 2018

[40] Thomas D Wagner Esther Palomar Khaled Mahbuband Ali E Abdallah Towards an anonymity supportedplatform for shared cyber threat intelligence risks andsecurity of internet and systems pages 175ndash183 2017

[41] Chenguang Wang Yangqiu Song Haoran Li MingZhang and Jiawei Han Text classification with het-erogeneous information network kernels In 13th AAAI2016

[42] Xiao Wang Houye Ji Chuan Shi Bai Wang Peng CuiP Yu and Yanfang Ye Heterogeneous graph attentionnetwork 2019

[43] Jie Yang Shuailong Liang and Yue Zhang Design chal-lenges and misconceptions in neural sequence labelingarXiv Computation and Language 2018

[44] Rex Ying Ruining He Kaifeng Chen Pong Eksombat-chai William L Hamilton and Jure Leskovec Graphconvolutional neural networks for web-scale recom-mender systems In Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Dis-covery amp Data Mining pages 974ndash983 ACM 2018

[45] Jiawei Zhang Xiangnan Kong and Philip S Yu Trans-ferring heterogeneous links across location-based socialnetworks In The 7th ACM international conference onWeb search and data mining pages 303ndash312 2014

[46] Yishuai Zhao Bo Lang and Ming Liu Ontology-basedunified model for heterogeneous threat intelligence inte-gration and sharing In 2017 11th IEEE InternationalConference on Anti-counterfeiting Security and Identi-fication (ASID) pages 11ndash15 2017

256 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

  • Introduction
  • Background
    • Cyber Threat Intelligence
    • Motivation
    • Preliminaries
      • Architecture Overview of HINTI
      • Methodology
        • Multi-granular Attention Based IOC Extraction
        • Cyber Threat Intelligence Modeling
        • Threat Intelligence Computing
          • Experimental Evaluation
            • Dataset and Settings
            • Evaluation of IOC Extraction
              • Application of Threat Intelligence Computing
                • Threat Profiling and Significance Ranking of IOCs
                • Attack Preference Modeling
                • Vulnerability Similarity Analysis
                  • Related Work
                  • Discussion
                  • Conclusion
Page 2: Cyber Threat Intelligence Modeling Based on Heterogeneous ... · 2.1Cyber Threat Intelligence Cyber Threat Intelligence (CTI) extracted from security-related data is structured information

Then HINTI leverages HIN to model the interdependent re-lationships among heterogeneous IOCs which can depict amore comprehensive picture of threat events Moreover wepropose a novel CTI computing framework to quantify theinterdependent relationships among IOCs which helps un-cover novel security insights In short the main contributionsof this paper are summarized as follows

bull Multi-granular Attention based IOC RecognitionWe propose multi-granular attention based IOC recogni-tion approach to automatically extract cyber threat ob-jects from multi-source threat texts which can learn thesignificance of features with different scales Our modeloutperforms the state-of-the-art methods in terms of IOCrecognition accuracy and recall In total we extract over397730 IOCs from the unstructured threat descriptions

bull Heterogeneous Threat Intelligence Modeling Wemodel different types of IOCs using heterogeneous infor-mation network (HIN) which introduces various meta-paths to capture the interdependent relationships amongheterogeneous IOCs while depicting a more comprehen-sive landscape of cyber threat events

bull Threat Intelligence Computing Framework We arethe first to present the concept of cyber threat intelligencecomputing and design a general computing frameworkas shown in Figure 5 The framework first utilizes aweight-learning based node similarity measure to quan-tify the interdependent relationships between heteroge-neous IOCs and then leverages attention mechanismbased heterogeneous graph convolutional networks toembed the IOCs and their interactive relations

bull Threat Intelligence Prototype System To evaluate theeffectiveness of HINTI we implement a CTI prototypesystem Our system has identified 1262258 relation-ships among 6 types of IOCs including attackers vul-nerabilities malicious files attack types devices andplatforms based on which we further assess the real-world applicability of HINTI using three real-world ap-plications IOC significance ranking attack preferencemodeling and vulnerability similarity analysis

2 Background

21 Cyber Threat IntelligenceCyber Threat Intelligence (CTI) extracted from security-related data is structured information used to proactively resistcyber attacks CTI consists of reasoning context mechanismindicators implications and actionable advice about an ex-isting or evolving cyber attack that can be used to createpreventive measures in advance [30] CTI allows subscribersto expand their visibility into the fast-growing threat land-scape and enable early identification and prevention of a

cyber threat Take WannaCry virus as an example if securityguards can timely capture the threat intelligence that indicatesldquoWannacry permeates port 445 to attack systems the mali-cious intrusion can be easily blocked by locking down port445 which is the most direct and effective way of combatingWannaCry virus [7]

Meanwhile social media (eg Blog Twitter) has increas-ingly become an effective medium for exchanging and spread-ing cyber security information on which security experts andorganizations often post their discoveries to reach a wideraudience promptly [32] These posts usually include a mag-nitude of valuable security-related information [25 26] suchas the warnings regarding latest vulnerabilities hacking toolsdata breaches and existing or upcoming software patchesproviding one of the main raw materials for extracting CTIs

Early CTI extraction requires extensive manual inspec-tion of the threat descriptions which becomes rather time-consuming given the enormous volume of such descrip-tions To facilitate the automatic generation and sharing ofCTI a large volume of methods and frameworks are es-tablished such as IODEF [13] STIX [4] TAXII [36] Ope-nIOC [10] and CyBox [28] CleanMX PhishTank IOC Finderand [2223146] The majority of existing methods and frame-works leverage regular expressions to extract IOCs whichmay suffer from a low accuracy due to their inability in pre-defining a comprehensive set of the IOC rules

22 Motivation

The main goal of this research is to address the limitationsof the existing CTI analytics frameworks by modeling theinterdependent relationships among heterogeneous IOCs Asa motivating example given a security-related post ldquoLastweek Lotus exploited CVE-2017-0143 vulnerability to affecta larger number of Vista SP2 and Win7 SP devices in IranCVE-2017-0143 is a remote code execution vulnerability in-cluding a malicious file SMBbatrdquo Most of the existing CTIframeworks can extract specific IOCs but neglect the rela-tionships among them as shown in Figure 1 It is obviousthat such IOCs could not draw a comprehensive picture ofthe threat landscape let alone quantifying their interactiverelationships for in-depth security investigation

Different from the existing CTI frameworks HINTI aimsto implement a computational CTI framework which can notonly extract IOCs efficiently but also model and quantify therelationships between them Here we use the motivating ex-ample to illustrate how HINTI works step-by-step in practiceas follows

(i) First the security-related post is annotated by the B-I-O sequence tagging method [43] as shown in Figure 2where B-X indicates that the element of type X is located atthe beginning of the fragment I-X means that the elementbelonging to type X is located in the middle of the fragmentand O represents a non-essential element of other types In this

242 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

Figure 1 An example of extracted IOCs without any relationsamong them

research we annotated 30000 such training samples from5000 threat description texts which are the raw materialsused to build our IOC extraction model

Figure 2 An annotation example with the B-I-O taggingmethod

(ii) The labeled training samples are then fed into the pro-posed neural network architecture as shown in Figure 6 totrain our proposed IOC extraction model As a result HINTIhas the ability to accurately identify and extract IOCs (egLotus SMBbat) using the proposed multi-granular attentionbased IOC extraction method (see Section 41 for details)

(iii) HINTI then utilizes the syntactic dependency parser[6] (eg subject-predicate-object attributive clause etc) toextract associated relationships between IOCs each of whichis represented as a triple (IOCirelation IOC j) In this moti-vating example HINTI extracts the relationship triples involv-ing (LotusexploitCV E minus 2017minus 0143) (CV E minus 2017minus0143a f f ectVistaSP2) etc Note that the extracted rela-tional triples can be incrementally pooled into an HIN tomodel the interactions among IOCs for depicting a morecomprehensive threat landscape Figure 3 shows a miniaturegraphic representation describing interactive relations amongIOCs extracted from the example Compared with Figure 1 itis obvious that HINTI can depict a more intuitive and compre-hensive threat landscape than the previous approaches In thispaper we mainly consider 9 relationships (R1simR9) among 6different types of IOCs (see Section 42 for details)

(iv) Finally HINTI integrates a CTI computing framework

Figure 3 A miniature of a constructed CTI includes attackervulnerability malicious file attack type device and platformwhich describes a particular threat an attacker utilizes CVE-2017-0143 vulnerability to invade Vista SP2 and Win7 SP1devices CVE-2017-0143 is a remote code execution vulnera-bility that involves a malicious file ldquoSMBbat

based on heterogeneous graph convolutional networks (seeSection 43) to effectively quantify the relationships amongIOCs for knowledge discovery Particularly our proposedCTI computing framework characterizes IOCs and their re-lationships in a low-dimensional embedding space based onwhich CTI subscribers can use any classification (eg SVMNaive Bayes) or clustering algorithms (K-Means DBSCAN)to gain new threat insights such as predicting which attack-ers are likely to intrude their systems and identifying whichvulnerabilities belong to the same category without the expertknowledge In this work we mainly explore three real-worldapplications to verify the effectiveness and efficiency of theCTI computing framework IOC significance ranking (seeSection 61) attack preference modeling (see Section 62)and vulnerability similarity analysis (see Section 63)

23 Preliminaries

In this paper we use heterogeneous information net-work (HIN) to model the relationships among IOCs Here wefirst introduce the preliminary knowledge about HIN

Definition 1 Heterogeneous Information Network ofThreat Intelligence (HINTI) is defined as a directed graphG = (VET ) with an object type mapping function ϕ VrarrMand a link type mapping function Ψ ErarrR Each object visin V belongs to one particular object type in the object typeset M ϕ(v) isinM and each link e isin E belongs to a particularrelation type in the relation type set R Ψ(e)isinR T denotesthe types of nodes and relationships

In this paper we focus on 6 common types of IOCs at-tacker (A) vulnerability (V) device (D) platform (P) mali-cious file (F) and attack type (T) and the links connectingdifferent objects represent different semantic relationshipsTo better understand the object types and relationship types inHINTI it is imperative to provide the meta-level (ie schema-level) description of the network Consequently we introduce

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 243

(a) Network schema (b) Network instance

Figure 4 Network schema and instance of HIN containing 6 types of IOCs (a) The network schema of HIN which depicts

the relationship template among different types of IOCs such as Devicebelongminusminusminusrarr Plat f orm (b) An instance of network schema

which describes the concrete relationships between IOCs by following a network schema eg O f f ice 2012belongminusminusminusrarrWindows

the network schema [37] for describing the meta-level rela-tionships

Definition 2 Network Schema The network schema ofHINTI denoted as HS = (AR) is a meta template for G =(VET ) with the object type mapping ϕ VrarrM and the linktype mapping Φ ErarrR It is a directed graph of object typesM with edges representing relations from R

The schema of HINTI specifies type constraints on the sets ofIOCs and their relationships Figure 4 (a) shows the networkschema of HINTI which defines the relationship templatesamong IOCs to effectively guide the semantic exploration inHINTI For example for a relationship that describes ldquoat-tackers invade devices the semantic schema can be writtenas attacker invademinusminusminusrarrdevice Figure 4 (b) presents a concreteinstance of the network schema

Definition 3 Meta-path A meta-path [37] P is a path se-quence defined on a network schema S = (NR) and is repre-

sented in the form of N1R1minusrarrN2

R2minusrarr middotmiddot middot RiminusrarrNi+1 which definesa composite relation R = R1 R2 middot middot middot Ri+1 where denotesthe composition operator on relations A meta-path P is asymmetric path when the relation R defined by the path issymmetric (ie P is equal to Pminus1)

Table 1 displays the meta-paths considered in HINTI Forexample the relationship ldquothe attackers (A) exploit the samevulnerability (V) can be described by a length-2 meta-path

attackerexploitminusminusminusminusrarr vulnerability

exploitminusminusminusminusminusminusrarr attacker denoted asAVAT (P4) which means that the two attackers exploit thesame vulnerability Similarly AV DPDTV T AT (P17) portraysa close relationship between IOCs that ldquotwo attackers wholeverage the same vulnerability invade the same type of deviceand ultimately destroy the same type of platform

Table 1 Meta-paths used in HINTI

ID Meta-path

P1 Attacker-Attacker

P2 Device-Device

P3 Vulnerability-Vulnerability

P4 Attacker-Vulnerability-Attacker

P5 Attacker-Device-Attacker

P6 Device-File-Device

P7 Device-Platform-Device

P8 Vulnerability-File-Vulnerability

P9 Vulnerability-Type-Vulnerability

P10 Vulnerability-Device-Vulnerability

P11 Vulnerability-Platform-Vulnerability

P12 Attacker-Device-Platform-Device-Attacker

P13 Attacker-Vul-Device-Vul-Attacker

P14 Attacker-Vul-Platform-Vul-Attacker

P15 Attacker-Vul-Type-Vul-Attacker

P16 Vul-Device-Platform-Device-Vul

P17 Attacker-Vul-Device-Platform-Device-Vul-Attacker

3 Architecture Overview of HINTI

HINTI as a cyber threat intelligence extraction and comput-ing framework is capable of effectively extracting IOCs fromthreat-related descriptions and formalizing the relationshipsamong heterogeneous IOCs to demystify new threat insightsAs shown in Figure 5 HINTI consists of four major compo-nents including

bull Data Collection and IOC Recognition We first de-

244 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

Figure 5 The overall architecture of HINTI HINTI consists of four major components (a) collecting security-related data andextracting threat objects (ie IOCs) (b) modeling interdependent relationships among IOCs into a heterogeneous informationnetwork (c) embedding nodes into a low-dimensional vector space using weight-learning based similarity measure and (d)computing threat intelligence based on graph convolutional networks and knowledge mining

velop a data collection system to automatically capturesecurity-related data from blogs hacker forum posts se-curity news and security bulletins The system utilizesa breadth-first search to collect the HTML source codeand then leverages Xpath (XML Path language) to ex-tract threat-related descriptions After that we utilize amulti-granular attention based IOC recognition methodto extract IOC from the collected threat-related descrip-tions (see Section 41 for details)

bull Relation Extraction and IOC modeling HINTI ad-dresses the challenge of CTI modeling by leveragingheterogeneous information networks which can natu-rally depict the interdependent relationships betweenheterogeneous IOCs As an example Figure 4 shows amodel that capture the interactive relationships among at-tacker vulnerability malicious file attack type platformand device (see Section 42 for details)

bull Meta-path Design and Similarity Measure Meta-path is an effective tool to express the semantic rela-tions among IOCs in constructed HIN For instance

attackerexploitminusminusminusminusrarr vulnerability

exploitminusminusminusminusminusminusrarr attacker indi-cates that two attackers are related by exploiting the samevulnerability We design 17 types of meta-paths (SeeTable 1) to describe the interdependent relationships be-tween IOCs With these meta-paths we present a weight-learning based node similarity computing approach toquantify and embed the relationships as the premise forthreat intelligence computing

bull Threat Computing and Knowledge Mining In thiscomponent an effective threat intelligence computingframework is proposed which can quantify and measure

the relevance among IOCs by leveraging graph convolu-tional network (GCN) Our proposed threat intelligencecomputing framework could reveal richer security knowl-edge within a more comprehensive threat landscape

4 Methodology

41 Multi-granular Attention Based IOC Ex-traction

Extracting IOCs from multi-source threat texts is one of themajor tasks of threat intelligence analytics and the qualityof the extracted IOCs significantly influences the analysisresults of cyber threats Recently Bidirectional Long Short-Term Memory+Conditional Random Fields (BiLSTM+CRF)model [15] has demonstrated excellent performance in textchunking and Named-entity Recognition (NER) Howeverdirectly applying this model to IOC extraction is unlikely tosucceed since threat texts usually contain a large number ofthreat objects with different grams and irregular structuresConsequently we need an efficient method to learn the dis-criminative characteristics of IOCs with different sizes In thispaper we propose a multi-granular attention based IOC extrac-tion method which can extract threat objects with differentgranularity Particularly Figure 6 presents the proposed IOCextraction framework which leverages the multi-granular at-tention mechanism to characterize IOCs Different from thetraditional BiLSTM+CRF model we introduce new word-embedding features with different granularities to capture thecharacteristics of IOCs with different sizes Furthermore weutilize a self-attention mechanism to learn the importance ofthe features to improve the accuracy of IOC extraction

Our proposed method takes a threat description sentenceX = (x1x2 middot middot middot xi) as input where xi represents i-th word

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 245

Figure 6 The framework of multi-granular IOC extraction

in X We first chunk the sentence into n-gram componentsincluding char-level 1-gram 2-gram and 3-gram which arethe inputs of our trained model written as follows

e jxi=V j

embedding(xi) (1)

where V jembedding transforms the chunk with granularity j into

a vector space and xi is the i-th word in a sentence X Thusthe threat description sentence Xi can be vectorized as follows

rarrh j

i = LST M f orward([e jx0e j

x1 middot middot middot e j

xi])

larrh j

i = LST Mbackward([e jx0e j

x1 middot middot middot e j

xi])

(2)

whererarrh j

i andlarrh j

i are the embedded features learned byforward LSTM and backward LSTM respectively Let O bethe output of Bi-LSTM which is a weighted sum of embeddedfeatures with weights corresponding to the importance ofdifferent features

O = H middotW T (3)

where H =j

sum~βiσ(h

j1h

j2 middot middot middot h

ji ) h j

i = (rarrh j

i +larrh j

i ) ~βi is theweight vector to represent the importance of h j

i in whichj and i are the segmentation granularity of sentences and thecorresponding index of the chunk W is the parameter matrix

Given a security-related sentence X = (x1x2 middot middot middot xi) itscorresponding threat object sequence Y = (y1 y2 middot middot middot yi) andits output of Bi-LSTM O we can compute the overall labelscore of X and Y as follows

S(X Y ) =n

sumi=0

(Ayiyi+1 +Oiyi) (4)

where Ayiyi+1 is the state transition matrix in CRF model andOiyi as the output of Bi-LSTM hidden layer (calculated byEq (3)) represents the label score of i-th word corresponding

to the type yi Next we utilize so f tmax function to normalizethe overall label score

p(Y |X) =eS(X Y )

sumyisinYX

eS(X Y )(5)

We design an objective function to maximize the proba-bility p(Y |X) to achieve the highest label score for differentIOCs which can be written as follows

argmax log(p(Y |X)) = argmax (S(X Y )minus

log( sumyisinYX

eS(X y))) (6)

By solving the objective function above we assign correctlabels to the n-gram components according to which we canidentify the IOCs with different lengths Our multi-granularattention based IOC extraction method is capable of identify-ing different types of IOCs and its evaluation is presented inSection 5

42 Cyber Threat Intelligence ModelingCTI modeling is an important step to explore the intricaterelationship between heterogeneous IOCs In our work HINis introduced to group different types of IOCs into a graphto explore their interactive relationships In this section weportray the main principle of threat intelligence modeling

To model the intricate interdependent relationships amongIOCs we define the following 9 relationships among 6 typesof IOCs as follows

bull R1 To depict the relation of an attacker and the ex-ploited vulnerability we construct the attacker-exploit-vulnerability matrix A For each element Ai j isin 01Ai j=1 indicates attacker i exploits vulnerability j

bull R2 To depict the relation of an attacker and a devicewe build the attacker-invade-device matrix D For eachelement Di j isin 01 Di j=1 indicates attacker i invadesdevice j

bull R3 Two attacker can cooperate to attack a target Tostudy the relationship of attacker-attacker we constructthe attacker-cooperate-attacker matrix C For each ele-ment Ci j isin 01 Ci j=1 indicates there exists a cooper-ative relationship between attacker i and j

bull R4 To describe the relation of a vulnerability and theaffected device we build the vulnerability-affect-devicematrix M For each element Mi j isin 01 Mi j=1 indi-cates vulnerability i affects device j

bull R5 A vulnerability is often labeled as a specific attacktype by Common Vulnerabilities and Exposures (CVE)

246 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

system7 To explore the relation of vulnerability-attacktype we build the vulnerability-belong-attack type ma-trix G where each element Gi j isin 01 denotes if vul-nerability i belongs to an attack type j

bull R6 A vulnerability often involves one or more maliciousfiles To describe the relation of vulnerability-file webuild the vulnerability-include-file matrix F For eachelement Fi j isin 01 Fi j=1 denotes that vulnerability iincludes malicious file j

bull R7 A malicious file often targets a specific device Weestablish the file-target-device matrix T to explore therelation of file-device For each element Ti j isin 01Ti j=1 indicates malicious file i targets device j

bull R8 Oftentimes a vulnerability evolves from anotherTo study the relationship of vulnerability-vulnerabilitywe build the vulnerability-evolve-vulnerability matrixE where each element Ei j isin 01 indicates if vulnera-bility i evolves from vulnerability j

bull R9 To depict the relation device-platform that a de-vice belongs to a platform we build the device-belong-platform matrix P where each element Pi j isin 01 il-lustrates if device i belongs to platform j

Based on the above 9 types of relationships HINTIleverages the syntactic dependency parser [6] (eg subject-predicate-object attributive clause etc) to automatically ex-tract the 9 relationships among IOCs from threat descriptionseach of which is represented as a triple (IOCirelation IOC j)For instance given a security-related description ldquoOn May12 2017 WannaCry exploited the MS17-010 vulnerabilityto affect a larger number of Windows devices which is aransomware attack via encrypted disks Using the syntacticdependency parser we can extract the following triples (Wan-naCry exploit MS17-010) (MS17-010 affect Windows de-vice) (WannaCry is ransomware) Such triples extractedfrom various data sources can be incrementally assembledinto HINTI to model the relationships among IOCs whichcould offer a more comprehensive threat landscape that de-scribes the threat context Particularly we further define 17types of meta-paths shown in Table 1 to probe into the interde-pendent relationships over attackers vulnerabilities maliciousfiles attack types devices and platforms HINTI is able toconvey a richer context of threat events by scrutinizing 17types of meta-paths and reveal the in-depth threat insightsbehind the heterogeneous IOCs (see Section 6 for details)

43 Threat Intelligence ComputingIn this section we illustrate the concept of threat intelligencecomputing and design a general threat intelligence computing

7httpcvemitreorg

framework based on heterogeneous graph convolutional net-works which quantifies and measures the relevance betweenIOCs by analyzing meta-path based semantic similarity Herewe first provide a formal definition of threat intelligence com-puting based on heterogeneous graph convolutional networks

Definition 4 Threat Intelligence Computing Based onHeterogeneous Graph Convolutional Networks Given thethreat intelligence graph G = (VE) the meta-path set M =P1P2 middot middot middot Pi The threat intelligence computing i) com-putes the similarity between IOCs based on meta-path Pi togenerate corresponding adjacency matrix Ai ii) constructsthe feature matrix of nodes Xi by embedding attribute in-formation of IOCs into a latent vector space iii) conductsgraph convolution GCN(AiXi) to quantify the interdependentrelationships between IOCs by following meta-path Pi andembeds them into a low-dimensional space

The threat intelligence computing aims to model the seman-tic relationships between IOCs and measure their similaritybased on meta-paths which can be used for advanced secu-rity knowledge discovery such as threat object classificationthreat type matching threat evolution analysis etc Intuitivelythe objects connected by the most significant meta-paths tendto bear more similarity [37] In this paper we propose aweight-learning based threat intelligence similarity measurewhich uses self-attention to improve the performance of simi-larity measurement between any two IOCs This method canbe formalized as below

Definition 5 Weight-learning based Node Similar-ity Measure Given a set of symmetric meta-path setP = [Pm]

Mprime

m=1 the similarity S(hih j) between any two IOCshi and h j is defined as

S(hih j) =Mprime

summ

rarrw

2middot | hirarr j isin Pm || hirarri isin Pm |+ | h jrarr j isin Pm |

(7)

where hirarr j isin hm is a path instance between IOC hi andh j following meta-path Pm hirarri isin Pm is that between IOCinstance hi and hi and h jrarr j isin Pm is that between IOC in-stance h j and h j where | hirarr j isin Pm |=CPm(i j) | hirarri isinPm |=CPm(i i) | h jrarr j isin Pm |=CPm( j j) and CPm is thecommuting matrix based on meta-path Pm defined below

rarrw =

[w1 wm wMprime ] denote the meta-path weights wherewm is the weight of meta-paths Pm and M

primeis the number of

meta-pathsS(hih j) is defined in two parts (1) the semantic overlap

in the numerator which describes the number of meta-pathbetween IOC instance hi and h j (2) and the semantic broad-ness in the denominator which depicts the number of totalmeta-paths between themselves The larger number of meta-path between IOC instance hi and h j the more similar thetwo IOCs are which is normalized by the semantic broadness

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 247

of denominator Moreover different from existing similaritymeasures [37] we propose an attention mechanism basedsimilarity measure method by introducing the weight vec-torrarrw = [w1 wm wMprime ] which is a trainable coefficient

vector to learn the importance of different meta-paths forcharacterizing IOCs

Obviously it is computationally expensive to measure thesimilarity among IOCs in the constructed heterogeneousgraph as it usually requires to randomly walk a larger numberof nodes in the graph Fortunately in our work it is unneces-sary to walk through the entire graph as we prescribe a limitby introducing predefined meta-paths and we only focus onthe symmetrical meta-paths presented in Table 1 To calcu-late the similarity between IOCs under different meta-pathinstances we need to compute the corresponding commutingmatrices [37] following the meta-paths

Given a meta-path set P = sumMprime

m A1A2 middot middot middot Al+1 themeta-path based commuting matrix can be defined as CP =UA1A2 UA2A3 middot middot middot UAAl+1 where CP(i j) represents the prob-ability of object iisinA1 reaching object j isinAl+1 under the pathP and is a connection operation These symmetric meta-paths not only efficiently reduce the complexity of walkingbut also ensures that the commuting matrix can be easily de-composed which greatly reduces the computational costs Inaddition the symmetric meta-paths in the graph G allow us toleverage the pairwise random-walk [37] to further acceleratethe computation

With Eq (7) and pairwise random-walk we can obtain thesimilarity embedding between any two IOCs hi and h j undera meta-path set P Based on the low-dimensional similarityembedding we derive a weighted adjacent matrix betweenIOCs denoted as Ai isin RNtimesN where N is the number of aspecific type of IOC in G Meanwhile to utilize the attributedinformation of nodes we train a word2vec model [24] toembed the attribute information of nodes into a feature matrixXi isin RNtimesd where N is the number of IOCs in Ai and d is thedimension of node feature With the learned adjacency matrixAi and its feature matrix Xi we can leverage the classicalGCN [18] to characterize the relationship between IOC hiand h j Particularly the layer-wise propagation rule of GCNcan be defined as below

H(l+1) = σ(Dminus12 ADminus

12 H(l)W (l)) (8)

where A = A+ IN is the adjacency matrix of IOCs with self-connections IN is the identity matrix Dii =sum j Ai j and W (l) isa l-th layer trainable weight matrix σ(middot) denotes an activationfunction such as relu H(l) isin RNtimesd is the matrix of activationin the l-th layer We perform graph convolution [18] on Aiand Xi to generate the embedding Z between IOCs belongingto type i

Z = f (XiAi) = σ(Ai middot relu(AiXiW(0)i )W (1)

i ) (9)

where W (0)i isin RdtimesH is an input-to-hidden weight matrix for a

hidden layer with H feature mapsW (1)i isinRHtimesF is a hidden-to-

output weight matrix with F feature maps in the output layerXi isin RNtimesd N is the number of a specific type of IOCs d is thedimension of their corresponding features and σ is anotheractivation function such as sigmoid Ai = Dminus

12 AiDminus

12 can be

calculated offline Here we leverage the cross-entropy loss tooptimize the performance of our proposed threat intelligenceframework written as follows

Loss(Yl f Zl f ) =minussumlisinYl

F

sumf=1

Yl f middot lnZl f (10)

where Yl is the set of node indices that have labels Yl f is thereal label and Zl f is a corresponding label that our modelpredicts Based on Eq (10) we conduct stochastic gradientdescent to continuously optimize the neural network weightsW (0)

i W (1)i and

rarrw to reduce the loss and build a general

threat intelligence computing framework Using this frame-work security organizations are able to mine richer securityknowledge hidden in the interdependent relationships amongIOCs

5 Experimental Evaluation

51 Dataset and SettingsWe develop a threat data collector to automatically collectcyber threat data from a set of sources including 73 inter-national security blogs (eg fireeye cloudflare) hacker fo-rum posts (eg Blackhat Hack5) security bulletins (egMicrosoft Cisco) CVE detail description and ExploitDB Acomplete list of data sources is presented in the Baidu cloud8We set up a daemon to collect the newly generated securityevents every day So far more than 245786 security-relateddata describing threat events have been collected For trainingand evaluating our proposed IOC extraction method 30000samples from 5000 texts are annotated by utilizing the B-I-Osequence tagging method (see Section 22 for the example)and an annotation example is shown in Figure 2

For 30000 labeled samples we randomly select 60 ofsamples as a training set 20 of samples as a verificationset and the rest of the samples as our test set Based on thedata sets we comprehensively evaluate the performance ofHINTI for extracting IOCs and threat intelligence computingWe run all of the experiments on 16 cores Intel(R) Core(TM)i7-6700 CPU 340GHz with 64GB RAM and 4times NVIDIATesla K80 GPU The software programs are executed on theTensorFlow-GPU framework on Ubuntu 1604

52 Evaluation of IOC ExtractionA set of experiments are conducted to evaluate the sensitiv-ity of different parameters in the multi-granular based IOC

8httpspanbaiducoms1J631WMYY_T_awa8aY5xy3A

248 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

extraction model We mainly consider 8 hyper-parametersthat seriously impact the performance of the model as shownin Table 2 More specifically Embedding_dim is one ofthe most important factors that determine the generaliza-tion capability of the model Here we fix other parame-ters while fine-tuning the embedding size in the range of(50100150200250300350400) Experimental resultsshow that the accuracy of extracted IOC achieves the bestwhen Embedding_dim=300 Learning_rate is another ma-jor factor for determining the stride of gradient descent inminimizing the loss function which determines whetherthe model can find a global optimal solution We fix otherparameters to fine-tune the Learning_rate in the rangeof (000100050010050105) and the performancereaches the best when the Learning_rate=0001 Similarlywe fine-tune the other hyper-parameters with 5000 epochsand the hyper-parameters allowing our model to perform op-timally are recorded in Table 2

Table 2 Hyperparameters setting in the multi-granular basedIOC extraction method

Parameter value Parameter Value

Embedding_dim 300 Hidden_dim 128

Sequence_length 500 Epoch_num 5000

Learning_rate 0001 Batch_size 64

Dropout_rate 05 Optimizer Adam

Table 3 Performance of IOC extraction wrt IOC types

IOC Type Precision Recall Micro-F1

IP 9956 9952 9954File 9436 9688 9560Type 9986 9981 9983

Email 9932 9987 9949Device 9326 9278 9302Vender 9307 9445 9424Version 9698 9799 9748Domain 9658 9589 9623Software 8825 8931 8878Function 9503 9559 9531Platform 9431 9257 9343Malware 8976 9123 9049

Vulnerability 9925 9873 9899Other 9829 9842 9835

In this paper we extract 13 major types of IOCs and the

Figure 7 Performance of IOC extraction using embeddingfeatures with different granularity

performance is presented in Table 3 Overall our IOC ex-traction method demonstrates excellent performance in termsof precision recall and Micro-F1 (ie micro-averaged F1-score) for most types of IOCs such as function maliciousIP and device However we observe a performance degra-dation when recognizing software and malware This can beattributed to the fact that most software and malware is namedby random strings such as md5 hash Moreover we find thatthe number of training samples impacts the performance ofthe model Specifically the performance becomes unsatisfac-tory (eg Software Malware) when the number of a certaintype of training samples is insufficient (ie less than 5000)

In order to verify the effectiveness of multi-granular embed-ding features we assess the performance of IOC extractionwith features of different granularity including char-level1-gram 2-gram 3-gram and multi-granular features The ex-perimental results are demonstrated in Figure 7 from whichwe can observe that the proposed multi-granular embeddingfeature outperforms others since it leverages the attentionmechanism to simultaneously learn multi-granular features tocharacterize different patterns of IOCs

Next to verify the effectiveness of the proposed IOC ex-traction method we compare it with the state-of-the-art en-tity recognition approaches including general NER toolsNLTK NER9 and Stanford NER10 professional IOC extrac-tion method Stucco [16] and iACE [22] and popular en-tity recognition approaches CRF [21] BiLSTM and BiL-STM+CRF [15] The experimental results of different meth-ods on real-world data are demonstrated in Table 4

The results indicate that our proposed IOC extraction out-performs the state-of-the-art entity recognition methods andtools in terms of precision recall and Micro-F1 and its im-provement can be attributed to the following factors Firstcompared with Standford NER and NLTK NER the NLP tools

9httpwwwnltkorgbookch07html10httpsstanfordnlpgithubioCoreNLPnerhtml

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 249

Table 4 Performance of threat entity recognition usingdifferent methods

Method Accuracy Precision Micro-F1

NLTK NER 6945 6851 6749

Stanford NER 6835 6674 6858

iACE 9214 9126 9225

Stucco 9116 9221 9147

CRF 9264 9180 9265

BiLSTM 9478 9521 9435

BiLSTM+CRF 9638 9642 9627

Multi-granular 9859 9872 9869

trained with general news corpora our method uses a security-related training corpus collected and labeled by ourselves asa data source for training our model Second different fromthe rule-based extraction approaches (eg iACE and Stucco)our proposed deep learning based method provides an end-to-end system with more advanced features to represent variousIOCs Third comparing to RNN-based methods (eg BiL-STM and BiLSTM+CRF) our proposed method brings inmulti-granular embedding sizes (char-level 1-gram 2-gramand 3-gram) to simultaneously learn the characteristics of vari-ous sizes and types of IOCs which can identify more complexand irregular IOCs Last but not the least our method imple-ments an attention mechanism to learn the weights of featureswith various scales to effectively characterize different typesof IOCs further enhancing the IOC recognition accuracy

6 Application of Threat Intelligence Comput-ing

Our proposed threat intelligence computing framework basedon heterogeneous graph convolutional networks can be usedto mine novel security knowledge behind heterogeneous IOCsIn this section we evaluate its effectiveness and applicabilityusing three real-world applications profiling and ranking forCTIs attack preference modeling and vulnerability similarityanalysis

61 Threat Profiling and Significance Rankingof IOCs

Due to the disparity in the significance of threats it is im-portant to derive the threat profile and rank the significanceof IOCs for demystifying the landscape of threats Howevermost of the existing CTIs are incapable of modeling the asso-ciated relationships between heterogeneous IOCs

Different from isolated CTIs HINTI leverages HIN to

model the interdependent relationships among IOCs withtwo characteristics first the isolated IOCs can be integratedinto a graph-based HIN to clearly display the associated rela-tionships among IOCs which is capable of directly depictingthe basic threat profile For example Figure 3 depicts a threatprofiling sample an attacker utilizes CVE-2017-0143 vulnera-bility to invade Vista SP2 and Win7 SP1 devices belonging tothe Microsoft platform and CVE-2017-0143 is a remote codeexecution vulnerability that uses a ldquoSMBbat malicious fileSecond the significance of IOCs in HINTI can be naturallyranked based on the proposed threat intelligence computingframework

Table 5 shows the top 5 authoritative ranking score [35] ofvulnerability attacker attack type and platform from whichsecurity experts can gain a clear insight into the impact ofeach IOC Degree centrality [33] which describes the numberof links incident upon a node is widely used in evaluatingthe importance of a node in a graph It can used to quantifythe immediate risk of a node that connects with other nodesfor delivering network flows such as virus spreading Heredegree centrality can be utilized in verifying the effectivenessof the proposed threat intelligence computing framework inranking the importance of IOCs It is worth noting that bothour ranking method and degree centrality work regardless ofthe time of attacks We compute the degree centrality rank-ing of IOCs based on the fact that the node with a higherdegree centrality is more important than a node with a lowerone For instance if the degree centrality of a vulnerabilityis higher it indicates that this vulnerability is exploited bymore attackers or it affects more devices The ranking resultof degree centrality shown in Table 5 is consistent with theranking result based on the proposed threat intelligence com-puting framework demonstrating the capability of the CTIcomputing framework in ranking the importance of differenttypes of IOCs

62 Attack Preference Modeling

Attack preference modeling is meaningful for security organi-zations to gain insight into the attack intention of attackersbuild attack portraits and develop personalized defense strate-gies Here we leverage HINTI to integrate different types ofIOCs and their interdependent relationships to comprehen-sively depict the picture of attack events which helps modelthe attack preferences With the proposed threat intelligencecomputing framework we model attack preferences by clus-tering the embedded attackersrsquo vectors

In this task each malicious IP address is treated as an in-truder and its attack preferences are mainly reflected in threefeatures including the platforms it destroys (including Win-dows Linux Unix ASP Android Apache etc) the industriesit invades (eg education finance government Internet ofThings and Industrial control system etc) and the exploittypes it employs (eg DOS Buffer overflow Execute code

250 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

Table 5 The significance ranking of different types of IOCs (CV E1 CVE-2017-0146 CV E2 CVE-2006-5911 CV E3 CVE-2008-6543 CV E4 CVE-2012-1199 CV E4 CVE-2006-4985 AR Authoritative Ranking DC Degree Centrality value)

Vulnerability Attacker Platform Attack Type

No AR DC Monicker AR DC Category AR DC Exploit_type AR DC

CV E1 02713 7643 Meatsploit 02764 549 PHP 04562 17865 Webapps 05494 11648

CV E2 02431 7124 GSR team 01391 327 windows 02242 13793 DOS 01772 8741

CV E3 02132 6833 Ihsan 00698 279 Linux 00736 8792 Overflow 01533 7652

CV E4 01826 6145 Techsa 00695 247 Linux86 00623 8147 CSRF 00966 5433

CV E5 01739 5637 Aurimma 00622 204 ASP 00382 5027 SQL 00251 2171

(a) AV DV T AT (P13) (b) AV DPDTV T AT (P17) (c) AV PV T AT (P14)

Figure 8 The performance of attack preference modeling with different meta-paths in which the preference of attacker i isreduced to a two-dimensional space (xiyi) and each cluster represents a group with a specific attack preference

Sql injection XSS Gain information etc)Specifically we first utilize our proposed threat intelligence

computing framework to embed each attacker into a low-dimensional vector space and then perform DBSCAN algo-rithm on the embedded vector to cluster attackers with thesame preferences into corresponding groups Figure 8 showsthe top 3 clustering results under different types of meta-paths in which the meta-path AV DPDTV T AT (P17) performsthe best performance with compact and well-separated clus-ters indicating that it contains richer semantic relationshipsfor characterizing attack preferences than other meta-paths

To verify the effectiveness of attack preference modelingwe identify 5297 distinct attackers (each unique IP address istreated as an attacker) who have submitted at least 10 cyberattacks For these attackers five cybersecurity researchersconsisting of three doctoral and two master students spentabout fortnight to manually annotate their attack preferencesfrom three perspectives the platforms they destroyed the in-dustries they attacked and the attack types they exploited Toensure the accuracy of data labeling we test the consistencyof the tags for the 5297 attackers and remove the sampleswith ambiguous tags As a result we obtain 3000 sampleswith consistent tags Based on the labeled samples we furtherevaluate the performance of different meta-paths on model-

Table 6 Performance of modeling attack preference withdifferent meta-paths

Metapath Accuracy Precision Micro-F1

P1 7431 7622 7525

P4 7116 7327 7216

P5 6915 7143 7027

P12 7214 7646 7424

P13 7965 8131 8047

P14 7748 7934 7840

P15 8017 7976 7996

P17 8139 8172 8155

ing attack preferences In the attack modeling scenario weonly focus on the meta-paths that both the start node and theend node are attackers The experimental results are demon-strated in Table 6 Obviously different meta-paths displaydifferent abilities in characterizing the attack preferences ofcyber intruders The performance when using P17 is more

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 251

superior than the one with other meta-paths which indicatesthat P17 holds more valuable information that characterizesthe attack preferences of cybercriminals since P17 includesthe semantics information of P1P4P5 and P12 sim P15

In addition we compare the capabilities of our proposedcomputing framework with those of other state-of-the-art em-bedding methods in terms of attack preference modelingOur analysis result shows that the accuracy of attack pref-erence modeling reaches 081 which outperforms the exist-ing popular models Node2vec (with precision of 071) [1]metapaht2vec (with precision of 073) [11] and HAN (withprecision of 076) [42] The performance improvement canbe attributed to the following characteristics First our com-puting framework utilizes weight-learning to learn the signifi-cance of different meta-paths for evaluating the similarity be-tween attackers Second the proposed computing frameworkleverages GCN to learn the structural information betweenattackers to obtain more discriminative structural features thatimproves the performance of attack preference modeling

63 Vulnerability Similarity AnalysisVulnerability classification or clustering is crucial for conduct-ing vulnerability trend analysis the correlation analysis ofincidents and exploits and the evaluation of countermeasuresThe traditional vulnerability analysis relies on the manualinvestigation of the source codes which requires expert ex-pertise and consumes considerable efforts In this sectionwe propose an unsupervised vulnerability similarity analy-sis method based on the proposed threat intelligence com-puting framework which can automatically group similarvulnerabilities into corresponding communities Particularlythe vulnerability-related IOCs are first embedded into a low-dimensional vector space using CTI computing frameworkThen the DBSCAN algorithm is performed on the embed-ded vector space to cluster vulnerabilities into correspondingcommunities The clustering results are presented in Figure 9

Figure 9 (c) shows all vulnerabilities are clustered into 12clusters using meta-path V DPDTV T (P16) which is very closeto the classification standard (ie 13) recommended by CVEdetails an authoritative database that publishes vulnerabilityinformation By manually analyzing the training samples wefind that HTTP Response Splitting vulnerability does not ap-pear in our dataset Therefore our cluster number (ie 12)is consistent with CVE Details11 To further validate the ef-fectiveness of threat intelligence computing framework forvulnerability clustering we randomly select 100 vulnerabili-ties from each cluster for manual inspection to measure theconsistency of the vulnerability types in each cluster and theresults are presented in Table 7 Obviously the clustering per-formance of cluster 8 (ie File Inclusion) and cluster 10 (ieDirectory Traversal) is remarkably worse than other clustersTo explain the reason we examine our training data and the

11httpswwwcvedetailscom

Table 7 Accuracy of vulnerability clustering

Cluster ID Vulnerability type Accuracy

cluster1 Denial of Service 8012

cluster2 XSS 8353

cluster3 Execute Code 8150

cluster4 Overflow 7650

cluster5 Gain Privilege 9156

cluster6 Bypass Something 7174

cluster7 CSRF 9327

cluster8 File Inclusion 6172

cluster9 Gain Informa 7042

cluster10 Directory Traversal 6949

cluster11 Memory Corruption 8156

cluster12 SQL Injection 8067

average 7851

computing framework We found that the proportion of thesetwo types of vulnerabilities is too small (cluster 8 is 34and cluster 10 is 42) making our computing frameworkvery likely to be under-fit with insufficient data Howeverthe proposed computing framework performs well on mosttypes of vulnerabilities in an unsupervised manner especiallygiven sufficient samples (eg cluster 7 is 176 and cluster 5is 157)

In addition by examining the clustering results we havean observation that the vulnerabilities in the same cluster arelikely to have evolutionary relationships For instance CVE-2018-0802 an office zero-day vulnerability is evolved fromthe CVE-2017-11882 They both include EQNEDT32exe fileused to edit the formula in Office software which allowsremote attackers to execute arbitrary codes by constructinga malformed font name The modeling and computation ofinterdependent relationships among IOCs in HINTI facilitatethe discovery of such intricate connections between vulnera-bilities

In summary HINTI is capable of depicting a more compre-hensive threat landscape and the proposed CTI computingframework has the ability to bring novel security insightstoward different real-world security applications Howeverthere are still numerous opportunities for enhancing thesesecurity applications Specifically for attack preference mod-eling although each individual IP address is treated as anattacker we cannot determine whether it belongs to a realattacker or is disguised by a proxy Fortunately even if thereal attack address cannot be captured understanding the at-tack preferences of these IP proxies which are widely usedin cybercrime is also meaningful for gaining insight into thecyber threats For vulnerability similarity analysis data imbal-

252 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

(a) AV DV T AT (P13) (b) V DV T (P10) (c) V DPDTV T (P16)

Figure 9 Illustration of the vulnerability similarity analysis based on different meta-paths in which vulnerability i can be reducedinto a two-dimensional space (xiyi) and each cluster indicates a particular type of vulnerability

ance issue affects the performance of model and inadequatetraining samples often result in model underfitting as shownin the case of cluster 8 and cluster 10

7 Related Work

Cyber Treat Intelligence An increasing number of securityvendors and researchers start exploring CTI for protectingsystem security and defending against new threat vectors [28]Existing CTI extraction tools such as IBM X-Force12 Threatcrowd13 Openctiio14 AlienVault15 CleanMX16 and Phish-Tank17 use regular expression to synthesize IOC from thedescriptive texts However these methods often producehigh false positive rate by misjudging legitimate entities asIOCs [22]

Recently Balzarotti et al [2] develop a system to extractIOCs from web pages and identify malicious URLs fromJavaScript codes Sabottke et al [31] propose to detect po-tential vulnerability exploits by extracting and analyzing thetweets that contain ldquoCVErdquo keyword Liao et al [22] present atool iACE for automatically extracting IOCs which excelsat processing technology articles Nevertheless iACE identi-fies IOCs from a single article which does not consider therich semantic information from multi-source texts Zhao etal [46] define different ontologies to describe the relationshipbetween entities based on expert knowledge Numerous popu-lar CTI platforms including IODEF [9] STIX [3] TAXII [40]OpenIOC [13] and CyBox [19] focus on extracting and shar-ing CTI Yet none of the existing approaches could uncoverthe interdependent relations among CTIs extracted from multi-source texts let alone quantifying CTIsrsquo relevance and miningvaluable threat intelligence hidden behind the isolated CTIs

12httpsexchangexforceibmcloudcom13httpswwwthreatcrowdorg14httpsdemoopenctiio15 httpsotxalienvaultcom16httplistclean-mxcom17httpswwwphishtankcom

Heterogeneous Information Network Real-world systemsoften contain a large number of interacting multi-typed ob-jects which can naturally be expressed as a heterogeneousinformation network (HIN) HIN as a conceptual graph repre-sentation can effectively fuse information and exploit richersemantics in interacting objects and links [37] HIN has beenapplied to network traffic analysis [38] public social mediadata analysis [45] and large-scale document analysis [41]Recent applications of HIN include mobile malware detec-tion [14] and opioid user identification [12] In this paper forthe first time we use HIN for CTI modelingGraph Convolutional Network Graph convolutional net-works (GCN) [17] has become an effective tool for address-ing the task of machine learning on graphs such as semi-supervised node classification [17] event classification [29]clustering [8] link prediction [27] and recommended sys-tem [44] Given a graph GCN can directly conduct the con-volutional operation on the graph to learn the nonlinear em-bedding of nodes In our work to discern and reveal the inter-active relationships between IOCs we utilize GCN to learnmore discriminative representation from attributes and graphstructure simultaneously which is the premise for threat intel-ligence computing

8 Discussion

Data Availability The proposed framework assumes thatsufficient threat description data can be obtained for generat-ing comprehensive and the latest CTIs Fortunately with thegrowing prosperity of social media an increasing number ofsecurity-related data (eg blogs posts news and open secu-rity databases) can be collected effortlessly To automaticallycollect security-related data we develop a crawler system tocollect threat description data from 73 international securitysources (eg blogs hacker forum posts security bulletinsetc) providing sufficient raw materials for generating cyberthreat intelligenceModel Extensibility In this paper 6 types of IOCs and 9

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 253

types of relationships are modeled in HINTI However ourproposed framework is extensible in which more types ofIOCs and relationships can be enrolled to represent richer andmore comprehensive threat information such as maliciousdomains phishing Emails attack tools their interactions etcHigh-level Semantic Relations In view of the computa-tional complexity of the model our threat intelligence comput-ing method focues on utilizing the meta-paths to quantify thesimilarity between IOCs while ignoring the influence of themeta-graph on it which inevitably misses characterizing somehigh-level semantic information Nevertheless the proposedcomputing framework introduces an attention mechanism tolearn the signification of different meta-paths to character-ize IOCs and their interactive relationships which effectivelycompensates for the performance degradation caused by ig-noring the meta-graphsSecurity Knowledge Reasoning Although our proposedframework exhibits promising results in CTI extraction andmodeling computing how to implement advanced securityknowledge reasoning and prediction is still an open problemeg it remains challenging to predict whether a vulnerabilitycould potentially affect a particular type of devices in thefuture We will investigate this problem in the future

9 Conclusion

This work explores a new direction of threat intelligence com-puting which aims to uncover new knowledge in the relation-ships among different threat vectors We propose HINTI acyber threat intelligence framework to model and quantify theinterdependent relationships among different types of IOCsby leveraging heterogeneous graph convolutional networksWe develop a multi-granular attention mechanism to learnthe importance of different features and model the interde-pendent relationships among IOCs using HIN We proposethe concept of threat intelligence computing and present ageneral intelligence computing framework based on graphconvolutional networks Experimental results demonstratethat the proposed multi-granular attention based IOC extrac-tion method outperforms the existing state-of-the-art methodsThe proposed threat intelligence computing framework caneffectively mine security knowledge hidden in the interdepen-dent relationships among IOCs which enables crucial threatintelligence applications such as threat profiling and rankingattack preference modeling and vulnerability similarity analy-sis We would like to emphasize that the knowledge discoveryamong interdependent CTIs is a new field that calls for acollaborative effort from security experts and data scientists

In future we plan to develop a predicative and reasoningmodel based on HINTI and explore preventative countermea-sures to protect cyber infrastructure from future threats Wealso plan to add more types of IOCs and relations to depicta more comprehensive threat landscape Moreover we willleverage both meta-paths and meta-graphs to characterize the

IOCs and their interactions to further improve the embeddingperformance and to strike a balance between the accuracyand computational complexity of the model We will alsoinvestigate the feasibility of security knowledge predictionbased on HINTI to infer the potential future relationshipsbetween the vulnerabilities and devices

Acknowledgement

We would like to thank our shepherd Tobias Fiebig andthe anonymous reviewers for providing valuable feedbackon our work We also thank Hao Peng and Lichao Sunfor their feedback on the early version of this work Thiswork was supported in part by National Science Founda-tion grants CNS1950171 CNS-1949753 It was also sup-ported by the NSFC for Innovative Research Group ScienceFund Project (62141003) National Key RampD Program China(2018YFB0803 503) the 2018 joint Research Foundationof Ministry of Education China Mobile (MCM20180507)and the Opening Project of Shanghai Trusted Industrial Con-trol Platform (TICPSH202003020-ZC) Any opinions find-ings and conclusions or recommendations expressed in thismaterial do not necessarily reflect the views of any fundingagencies

References

[1] Jure Leskovec Aditya Grover node2vec Scalable fea-ture learning for networks In Acm Sigkdd InternationalConference on Knowledge Discovery Data Mining2016

[2] Marco Balduzzi Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications In WWW 2016

[3] Sean Barnum Standardizing cyber threat intelligenceinformation with the structured threat information ex-pression (stix) Mitre Corporation 111ndash22 2012

[4] Eric W Burger Michael D Goodman Panos Kam-panakis and Kevin A Zhu Taxonomy model for cyberthreat intelligence information exchange technologiesIn Proceedings of the 2014 ACM Workshop on Infor-mation Sharing amp Collaborative Security pages 51ndash602014

[5] Onur Catakoglu Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications The web conference pages 333ndash3432016

[6] Danqi Chen and Christopher D Manning A fast and ac-curate dependency parser using neural networks pages740ndash750 2014

254 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

[7] Qian Chen and Robert A Bridges Automated behav-ioral analysis of malware a case study of wannacry ran-somware In 16th IEEE ICMLA pages 454ndash460 2017

[8] Weilin Chiang Xuanqing Liu Si Si Yang Li SamyBengio and Chojui Hsieh Cluster-gcn An efficient al-gorithm for training deep and large graph convolutionalnetworks knowledge discovery and data mining pages257ndash266 2019

[9] Roman Danyliw Jan Meijer and Yuri Demchenko Theincident object description exchange format Interna-tional Journal of High Performance Computing Appli-cations 50701ndash92 2007

[10] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 2017

[11] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 ACM 2017

[12] Yujie Fan Yiming Zhang Yanfang Ye and Xin Li Au-tomatic opioid user detection from twitter Transductiveensemble built on different meta-graph based similari-ties over heterogeneous information network In IJCAIpages 3357ndash3363 2018

[13] Fireeye Openioc httpswwwfireeyecomblogthreat-research201310openioc-basicshtml Accessed January 20 2020

[14] Shifu Hou Yanfang Ye Yangqiu Song and Melih Ab-dulhayoglu Hindroid An intelligent android malwaredetection system based on structured heterogeneous in-formation network In Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Dis-covery and Data Mining pages 1507ndash1515 2017

[15] Zhiheng Huang Wei Xu and Kai Yu Bidirectional lstm-crf models for sequence tagging arXiv1508019912015

[16] Michael D Iannacone Shawn Bohn Grant NakamuraJohn Gerth Kelly MT Huffer Robert A Bridges Erik MFerragut and John R Goodall Developing an ontologyfor cyber security knowledge graphs CISR 1512 2015

[17] Thomas Kipf and Max Welling Semi-supervised clas-sification with graph convolutional networks arXivLearning 2016

[18] Thomas N Kipf and Max Welling Semi-supervisedclassification with graph convolutional networks arXivpreprint arXiv160902907 2016

[19] Tero Kokkonen Architecture for the cyber security situ-ational awareness system In Internet of Things SmartSpaces and Next Generation Networks and Systemspages 294ndash302 Springer 2016

[20] Mehmet Necip Kurt Yasin Yılmaz and Xiaodong WangDistributed quickest detection of cyber-attacks in smartgrid IEEE Transactions on Information Forensics andSecurity 13(8)2015ndash2030 2018

[21] John Lafferty Andrew Mccallum and Fernando PereiraConditional random fields Probabilistic models for seg-menting and labeling sequence data ICML pages 282ndash289 2001

[22] Xiaojing Liao Yuan Kan Xiao Feng Wang Li Zhouand Raheem Beyah Acing the ioc game Toward auto-matic discovery and analysis of open-source cyber threatintelligence In ACM Sigsac Conference on ComputerCommunications Security 2016

[23] Rob Mcmillan Open threat intelligencehttpwwwgartnercomdoc2487216definition-threat-intelligence AccessedJanuary 20 2020

[24] Tomas Mikolov Kai Chen Greg Corrado and JeffreyDean Efficient estimation of word representations invector space arXiv preprint arXiv13013781 2013

[25] Sudip Mittal Prajit Kumar Das Varish Mulwad Anu-pam Joshi and Tim Finin Cybertwitter Using twitterto generate alerts for cybersecurity threats and vulnera-bilities In Proceedings of the 2016 IEEE Advances inSocial Networks Analysis and Mining pages 860ndash8672016

[26] Eric Nunes Ahmad Diab Andrew Gunn EricssonMarin Vineet Mishra Vivin Paliath John RobertsonJana Shakarian Amanda Thart and Paulo ShakarianDarknet and deepnet mining for proactive cybersecuritythreat intelligence In 2016 IEEE ISI pages 7ndash12 2016

[27] Shirui Pan Ruiqi Hu Guodong Long Jing Jiang LinaYao and Chengqi Zhang Adversarially regularizedgraph autoencoder for graph embedding pages 2609ndash2615 2018

[28] P Pawlinski P Jaroszewski P Kijewski L SiewierskiP Jacewicz P Zielony and R Zuber Actionable infor-mation for security incident response European UnionAgency for Network and Information Security Herak-lion Greece 2014

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 255

[29] Hao Peng Jianxin Li Qiran Gong Yangqiu SongYuanxing Ning Kunfeng Lai and Philip S Yu Fine-grained event categorization with heterogeneous graphconvolutional networks In Proceedings of the 28th In-ternational Joint Conference on Artificial Intelligencepages 3238ndash3245 AAAI Press 2019

[30] Sara Qamar Zahid Anwar Mohammad Ashiqur Rah-man Ehab Al-Shaer and Bei-Tseng Chu Data-drivenanalytics for cyber-threat intelligence and informationsharing Computers amp Security 6735ndash58 2017

[31] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In USENIXSecurity 2015

[32] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In 24thUSENIX Security pages 1041ndash1056 2015

[33] Deepak Sharma and Avadhesha Surolia Degree Cen-trality Springer New York 2013

[34] Saurabh Singh Pradip Kumar Sharma Seo Yeon MoonDaesung Moon and Jong Hyuk Park A comprehensivestudy on apt attacks and countermeasures for futurenetworks and communications challenges and solutionsJournal of Supercomputing pages 1ndash32 2016

[35] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks Principles and methodologiesSynthesis Lectures on Data Mining and Knowledge Dis-covery 3(2)1ndash159 2012

[36] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks a structural analysis approachACM SIGKDD 14(2)20ndash28 2013

[37] Yizhou Sun Jiawei Han Xifeng Yan Philip S Yu andTianyi Wu Pathsim Meta path-based top-k similaritysearch in heterogeneous information networks Proceed-ings of the VLDB Endowment 4(11)992ndash1003 2011

[38] Yizhou Sun Brandon Norick Jiawei Han Xifeng YanPhilip S Yu and Xiao Yu Pathselclus Integrating meta-path selection with user-guided object clustering in het-erogeneous information networks ACM Transactionson TKDD 7(3)11 2013

[39] Wiem Tounsi and Helmi Rais A survey on technicalthreat intelligence in the age of sophisticated cyber at-tacks Computers amp Security 72212ndash233 2018

[40] Thomas D Wagner Esther Palomar Khaled Mahbuband Ali E Abdallah Towards an anonymity supportedplatform for shared cyber threat intelligence risks andsecurity of internet and systems pages 175ndash183 2017

[41] Chenguang Wang Yangqiu Song Haoran Li MingZhang and Jiawei Han Text classification with het-erogeneous information network kernels In 13th AAAI2016

[42] Xiao Wang Houye Ji Chuan Shi Bai Wang Peng CuiP Yu and Yanfang Ye Heterogeneous graph attentionnetwork 2019

[43] Jie Yang Shuailong Liang and Yue Zhang Design chal-lenges and misconceptions in neural sequence labelingarXiv Computation and Language 2018

[44] Rex Ying Ruining He Kaifeng Chen Pong Eksombat-chai William L Hamilton and Jure Leskovec Graphconvolutional neural networks for web-scale recom-mender systems In Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Dis-covery amp Data Mining pages 974ndash983 ACM 2018

[45] Jiawei Zhang Xiangnan Kong and Philip S Yu Trans-ferring heterogeneous links across location-based socialnetworks In The 7th ACM international conference onWeb search and data mining pages 303ndash312 2014

[46] Yishuai Zhao Bo Lang and Ming Liu Ontology-basedunified model for heterogeneous threat intelligence inte-gration and sharing In 2017 11th IEEE InternationalConference on Anti-counterfeiting Security and Identi-fication (ASID) pages 11ndash15 2017

256 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

  • Introduction
  • Background
    • Cyber Threat Intelligence
    • Motivation
    • Preliminaries
      • Architecture Overview of HINTI
      • Methodology
        • Multi-granular Attention Based IOC Extraction
        • Cyber Threat Intelligence Modeling
        • Threat Intelligence Computing
          • Experimental Evaluation
            • Dataset and Settings
            • Evaluation of IOC Extraction
              • Application of Threat Intelligence Computing
                • Threat Profiling and Significance Ranking of IOCs
                • Attack Preference Modeling
                • Vulnerability Similarity Analysis
                  • Related Work
                  • Discussion
                  • Conclusion
Page 3: Cyber Threat Intelligence Modeling Based on Heterogeneous ... · 2.1Cyber Threat Intelligence Cyber Threat Intelligence (CTI) extracted from security-related data is structured information

Figure 1 An example of extracted IOCs without any relationsamong them

research we annotated 30000 such training samples from5000 threat description texts which are the raw materialsused to build our IOC extraction model

Figure 2 An annotation example with the B-I-O taggingmethod

(ii) The labeled training samples are then fed into the pro-posed neural network architecture as shown in Figure 6 totrain our proposed IOC extraction model As a result HINTIhas the ability to accurately identify and extract IOCs (egLotus SMBbat) using the proposed multi-granular attentionbased IOC extraction method (see Section 41 for details)

(iii) HINTI then utilizes the syntactic dependency parser[6] (eg subject-predicate-object attributive clause etc) toextract associated relationships between IOCs each of whichis represented as a triple (IOCirelation IOC j) In this moti-vating example HINTI extracts the relationship triples involv-ing (LotusexploitCV E minus 2017minus 0143) (CV E minus 2017minus0143a f f ectVistaSP2) etc Note that the extracted rela-tional triples can be incrementally pooled into an HIN tomodel the interactions among IOCs for depicting a morecomprehensive threat landscape Figure 3 shows a miniaturegraphic representation describing interactive relations amongIOCs extracted from the example Compared with Figure 1 itis obvious that HINTI can depict a more intuitive and compre-hensive threat landscape than the previous approaches In thispaper we mainly consider 9 relationships (R1simR9) among 6different types of IOCs (see Section 42 for details)

(iv) Finally HINTI integrates a CTI computing framework

Figure 3 A miniature of a constructed CTI includes attackervulnerability malicious file attack type device and platformwhich describes a particular threat an attacker utilizes CVE-2017-0143 vulnerability to invade Vista SP2 and Win7 SP1devices CVE-2017-0143 is a remote code execution vulnera-bility that involves a malicious file ldquoSMBbat

based on heterogeneous graph convolutional networks (seeSection 43) to effectively quantify the relationships amongIOCs for knowledge discovery Particularly our proposedCTI computing framework characterizes IOCs and their re-lationships in a low-dimensional embedding space based onwhich CTI subscribers can use any classification (eg SVMNaive Bayes) or clustering algorithms (K-Means DBSCAN)to gain new threat insights such as predicting which attack-ers are likely to intrude their systems and identifying whichvulnerabilities belong to the same category without the expertknowledge In this work we mainly explore three real-worldapplications to verify the effectiveness and efficiency of theCTI computing framework IOC significance ranking (seeSection 61) attack preference modeling (see Section 62)and vulnerability similarity analysis (see Section 63)

23 Preliminaries

In this paper we use heterogeneous information net-work (HIN) to model the relationships among IOCs Here wefirst introduce the preliminary knowledge about HIN

Definition 1 Heterogeneous Information Network ofThreat Intelligence (HINTI) is defined as a directed graphG = (VET ) with an object type mapping function ϕ VrarrMand a link type mapping function Ψ ErarrR Each object visin V belongs to one particular object type in the object typeset M ϕ(v) isinM and each link e isin E belongs to a particularrelation type in the relation type set R Ψ(e)isinR T denotesthe types of nodes and relationships

In this paper we focus on 6 common types of IOCs at-tacker (A) vulnerability (V) device (D) platform (P) mali-cious file (F) and attack type (T) and the links connectingdifferent objects represent different semantic relationshipsTo better understand the object types and relationship types inHINTI it is imperative to provide the meta-level (ie schema-level) description of the network Consequently we introduce

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 243

(a) Network schema (b) Network instance

Figure 4 Network schema and instance of HIN containing 6 types of IOCs (a) The network schema of HIN which depicts

the relationship template among different types of IOCs such as Devicebelongminusminusminusrarr Plat f orm (b) An instance of network schema

which describes the concrete relationships between IOCs by following a network schema eg O f f ice 2012belongminusminusminusrarrWindows

the network schema [37] for describing the meta-level rela-tionships

Definition 2 Network Schema The network schema ofHINTI denoted as HS = (AR) is a meta template for G =(VET ) with the object type mapping ϕ VrarrM and the linktype mapping Φ ErarrR It is a directed graph of object typesM with edges representing relations from R

The schema of HINTI specifies type constraints on the sets ofIOCs and their relationships Figure 4 (a) shows the networkschema of HINTI which defines the relationship templatesamong IOCs to effectively guide the semantic exploration inHINTI For example for a relationship that describes ldquoat-tackers invade devices the semantic schema can be writtenas attacker invademinusminusminusrarrdevice Figure 4 (b) presents a concreteinstance of the network schema

Definition 3 Meta-path A meta-path [37] P is a path se-quence defined on a network schema S = (NR) and is repre-

sented in the form of N1R1minusrarrN2

R2minusrarr middotmiddot middot RiminusrarrNi+1 which definesa composite relation R = R1 R2 middot middot middot Ri+1 where denotesthe composition operator on relations A meta-path P is asymmetric path when the relation R defined by the path issymmetric (ie P is equal to Pminus1)

Table 1 displays the meta-paths considered in HINTI Forexample the relationship ldquothe attackers (A) exploit the samevulnerability (V) can be described by a length-2 meta-path

attackerexploitminusminusminusminusrarr vulnerability

exploitminusminusminusminusminusminusrarr attacker denoted asAVAT (P4) which means that the two attackers exploit thesame vulnerability Similarly AV DPDTV T AT (P17) portraysa close relationship between IOCs that ldquotwo attackers wholeverage the same vulnerability invade the same type of deviceand ultimately destroy the same type of platform

Table 1 Meta-paths used in HINTI

ID Meta-path

P1 Attacker-Attacker

P2 Device-Device

P3 Vulnerability-Vulnerability

P4 Attacker-Vulnerability-Attacker

P5 Attacker-Device-Attacker

P6 Device-File-Device

P7 Device-Platform-Device

P8 Vulnerability-File-Vulnerability

P9 Vulnerability-Type-Vulnerability

P10 Vulnerability-Device-Vulnerability

P11 Vulnerability-Platform-Vulnerability

P12 Attacker-Device-Platform-Device-Attacker

P13 Attacker-Vul-Device-Vul-Attacker

P14 Attacker-Vul-Platform-Vul-Attacker

P15 Attacker-Vul-Type-Vul-Attacker

P16 Vul-Device-Platform-Device-Vul

P17 Attacker-Vul-Device-Platform-Device-Vul-Attacker

3 Architecture Overview of HINTI

HINTI as a cyber threat intelligence extraction and comput-ing framework is capable of effectively extracting IOCs fromthreat-related descriptions and formalizing the relationshipsamong heterogeneous IOCs to demystify new threat insightsAs shown in Figure 5 HINTI consists of four major compo-nents including

bull Data Collection and IOC Recognition We first de-

244 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

Figure 5 The overall architecture of HINTI HINTI consists of four major components (a) collecting security-related data andextracting threat objects (ie IOCs) (b) modeling interdependent relationships among IOCs into a heterogeneous informationnetwork (c) embedding nodes into a low-dimensional vector space using weight-learning based similarity measure and (d)computing threat intelligence based on graph convolutional networks and knowledge mining

velop a data collection system to automatically capturesecurity-related data from blogs hacker forum posts se-curity news and security bulletins The system utilizesa breadth-first search to collect the HTML source codeand then leverages Xpath (XML Path language) to ex-tract threat-related descriptions After that we utilize amulti-granular attention based IOC recognition methodto extract IOC from the collected threat-related descrip-tions (see Section 41 for details)

bull Relation Extraction and IOC modeling HINTI ad-dresses the challenge of CTI modeling by leveragingheterogeneous information networks which can natu-rally depict the interdependent relationships betweenheterogeneous IOCs As an example Figure 4 shows amodel that capture the interactive relationships among at-tacker vulnerability malicious file attack type platformand device (see Section 42 for details)

bull Meta-path Design and Similarity Measure Meta-path is an effective tool to express the semantic rela-tions among IOCs in constructed HIN For instance

attackerexploitminusminusminusminusrarr vulnerability

exploitminusminusminusminusminusminusrarr attacker indi-cates that two attackers are related by exploiting the samevulnerability We design 17 types of meta-paths (SeeTable 1) to describe the interdependent relationships be-tween IOCs With these meta-paths we present a weight-learning based node similarity computing approach toquantify and embed the relationships as the premise forthreat intelligence computing

bull Threat Computing and Knowledge Mining In thiscomponent an effective threat intelligence computingframework is proposed which can quantify and measure

the relevance among IOCs by leveraging graph convolu-tional network (GCN) Our proposed threat intelligencecomputing framework could reveal richer security knowl-edge within a more comprehensive threat landscape

4 Methodology

41 Multi-granular Attention Based IOC Ex-traction

Extracting IOCs from multi-source threat texts is one of themajor tasks of threat intelligence analytics and the qualityof the extracted IOCs significantly influences the analysisresults of cyber threats Recently Bidirectional Long Short-Term Memory+Conditional Random Fields (BiLSTM+CRF)model [15] has demonstrated excellent performance in textchunking and Named-entity Recognition (NER) Howeverdirectly applying this model to IOC extraction is unlikely tosucceed since threat texts usually contain a large number ofthreat objects with different grams and irregular structuresConsequently we need an efficient method to learn the dis-criminative characteristics of IOCs with different sizes In thispaper we propose a multi-granular attention based IOC extrac-tion method which can extract threat objects with differentgranularity Particularly Figure 6 presents the proposed IOCextraction framework which leverages the multi-granular at-tention mechanism to characterize IOCs Different from thetraditional BiLSTM+CRF model we introduce new word-embedding features with different granularities to capture thecharacteristics of IOCs with different sizes Furthermore weutilize a self-attention mechanism to learn the importance ofthe features to improve the accuracy of IOC extraction

Our proposed method takes a threat description sentenceX = (x1x2 middot middot middot xi) as input where xi represents i-th word

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 245

Figure 6 The framework of multi-granular IOC extraction

in X We first chunk the sentence into n-gram componentsincluding char-level 1-gram 2-gram and 3-gram which arethe inputs of our trained model written as follows

e jxi=V j

embedding(xi) (1)

where V jembedding transforms the chunk with granularity j into

a vector space and xi is the i-th word in a sentence X Thusthe threat description sentence Xi can be vectorized as follows

rarrh j

i = LST M f orward([e jx0e j

x1 middot middot middot e j

xi])

larrh j

i = LST Mbackward([e jx0e j

x1 middot middot middot e j

xi])

(2)

whererarrh j

i andlarrh j

i are the embedded features learned byforward LSTM and backward LSTM respectively Let O bethe output of Bi-LSTM which is a weighted sum of embeddedfeatures with weights corresponding to the importance ofdifferent features

O = H middotW T (3)

where H =j

sum~βiσ(h

j1h

j2 middot middot middot h

ji ) h j

i = (rarrh j

i +larrh j

i ) ~βi is theweight vector to represent the importance of h j

i in whichj and i are the segmentation granularity of sentences and thecorresponding index of the chunk W is the parameter matrix

Given a security-related sentence X = (x1x2 middot middot middot xi) itscorresponding threat object sequence Y = (y1 y2 middot middot middot yi) andits output of Bi-LSTM O we can compute the overall labelscore of X and Y as follows

S(X Y ) =n

sumi=0

(Ayiyi+1 +Oiyi) (4)

where Ayiyi+1 is the state transition matrix in CRF model andOiyi as the output of Bi-LSTM hidden layer (calculated byEq (3)) represents the label score of i-th word corresponding

to the type yi Next we utilize so f tmax function to normalizethe overall label score

p(Y |X) =eS(X Y )

sumyisinYX

eS(X Y )(5)

We design an objective function to maximize the proba-bility p(Y |X) to achieve the highest label score for differentIOCs which can be written as follows

argmax log(p(Y |X)) = argmax (S(X Y )minus

log( sumyisinYX

eS(X y))) (6)

By solving the objective function above we assign correctlabels to the n-gram components according to which we canidentify the IOCs with different lengths Our multi-granularattention based IOC extraction method is capable of identify-ing different types of IOCs and its evaluation is presented inSection 5

42 Cyber Threat Intelligence ModelingCTI modeling is an important step to explore the intricaterelationship between heterogeneous IOCs In our work HINis introduced to group different types of IOCs into a graphto explore their interactive relationships In this section weportray the main principle of threat intelligence modeling

To model the intricate interdependent relationships amongIOCs we define the following 9 relationships among 6 typesof IOCs as follows

bull R1 To depict the relation of an attacker and the ex-ploited vulnerability we construct the attacker-exploit-vulnerability matrix A For each element Ai j isin 01Ai j=1 indicates attacker i exploits vulnerability j

bull R2 To depict the relation of an attacker and a devicewe build the attacker-invade-device matrix D For eachelement Di j isin 01 Di j=1 indicates attacker i invadesdevice j

bull R3 Two attacker can cooperate to attack a target Tostudy the relationship of attacker-attacker we constructthe attacker-cooperate-attacker matrix C For each ele-ment Ci j isin 01 Ci j=1 indicates there exists a cooper-ative relationship between attacker i and j

bull R4 To describe the relation of a vulnerability and theaffected device we build the vulnerability-affect-devicematrix M For each element Mi j isin 01 Mi j=1 indi-cates vulnerability i affects device j

bull R5 A vulnerability is often labeled as a specific attacktype by Common Vulnerabilities and Exposures (CVE)

246 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

system7 To explore the relation of vulnerability-attacktype we build the vulnerability-belong-attack type ma-trix G where each element Gi j isin 01 denotes if vul-nerability i belongs to an attack type j

bull R6 A vulnerability often involves one or more maliciousfiles To describe the relation of vulnerability-file webuild the vulnerability-include-file matrix F For eachelement Fi j isin 01 Fi j=1 denotes that vulnerability iincludes malicious file j

bull R7 A malicious file often targets a specific device Weestablish the file-target-device matrix T to explore therelation of file-device For each element Ti j isin 01Ti j=1 indicates malicious file i targets device j

bull R8 Oftentimes a vulnerability evolves from anotherTo study the relationship of vulnerability-vulnerabilitywe build the vulnerability-evolve-vulnerability matrixE where each element Ei j isin 01 indicates if vulnera-bility i evolves from vulnerability j

bull R9 To depict the relation device-platform that a de-vice belongs to a platform we build the device-belong-platform matrix P where each element Pi j isin 01 il-lustrates if device i belongs to platform j

Based on the above 9 types of relationships HINTIleverages the syntactic dependency parser [6] (eg subject-predicate-object attributive clause etc) to automatically ex-tract the 9 relationships among IOCs from threat descriptionseach of which is represented as a triple (IOCirelation IOC j)For instance given a security-related description ldquoOn May12 2017 WannaCry exploited the MS17-010 vulnerabilityto affect a larger number of Windows devices which is aransomware attack via encrypted disks Using the syntacticdependency parser we can extract the following triples (Wan-naCry exploit MS17-010) (MS17-010 affect Windows de-vice) (WannaCry is ransomware) Such triples extractedfrom various data sources can be incrementally assembledinto HINTI to model the relationships among IOCs whichcould offer a more comprehensive threat landscape that de-scribes the threat context Particularly we further define 17types of meta-paths shown in Table 1 to probe into the interde-pendent relationships over attackers vulnerabilities maliciousfiles attack types devices and platforms HINTI is able toconvey a richer context of threat events by scrutinizing 17types of meta-paths and reveal the in-depth threat insightsbehind the heterogeneous IOCs (see Section 6 for details)

43 Threat Intelligence ComputingIn this section we illustrate the concept of threat intelligencecomputing and design a general threat intelligence computing

7httpcvemitreorg

framework based on heterogeneous graph convolutional net-works which quantifies and measures the relevance betweenIOCs by analyzing meta-path based semantic similarity Herewe first provide a formal definition of threat intelligence com-puting based on heterogeneous graph convolutional networks

Definition 4 Threat Intelligence Computing Based onHeterogeneous Graph Convolutional Networks Given thethreat intelligence graph G = (VE) the meta-path set M =P1P2 middot middot middot Pi The threat intelligence computing i) com-putes the similarity between IOCs based on meta-path Pi togenerate corresponding adjacency matrix Ai ii) constructsthe feature matrix of nodes Xi by embedding attribute in-formation of IOCs into a latent vector space iii) conductsgraph convolution GCN(AiXi) to quantify the interdependentrelationships between IOCs by following meta-path Pi andembeds them into a low-dimensional space

The threat intelligence computing aims to model the seman-tic relationships between IOCs and measure their similaritybased on meta-paths which can be used for advanced secu-rity knowledge discovery such as threat object classificationthreat type matching threat evolution analysis etc Intuitivelythe objects connected by the most significant meta-paths tendto bear more similarity [37] In this paper we propose aweight-learning based threat intelligence similarity measurewhich uses self-attention to improve the performance of simi-larity measurement between any two IOCs This method canbe formalized as below

Definition 5 Weight-learning based Node Similar-ity Measure Given a set of symmetric meta-path setP = [Pm]

Mprime

m=1 the similarity S(hih j) between any two IOCshi and h j is defined as

S(hih j) =Mprime

summ

rarrw

2middot | hirarr j isin Pm || hirarri isin Pm |+ | h jrarr j isin Pm |

(7)

where hirarr j isin hm is a path instance between IOC hi andh j following meta-path Pm hirarri isin Pm is that between IOCinstance hi and hi and h jrarr j isin Pm is that between IOC in-stance h j and h j where | hirarr j isin Pm |=CPm(i j) | hirarri isinPm |=CPm(i i) | h jrarr j isin Pm |=CPm( j j) and CPm is thecommuting matrix based on meta-path Pm defined below

rarrw =

[w1 wm wMprime ] denote the meta-path weights wherewm is the weight of meta-paths Pm and M

primeis the number of

meta-pathsS(hih j) is defined in two parts (1) the semantic overlap

in the numerator which describes the number of meta-pathbetween IOC instance hi and h j (2) and the semantic broad-ness in the denominator which depicts the number of totalmeta-paths between themselves The larger number of meta-path between IOC instance hi and h j the more similar thetwo IOCs are which is normalized by the semantic broadness

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 247

of denominator Moreover different from existing similaritymeasures [37] we propose an attention mechanism basedsimilarity measure method by introducing the weight vec-torrarrw = [w1 wm wMprime ] which is a trainable coefficient

vector to learn the importance of different meta-paths forcharacterizing IOCs

Obviously it is computationally expensive to measure thesimilarity among IOCs in the constructed heterogeneousgraph as it usually requires to randomly walk a larger numberof nodes in the graph Fortunately in our work it is unneces-sary to walk through the entire graph as we prescribe a limitby introducing predefined meta-paths and we only focus onthe symmetrical meta-paths presented in Table 1 To calcu-late the similarity between IOCs under different meta-pathinstances we need to compute the corresponding commutingmatrices [37] following the meta-paths

Given a meta-path set P = sumMprime

m A1A2 middot middot middot Al+1 themeta-path based commuting matrix can be defined as CP =UA1A2 UA2A3 middot middot middot UAAl+1 where CP(i j) represents the prob-ability of object iisinA1 reaching object j isinAl+1 under the pathP and is a connection operation These symmetric meta-paths not only efficiently reduce the complexity of walkingbut also ensures that the commuting matrix can be easily de-composed which greatly reduces the computational costs Inaddition the symmetric meta-paths in the graph G allow us toleverage the pairwise random-walk [37] to further acceleratethe computation

With Eq (7) and pairwise random-walk we can obtain thesimilarity embedding between any two IOCs hi and h j undera meta-path set P Based on the low-dimensional similarityembedding we derive a weighted adjacent matrix betweenIOCs denoted as Ai isin RNtimesN where N is the number of aspecific type of IOC in G Meanwhile to utilize the attributedinformation of nodes we train a word2vec model [24] toembed the attribute information of nodes into a feature matrixXi isin RNtimesd where N is the number of IOCs in Ai and d is thedimension of node feature With the learned adjacency matrixAi and its feature matrix Xi we can leverage the classicalGCN [18] to characterize the relationship between IOC hiand h j Particularly the layer-wise propagation rule of GCNcan be defined as below

H(l+1) = σ(Dminus12 ADminus

12 H(l)W (l)) (8)

where A = A+ IN is the adjacency matrix of IOCs with self-connections IN is the identity matrix Dii =sum j Ai j and W (l) isa l-th layer trainable weight matrix σ(middot) denotes an activationfunction such as relu H(l) isin RNtimesd is the matrix of activationin the l-th layer We perform graph convolution [18] on Aiand Xi to generate the embedding Z between IOCs belongingto type i

Z = f (XiAi) = σ(Ai middot relu(AiXiW(0)i )W (1)

i ) (9)

where W (0)i isin RdtimesH is an input-to-hidden weight matrix for a

hidden layer with H feature mapsW (1)i isinRHtimesF is a hidden-to-

output weight matrix with F feature maps in the output layerXi isin RNtimesd N is the number of a specific type of IOCs d is thedimension of their corresponding features and σ is anotheractivation function such as sigmoid Ai = Dminus

12 AiDminus

12 can be

calculated offline Here we leverage the cross-entropy loss tooptimize the performance of our proposed threat intelligenceframework written as follows

Loss(Yl f Zl f ) =minussumlisinYl

F

sumf=1

Yl f middot lnZl f (10)

where Yl is the set of node indices that have labels Yl f is thereal label and Zl f is a corresponding label that our modelpredicts Based on Eq (10) we conduct stochastic gradientdescent to continuously optimize the neural network weightsW (0)

i W (1)i and

rarrw to reduce the loss and build a general

threat intelligence computing framework Using this frame-work security organizations are able to mine richer securityknowledge hidden in the interdependent relationships amongIOCs

5 Experimental Evaluation

51 Dataset and SettingsWe develop a threat data collector to automatically collectcyber threat data from a set of sources including 73 inter-national security blogs (eg fireeye cloudflare) hacker fo-rum posts (eg Blackhat Hack5) security bulletins (egMicrosoft Cisco) CVE detail description and ExploitDB Acomplete list of data sources is presented in the Baidu cloud8We set up a daemon to collect the newly generated securityevents every day So far more than 245786 security-relateddata describing threat events have been collected For trainingand evaluating our proposed IOC extraction method 30000samples from 5000 texts are annotated by utilizing the B-I-Osequence tagging method (see Section 22 for the example)and an annotation example is shown in Figure 2

For 30000 labeled samples we randomly select 60 ofsamples as a training set 20 of samples as a verificationset and the rest of the samples as our test set Based on thedata sets we comprehensively evaluate the performance ofHINTI for extracting IOCs and threat intelligence computingWe run all of the experiments on 16 cores Intel(R) Core(TM)i7-6700 CPU 340GHz with 64GB RAM and 4times NVIDIATesla K80 GPU The software programs are executed on theTensorFlow-GPU framework on Ubuntu 1604

52 Evaluation of IOC ExtractionA set of experiments are conducted to evaluate the sensitiv-ity of different parameters in the multi-granular based IOC

8httpspanbaiducoms1J631WMYY_T_awa8aY5xy3A

248 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

extraction model We mainly consider 8 hyper-parametersthat seriously impact the performance of the model as shownin Table 2 More specifically Embedding_dim is one ofthe most important factors that determine the generaliza-tion capability of the model Here we fix other parame-ters while fine-tuning the embedding size in the range of(50100150200250300350400) Experimental resultsshow that the accuracy of extracted IOC achieves the bestwhen Embedding_dim=300 Learning_rate is another ma-jor factor for determining the stride of gradient descent inminimizing the loss function which determines whetherthe model can find a global optimal solution We fix otherparameters to fine-tune the Learning_rate in the rangeof (000100050010050105) and the performancereaches the best when the Learning_rate=0001 Similarlywe fine-tune the other hyper-parameters with 5000 epochsand the hyper-parameters allowing our model to perform op-timally are recorded in Table 2

Table 2 Hyperparameters setting in the multi-granular basedIOC extraction method

Parameter value Parameter Value

Embedding_dim 300 Hidden_dim 128

Sequence_length 500 Epoch_num 5000

Learning_rate 0001 Batch_size 64

Dropout_rate 05 Optimizer Adam

Table 3 Performance of IOC extraction wrt IOC types

IOC Type Precision Recall Micro-F1

IP 9956 9952 9954File 9436 9688 9560Type 9986 9981 9983

Email 9932 9987 9949Device 9326 9278 9302Vender 9307 9445 9424Version 9698 9799 9748Domain 9658 9589 9623Software 8825 8931 8878Function 9503 9559 9531Platform 9431 9257 9343Malware 8976 9123 9049

Vulnerability 9925 9873 9899Other 9829 9842 9835

In this paper we extract 13 major types of IOCs and the

Figure 7 Performance of IOC extraction using embeddingfeatures with different granularity

performance is presented in Table 3 Overall our IOC ex-traction method demonstrates excellent performance in termsof precision recall and Micro-F1 (ie micro-averaged F1-score) for most types of IOCs such as function maliciousIP and device However we observe a performance degra-dation when recognizing software and malware This can beattributed to the fact that most software and malware is namedby random strings such as md5 hash Moreover we find thatthe number of training samples impacts the performance ofthe model Specifically the performance becomes unsatisfac-tory (eg Software Malware) when the number of a certaintype of training samples is insufficient (ie less than 5000)

In order to verify the effectiveness of multi-granular embed-ding features we assess the performance of IOC extractionwith features of different granularity including char-level1-gram 2-gram 3-gram and multi-granular features The ex-perimental results are demonstrated in Figure 7 from whichwe can observe that the proposed multi-granular embeddingfeature outperforms others since it leverages the attentionmechanism to simultaneously learn multi-granular features tocharacterize different patterns of IOCs

Next to verify the effectiveness of the proposed IOC ex-traction method we compare it with the state-of-the-art en-tity recognition approaches including general NER toolsNLTK NER9 and Stanford NER10 professional IOC extrac-tion method Stucco [16] and iACE [22] and popular en-tity recognition approaches CRF [21] BiLSTM and BiL-STM+CRF [15] The experimental results of different meth-ods on real-world data are demonstrated in Table 4

The results indicate that our proposed IOC extraction out-performs the state-of-the-art entity recognition methods andtools in terms of precision recall and Micro-F1 and its im-provement can be attributed to the following factors Firstcompared with Standford NER and NLTK NER the NLP tools

9httpwwwnltkorgbookch07html10httpsstanfordnlpgithubioCoreNLPnerhtml

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 249

Table 4 Performance of threat entity recognition usingdifferent methods

Method Accuracy Precision Micro-F1

NLTK NER 6945 6851 6749

Stanford NER 6835 6674 6858

iACE 9214 9126 9225

Stucco 9116 9221 9147

CRF 9264 9180 9265

BiLSTM 9478 9521 9435

BiLSTM+CRF 9638 9642 9627

Multi-granular 9859 9872 9869

trained with general news corpora our method uses a security-related training corpus collected and labeled by ourselves asa data source for training our model Second different fromthe rule-based extraction approaches (eg iACE and Stucco)our proposed deep learning based method provides an end-to-end system with more advanced features to represent variousIOCs Third comparing to RNN-based methods (eg BiL-STM and BiLSTM+CRF) our proposed method brings inmulti-granular embedding sizes (char-level 1-gram 2-gramand 3-gram) to simultaneously learn the characteristics of vari-ous sizes and types of IOCs which can identify more complexand irregular IOCs Last but not the least our method imple-ments an attention mechanism to learn the weights of featureswith various scales to effectively characterize different typesof IOCs further enhancing the IOC recognition accuracy

6 Application of Threat Intelligence Comput-ing

Our proposed threat intelligence computing framework basedon heterogeneous graph convolutional networks can be usedto mine novel security knowledge behind heterogeneous IOCsIn this section we evaluate its effectiveness and applicabilityusing three real-world applications profiling and ranking forCTIs attack preference modeling and vulnerability similarityanalysis

61 Threat Profiling and Significance Rankingof IOCs

Due to the disparity in the significance of threats it is im-portant to derive the threat profile and rank the significanceof IOCs for demystifying the landscape of threats Howevermost of the existing CTIs are incapable of modeling the asso-ciated relationships between heterogeneous IOCs

Different from isolated CTIs HINTI leverages HIN to

model the interdependent relationships among IOCs withtwo characteristics first the isolated IOCs can be integratedinto a graph-based HIN to clearly display the associated rela-tionships among IOCs which is capable of directly depictingthe basic threat profile For example Figure 3 depicts a threatprofiling sample an attacker utilizes CVE-2017-0143 vulnera-bility to invade Vista SP2 and Win7 SP1 devices belonging tothe Microsoft platform and CVE-2017-0143 is a remote codeexecution vulnerability that uses a ldquoSMBbat malicious fileSecond the significance of IOCs in HINTI can be naturallyranked based on the proposed threat intelligence computingframework

Table 5 shows the top 5 authoritative ranking score [35] ofvulnerability attacker attack type and platform from whichsecurity experts can gain a clear insight into the impact ofeach IOC Degree centrality [33] which describes the numberof links incident upon a node is widely used in evaluatingthe importance of a node in a graph It can used to quantifythe immediate risk of a node that connects with other nodesfor delivering network flows such as virus spreading Heredegree centrality can be utilized in verifying the effectivenessof the proposed threat intelligence computing framework inranking the importance of IOCs It is worth noting that bothour ranking method and degree centrality work regardless ofthe time of attacks We compute the degree centrality rank-ing of IOCs based on the fact that the node with a higherdegree centrality is more important than a node with a lowerone For instance if the degree centrality of a vulnerabilityis higher it indicates that this vulnerability is exploited bymore attackers or it affects more devices The ranking resultof degree centrality shown in Table 5 is consistent with theranking result based on the proposed threat intelligence com-puting framework demonstrating the capability of the CTIcomputing framework in ranking the importance of differenttypes of IOCs

62 Attack Preference Modeling

Attack preference modeling is meaningful for security organi-zations to gain insight into the attack intention of attackersbuild attack portraits and develop personalized defense strate-gies Here we leverage HINTI to integrate different types ofIOCs and their interdependent relationships to comprehen-sively depict the picture of attack events which helps modelthe attack preferences With the proposed threat intelligencecomputing framework we model attack preferences by clus-tering the embedded attackersrsquo vectors

In this task each malicious IP address is treated as an in-truder and its attack preferences are mainly reflected in threefeatures including the platforms it destroys (including Win-dows Linux Unix ASP Android Apache etc) the industriesit invades (eg education finance government Internet ofThings and Industrial control system etc) and the exploittypes it employs (eg DOS Buffer overflow Execute code

250 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

Table 5 The significance ranking of different types of IOCs (CV E1 CVE-2017-0146 CV E2 CVE-2006-5911 CV E3 CVE-2008-6543 CV E4 CVE-2012-1199 CV E4 CVE-2006-4985 AR Authoritative Ranking DC Degree Centrality value)

Vulnerability Attacker Platform Attack Type

No AR DC Monicker AR DC Category AR DC Exploit_type AR DC

CV E1 02713 7643 Meatsploit 02764 549 PHP 04562 17865 Webapps 05494 11648

CV E2 02431 7124 GSR team 01391 327 windows 02242 13793 DOS 01772 8741

CV E3 02132 6833 Ihsan 00698 279 Linux 00736 8792 Overflow 01533 7652

CV E4 01826 6145 Techsa 00695 247 Linux86 00623 8147 CSRF 00966 5433

CV E5 01739 5637 Aurimma 00622 204 ASP 00382 5027 SQL 00251 2171

(a) AV DV T AT (P13) (b) AV DPDTV T AT (P17) (c) AV PV T AT (P14)

Figure 8 The performance of attack preference modeling with different meta-paths in which the preference of attacker i isreduced to a two-dimensional space (xiyi) and each cluster represents a group with a specific attack preference

Sql injection XSS Gain information etc)Specifically we first utilize our proposed threat intelligence

computing framework to embed each attacker into a low-dimensional vector space and then perform DBSCAN algo-rithm on the embedded vector to cluster attackers with thesame preferences into corresponding groups Figure 8 showsthe top 3 clustering results under different types of meta-paths in which the meta-path AV DPDTV T AT (P17) performsthe best performance with compact and well-separated clus-ters indicating that it contains richer semantic relationshipsfor characterizing attack preferences than other meta-paths

To verify the effectiveness of attack preference modelingwe identify 5297 distinct attackers (each unique IP address istreated as an attacker) who have submitted at least 10 cyberattacks For these attackers five cybersecurity researchersconsisting of three doctoral and two master students spentabout fortnight to manually annotate their attack preferencesfrom three perspectives the platforms they destroyed the in-dustries they attacked and the attack types they exploited Toensure the accuracy of data labeling we test the consistencyof the tags for the 5297 attackers and remove the sampleswith ambiguous tags As a result we obtain 3000 sampleswith consistent tags Based on the labeled samples we furtherevaluate the performance of different meta-paths on model-

Table 6 Performance of modeling attack preference withdifferent meta-paths

Metapath Accuracy Precision Micro-F1

P1 7431 7622 7525

P4 7116 7327 7216

P5 6915 7143 7027

P12 7214 7646 7424

P13 7965 8131 8047

P14 7748 7934 7840

P15 8017 7976 7996

P17 8139 8172 8155

ing attack preferences In the attack modeling scenario weonly focus on the meta-paths that both the start node and theend node are attackers The experimental results are demon-strated in Table 6 Obviously different meta-paths displaydifferent abilities in characterizing the attack preferences ofcyber intruders The performance when using P17 is more

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 251

superior than the one with other meta-paths which indicatesthat P17 holds more valuable information that characterizesthe attack preferences of cybercriminals since P17 includesthe semantics information of P1P4P5 and P12 sim P15

In addition we compare the capabilities of our proposedcomputing framework with those of other state-of-the-art em-bedding methods in terms of attack preference modelingOur analysis result shows that the accuracy of attack pref-erence modeling reaches 081 which outperforms the exist-ing popular models Node2vec (with precision of 071) [1]metapaht2vec (with precision of 073) [11] and HAN (withprecision of 076) [42] The performance improvement canbe attributed to the following characteristics First our com-puting framework utilizes weight-learning to learn the signifi-cance of different meta-paths for evaluating the similarity be-tween attackers Second the proposed computing frameworkleverages GCN to learn the structural information betweenattackers to obtain more discriminative structural features thatimproves the performance of attack preference modeling

63 Vulnerability Similarity AnalysisVulnerability classification or clustering is crucial for conduct-ing vulnerability trend analysis the correlation analysis ofincidents and exploits and the evaluation of countermeasuresThe traditional vulnerability analysis relies on the manualinvestigation of the source codes which requires expert ex-pertise and consumes considerable efforts In this sectionwe propose an unsupervised vulnerability similarity analy-sis method based on the proposed threat intelligence com-puting framework which can automatically group similarvulnerabilities into corresponding communities Particularlythe vulnerability-related IOCs are first embedded into a low-dimensional vector space using CTI computing frameworkThen the DBSCAN algorithm is performed on the embed-ded vector space to cluster vulnerabilities into correspondingcommunities The clustering results are presented in Figure 9

Figure 9 (c) shows all vulnerabilities are clustered into 12clusters using meta-path V DPDTV T (P16) which is very closeto the classification standard (ie 13) recommended by CVEdetails an authoritative database that publishes vulnerabilityinformation By manually analyzing the training samples wefind that HTTP Response Splitting vulnerability does not ap-pear in our dataset Therefore our cluster number (ie 12)is consistent with CVE Details11 To further validate the ef-fectiveness of threat intelligence computing framework forvulnerability clustering we randomly select 100 vulnerabili-ties from each cluster for manual inspection to measure theconsistency of the vulnerability types in each cluster and theresults are presented in Table 7 Obviously the clustering per-formance of cluster 8 (ie File Inclusion) and cluster 10 (ieDirectory Traversal) is remarkably worse than other clustersTo explain the reason we examine our training data and the

11httpswwwcvedetailscom

Table 7 Accuracy of vulnerability clustering

Cluster ID Vulnerability type Accuracy

cluster1 Denial of Service 8012

cluster2 XSS 8353

cluster3 Execute Code 8150

cluster4 Overflow 7650

cluster5 Gain Privilege 9156

cluster6 Bypass Something 7174

cluster7 CSRF 9327

cluster8 File Inclusion 6172

cluster9 Gain Informa 7042

cluster10 Directory Traversal 6949

cluster11 Memory Corruption 8156

cluster12 SQL Injection 8067

average 7851

computing framework We found that the proportion of thesetwo types of vulnerabilities is too small (cluster 8 is 34and cluster 10 is 42) making our computing frameworkvery likely to be under-fit with insufficient data Howeverthe proposed computing framework performs well on mosttypes of vulnerabilities in an unsupervised manner especiallygiven sufficient samples (eg cluster 7 is 176 and cluster 5is 157)

In addition by examining the clustering results we havean observation that the vulnerabilities in the same cluster arelikely to have evolutionary relationships For instance CVE-2018-0802 an office zero-day vulnerability is evolved fromthe CVE-2017-11882 They both include EQNEDT32exe fileused to edit the formula in Office software which allowsremote attackers to execute arbitrary codes by constructinga malformed font name The modeling and computation ofinterdependent relationships among IOCs in HINTI facilitatethe discovery of such intricate connections between vulnera-bilities

In summary HINTI is capable of depicting a more compre-hensive threat landscape and the proposed CTI computingframework has the ability to bring novel security insightstoward different real-world security applications Howeverthere are still numerous opportunities for enhancing thesesecurity applications Specifically for attack preference mod-eling although each individual IP address is treated as anattacker we cannot determine whether it belongs to a realattacker or is disguised by a proxy Fortunately even if thereal attack address cannot be captured understanding the at-tack preferences of these IP proxies which are widely usedin cybercrime is also meaningful for gaining insight into thecyber threats For vulnerability similarity analysis data imbal-

252 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

(a) AV DV T AT (P13) (b) V DV T (P10) (c) V DPDTV T (P16)

Figure 9 Illustration of the vulnerability similarity analysis based on different meta-paths in which vulnerability i can be reducedinto a two-dimensional space (xiyi) and each cluster indicates a particular type of vulnerability

ance issue affects the performance of model and inadequatetraining samples often result in model underfitting as shownin the case of cluster 8 and cluster 10

7 Related Work

Cyber Treat Intelligence An increasing number of securityvendors and researchers start exploring CTI for protectingsystem security and defending against new threat vectors [28]Existing CTI extraction tools such as IBM X-Force12 Threatcrowd13 Openctiio14 AlienVault15 CleanMX16 and Phish-Tank17 use regular expression to synthesize IOC from thedescriptive texts However these methods often producehigh false positive rate by misjudging legitimate entities asIOCs [22]

Recently Balzarotti et al [2] develop a system to extractIOCs from web pages and identify malicious URLs fromJavaScript codes Sabottke et al [31] propose to detect po-tential vulnerability exploits by extracting and analyzing thetweets that contain ldquoCVErdquo keyword Liao et al [22] present atool iACE for automatically extracting IOCs which excelsat processing technology articles Nevertheless iACE identi-fies IOCs from a single article which does not consider therich semantic information from multi-source texts Zhao etal [46] define different ontologies to describe the relationshipbetween entities based on expert knowledge Numerous popu-lar CTI platforms including IODEF [9] STIX [3] TAXII [40]OpenIOC [13] and CyBox [19] focus on extracting and shar-ing CTI Yet none of the existing approaches could uncoverthe interdependent relations among CTIs extracted from multi-source texts let alone quantifying CTIsrsquo relevance and miningvaluable threat intelligence hidden behind the isolated CTIs

12httpsexchangexforceibmcloudcom13httpswwwthreatcrowdorg14httpsdemoopenctiio15 httpsotxalienvaultcom16httplistclean-mxcom17httpswwwphishtankcom

Heterogeneous Information Network Real-world systemsoften contain a large number of interacting multi-typed ob-jects which can naturally be expressed as a heterogeneousinformation network (HIN) HIN as a conceptual graph repre-sentation can effectively fuse information and exploit richersemantics in interacting objects and links [37] HIN has beenapplied to network traffic analysis [38] public social mediadata analysis [45] and large-scale document analysis [41]Recent applications of HIN include mobile malware detec-tion [14] and opioid user identification [12] In this paper forthe first time we use HIN for CTI modelingGraph Convolutional Network Graph convolutional net-works (GCN) [17] has become an effective tool for address-ing the task of machine learning on graphs such as semi-supervised node classification [17] event classification [29]clustering [8] link prediction [27] and recommended sys-tem [44] Given a graph GCN can directly conduct the con-volutional operation on the graph to learn the nonlinear em-bedding of nodes In our work to discern and reveal the inter-active relationships between IOCs we utilize GCN to learnmore discriminative representation from attributes and graphstructure simultaneously which is the premise for threat intel-ligence computing

8 Discussion

Data Availability The proposed framework assumes thatsufficient threat description data can be obtained for generat-ing comprehensive and the latest CTIs Fortunately with thegrowing prosperity of social media an increasing number ofsecurity-related data (eg blogs posts news and open secu-rity databases) can be collected effortlessly To automaticallycollect security-related data we develop a crawler system tocollect threat description data from 73 international securitysources (eg blogs hacker forum posts security bulletinsetc) providing sufficient raw materials for generating cyberthreat intelligenceModel Extensibility In this paper 6 types of IOCs and 9

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 253

types of relationships are modeled in HINTI However ourproposed framework is extensible in which more types ofIOCs and relationships can be enrolled to represent richer andmore comprehensive threat information such as maliciousdomains phishing Emails attack tools their interactions etcHigh-level Semantic Relations In view of the computa-tional complexity of the model our threat intelligence comput-ing method focues on utilizing the meta-paths to quantify thesimilarity between IOCs while ignoring the influence of themeta-graph on it which inevitably misses characterizing somehigh-level semantic information Nevertheless the proposedcomputing framework introduces an attention mechanism tolearn the signification of different meta-paths to character-ize IOCs and their interactive relationships which effectivelycompensates for the performance degradation caused by ig-noring the meta-graphsSecurity Knowledge Reasoning Although our proposedframework exhibits promising results in CTI extraction andmodeling computing how to implement advanced securityknowledge reasoning and prediction is still an open problemeg it remains challenging to predict whether a vulnerabilitycould potentially affect a particular type of devices in thefuture We will investigate this problem in the future

9 Conclusion

This work explores a new direction of threat intelligence com-puting which aims to uncover new knowledge in the relation-ships among different threat vectors We propose HINTI acyber threat intelligence framework to model and quantify theinterdependent relationships among different types of IOCsby leveraging heterogeneous graph convolutional networksWe develop a multi-granular attention mechanism to learnthe importance of different features and model the interde-pendent relationships among IOCs using HIN We proposethe concept of threat intelligence computing and present ageneral intelligence computing framework based on graphconvolutional networks Experimental results demonstratethat the proposed multi-granular attention based IOC extrac-tion method outperforms the existing state-of-the-art methodsThe proposed threat intelligence computing framework caneffectively mine security knowledge hidden in the interdepen-dent relationships among IOCs which enables crucial threatintelligence applications such as threat profiling and rankingattack preference modeling and vulnerability similarity analy-sis We would like to emphasize that the knowledge discoveryamong interdependent CTIs is a new field that calls for acollaborative effort from security experts and data scientists

In future we plan to develop a predicative and reasoningmodel based on HINTI and explore preventative countermea-sures to protect cyber infrastructure from future threats Wealso plan to add more types of IOCs and relations to depicta more comprehensive threat landscape Moreover we willleverage both meta-paths and meta-graphs to characterize the

IOCs and their interactions to further improve the embeddingperformance and to strike a balance between the accuracyand computational complexity of the model We will alsoinvestigate the feasibility of security knowledge predictionbased on HINTI to infer the potential future relationshipsbetween the vulnerabilities and devices

Acknowledgement

We would like to thank our shepherd Tobias Fiebig andthe anonymous reviewers for providing valuable feedbackon our work We also thank Hao Peng and Lichao Sunfor their feedback on the early version of this work Thiswork was supported in part by National Science Founda-tion grants CNS1950171 CNS-1949753 It was also sup-ported by the NSFC for Innovative Research Group ScienceFund Project (62141003) National Key RampD Program China(2018YFB0803 503) the 2018 joint Research Foundationof Ministry of Education China Mobile (MCM20180507)and the Opening Project of Shanghai Trusted Industrial Con-trol Platform (TICPSH202003020-ZC) Any opinions find-ings and conclusions or recommendations expressed in thismaterial do not necessarily reflect the views of any fundingagencies

References

[1] Jure Leskovec Aditya Grover node2vec Scalable fea-ture learning for networks In Acm Sigkdd InternationalConference on Knowledge Discovery Data Mining2016

[2] Marco Balduzzi Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications In WWW 2016

[3] Sean Barnum Standardizing cyber threat intelligenceinformation with the structured threat information ex-pression (stix) Mitre Corporation 111ndash22 2012

[4] Eric W Burger Michael D Goodman Panos Kam-panakis and Kevin A Zhu Taxonomy model for cyberthreat intelligence information exchange technologiesIn Proceedings of the 2014 ACM Workshop on Infor-mation Sharing amp Collaborative Security pages 51ndash602014

[5] Onur Catakoglu Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications The web conference pages 333ndash3432016

[6] Danqi Chen and Christopher D Manning A fast and ac-curate dependency parser using neural networks pages740ndash750 2014

254 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

[7] Qian Chen and Robert A Bridges Automated behav-ioral analysis of malware a case study of wannacry ran-somware In 16th IEEE ICMLA pages 454ndash460 2017

[8] Weilin Chiang Xuanqing Liu Si Si Yang Li SamyBengio and Chojui Hsieh Cluster-gcn An efficient al-gorithm for training deep and large graph convolutionalnetworks knowledge discovery and data mining pages257ndash266 2019

[9] Roman Danyliw Jan Meijer and Yuri Demchenko Theincident object description exchange format Interna-tional Journal of High Performance Computing Appli-cations 50701ndash92 2007

[10] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 2017

[11] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 ACM 2017

[12] Yujie Fan Yiming Zhang Yanfang Ye and Xin Li Au-tomatic opioid user detection from twitter Transductiveensemble built on different meta-graph based similari-ties over heterogeneous information network In IJCAIpages 3357ndash3363 2018

[13] Fireeye Openioc httpswwwfireeyecomblogthreat-research201310openioc-basicshtml Accessed January 20 2020

[14] Shifu Hou Yanfang Ye Yangqiu Song and Melih Ab-dulhayoglu Hindroid An intelligent android malwaredetection system based on structured heterogeneous in-formation network In Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Dis-covery and Data Mining pages 1507ndash1515 2017

[15] Zhiheng Huang Wei Xu and Kai Yu Bidirectional lstm-crf models for sequence tagging arXiv1508019912015

[16] Michael D Iannacone Shawn Bohn Grant NakamuraJohn Gerth Kelly MT Huffer Robert A Bridges Erik MFerragut and John R Goodall Developing an ontologyfor cyber security knowledge graphs CISR 1512 2015

[17] Thomas Kipf and Max Welling Semi-supervised clas-sification with graph convolutional networks arXivLearning 2016

[18] Thomas N Kipf and Max Welling Semi-supervisedclassification with graph convolutional networks arXivpreprint arXiv160902907 2016

[19] Tero Kokkonen Architecture for the cyber security situ-ational awareness system In Internet of Things SmartSpaces and Next Generation Networks and Systemspages 294ndash302 Springer 2016

[20] Mehmet Necip Kurt Yasin Yılmaz and Xiaodong WangDistributed quickest detection of cyber-attacks in smartgrid IEEE Transactions on Information Forensics andSecurity 13(8)2015ndash2030 2018

[21] John Lafferty Andrew Mccallum and Fernando PereiraConditional random fields Probabilistic models for seg-menting and labeling sequence data ICML pages 282ndash289 2001

[22] Xiaojing Liao Yuan Kan Xiao Feng Wang Li Zhouand Raheem Beyah Acing the ioc game Toward auto-matic discovery and analysis of open-source cyber threatintelligence In ACM Sigsac Conference on ComputerCommunications Security 2016

[23] Rob Mcmillan Open threat intelligencehttpwwwgartnercomdoc2487216definition-threat-intelligence AccessedJanuary 20 2020

[24] Tomas Mikolov Kai Chen Greg Corrado and JeffreyDean Efficient estimation of word representations invector space arXiv preprint arXiv13013781 2013

[25] Sudip Mittal Prajit Kumar Das Varish Mulwad Anu-pam Joshi and Tim Finin Cybertwitter Using twitterto generate alerts for cybersecurity threats and vulnera-bilities In Proceedings of the 2016 IEEE Advances inSocial Networks Analysis and Mining pages 860ndash8672016

[26] Eric Nunes Ahmad Diab Andrew Gunn EricssonMarin Vineet Mishra Vivin Paliath John RobertsonJana Shakarian Amanda Thart and Paulo ShakarianDarknet and deepnet mining for proactive cybersecuritythreat intelligence In 2016 IEEE ISI pages 7ndash12 2016

[27] Shirui Pan Ruiqi Hu Guodong Long Jing Jiang LinaYao and Chengqi Zhang Adversarially regularizedgraph autoencoder for graph embedding pages 2609ndash2615 2018

[28] P Pawlinski P Jaroszewski P Kijewski L SiewierskiP Jacewicz P Zielony and R Zuber Actionable infor-mation for security incident response European UnionAgency for Network and Information Security Herak-lion Greece 2014

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 255

[29] Hao Peng Jianxin Li Qiran Gong Yangqiu SongYuanxing Ning Kunfeng Lai and Philip S Yu Fine-grained event categorization with heterogeneous graphconvolutional networks In Proceedings of the 28th In-ternational Joint Conference on Artificial Intelligencepages 3238ndash3245 AAAI Press 2019

[30] Sara Qamar Zahid Anwar Mohammad Ashiqur Rah-man Ehab Al-Shaer and Bei-Tseng Chu Data-drivenanalytics for cyber-threat intelligence and informationsharing Computers amp Security 6735ndash58 2017

[31] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In USENIXSecurity 2015

[32] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In 24thUSENIX Security pages 1041ndash1056 2015

[33] Deepak Sharma and Avadhesha Surolia Degree Cen-trality Springer New York 2013

[34] Saurabh Singh Pradip Kumar Sharma Seo Yeon MoonDaesung Moon and Jong Hyuk Park A comprehensivestudy on apt attacks and countermeasures for futurenetworks and communications challenges and solutionsJournal of Supercomputing pages 1ndash32 2016

[35] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks Principles and methodologiesSynthesis Lectures on Data Mining and Knowledge Dis-covery 3(2)1ndash159 2012

[36] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks a structural analysis approachACM SIGKDD 14(2)20ndash28 2013

[37] Yizhou Sun Jiawei Han Xifeng Yan Philip S Yu andTianyi Wu Pathsim Meta path-based top-k similaritysearch in heterogeneous information networks Proceed-ings of the VLDB Endowment 4(11)992ndash1003 2011

[38] Yizhou Sun Brandon Norick Jiawei Han Xifeng YanPhilip S Yu and Xiao Yu Pathselclus Integrating meta-path selection with user-guided object clustering in het-erogeneous information networks ACM Transactionson TKDD 7(3)11 2013

[39] Wiem Tounsi and Helmi Rais A survey on technicalthreat intelligence in the age of sophisticated cyber at-tacks Computers amp Security 72212ndash233 2018

[40] Thomas D Wagner Esther Palomar Khaled Mahbuband Ali E Abdallah Towards an anonymity supportedplatform for shared cyber threat intelligence risks andsecurity of internet and systems pages 175ndash183 2017

[41] Chenguang Wang Yangqiu Song Haoran Li MingZhang and Jiawei Han Text classification with het-erogeneous information network kernels In 13th AAAI2016

[42] Xiao Wang Houye Ji Chuan Shi Bai Wang Peng CuiP Yu and Yanfang Ye Heterogeneous graph attentionnetwork 2019

[43] Jie Yang Shuailong Liang and Yue Zhang Design chal-lenges and misconceptions in neural sequence labelingarXiv Computation and Language 2018

[44] Rex Ying Ruining He Kaifeng Chen Pong Eksombat-chai William L Hamilton and Jure Leskovec Graphconvolutional neural networks for web-scale recom-mender systems In Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Dis-covery amp Data Mining pages 974ndash983 ACM 2018

[45] Jiawei Zhang Xiangnan Kong and Philip S Yu Trans-ferring heterogeneous links across location-based socialnetworks In The 7th ACM international conference onWeb search and data mining pages 303ndash312 2014

[46] Yishuai Zhao Bo Lang and Ming Liu Ontology-basedunified model for heterogeneous threat intelligence inte-gration and sharing In 2017 11th IEEE InternationalConference on Anti-counterfeiting Security and Identi-fication (ASID) pages 11ndash15 2017

256 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

  • Introduction
  • Background
    • Cyber Threat Intelligence
    • Motivation
    • Preliminaries
      • Architecture Overview of HINTI
      • Methodology
        • Multi-granular Attention Based IOC Extraction
        • Cyber Threat Intelligence Modeling
        • Threat Intelligence Computing
          • Experimental Evaluation
            • Dataset and Settings
            • Evaluation of IOC Extraction
              • Application of Threat Intelligence Computing
                • Threat Profiling and Significance Ranking of IOCs
                • Attack Preference Modeling
                • Vulnerability Similarity Analysis
                  • Related Work
                  • Discussion
                  • Conclusion
Page 4: Cyber Threat Intelligence Modeling Based on Heterogeneous ... · 2.1Cyber Threat Intelligence Cyber Threat Intelligence (CTI) extracted from security-related data is structured information

(a) Network schema (b) Network instance

Figure 4 Network schema and instance of HIN containing 6 types of IOCs (a) The network schema of HIN which depicts

the relationship template among different types of IOCs such as Devicebelongminusminusminusrarr Plat f orm (b) An instance of network schema

which describes the concrete relationships between IOCs by following a network schema eg O f f ice 2012belongminusminusminusrarrWindows

the network schema [37] for describing the meta-level rela-tionships

Definition 2 Network Schema The network schema ofHINTI denoted as HS = (AR) is a meta template for G =(VET ) with the object type mapping ϕ VrarrM and the linktype mapping Φ ErarrR It is a directed graph of object typesM with edges representing relations from R

The schema of HINTI specifies type constraints on the sets ofIOCs and their relationships Figure 4 (a) shows the networkschema of HINTI which defines the relationship templatesamong IOCs to effectively guide the semantic exploration inHINTI For example for a relationship that describes ldquoat-tackers invade devices the semantic schema can be writtenas attacker invademinusminusminusrarrdevice Figure 4 (b) presents a concreteinstance of the network schema

Definition 3 Meta-path A meta-path [37] P is a path se-quence defined on a network schema S = (NR) and is repre-

sented in the form of N1R1minusrarrN2

R2minusrarr middotmiddot middot RiminusrarrNi+1 which definesa composite relation R = R1 R2 middot middot middot Ri+1 where denotesthe composition operator on relations A meta-path P is asymmetric path when the relation R defined by the path issymmetric (ie P is equal to Pminus1)

Table 1 displays the meta-paths considered in HINTI Forexample the relationship ldquothe attackers (A) exploit the samevulnerability (V) can be described by a length-2 meta-path

attackerexploitminusminusminusminusrarr vulnerability

exploitminusminusminusminusminusminusrarr attacker denoted asAVAT (P4) which means that the two attackers exploit thesame vulnerability Similarly AV DPDTV T AT (P17) portraysa close relationship between IOCs that ldquotwo attackers wholeverage the same vulnerability invade the same type of deviceand ultimately destroy the same type of platform

Table 1 Meta-paths used in HINTI

ID Meta-path

P1 Attacker-Attacker

P2 Device-Device

P3 Vulnerability-Vulnerability

P4 Attacker-Vulnerability-Attacker

P5 Attacker-Device-Attacker

P6 Device-File-Device

P7 Device-Platform-Device

P8 Vulnerability-File-Vulnerability

P9 Vulnerability-Type-Vulnerability

P10 Vulnerability-Device-Vulnerability

P11 Vulnerability-Platform-Vulnerability

P12 Attacker-Device-Platform-Device-Attacker

P13 Attacker-Vul-Device-Vul-Attacker

P14 Attacker-Vul-Platform-Vul-Attacker

P15 Attacker-Vul-Type-Vul-Attacker

P16 Vul-Device-Platform-Device-Vul

P17 Attacker-Vul-Device-Platform-Device-Vul-Attacker

3 Architecture Overview of HINTI

HINTI as a cyber threat intelligence extraction and comput-ing framework is capable of effectively extracting IOCs fromthreat-related descriptions and formalizing the relationshipsamong heterogeneous IOCs to demystify new threat insightsAs shown in Figure 5 HINTI consists of four major compo-nents including

bull Data Collection and IOC Recognition We first de-

244 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

Figure 5 The overall architecture of HINTI HINTI consists of four major components (a) collecting security-related data andextracting threat objects (ie IOCs) (b) modeling interdependent relationships among IOCs into a heterogeneous informationnetwork (c) embedding nodes into a low-dimensional vector space using weight-learning based similarity measure and (d)computing threat intelligence based on graph convolutional networks and knowledge mining

velop a data collection system to automatically capturesecurity-related data from blogs hacker forum posts se-curity news and security bulletins The system utilizesa breadth-first search to collect the HTML source codeand then leverages Xpath (XML Path language) to ex-tract threat-related descriptions After that we utilize amulti-granular attention based IOC recognition methodto extract IOC from the collected threat-related descrip-tions (see Section 41 for details)

bull Relation Extraction and IOC modeling HINTI ad-dresses the challenge of CTI modeling by leveragingheterogeneous information networks which can natu-rally depict the interdependent relationships betweenheterogeneous IOCs As an example Figure 4 shows amodel that capture the interactive relationships among at-tacker vulnerability malicious file attack type platformand device (see Section 42 for details)

bull Meta-path Design and Similarity Measure Meta-path is an effective tool to express the semantic rela-tions among IOCs in constructed HIN For instance

attackerexploitminusminusminusminusrarr vulnerability

exploitminusminusminusminusminusminusrarr attacker indi-cates that two attackers are related by exploiting the samevulnerability We design 17 types of meta-paths (SeeTable 1) to describe the interdependent relationships be-tween IOCs With these meta-paths we present a weight-learning based node similarity computing approach toquantify and embed the relationships as the premise forthreat intelligence computing

bull Threat Computing and Knowledge Mining In thiscomponent an effective threat intelligence computingframework is proposed which can quantify and measure

the relevance among IOCs by leveraging graph convolu-tional network (GCN) Our proposed threat intelligencecomputing framework could reveal richer security knowl-edge within a more comprehensive threat landscape

4 Methodology

41 Multi-granular Attention Based IOC Ex-traction

Extracting IOCs from multi-source threat texts is one of themajor tasks of threat intelligence analytics and the qualityof the extracted IOCs significantly influences the analysisresults of cyber threats Recently Bidirectional Long Short-Term Memory+Conditional Random Fields (BiLSTM+CRF)model [15] has demonstrated excellent performance in textchunking and Named-entity Recognition (NER) Howeverdirectly applying this model to IOC extraction is unlikely tosucceed since threat texts usually contain a large number ofthreat objects with different grams and irregular structuresConsequently we need an efficient method to learn the dis-criminative characteristics of IOCs with different sizes In thispaper we propose a multi-granular attention based IOC extrac-tion method which can extract threat objects with differentgranularity Particularly Figure 6 presents the proposed IOCextraction framework which leverages the multi-granular at-tention mechanism to characterize IOCs Different from thetraditional BiLSTM+CRF model we introduce new word-embedding features with different granularities to capture thecharacteristics of IOCs with different sizes Furthermore weutilize a self-attention mechanism to learn the importance ofthe features to improve the accuracy of IOC extraction

Our proposed method takes a threat description sentenceX = (x1x2 middot middot middot xi) as input where xi represents i-th word

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 245

Figure 6 The framework of multi-granular IOC extraction

in X We first chunk the sentence into n-gram componentsincluding char-level 1-gram 2-gram and 3-gram which arethe inputs of our trained model written as follows

e jxi=V j

embedding(xi) (1)

where V jembedding transforms the chunk with granularity j into

a vector space and xi is the i-th word in a sentence X Thusthe threat description sentence Xi can be vectorized as follows

rarrh j

i = LST M f orward([e jx0e j

x1 middot middot middot e j

xi])

larrh j

i = LST Mbackward([e jx0e j

x1 middot middot middot e j

xi])

(2)

whererarrh j

i andlarrh j

i are the embedded features learned byforward LSTM and backward LSTM respectively Let O bethe output of Bi-LSTM which is a weighted sum of embeddedfeatures with weights corresponding to the importance ofdifferent features

O = H middotW T (3)

where H =j

sum~βiσ(h

j1h

j2 middot middot middot h

ji ) h j

i = (rarrh j

i +larrh j

i ) ~βi is theweight vector to represent the importance of h j

i in whichj and i are the segmentation granularity of sentences and thecorresponding index of the chunk W is the parameter matrix

Given a security-related sentence X = (x1x2 middot middot middot xi) itscorresponding threat object sequence Y = (y1 y2 middot middot middot yi) andits output of Bi-LSTM O we can compute the overall labelscore of X and Y as follows

S(X Y ) =n

sumi=0

(Ayiyi+1 +Oiyi) (4)

where Ayiyi+1 is the state transition matrix in CRF model andOiyi as the output of Bi-LSTM hidden layer (calculated byEq (3)) represents the label score of i-th word corresponding

to the type yi Next we utilize so f tmax function to normalizethe overall label score

p(Y |X) =eS(X Y )

sumyisinYX

eS(X Y )(5)

We design an objective function to maximize the proba-bility p(Y |X) to achieve the highest label score for differentIOCs which can be written as follows

argmax log(p(Y |X)) = argmax (S(X Y )minus

log( sumyisinYX

eS(X y))) (6)

By solving the objective function above we assign correctlabels to the n-gram components according to which we canidentify the IOCs with different lengths Our multi-granularattention based IOC extraction method is capable of identify-ing different types of IOCs and its evaluation is presented inSection 5

42 Cyber Threat Intelligence ModelingCTI modeling is an important step to explore the intricaterelationship between heterogeneous IOCs In our work HINis introduced to group different types of IOCs into a graphto explore their interactive relationships In this section weportray the main principle of threat intelligence modeling

To model the intricate interdependent relationships amongIOCs we define the following 9 relationships among 6 typesof IOCs as follows

bull R1 To depict the relation of an attacker and the ex-ploited vulnerability we construct the attacker-exploit-vulnerability matrix A For each element Ai j isin 01Ai j=1 indicates attacker i exploits vulnerability j

bull R2 To depict the relation of an attacker and a devicewe build the attacker-invade-device matrix D For eachelement Di j isin 01 Di j=1 indicates attacker i invadesdevice j

bull R3 Two attacker can cooperate to attack a target Tostudy the relationship of attacker-attacker we constructthe attacker-cooperate-attacker matrix C For each ele-ment Ci j isin 01 Ci j=1 indicates there exists a cooper-ative relationship between attacker i and j

bull R4 To describe the relation of a vulnerability and theaffected device we build the vulnerability-affect-devicematrix M For each element Mi j isin 01 Mi j=1 indi-cates vulnerability i affects device j

bull R5 A vulnerability is often labeled as a specific attacktype by Common Vulnerabilities and Exposures (CVE)

246 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

system7 To explore the relation of vulnerability-attacktype we build the vulnerability-belong-attack type ma-trix G where each element Gi j isin 01 denotes if vul-nerability i belongs to an attack type j

bull R6 A vulnerability often involves one or more maliciousfiles To describe the relation of vulnerability-file webuild the vulnerability-include-file matrix F For eachelement Fi j isin 01 Fi j=1 denotes that vulnerability iincludes malicious file j

bull R7 A malicious file often targets a specific device Weestablish the file-target-device matrix T to explore therelation of file-device For each element Ti j isin 01Ti j=1 indicates malicious file i targets device j

bull R8 Oftentimes a vulnerability evolves from anotherTo study the relationship of vulnerability-vulnerabilitywe build the vulnerability-evolve-vulnerability matrixE where each element Ei j isin 01 indicates if vulnera-bility i evolves from vulnerability j

bull R9 To depict the relation device-platform that a de-vice belongs to a platform we build the device-belong-platform matrix P where each element Pi j isin 01 il-lustrates if device i belongs to platform j

Based on the above 9 types of relationships HINTIleverages the syntactic dependency parser [6] (eg subject-predicate-object attributive clause etc) to automatically ex-tract the 9 relationships among IOCs from threat descriptionseach of which is represented as a triple (IOCirelation IOC j)For instance given a security-related description ldquoOn May12 2017 WannaCry exploited the MS17-010 vulnerabilityto affect a larger number of Windows devices which is aransomware attack via encrypted disks Using the syntacticdependency parser we can extract the following triples (Wan-naCry exploit MS17-010) (MS17-010 affect Windows de-vice) (WannaCry is ransomware) Such triples extractedfrom various data sources can be incrementally assembledinto HINTI to model the relationships among IOCs whichcould offer a more comprehensive threat landscape that de-scribes the threat context Particularly we further define 17types of meta-paths shown in Table 1 to probe into the interde-pendent relationships over attackers vulnerabilities maliciousfiles attack types devices and platforms HINTI is able toconvey a richer context of threat events by scrutinizing 17types of meta-paths and reveal the in-depth threat insightsbehind the heterogeneous IOCs (see Section 6 for details)

43 Threat Intelligence ComputingIn this section we illustrate the concept of threat intelligencecomputing and design a general threat intelligence computing

7httpcvemitreorg

framework based on heterogeneous graph convolutional net-works which quantifies and measures the relevance betweenIOCs by analyzing meta-path based semantic similarity Herewe first provide a formal definition of threat intelligence com-puting based on heterogeneous graph convolutional networks

Definition 4 Threat Intelligence Computing Based onHeterogeneous Graph Convolutional Networks Given thethreat intelligence graph G = (VE) the meta-path set M =P1P2 middot middot middot Pi The threat intelligence computing i) com-putes the similarity between IOCs based on meta-path Pi togenerate corresponding adjacency matrix Ai ii) constructsthe feature matrix of nodes Xi by embedding attribute in-formation of IOCs into a latent vector space iii) conductsgraph convolution GCN(AiXi) to quantify the interdependentrelationships between IOCs by following meta-path Pi andembeds them into a low-dimensional space

The threat intelligence computing aims to model the seman-tic relationships between IOCs and measure their similaritybased on meta-paths which can be used for advanced secu-rity knowledge discovery such as threat object classificationthreat type matching threat evolution analysis etc Intuitivelythe objects connected by the most significant meta-paths tendto bear more similarity [37] In this paper we propose aweight-learning based threat intelligence similarity measurewhich uses self-attention to improve the performance of simi-larity measurement between any two IOCs This method canbe formalized as below

Definition 5 Weight-learning based Node Similar-ity Measure Given a set of symmetric meta-path setP = [Pm]

Mprime

m=1 the similarity S(hih j) between any two IOCshi and h j is defined as

S(hih j) =Mprime

summ

rarrw

2middot | hirarr j isin Pm || hirarri isin Pm |+ | h jrarr j isin Pm |

(7)

where hirarr j isin hm is a path instance between IOC hi andh j following meta-path Pm hirarri isin Pm is that between IOCinstance hi and hi and h jrarr j isin Pm is that between IOC in-stance h j and h j where | hirarr j isin Pm |=CPm(i j) | hirarri isinPm |=CPm(i i) | h jrarr j isin Pm |=CPm( j j) and CPm is thecommuting matrix based on meta-path Pm defined below

rarrw =

[w1 wm wMprime ] denote the meta-path weights wherewm is the weight of meta-paths Pm and M

primeis the number of

meta-pathsS(hih j) is defined in two parts (1) the semantic overlap

in the numerator which describes the number of meta-pathbetween IOC instance hi and h j (2) and the semantic broad-ness in the denominator which depicts the number of totalmeta-paths between themselves The larger number of meta-path between IOC instance hi and h j the more similar thetwo IOCs are which is normalized by the semantic broadness

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 247

of denominator Moreover different from existing similaritymeasures [37] we propose an attention mechanism basedsimilarity measure method by introducing the weight vec-torrarrw = [w1 wm wMprime ] which is a trainable coefficient

vector to learn the importance of different meta-paths forcharacterizing IOCs

Obviously it is computationally expensive to measure thesimilarity among IOCs in the constructed heterogeneousgraph as it usually requires to randomly walk a larger numberof nodes in the graph Fortunately in our work it is unneces-sary to walk through the entire graph as we prescribe a limitby introducing predefined meta-paths and we only focus onthe symmetrical meta-paths presented in Table 1 To calcu-late the similarity between IOCs under different meta-pathinstances we need to compute the corresponding commutingmatrices [37] following the meta-paths

Given a meta-path set P = sumMprime

m A1A2 middot middot middot Al+1 themeta-path based commuting matrix can be defined as CP =UA1A2 UA2A3 middot middot middot UAAl+1 where CP(i j) represents the prob-ability of object iisinA1 reaching object j isinAl+1 under the pathP and is a connection operation These symmetric meta-paths not only efficiently reduce the complexity of walkingbut also ensures that the commuting matrix can be easily de-composed which greatly reduces the computational costs Inaddition the symmetric meta-paths in the graph G allow us toleverage the pairwise random-walk [37] to further acceleratethe computation

With Eq (7) and pairwise random-walk we can obtain thesimilarity embedding between any two IOCs hi and h j undera meta-path set P Based on the low-dimensional similarityembedding we derive a weighted adjacent matrix betweenIOCs denoted as Ai isin RNtimesN where N is the number of aspecific type of IOC in G Meanwhile to utilize the attributedinformation of nodes we train a word2vec model [24] toembed the attribute information of nodes into a feature matrixXi isin RNtimesd where N is the number of IOCs in Ai and d is thedimension of node feature With the learned adjacency matrixAi and its feature matrix Xi we can leverage the classicalGCN [18] to characterize the relationship between IOC hiand h j Particularly the layer-wise propagation rule of GCNcan be defined as below

H(l+1) = σ(Dminus12 ADminus

12 H(l)W (l)) (8)

where A = A+ IN is the adjacency matrix of IOCs with self-connections IN is the identity matrix Dii =sum j Ai j and W (l) isa l-th layer trainable weight matrix σ(middot) denotes an activationfunction such as relu H(l) isin RNtimesd is the matrix of activationin the l-th layer We perform graph convolution [18] on Aiand Xi to generate the embedding Z between IOCs belongingto type i

Z = f (XiAi) = σ(Ai middot relu(AiXiW(0)i )W (1)

i ) (9)

where W (0)i isin RdtimesH is an input-to-hidden weight matrix for a

hidden layer with H feature mapsW (1)i isinRHtimesF is a hidden-to-

output weight matrix with F feature maps in the output layerXi isin RNtimesd N is the number of a specific type of IOCs d is thedimension of their corresponding features and σ is anotheractivation function such as sigmoid Ai = Dminus

12 AiDminus

12 can be

calculated offline Here we leverage the cross-entropy loss tooptimize the performance of our proposed threat intelligenceframework written as follows

Loss(Yl f Zl f ) =minussumlisinYl

F

sumf=1

Yl f middot lnZl f (10)

where Yl is the set of node indices that have labels Yl f is thereal label and Zl f is a corresponding label that our modelpredicts Based on Eq (10) we conduct stochastic gradientdescent to continuously optimize the neural network weightsW (0)

i W (1)i and

rarrw to reduce the loss and build a general

threat intelligence computing framework Using this frame-work security organizations are able to mine richer securityknowledge hidden in the interdependent relationships amongIOCs

5 Experimental Evaluation

51 Dataset and SettingsWe develop a threat data collector to automatically collectcyber threat data from a set of sources including 73 inter-national security blogs (eg fireeye cloudflare) hacker fo-rum posts (eg Blackhat Hack5) security bulletins (egMicrosoft Cisco) CVE detail description and ExploitDB Acomplete list of data sources is presented in the Baidu cloud8We set up a daemon to collect the newly generated securityevents every day So far more than 245786 security-relateddata describing threat events have been collected For trainingand evaluating our proposed IOC extraction method 30000samples from 5000 texts are annotated by utilizing the B-I-Osequence tagging method (see Section 22 for the example)and an annotation example is shown in Figure 2

For 30000 labeled samples we randomly select 60 ofsamples as a training set 20 of samples as a verificationset and the rest of the samples as our test set Based on thedata sets we comprehensively evaluate the performance ofHINTI for extracting IOCs and threat intelligence computingWe run all of the experiments on 16 cores Intel(R) Core(TM)i7-6700 CPU 340GHz with 64GB RAM and 4times NVIDIATesla K80 GPU The software programs are executed on theTensorFlow-GPU framework on Ubuntu 1604

52 Evaluation of IOC ExtractionA set of experiments are conducted to evaluate the sensitiv-ity of different parameters in the multi-granular based IOC

8httpspanbaiducoms1J631WMYY_T_awa8aY5xy3A

248 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

extraction model We mainly consider 8 hyper-parametersthat seriously impact the performance of the model as shownin Table 2 More specifically Embedding_dim is one ofthe most important factors that determine the generaliza-tion capability of the model Here we fix other parame-ters while fine-tuning the embedding size in the range of(50100150200250300350400) Experimental resultsshow that the accuracy of extracted IOC achieves the bestwhen Embedding_dim=300 Learning_rate is another ma-jor factor for determining the stride of gradient descent inminimizing the loss function which determines whetherthe model can find a global optimal solution We fix otherparameters to fine-tune the Learning_rate in the rangeof (000100050010050105) and the performancereaches the best when the Learning_rate=0001 Similarlywe fine-tune the other hyper-parameters with 5000 epochsand the hyper-parameters allowing our model to perform op-timally are recorded in Table 2

Table 2 Hyperparameters setting in the multi-granular basedIOC extraction method

Parameter value Parameter Value

Embedding_dim 300 Hidden_dim 128

Sequence_length 500 Epoch_num 5000

Learning_rate 0001 Batch_size 64

Dropout_rate 05 Optimizer Adam

Table 3 Performance of IOC extraction wrt IOC types

IOC Type Precision Recall Micro-F1

IP 9956 9952 9954File 9436 9688 9560Type 9986 9981 9983

Email 9932 9987 9949Device 9326 9278 9302Vender 9307 9445 9424Version 9698 9799 9748Domain 9658 9589 9623Software 8825 8931 8878Function 9503 9559 9531Platform 9431 9257 9343Malware 8976 9123 9049

Vulnerability 9925 9873 9899Other 9829 9842 9835

In this paper we extract 13 major types of IOCs and the

Figure 7 Performance of IOC extraction using embeddingfeatures with different granularity

performance is presented in Table 3 Overall our IOC ex-traction method demonstrates excellent performance in termsof precision recall and Micro-F1 (ie micro-averaged F1-score) for most types of IOCs such as function maliciousIP and device However we observe a performance degra-dation when recognizing software and malware This can beattributed to the fact that most software and malware is namedby random strings such as md5 hash Moreover we find thatthe number of training samples impacts the performance ofthe model Specifically the performance becomes unsatisfac-tory (eg Software Malware) when the number of a certaintype of training samples is insufficient (ie less than 5000)

In order to verify the effectiveness of multi-granular embed-ding features we assess the performance of IOC extractionwith features of different granularity including char-level1-gram 2-gram 3-gram and multi-granular features The ex-perimental results are demonstrated in Figure 7 from whichwe can observe that the proposed multi-granular embeddingfeature outperforms others since it leverages the attentionmechanism to simultaneously learn multi-granular features tocharacterize different patterns of IOCs

Next to verify the effectiveness of the proposed IOC ex-traction method we compare it with the state-of-the-art en-tity recognition approaches including general NER toolsNLTK NER9 and Stanford NER10 professional IOC extrac-tion method Stucco [16] and iACE [22] and popular en-tity recognition approaches CRF [21] BiLSTM and BiL-STM+CRF [15] The experimental results of different meth-ods on real-world data are demonstrated in Table 4

The results indicate that our proposed IOC extraction out-performs the state-of-the-art entity recognition methods andtools in terms of precision recall and Micro-F1 and its im-provement can be attributed to the following factors Firstcompared with Standford NER and NLTK NER the NLP tools

9httpwwwnltkorgbookch07html10httpsstanfordnlpgithubioCoreNLPnerhtml

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 249

Table 4 Performance of threat entity recognition usingdifferent methods

Method Accuracy Precision Micro-F1

NLTK NER 6945 6851 6749

Stanford NER 6835 6674 6858

iACE 9214 9126 9225

Stucco 9116 9221 9147

CRF 9264 9180 9265

BiLSTM 9478 9521 9435

BiLSTM+CRF 9638 9642 9627

Multi-granular 9859 9872 9869

trained with general news corpora our method uses a security-related training corpus collected and labeled by ourselves asa data source for training our model Second different fromthe rule-based extraction approaches (eg iACE and Stucco)our proposed deep learning based method provides an end-to-end system with more advanced features to represent variousIOCs Third comparing to RNN-based methods (eg BiL-STM and BiLSTM+CRF) our proposed method brings inmulti-granular embedding sizes (char-level 1-gram 2-gramand 3-gram) to simultaneously learn the characteristics of vari-ous sizes and types of IOCs which can identify more complexand irregular IOCs Last but not the least our method imple-ments an attention mechanism to learn the weights of featureswith various scales to effectively characterize different typesof IOCs further enhancing the IOC recognition accuracy

6 Application of Threat Intelligence Comput-ing

Our proposed threat intelligence computing framework basedon heterogeneous graph convolutional networks can be usedto mine novel security knowledge behind heterogeneous IOCsIn this section we evaluate its effectiveness and applicabilityusing three real-world applications profiling and ranking forCTIs attack preference modeling and vulnerability similarityanalysis

61 Threat Profiling and Significance Rankingof IOCs

Due to the disparity in the significance of threats it is im-portant to derive the threat profile and rank the significanceof IOCs for demystifying the landscape of threats Howevermost of the existing CTIs are incapable of modeling the asso-ciated relationships between heterogeneous IOCs

Different from isolated CTIs HINTI leverages HIN to

model the interdependent relationships among IOCs withtwo characteristics first the isolated IOCs can be integratedinto a graph-based HIN to clearly display the associated rela-tionships among IOCs which is capable of directly depictingthe basic threat profile For example Figure 3 depicts a threatprofiling sample an attacker utilizes CVE-2017-0143 vulnera-bility to invade Vista SP2 and Win7 SP1 devices belonging tothe Microsoft platform and CVE-2017-0143 is a remote codeexecution vulnerability that uses a ldquoSMBbat malicious fileSecond the significance of IOCs in HINTI can be naturallyranked based on the proposed threat intelligence computingframework

Table 5 shows the top 5 authoritative ranking score [35] ofvulnerability attacker attack type and platform from whichsecurity experts can gain a clear insight into the impact ofeach IOC Degree centrality [33] which describes the numberof links incident upon a node is widely used in evaluatingthe importance of a node in a graph It can used to quantifythe immediate risk of a node that connects with other nodesfor delivering network flows such as virus spreading Heredegree centrality can be utilized in verifying the effectivenessof the proposed threat intelligence computing framework inranking the importance of IOCs It is worth noting that bothour ranking method and degree centrality work regardless ofthe time of attacks We compute the degree centrality rank-ing of IOCs based on the fact that the node with a higherdegree centrality is more important than a node with a lowerone For instance if the degree centrality of a vulnerabilityis higher it indicates that this vulnerability is exploited bymore attackers or it affects more devices The ranking resultof degree centrality shown in Table 5 is consistent with theranking result based on the proposed threat intelligence com-puting framework demonstrating the capability of the CTIcomputing framework in ranking the importance of differenttypes of IOCs

62 Attack Preference Modeling

Attack preference modeling is meaningful for security organi-zations to gain insight into the attack intention of attackersbuild attack portraits and develop personalized defense strate-gies Here we leverage HINTI to integrate different types ofIOCs and their interdependent relationships to comprehen-sively depict the picture of attack events which helps modelthe attack preferences With the proposed threat intelligencecomputing framework we model attack preferences by clus-tering the embedded attackersrsquo vectors

In this task each malicious IP address is treated as an in-truder and its attack preferences are mainly reflected in threefeatures including the platforms it destroys (including Win-dows Linux Unix ASP Android Apache etc) the industriesit invades (eg education finance government Internet ofThings and Industrial control system etc) and the exploittypes it employs (eg DOS Buffer overflow Execute code

250 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

Table 5 The significance ranking of different types of IOCs (CV E1 CVE-2017-0146 CV E2 CVE-2006-5911 CV E3 CVE-2008-6543 CV E4 CVE-2012-1199 CV E4 CVE-2006-4985 AR Authoritative Ranking DC Degree Centrality value)

Vulnerability Attacker Platform Attack Type

No AR DC Monicker AR DC Category AR DC Exploit_type AR DC

CV E1 02713 7643 Meatsploit 02764 549 PHP 04562 17865 Webapps 05494 11648

CV E2 02431 7124 GSR team 01391 327 windows 02242 13793 DOS 01772 8741

CV E3 02132 6833 Ihsan 00698 279 Linux 00736 8792 Overflow 01533 7652

CV E4 01826 6145 Techsa 00695 247 Linux86 00623 8147 CSRF 00966 5433

CV E5 01739 5637 Aurimma 00622 204 ASP 00382 5027 SQL 00251 2171

(a) AV DV T AT (P13) (b) AV DPDTV T AT (P17) (c) AV PV T AT (P14)

Figure 8 The performance of attack preference modeling with different meta-paths in which the preference of attacker i isreduced to a two-dimensional space (xiyi) and each cluster represents a group with a specific attack preference

Sql injection XSS Gain information etc)Specifically we first utilize our proposed threat intelligence

computing framework to embed each attacker into a low-dimensional vector space and then perform DBSCAN algo-rithm on the embedded vector to cluster attackers with thesame preferences into corresponding groups Figure 8 showsthe top 3 clustering results under different types of meta-paths in which the meta-path AV DPDTV T AT (P17) performsthe best performance with compact and well-separated clus-ters indicating that it contains richer semantic relationshipsfor characterizing attack preferences than other meta-paths

To verify the effectiveness of attack preference modelingwe identify 5297 distinct attackers (each unique IP address istreated as an attacker) who have submitted at least 10 cyberattacks For these attackers five cybersecurity researchersconsisting of three doctoral and two master students spentabout fortnight to manually annotate their attack preferencesfrom three perspectives the platforms they destroyed the in-dustries they attacked and the attack types they exploited Toensure the accuracy of data labeling we test the consistencyof the tags for the 5297 attackers and remove the sampleswith ambiguous tags As a result we obtain 3000 sampleswith consistent tags Based on the labeled samples we furtherevaluate the performance of different meta-paths on model-

Table 6 Performance of modeling attack preference withdifferent meta-paths

Metapath Accuracy Precision Micro-F1

P1 7431 7622 7525

P4 7116 7327 7216

P5 6915 7143 7027

P12 7214 7646 7424

P13 7965 8131 8047

P14 7748 7934 7840

P15 8017 7976 7996

P17 8139 8172 8155

ing attack preferences In the attack modeling scenario weonly focus on the meta-paths that both the start node and theend node are attackers The experimental results are demon-strated in Table 6 Obviously different meta-paths displaydifferent abilities in characterizing the attack preferences ofcyber intruders The performance when using P17 is more

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 251

superior than the one with other meta-paths which indicatesthat P17 holds more valuable information that characterizesthe attack preferences of cybercriminals since P17 includesthe semantics information of P1P4P5 and P12 sim P15

In addition we compare the capabilities of our proposedcomputing framework with those of other state-of-the-art em-bedding methods in terms of attack preference modelingOur analysis result shows that the accuracy of attack pref-erence modeling reaches 081 which outperforms the exist-ing popular models Node2vec (with precision of 071) [1]metapaht2vec (with precision of 073) [11] and HAN (withprecision of 076) [42] The performance improvement canbe attributed to the following characteristics First our com-puting framework utilizes weight-learning to learn the signifi-cance of different meta-paths for evaluating the similarity be-tween attackers Second the proposed computing frameworkleverages GCN to learn the structural information betweenattackers to obtain more discriminative structural features thatimproves the performance of attack preference modeling

63 Vulnerability Similarity AnalysisVulnerability classification or clustering is crucial for conduct-ing vulnerability trend analysis the correlation analysis ofincidents and exploits and the evaluation of countermeasuresThe traditional vulnerability analysis relies on the manualinvestigation of the source codes which requires expert ex-pertise and consumes considerable efforts In this sectionwe propose an unsupervised vulnerability similarity analy-sis method based on the proposed threat intelligence com-puting framework which can automatically group similarvulnerabilities into corresponding communities Particularlythe vulnerability-related IOCs are first embedded into a low-dimensional vector space using CTI computing frameworkThen the DBSCAN algorithm is performed on the embed-ded vector space to cluster vulnerabilities into correspondingcommunities The clustering results are presented in Figure 9

Figure 9 (c) shows all vulnerabilities are clustered into 12clusters using meta-path V DPDTV T (P16) which is very closeto the classification standard (ie 13) recommended by CVEdetails an authoritative database that publishes vulnerabilityinformation By manually analyzing the training samples wefind that HTTP Response Splitting vulnerability does not ap-pear in our dataset Therefore our cluster number (ie 12)is consistent with CVE Details11 To further validate the ef-fectiveness of threat intelligence computing framework forvulnerability clustering we randomly select 100 vulnerabili-ties from each cluster for manual inspection to measure theconsistency of the vulnerability types in each cluster and theresults are presented in Table 7 Obviously the clustering per-formance of cluster 8 (ie File Inclusion) and cluster 10 (ieDirectory Traversal) is remarkably worse than other clustersTo explain the reason we examine our training data and the

11httpswwwcvedetailscom

Table 7 Accuracy of vulnerability clustering

Cluster ID Vulnerability type Accuracy

cluster1 Denial of Service 8012

cluster2 XSS 8353

cluster3 Execute Code 8150

cluster4 Overflow 7650

cluster5 Gain Privilege 9156

cluster6 Bypass Something 7174

cluster7 CSRF 9327

cluster8 File Inclusion 6172

cluster9 Gain Informa 7042

cluster10 Directory Traversal 6949

cluster11 Memory Corruption 8156

cluster12 SQL Injection 8067

average 7851

computing framework We found that the proportion of thesetwo types of vulnerabilities is too small (cluster 8 is 34and cluster 10 is 42) making our computing frameworkvery likely to be under-fit with insufficient data Howeverthe proposed computing framework performs well on mosttypes of vulnerabilities in an unsupervised manner especiallygiven sufficient samples (eg cluster 7 is 176 and cluster 5is 157)

In addition by examining the clustering results we havean observation that the vulnerabilities in the same cluster arelikely to have evolutionary relationships For instance CVE-2018-0802 an office zero-day vulnerability is evolved fromthe CVE-2017-11882 They both include EQNEDT32exe fileused to edit the formula in Office software which allowsremote attackers to execute arbitrary codes by constructinga malformed font name The modeling and computation ofinterdependent relationships among IOCs in HINTI facilitatethe discovery of such intricate connections between vulnera-bilities

In summary HINTI is capable of depicting a more compre-hensive threat landscape and the proposed CTI computingframework has the ability to bring novel security insightstoward different real-world security applications Howeverthere are still numerous opportunities for enhancing thesesecurity applications Specifically for attack preference mod-eling although each individual IP address is treated as anattacker we cannot determine whether it belongs to a realattacker or is disguised by a proxy Fortunately even if thereal attack address cannot be captured understanding the at-tack preferences of these IP proxies which are widely usedin cybercrime is also meaningful for gaining insight into thecyber threats For vulnerability similarity analysis data imbal-

252 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

(a) AV DV T AT (P13) (b) V DV T (P10) (c) V DPDTV T (P16)

Figure 9 Illustration of the vulnerability similarity analysis based on different meta-paths in which vulnerability i can be reducedinto a two-dimensional space (xiyi) and each cluster indicates a particular type of vulnerability

ance issue affects the performance of model and inadequatetraining samples often result in model underfitting as shownin the case of cluster 8 and cluster 10

7 Related Work

Cyber Treat Intelligence An increasing number of securityvendors and researchers start exploring CTI for protectingsystem security and defending against new threat vectors [28]Existing CTI extraction tools such as IBM X-Force12 Threatcrowd13 Openctiio14 AlienVault15 CleanMX16 and Phish-Tank17 use regular expression to synthesize IOC from thedescriptive texts However these methods often producehigh false positive rate by misjudging legitimate entities asIOCs [22]

Recently Balzarotti et al [2] develop a system to extractIOCs from web pages and identify malicious URLs fromJavaScript codes Sabottke et al [31] propose to detect po-tential vulnerability exploits by extracting and analyzing thetweets that contain ldquoCVErdquo keyword Liao et al [22] present atool iACE for automatically extracting IOCs which excelsat processing technology articles Nevertheless iACE identi-fies IOCs from a single article which does not consider therich semantic information from multi-source texts Zhao etal [46] define different ontologies to describe the relationshipbetween entities based on expert knowledge Numerous popu-lar CTI platforms including IODEF [9] STIX [3] TAXII [40]OpenIOC [13] and CyBox [19] focus on extracting and shar-ing CTI Yet none of the existing approaches could uncoverthe interdependent relations among CTIs extracted from multi-source texts let alone quantifying CTIsrsquo relevance and miningvaluable threat intelligence hidden behind the isolated CTIs

12httpsexchangexforceibmcloudcom13httpswwwthreatcrowdorg14httpsdemoopenctiio15 httpsotxalienvaultcom16httplistclean-mxcom17httpswwwphishtankcom

Heterogeneous Information Network Real-world systemsoften contain a large number of interacting multi-typed ob-jects which can naturally be expressed as a heterogeneousinformation network (HIN) HIN as a conceptual graph repre-sentation can effectively fuse information and exploit richersemantics in interacting objects and links [37] HIN has beenapplied to network traffic analysis [38] public social mediadata analysis [45] and large-scale document analysis [41]Recent applications of HIN include mobile malware detec-tion [14] and opioid user identification [12] In this paper forthe first time we use HIN for CTI modelingGraph Convolutional Network Graph convolutional net-works (GCN) [17] has become an effective tool for address-ing the task of machine learning on graphs such as semi-supervised node classification [17] event classification [29]clustering [8] link prediction [27] and recommended sys-tem [44] Given a graph GCN can directly conduct the con-volutional operation on the graph to learn the nonlinear em-bedding of nodes In our work to discern and reveal the inter-active relationships between IOCs we utilize GCN to learnmore discriminative representation from attributes and graphstructure simultaneously which is the premise for threat intel-ligence computing

8 Discussion

Data Availability The proposed framework assumes thatsufficient threat description data can be obtained for generat-ing comprehensive and the latest CTIs Fortunately with thegrowing prosperity of social media an increasing number ofsecurity-related data (eg blogs posts news and open secu-rity databases) can be collected effortlessly To automaticallycollect security-related data we develop a crawler system tocollect threat description data from 73 international securitysources (eg blogs hacker forum posts security bulletinsetc) providing sufficient raw materials for generating cyberthreat intelligenceModel Extensibility In this paper 6 types of IOCs and 9

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 253

types of relationships are modeled in HINTI However ourproposed framework is extensible in which more types ofIOCs and relationships can be enrolled to represent richer andmore comprehensive threat information such as maliciousdomains phishing Emails attack tools their interactions etcHigh-level Semantic Relations In view of the computa-tional complexity of the model our threat intelligence comput-ing method focues on utilizing the meta-paths to quantify thesimilarity between IOCs while ignoring the influence of themeta-graph on it which inevitably misses characterizing somehigh-level semantic information Nevertheless the proposedcomputing framework introduces an attention mechanism tolearn the signification of different meta-paths to character-ize IOCs and their interactive relationships which effectivelycompensates for the performance degradation caused by ig-noring the meta-graphsSecurity Knowledge Reasoning Although our proposedframework exhibits promising results in CTI extraction andmodeling computing how to implement advanced securityknowledge reasoning and prediction is still an open problemeg it remains challenging to predict whether a vulnerabilitycould potentially affect a particular type of devices in thefuture We will investigate this problem in the future

9 Conclusion

This work explores a new direction of threat intelligence com-puting which aims to uncover new knowledge in the relation-ships among different threat vectors We propose HINTI acyber threat intelligence framework to model and quantify theinterdependent relationships among different types of IOCsby leveraging heterogeneous graph convolutional networksWe develop a multi-granular attention mechanism to learnthe importance of different features and model the interde-pendent relationships among IOCs using HIN We proposethe concept of threat intelligence computing and present ageneral intelligence computing framework based on graphconvolutional networks Experimental results demonstratethat the proposed multi-granular attention based IOC extrac-tion method outperforms the existing state-of-the-art methodsThe proposed threat intelligence computing framework caneffectively mine security knowledge hidden in the interdepen-dent relationships among IOCs which enables crucial threatintelligence applications such as threat profiling and rankingattack preference modeling and vulnerability similarity analy-sis We would like to emphasize that the knowledge discoveryamong interdependent CTIs is a new field that calls for acollaborative effort from security experts and data scientists

In future we plan to develop a predicative and reasoningmodel based on HINTI and explore preventative countermea-sures to protect cyber infrastructure from future threats Wealso plan to add more types of IOCs and relations to depicta more comprehensive threat landscape Moreover we willleverage both meta-paths and meta-graphs to characterize the

IOCs and their interactions to further improve the embeddingperformance and to strike a balance between the accuracyand computational complexity of the model We will alsoinvestigate the feasibility of security knowledge predictionbased on HINTI to infer the potential future relationshipsbetween the vulnerabilities and devices

Acknowledgement

We would like to thank our shepherd Tobias Fiebig andthe anonymous reviewers for providing valuable feedbackon our work We also thank Hao Peng and Lichao Sunfor their feedback on the early version of this work Thiswork was supported in part by National Science Founda-tion grants CNS1950171 CNS-1949753 It was also sup-ported by the NSFC for Innovative Research Group ScienceFund Project (62141003) National Key RampD Program China(2018YFB0803 503) the 2018 joint Research Foundationof Ministry of Education China Mobile (MCM20180507)and the Opening Project of Shanghai Trusted Industrial Con-trol Platform (TICPSH202003020-ZC) Any opinions find-ings and conclusions or recommendations expressed in thismaterial do not necessarily reflect the views of any fundingagencies

References

[1] Jure Leskovec Aditya Grover node2vec Scalable fea-ture learning for networks In Acm Sigkdd InternationalConference on Knowledge Discovery Data Mining2016

[2] Marco Balduzzi Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications In WWW 2016

[3] Sean Barnum Standardizing cyber threat intelligenceinformation with the structured threat information ex-pression (stix) Mitre Corporation 111ndash22 2012

[4] Eric W Burger Michael D Goodman Panos Kam-panakis and Kevin A Zhu Taxonomy model for cyberthreat intelligence information exchange technologiesIn Proceedings of the 2014 ACM Workshop on Infor-mation Sharing amp Collaborative Security pages 51ndash602014

[5] Onur Catakoglu Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications The web conference pages 333ndash3432016

[6] Danqi Chen and Christopher D Manning A fast and ac-curate dependency parser using neural networks pages740ndash750 2014

254 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

[7] Qian Chen and Robert A Bridges Automated behav-ioral analysis of malware a case study of wannacry ran-somware In 16th IEEE ICMLA pages 454ndash460 2017

[8] Weilin Chiang Xuanqing Liu Si Si Yang Li SamyBengio and Chojui Hsieh Cluster-gcn An efficient al-gorithm for training deep and large graph convolutionalnetworks knowledge discovery and data mining pages257ndash266 2019

[9] Roman Danyliw Jan Meijer and Yuri Demchenko Theincident object description exchange format Interna-tional Journal of High Performance Computing Appli-cations 50701ndash92 2007

[10] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 2017

[11] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 ACM 2017

[12] Yujie Fan Yiming Zhang Yanfang Ye and Xin Li Au-tomatic opioid user detection from twitter Transductiveensemble built on different meta-graph based similari-ties over heterogeneous information network In IJCAIpages 3357ndash3363 2018

[13] Fireeye Openioc httpswwwfireeyecomblogthreat-research201310openioc-basicshtml Accessed January 20 2020

[14] Shifu Hou Yanfang Ye Yangqiu Song and Melih Ab-dulhayoglu Hindroid An intelligent android malwaredetection system based on structured heterogeneous in-formation network In Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Dis-covery and Data Mining pages 1507ndash1515 2017

[15] Zhiheng Huang Wei Xu and Kai Yu Bidirectional lstm-crf models for sequence tagging arXiv1508019912015

[16] Michael D Iannacone Shawn Bohn Grant NakamuraJohn Gerth Kelly MT Huffer Robert A Bridges Erik MFerragut and John R Goodall Developing an ontologyfor cyber security knowledge graphs CISR 1512 2015

[17] Thomas Kipf and Max Welling Semi-supervised clas-sification with graph convolutional networks arXivLearning 2016

[18] Thomas N Kipf and Max Welling Semi-supervisedclassification with graph convolutional networks arXivpreprint arXiv160902907 2016

[19] Tero Kokkonen Architecture for the cyber security situ-ational awareness system In Internet of Things SmartSpaces and Next Generation Networks and Systemspages 294ndash302 Springer 2016

[20] Mehmet Necip Kurt Yasin Yılmaz and Xiaodong WangDistributed quickest detection of cyber-attacks in smartgrid IEEE Transactions on Information Forensics andSecurity 13(8)2015ndash2030 2018

[21] John Lafferty Andrew Mccallum and Fernando PereiraConditional random fields Probabilistic models for seg-menting and labeling sequence data ICML pages 282ndash289 2001

[22] Xiaojing Liao Yuan Kan Xiao Feng Wang Li Zhouand Raheem Beyah Acing the ioc game Toward auto-matic discovery and analysis of open-source cyber threatintelligence In ACM Sigsac Conference on ComputerCommunications Security 2016

[23] Rob Mcmillan Open threat intelligencehttpwwwgartnercomdoc2487216definition-threat-intelligence AccessedJanuary 20 2020

[24] Tomas Mikolov Kai Chen Greg Corrado and JeffreyDean Efficient estimation of word representations invector space arXiv preprint arXiv13013781 2013

[25] Sudip Mittal Prajit Kumar Das Varish Mulwad Anu-pam Joshi and Tim Finin Cybertwitter Using twitterto generate alerts for cybersecurity threats and vulnera-bilities In Proceedings of the 2016 IEEE Advances inSocial Networks Analysis and Mining pages 860ndash8672016

[26] Eric Nunes Ahmad Diab Andrew Gunn EricssonMarin Vineet Mishra Vivin Paliath John RobertsonJana Shakarian Amanda Thart and Paulo ShakarianDarknet and deepnet mining for proactive cybersecuritythreat intelligence In 2016 IEEE ISI pages 7ndash12 2016

[27] Shirui Pan Ruiqi Hu Guodong Long Jing Jiang LinaYao and Chengqi Zhang Adversarially regularizedgraph autoencoder for graph embedding pages 2609ndash2615 2018

[28] P Pawlinski P Jaroszewski P Kijewski L SiewierskiP Jacewicz P Zielony and R Zuber Actionable infor-mation for security incident response European UnionAgency for Network and Information Security Herak-lion Greece 2014

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 255

[29] Hao Peng Jianxin Li Qiran Gong Yangqiu SongYuanxing Ning Kunfeng Lai and Philip S Yu Fine-grained event categorization with heterogeneous graphconvolutional networks In Proceedings of the 28th In-ternational Joint Conference on Artificial Intelligencepages 3238ndash3245 AAAI Press 2019

[30] Sara Qamar Zahid Anwar Mohammad Ashiqur Rah-man Ehab Al-Shaer and Bei-Tseng Chu Data-drivenanalytics for cyber-threat intelligence and informationsharing Computers amp Security 6735ndash58 2017

[31] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In USENIXSecurity 2015

[32] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In 24thUSENIX Security pages 1041ndash1056 2015

[33] Deepak Sharma and Avadhesha Surolia Degree Cen-trality Springer New York 2013

[34] Saurabh Singh Pradip Kumar Sharma Seo Yeon MoonDaesung Moon and Jong Hyuk Park A comprehensivestudy on apt attacks and countermeasures for futurenetworks and communications challenges and solutionsJournal of Supercomputing pages 1ndash32 2016

[35] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks Principles and methodologiesSynthesis Lectures on Data Mining and Knowledge Dis-covery 3(2)1ndash159 2012

[36] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks a structural analysis approachACM SIGKDD 14(2)20ndash28 2013

[37] Yizhou Sun Jiawei Han Xifeng Yan Philip S Yu andTianyi Wu Pathsim Meta path-based top-k similaritysearch in heterogeneous information networks Proceed-ings of the VLDB Endowment 4(11)992ndash1003 2011

[38] Yizhou Sun Brandon Norick Jiawei Han Xifeng YanPhilip S Yu and Xiao Yu Pathselclus Integrating meta-path selection with user-guided object clustering in het-erogeneous information networks ACM Transactionson TKDD 7(3)11 2013

[39] Wiem Tounsi and Helmi Rais A survey on technicalthreat intelligence in the age of sophisticated cyber at-tacks Computers amp Security 72212ndash233 2018

[40] Thomas D Wagner Esther Palomar Khaled Mahbuband Ali E Abdallah Towards an anonymity supportedplatform for shared cyber threat intelligence risks andsecurity of internet and systems pages 175ndash183 2017

[41] Chenguang Wang Yangqiu Song Haoran Li MingZhang and Jiawei Han Text classification with het-erogeneous information network kernels In 13th AAAI2016

[42] Xiao Wang Houye Ji Chuan Shi Bai Wang Peng CuiP Yu and Yanfang Ye Heterogeneous graph attentionnetwork 2019

[43] Jie Yang Shuailong Liang and Yue Zhang Design chal-lenges and misconceptions in neural sequence labelingarXiv Computation and Language 2018

[44] Rex Ying Ruining He Kaifeng Chen Pong Eksombat-chai William L Hamilton and Jure Leskovec Graphconvolutional neural networks for web-scale recom-mender systems In Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Dis-covery amp Data Mining pages 974ndash983 ACM 2018

[45] Jiawei Zhang Xiangnan Kong and Philip S Yu Trans-ferring heterogeneous links across location-based socialnetworks In The 7th ACM international conference onWeb search and data mining pages 303ndash312 2014

[46] Yishuai Zhao Bo Lang and Ming Liu Ontology-basedunified model for heterogeneous threat intelligence inte-gration and sharing In 2017 11th IEEE InternationalConference on Anti-counterfeiting Security and Identi-fication (ASID) pages 11ndash15 2017

256 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

  • Introduction
  • Background
    • Cyber Threat Intelligence
    • Motivation
    • Preliminaries
      • Architecture Overview of HINTI
      • Methodology
        • Multi-granular Attention Based IOC Extraction
        • Cyber Threat Intelligence Modeling
        • Threat Intelligence Computing
          • Experimental Evaluation
            • Dataset and Settings
            • Evaluation of IOC Extraction
              • Application of Threat Intelligence Computing
                • Threat Profiling and Significance Ranking of IOCs
                • Attack Preference Modeling
                • Vulnerability Similarity Analysis
                  • Related Work
                  • Discussion
                  • Conclusion
Page 5: Cyber Threat Intelligence Modeling Based on Heterogeneous ... · 2.1Cyber Threat Intelligence Cyber Threat Intelligence (CTI) extracted from security-related data is structured information

Figure 5 The overall architecture of HINTI HINTI consists of four major components (a) collecting security-related data andextracting threat objects (ie IOCs) (b) modeling interdependent relationships among IOCs into a heterogeneous informationnetwork (c) embedding nodes into a low-dimensional vector space using weight-learning based similarity measure and (d)computing threat intelligence based on graph convolutional networks and knowledge mining

velop a data collection system to automatically capturesecurity-related data from blogs hacker forum posts se-curity news and security bulletins The system utilizesa breadth-first search to collect the HTML source codeand then leverages Xpath (XML Path language) to ex-tract threat-related descriptions After that we utilize amulti-granular attention based IOC recognition methodto extract IOC from the collected threat-related descrip-tions (see Section 41 for details)

bull Relation Extraction and IOC modeling HINTI ad-dresses the challenge of CTI modeling by leveragingheterogeneous information networks which can natu-rally depict the interdependent relationships betweenheterogeneous IOCs As an example Figure 4 shows amodel that capture the interactive relationships among at-tacker vulnerability malicious file attack type platformand device (see Section 42 for details)

bull Meta-path Design and Similarity Measure Meta-path is an effective tool to express the semantic rela-tions among IOCs in constructed HIN For instance

attackerexploitminusminusminusminusrarr vulnerability

exploitminusminusminusminusminusminusrarr attacker indi-cates that two attackers are related by exploiting the samevulnerability We design 17 types of meta-paths (SeeTable 1) to describe the interdependent relationships be-tween IOCs With these meta-paths we present a weight-learning based node similarity computing approach toquantify and embed the relationships as the premise forthreat intelligence computing

bull Threat Computing and Knowledge Mining In thiscomponent an effective threat intelligence computingframework is proposed which can quantify and measure

the relevance among IOCs by leveraging graph convolu-tional network (GCN) Our proposed threat intelligencecomputing framework could reveal richer security knowl-edge within a more comprehensive threat landscape

4 Methodology

41 Multi-granular Attention Based IOC Ex-traction

Extracting IOCs from multi-source threat texts is one of themajor tasks of threat intelligence analytics and the qualityof the extracted IOCs significantly influences the analysisresults of cyber threats Recently Bidirectional Long Short-Term Memory+Conditional Random Fields (BiLSTM+CRF)model [15] has demonstrated excellent performance in textchunking and Named-entity Recognition (NER) Howeverdirectly applying this model to IOC extraction is unlikely tosucceed since threat texts usually contain a large number ofthreat objects with different grams and irregular structuresConsequently we need an efficient method to learn the dis-criminative characteristics of IOCs with different sizes In thispaper we propose a multi-granular attention based IOC extrac-tion method which can extract threat objects with differentgranularity Particularly Figure 6 presents the proposed IOCextraction framework which leverages the multi-granular at-tention mechanism to characterize IOCs Different from thetraditional BiLSTM+CRF model we introduce new word-embedding features with different granularities to capture thecharacteristics of IOCs with different sizes Furthermore weutilize a self-attention mechanism to learn the importance ofthe features to improve the accuracy of IOC extraction

Our proposed method takes a threat description sentenceX = (x1x2 middot middot middot xi) as input where xi represents i-th word

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 245

Figure 6 The framework of multi-granular IOC extraction

in X We first chunk the sentence into n-gram componentsincluding char-level 1-gram 2-gram and 3-gram which arethe inputs of our trained model written as follows

e jxi=V j

embedding(xi) (1)

where V jembedding transforms the chunk with granularity j into

a vector space and xi is the i-th word in a sentence X Thusthe threat description sentence Xi can be vectorized as follows

rarrh j

i = LST M f orward([e jx0e j

x1 middot middot middot e j

xi])

larrh j

i = LST Mbackward([e jx0e j

x1 middot middot middot e j

xi])

(2)

whererarrh j

i andlarrh j

i are the embedded features learned byforward LSTM and backward LSTM respectively Let O bethe output of Bi-LSTM which is a weighted sum of embeddedfeatures with weights corresponding to the importance ofdifferent features

O = H middotW T (3)

where H =j

sum~βiσ(h

j1h

j2 middot middot middot h

ji ) h j

i = (rarrh j

i +larrh j

i ) ~βi is theweight vector to represent the importance of h j

i in whichj and i are the segmentation granularity of sentences and thecorresponding index of the chunk W is the parameter matrix

Given a security-related sentence X = (x1x2 middot middot middot xi) itscorresponding threat object sequence Y = (y1 y2 middot middot middot yi) andits output of Bi-LSTM O we can compute the overall labelscore of X and Y as follows

S(X Y ) =n

sumi=0

(Ayiyi+1 +Oiyi) (4)

where Ayiyi+1 is the state transition matrix in CRF model andOiyi as the output of Bi-LSTM hidden layer (calculated byEq (3)) represents the label score of i-th word corresponding

to the type yi Next we utilize so f tmax function to normalizethe overall label score

p(Y |X) =eS(X Y )

sumyisinYX

eS(X Y )(5)

We design an objective function to maximize the proba-bility p(Y |X) to achieve the highest label score for differentIOCs which can be written as follows

argmax log(p(Y |X)) = argmax (S(X Y )minus

log( sumyisinYX

eS(X y))) (6)

By solving the objective function above we assign correctlabels to the n-gram components according to which we canidentify the IOCs with different lengths Our multi-granularattention based IOC extraction method is capable of identify-ing different types of IOCs and its evaluation is presented inSection 5

42 Cyber Threat Intelligence ModelingCTI modeling is an important step to explore the intricaterelationship between heterogeneous IOCs In our work HINis introduced to group different types of IOCs into a graphto explore their interactive relationships In this section weportray the main principle of threat intelligence modeling

To model the intricate interdependent relationships amongIOCs we define the following 9 relationships among 6 typesof IOCs as follows

bull R1 To depict the relation of an attacker and the ex-ploited vulnerability we construct the attacker-exploit-vulnerability matrix A For each element Ai j isin 01Ai j=1 indicates attacker i exploits vulnerability j

bull R2 To depict the relation of an attacker and a devicewe build the attacker-invade-device matrix D For eachelement Di j isin 01 Di j=1 indicates attacker i invadesdevice j

bull R3 Two attacker can cooperate to attack a target Tostudy the relationship of attacker-attacker we constructthe attacker-cooperate-attacker matrix C For each ele-ment Ci j isin 01 Ci j=1 indicates there exists a cooper-ative relationship between attacker i and j

bull R4 To describe the relation of a vulnerability and theaffected device we build the vulnerability-affect-devicematrix M For each element Mi j isin 01 Mi j=1 indi-cates vulnerability i affects device j

bull R5 A vulnerability is often labeled as a specific attacktype by Common Vulnerabilities and Exposures (CVE)

246 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

system7 To explore the relation of vulnerability-attacktype we build the vulnerability-belong-attack type ma-trix G where each element Gi j isin 01 denotes if vul-nerability i belongs to an attack type j

bull R6 A vulnerability often involves one or more maliciousfiles To describe the relation of vulnerability-file webuild the vulnerability-include-file matrix F For eachelement Fi j isin 01 Fi j=1 denotes that vulnerability iincludes malicious file j

bull R7 A malicious file often targets a specific device Weestablish the file-target-device matrix T to explore therelation of file-device For each element Ti j isin 01Ti j=1 indicates malicious file i targets device j

bull R8 Oftentimes a vulnerability evolves from anotherTo study the relationship of vulnerability-vulnerabilitywe build the vulnerability-evolve-vulnerability matrixE where each element Ei j isin 01 indicates if vulnera-bility i evolves from vulnerability j

bull R9 To depict the relation device-platform that a de-vice belongs to a platform we build the device-belong-platform matrix P where each element Pi j isin 01 il-lustrates if device i belongs to platform j

Based on the above 9 types of relationships HINTIleverages the syntactic dependency parser [6] (eg subject-predicate-object attributive clause etc) to automatically ex-tract the 9 relationships among IOCs from threat descriptionseach of which is represented as a triple (IOCirelation IOC j)For instance given a security-related description ldquoOn May12 2017 WannaCry exploited the MS17-010 vulnerabilityto affect a larger number of Windows devices which is aransomware attack via encrypted disks Using the syntacticdependency parser we can extract the following triples (Wan-naCry exploit MS17-010) (MS17-010 affect Windows de-vice) (WannaCry is ransomware) Such triples extractedfrom various data sources can be incrementally assembledinto HINTI to model the relationships among IOCs whichcould offer a more comprehensive threat landscape that de-scribes the threat context Particularly we further define 17types of meta-paths shown in Table 1 to probe into the interde-pendent relationships over attackers vulnerabilities maliciousfiles attack types devices and platforms HINTI is able toconvey a richer context of threat events by scrutinizing 17types of meta-paths and reveal the in-depth threat insightsbehind the heterogeneous IOCs (see Section 6 for details)

43 Threat Intelligence ComputingIn this section we illustrate the concept of threat intelligencecomputing and design a general threat intelligence computing

7httpcvemitreorg

framework based on heterogeneous graph convolutional net-works which quantifies and measures the relevance betweenIOCs by analyzing meta-path based semantic similarity Herewe first provide a formal definition of threat intelligence com-puting based on heterogeneous graph convolutional networks

Definition 4 Threat Intelligence Computing Based onHeterogeneous Graph Convolutional Networks Given thethreat intelligence graph G = (VE) the meta-path set M =P1P2 middot middot middot Pi The threat intelligence computing i) com-putes the similarity between IOCs based on meta-path Pi togenerate corresponding adjacency matrix Ai ii) constructsthe feature matrix of nodes Xi by embedding attribute in-formation of IOCs into a latent vector space iii) conductsgraph convolution GCN(AiXi) to quantify the interdependentrelationships between IOCs by following meta-path Pi andembeds them into a low-dimensional space

The threat intelligence computing aims to model the seman-tic relationships between IOCs and measure their similaritybased on meta-paths which can be used for advanced secu-rity knowledge discovery such as threat object classificationthreat type matching threat evolution analysis etc Intuitivelythe objects connected by the most significant meta-paths tendto bear more similarity [37] In this paper we propose aweight-learning based threat intelligence similarity measurewhich uses self-attention to improve the performance of simi-larity measurement between any two IOCs This method canbe formalized as below

Definition 5 Weight-learning based Node Similar-ity Measure Given a set of symmetric meta-path setP = [Pm]

Mprime

m=1 the similarity S(hih j) between any two IOCshi and h j is defined as

S(hih j) =Mprime

summ

rarrw

2middot | hirarr j isin Pm || hirarri isin Pm |+ | h jrarr j isin Pm |

(7)

where hirarr j isin hm is a path instance between IOC hi andh j following meta-path Pm hirarri isin Pm is that between IOCinstance hi and hi and h jrarr j isin Pm is that between IOC in-stance h j and h j where | hirarr j isin Pm |=CPm(i j) | hirarri isinPm |=CPm(i i) | h jrarr j isin Pm |=CPm( j j) and CPm is thecommuting matrix based on meta-path Pm defined below

rarrw =

[w1 wm wMprime ] denote the meta-path weights wherewm is the weight of meta-paths Pm and M

primeis the number of

meta-pathsS(hih j) is defined in two parts (1) the semantic overlap

in the numerator which describes the number of meta-pathbetween IOC instance hi and h j (2) and the semantic broad-ness in the denominator which depicts the number of totalmeta-paths between themselves The larger number of meta-path between IOC instance hi and h j the more similar thetwo IOCs are which is normalized by the semantic broadness

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 247

of denominator Moreover different from existing similaritymeasures [37] we propose an attention mechanism basedsimilarity measure method by introducing the weight vec-torrarrw = [w1 wm wMprime ] which is a trainable coefficient

vector to learn the importance of different meta-paths forcharacterizing IOCs

Obviously it is computationally expensive to measure thesimilarity among IOCs in the constructed heterogeneousgraph as it usually requires to randomly walk a larger numberof nodes in the graph Fortunately in our work it is unneces-sary to walk through the entire graph as we prescribe a limitby introducing predefined meta-paths and we only focus onthe symmetrical meta-paths presented in Table 1 To calcu-late the similarity between IOCs under different meta-pathinstances we need to compute the corresponding commutingmatrices [37] following the meta-paths

Given a meta-path set P = sumMprime

m A1A2 middot middot middot Al+1 themeta-path based commuting matrix can be defined as CP =UA1A2 UA2A3 middot middot middot UAAl+1 where CP(i j) represents the prob-ability of object iisinA1 reaching object j isinAl+1 under the pathP and is a connection operation These symmetric meta-paths not only efficiently reduce the complexity of walkingbut also ensures that the commuting matrix can be easily de-composed which greatly reduces the computational costs Inaddition the symmetric meta-paths in the graph G allow us toleverage the pairwise random-walk [37] to further acceleratethe computation

With Eq (7) and pairwise random-walk we can obtain thesimilarity embedding between any two IOCs hi and h j undera meta-path set P Based on the low-dimensional similarityembedding we derive a weighted adjacent matrix betweenIOCs denoted as Ai isin RNtimesN where N is the number of aspecific type of IOC in G Meanwhile to utilize the attributedinformation of nodes we train a word2vec model [24] toembed the attribute information of nodes into a feature matrixXi isin RNtimesd where N is the number of IOCs in Ai and d is thedimension of node feature With the learned adjacency matrixAi and its feature matrix Xi we can leverage the classicalGCN [18] to characterize the relationship between IOC hiand h j Particularly the layer-wise propagation rule of GCNcan be defined as below

H(l+1) = σ(Dminus12 ADminus

12 H(l)W (l)) (8)

where A = A+ IN is the adjacency matrix of IOCs with self-connections IN is the identity matrix Dii =sum j Ai j and W (l) isa l-th layer trainable weight matrix σ(middot) denotes an activationfunction such as relu H(l) isin RNtimesd is the matrix of activationin the l-th layer We perform graph convolution [18] on Aiand Xi to generate the embedding Z between IOCs belongingto type i

Z = f (XiAi) = σ(Ai middot relu(AiXiW(0)i )W (1)

i ) (9)

where W (0)i isin RdtimesH is an input-to-hidden weight matrix for a

hidden layer with H feature mapsW (1)i isinRHtimesF is a hidden-to-

output weight matrix with F feature maps in the output layerXi isin RNtimesd N is the number of a specific type of IOCs d is thedimension of their corresponding features and σ is anotheractivation function such as sigmoid Ai = Dminus

12 AiDminus

12 can be

calculated offline Here we leverage the cross-entropy loss tooptimize the performance of our proposed threat intelligenceframework written as follows

Loss(Yl f Zl f ) =minussumlisinYl

F

sumf=1

Yl f middot lnZl f (10)

where Yl is the set of node indices that have labels Yl f is thereal label and Zl f is a corresponding label that our modelpredicts Based on Eq (10) we conduct stochastic gradientdescent to continuously optimize the neural network weightsW (0)

i W (1)i and

rarrw to reduce the loss and build a general

threat intelligence computing framework Using this frame-work security organizations are able to mine richer securityknowledge hidden in the interdependent relationships amongIOCs

5 Experimental Evaluation

51 Dataset and SettingsWe develop a threat data collector to automatically collectcyber threat data from a set of sources including 73 inter-national security blogs (eg fireeye cloudflare) hacker fo-rum posts (eg Blackhat Hack5) security bulletins (egMicrosoft Cisco) CVE detail description and ExploitDB Acomplete list of data sources is presented in the Baidu cloud8We set up a daemon to collect the newly generated securityevents every day So far more than 245786 security-relateddata describing threat events have been collected For trainingand evaluating our proposed IOC extraction method 30000samples from 5000 texts are annotated by utilizing the B-I-Osequence tagging method (see Section 22 for the example)and an annotation example is shown in Figure 2

For 30000 labeled samples we randomly select 60 ofsamples as a training set 20 of samples as a verificationset and the rest of the samples as our test set Based on thedata sets we comprehensively evaluate the performance ofHINTI for extracting IOCs and threat intelligence computingWe run all of the experiments on 16 cores Intel(R) Core(TM)i7-6700 CPU 340GHz with 64GB RAM and 4times NVIDIATesla K80 GPU The software programs are executed on theTensorFlow-GPU framework on Ubuntu 1604

52 Evaluation of IOC ExtractionA set of experiments are conducted to evaluate the sensitiv-ity of different parameters in the multi-granular based IOC

8httpspanbaiducoms1J631WMYY_T_awa8aY5xy3A

248 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

extraction model We mainly consider 8 hyper-parametersthat seriously impact the performance of the model as shownin Table 2 More specifically Embedding_dim is one ofthe most important factors that determine the generaliza-tion capability of the model Here we fix other parame-ters while fine-tuning the embedding size in the range of(50100150200250300350400) Experimental resultsshow that the accuracy of extracted IOC achieves the bestwhen Embedding_dim=300 Learning_rate is another ma-jor factor for determining the stride of gradient descent inminimizing the loss function which determines whetherthe model can find a global optimal solution We fix otherparameters to fine-tune the Learning_rate in the rangeof (000100050010050105) and the performancereaches the best when the Learning_rate=0001 Similarlywe fine-tune the other hyper-parameters with 5000 epochsand the hyper-parameters allowing our model to perform op-timally are recorded in Table 2

Table 2 Hyperparameters setting in the multi-granular basedIOC extraction method

Parameter value Parameter Value

Embedding_dim 300 Hidden_dim 128

Sequence_length 500 Epoch_num 5000

Learning_rate 0001 Batch_size 64

Dropout_rate 05 Optimizer Adam

Table 3 Performance of IOC extraction wrt IOC types

IOC Type Precision Recall Micro-F1

IP 9956 9952 9954File 9436 9688 9560Type 9986 9981 9983

Email 9932 9987 9949Device 9326 9278 9302Vender 9307 9445 9424Version 9698 9799 9748Domain 9658 9589 9623Software 8825 8931 8878Function 9503 9559 9531Platform 9431 9257 9343Malware 8976 9123 9049

Vulnerability 9925 9873 9899Other 9829 9842 9835

In this paper we extract 13 major types of IOCs and the

Figure 7 Performance of IOC extraction using embeddingfeatures with different granularity

performance is presented in Table 3 Overall our IOC ex-traction method demonstrates excellent performance in termsof precision recall and Micro-F1 (ie micro-averaged F1-score) for most types of IOCs such as function maliciousIP and device However we observe a performance degra-dation when recognizing software and malware This can beattributed to the fact that most software and malware is namedby random strings such as md5 hash Moreover we find thatthe number of training samples impacts the performance ofthe model Specifically the performance becomes unsatisfac-tory (eg Software Malware) when the number of a certaintype of training samples is insufficient (ie less than 5000)

In order to verify the effectiveness of multi-granular embed-ding features we assess the performance of IOC extractionwith features of different granularity including char-level1-gram 2-gram 3-gram and multi-granular features The ex-perimental results are demonstrated in Figure 7 from whichwe can observe that the proposed multi-granular embeddingfeature outperforms others since it leverages the attentionmechanism to simultaneously learn multi-granular features tocharacterize different patterns of IOCs

Next to verify the effectiveness of the proposed IOC ex-traction method we compare it with the state-of-the-art en-tity recognition approaches including general NER toolsNLTK NER9 and Stanford NER10 professional IOC extrac-tion method Stucco [16] and iACE [22] and popular en-tity recognition approaches CRF [21] BiLSTM and BiL-STM+CRF [15] The experimental results of different meth-ods on real-world data are demonstrated in Table 4

The results indicate that our proposed IOC extraction out-performs the state-of-the-art entity recognition methods andtools in terms of precision recall and Micro-F1 and its im-provement can be attributed to the following factors Firstcompared with Standford NER and NLTK NER the NLP tools

9httpwwwnltkorgbookch07html10httpsstanfordnlpgithubioCoreNLPnerhtml

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 249

Table 4 Performance of threat entity recognition usingdifferent methods

Method Accuracy Precision Micro-F1

NLTK NER 6945 6851 6749

Stanford NER 6835 6674 6858

iACE 9214 9126 9225

Stucco 9116 9221 9147

CRF 9264 9180 9265

BiLSTM 9478 9521 9435

BiLSTM+CRF 9638 9642 9627

Multi-granular 9859 9872 9869

trained with general news corpora our method uses a security-related training corpus collected and labeled by ourselves asa data source for training our model Second different fromthe rule-based extraction approaches (eg iACE and Stucco)our proposed deep learning based method provides an end-to-end system with more advanced features to represent variousIOCs Third comparing to RNN-based methods (eg BiL-STM and BiLSTM+CRF) our proposed method brings inmulti-granular embedding sizes (char-level 1-gram 2-gramand 3-gram) to simultaneously learn the characteristics of vari-ous sizes and types of IOCs which can identify more complexand irregular IOCs Last but not the least our method imple-ments an attention mechanism to learn the weights of featureswith various scales to effectively characterize different typesof IOCs further enhancing the IOC recognition accuracy

6 Application of Threat Intelligence Comput-ing

Our proposed threat intelligence computing framework basedon heterogeneous graph convolutional networks can be usedto mine novel security knowledge behind heterogeneous IOCsIn this section we evaluate its effectiveness and applicabilityusing three real-world applications profiling and ranking forCTIs attack preference modeling and vulnerability similarityanalysis

61 Threat Profiling and Significance Rankingof IOCs

Due to the disparity in the significance of threats it is im-portant to derive the threat profile and rank the significanceof IOCs for demystifying the landscape of threats Howevermost of the existing CTIs are incapable of modeling the asso-ciated relationships between heterogeneous IOCs

Different from isolated CTIs HINTI leverages HIN to

model the interdependent relationships among IOCs withtwo characteristics first the isolated IOCs can be integratedinto a graph-based HIN to clearly display the associated rela-tionships among IOCs which is capable of directly depictingthe basic threat profile For example Figure 3 depicts a threatprofiling sample an attacker utilizes CVE-2017-0143 vulnera-bility to invade Vista SP2 and Win7 SP1 devices belonging tothe Microsoft platform and CVE-2017-0143 is a remote codeexecution vulnerability that uses a ldquoSMBbat malicious fileSecond the significance of IOCs in HINTI can be naturallyranked based on the proposed threat intelligence computingframework

Table 5 shows the top 5 authoritative ranking score [35] ofvulnerability attacker attack type and platform from whichsecurity experts can gain a clear insight into the impact ofeach IOC Degree centrality [33] which describes the numberof links incident upon a node is widely used in evaluatingthe importance of a node in a graph It can used to quantifythe immediate risk of a node that connects with other nodesfor delivering network flows such as virus spreading Heredegree centrality can be utilized in verifying the effectivenessof the proposed threat intelligence computing framework inranking the importance of IOCs It is worth noting that bothour ranking method and degree centrality work regardless ofthe time of attacks We compute the degree centrality rank-ing of IOCs based on the fact that the node with a higherdegree centrality is more important than a node with a lowerone For instance if the degree centrality of a vulnerabilityis higher it indicates that this vulnerability is exploited bymore attackers or it affects more devices The ranking resultof degree centrality shown in Table 5 is consistent with theranking result based on the proposed threat intelligence com-puting framework demonstrating the capability of the CTIcomputing framework in ranking the importance of differenttypes of IOCs

62 Attack Preference Modeling

Attack preference modeling is meaningful for security organi-zations to gain insight into the attack intention of attackersbuild attack portraits and develop personalized defense strate-gies Here we leverage HINTI to integrate different types ofIOCs and their interdependent relationships to comprehen-sively depict the picture of attack events which helps modelthe attack preferences With the proposed threat intelligencecomputing framework we model attack preferences by clus-tering the embedded attackersrsquo vectors

In this task each malicious IP address is treated as an in-truder and its attack preferences are mainly reflected in threefeatures including the platforms it destroys (including Win-dows Linux Unix ASP Android Apache etc) the industriesit invades (eg education finance government Internet ofThings and Industrial control system etc) and the exploittypes it employs (eg DOS Buffer overflow Execute code

250 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

Table 5 The significance ranking of different types of IOCs (CV E1 CVE-2017-0146 CV E2 CVE-2006-5911 CV E3 CVE-2008-6543 CV E4 CVE-2012-1199 CV E4 CVE-2006-4985 AR Authoritative Ranking DC Degree Centrality value)

Vulnerability Attacker Platform Attack Type

No AR DC Monicker AR DC Category AR DC Exploit_type AR DC

CV E1 02713 7643 Meatsploit 02764 549 PHP 04562 17865 Webapps 05494 11648

CV E2 02431 7124 GSR team 01391 327 windows 02242 13793 DOS 01772 8741

CV E3 02132 6833 Ihsan 00698 279 Linux 00736 8792 Overflow 01533 7652

CV E4 01826 6145 Techsa 00695 247 Linux86 00623 8147 CSRF 00966 5433

CV E5 01739 5637 Aurimma 00622 204 ASP 00382 5027 SQL 00251 2171

(a) AV DV T AT (P13) (b) AV DPDTV T AT (P17) (c) AV PV T AT (P14)

Figure 8 The performance of attack preference modeling with different meta-paths in which the preference of attacker i isreduced to a two-dimensional space (xiyi) and each cluster represents a group with a specific attack preference

Sql injection XSS Gain information etc)Specifically we first utilize our proposed threat intelligence

computing framework to embed each attacker into a low-dimensional vector space and then perform DBSCAN algo-rithm on the embedded vector to cluster attackers with thesame preferences into corresponding groups Figure 8 showsthe top 3 clustering results under different types of meta-paths in which the meta-path AV DPDTV T AT (P17) performsthe best performance with compact and well-separated clus-ters indicating that it contains richer semantic relationshipsfor characterizing attack preferences than other meta-paths

To verify the effectiveness of attack preference modelingwe identify 5297 distinct attackers (each unique IP address istreated as an attacker) who have submitted at least 10 cyberattacks For these attackers five cybersecurity researchersconsisting of three doctoral and two master students spentabout fortnight to manually annotate their attack preferencesfrom three perspectives the platforms they destroyed the in-dustries they attacked and the attack types they exploited Toensure the accuracy of data labeling we test the consistencyof the tags for the 5297 attackers and remove the sampleswith ambiguous tags As a result we obtain 3000 sampleswith consistent tags Based on the labeled samples we furtherevaluate the performance of different meta-paths on model-

Table 6 Performance of modeling attack preference withdifferent meta-paths

Metapath Accuracy Precision Micro-F1

P1 7431 7622 7525

P4 7116 7327 7216

P5 6915 7143 7027

P12 7214 7646 7424

P13 7965 8131 8047

P14 7748 7934 7840

P15 8017 7976 7996

P17 8139 8172 8155

ing attack preferences In the attack modeling scenario weonly focus on the meta-paths that both the start node and theend node are attackers The experimental results are demon-strated in Table 6 Obviously different meta-paths displaydifferent abilities in characterizing the attack preferences ofcyber intruders The performance when using P17 is more

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 251

superior than the one with other meta-paths which indicatesthat P17 holds more valuable information that characterizesthe attack preferences of cybercriminals since P17 includesthe semantics information of P1P4P5 and P12 sim P15

In addition we compare the capabilities of our proposedcomputing framework with those of other state-of-the-art em-bedding methods in terms of attack preference modelingOur analysis result shows that the accuracy of attack pref-erence modeling reaches 081 which outperforms the exist-ing popular models Node2vec (with precision of 071) [1]metapaht2vec (with precision of 073) [11] and HAN (withprecision of 076) [42] The performance improvement canbe attributed to the following characteristics First our com-puting framework utilizes weight-learning to learn the signifi-cance of different meta-paths for evaluating the similarity be-tween attackers Second the proposed computing frameworkleverages GCN to learn the structural information betweenattackers to obtain more discriminative structural features thatimproves the performance of attack preference modeling

63 Vulnerability Similarity AnalysisVulnerability classification or clustering is crucial for conduct-ing vulnerability trend analysis the correlation analysis ofincidents and exploits and the evaluation of countermeasuresThe traditional vulnerability analysis relies on the manualinvestigation of the source codes which requires expert ex-pertise and consumes considerable efforts In this sectionwe propose an unsupervised vulnerability similarity analy-sis method based on the proposed threat intelligence com-puting framework which can automatically group similarvulnerabilities into corresponding communities Particularlythe vulnerability-related IOCs are first embedded into a low-dimensional vector space using CTI computing frameworkThen the DBSCAN algorithm is performed on the embed-ded vector space to cluster vulnerabilities into correspondingcommunities The clustering results are presented in Figure 9

Figure 9 (c) shows all vulnerabilities are clustered into 12clusters using meta-path V DPDTV T (P16) which is very closeto the classification standard (ie 13) recommended by CVEdetails an authoritative database that publishes vulnerabilityinformation By manually analyzing the training samples wefind that HTTP Response Splitting vulnerability does not ap-pear in our dataset Therefore our cluster number (ie 12)is consistent with CVE Details11 To further validate the ef-fectiveness of threat intelligence computing framework forvulnerability clustering we randomly select 100 vulnerabili-ties from each cluster for manual inspection to measure theconsistency of the vulnerability types in each cluster and theresults are presented in Table 7 Obviously the clustering per-formance of cluster 8 (ie File Inclusion) and cluster 10 (ieDirectory Traversal) is remarkably worse than other clustersTo explain the reason we examine our training data and the

11httpswwwcvedetailscom

Table 7 Accuracy of vulnerability clustering

Cluster ID Vulnerability type Accuracy

cluster1 Denial of Service 8012

cluster2 XSS 8353

cluster3 Execute Code 8150

cluster4 Overflow 7650

cluster5 Gain Privilege 9156

cluster6 Bypass Something 7174

cluster7 CSRF 9327

cluster8 File Inclusion 6172

cluster9 Gain Informa 7042

cluster10 Directory Traversal 6949

cluster11 Memory Corruption 8156

cluster12 SQL Injection 8067

average 7851

computing framework We found that the proportion of thesetwo types of vulnerabilities is too small (cluster 8 is 34and cluster 10 is 42) making our computing frameworkvery likely to be under-fit with insufficient data Howeverthe proposed computing framework performs well on mosttypes of vulnerabilities in an unsupervised manner especiallygiven sufficient samples (eg cluster 7 is 176 and cluster 5is 157)

In addition by examining the clustering results we havean observation that the vulnerabilities in the same cluster arelikely to have evolutionary relationships For instance CVE-2018-0802 an office zero-day vulnerability is evolved fromthe CVE-2017-11882 They both include EQNEDT32exe fileused to edit the formula in Office software which allowsremote attackers to execute arbitrary codes by constructinga malformed font name The modeling and computation ofinterdependent relationships among IOCs in HINTI facilitatethe discovery of such intricate connections between vulnera-bilities

In summary HINTI is capable of depicting a more compre-hensive threat landscape and the proposed CTI computingframework has the ability to bring novel security insightstoward different real-world security applications Howeverthere are still numerous opportunities for enhancing thesesecurity applications Specifically for attack preference mod-eling although each individual IP address is treated as anattacker we cannot determine whether it belongs to a realattacker or is disguised by a proxy Fortunately even if thereal attack address cannot be captured understanding the at-tack preferences of these IP proxies which are widely usedin cybercrime is also meaningful for gaining insight into thecyber threats For vulnerability similarity analysis data imbal-

252 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

(a) AV DV T AT (P13) (b) V DV T (P10) (c) V DPDTV T (P16)

Figure 9 Illustration of the vulnerability similarity analysis based on different meta-paths in which vulnerability i can be reducedinto a two-dimensional space (xiyi) and each cluster indicates a particular type of vulnerability

ance issue affects the performance of model and inadequatetraining samples often result in model underfitting as shownin the case of cluster 8 and cluster 10

7 Related Work

Cyber Treat Intelligence An increasing number of securityvendors and researchers start exploring CTI for protectingsystem security and defending against new threat vectors [28]Existing CTI extraction tools such as IBM X-Force12 Threatcrowd13 Openctiio14 AlienVault15 CleanMX16 and Phish-Tank17 use regular expression to synthesize IOC from thedescriptive texts However these methods often producehigh false positive rate by misjudging legitimate entities asIOCs [22]

Recently Balzarotti et al [2] develop a system to extractIOCs from web pages and identify malicious URLs fromJavaScript codes Sabottke et al [31] propose to detect po-tential vulnerability exploits by extracting and analyzing thetweets that contain ldquoCVErdquo keyword Liao et al [22] present atool iACE for automatically extracting IOCs which excelsat processing technology articles Nevertheless iACE identi-fies IOCs from a single article which does not consider therich semantic information from multi-source texts Zhao etal [46] define different ontologies to describe the relationshipbetween entities based on expert knowledge Numerous popu-lar CTI platforms including IODEF [9] STIX [3] TAXII [40]OpenIOC [13] and CyBox [19] focus on extracting and shar-ing CTI Yet none of the existing approaches could uncoverthe interdependent relations among CTIs extracted from multi-source texts let alone quantifying CTIsrsquo relevance and miningvaluable threat intelligence hidden behind the isolated CTIs

12httpsexchangexforceibmcloudcom13httpswwwthreatcrowdorg14httpsdemoopenctiio15 httpsotxalienvaultcom16httplistclean-mxcom17httpswwwphishtankcom

Heterogeneous Information Network Real-world systemsoften contain a large number of interacting multi-typed ob-jects which can naturally be expressed as a heterogeneousinformation network (HIN) HIN as a conceptual graph repre-sentation can effectively fuse information and exploit richersemantics in interacting objects and links [37] HIN has beenapplied to network traffic analysis [38] public social mediadata analysis [45] and large-scale document analysis [41]Recent applications of HIN include mobile malware detec-tion [14] and opioid user identification [12] In this paper forthe first time we use HIN for CTI modelingGraph Convolutional Network Graph convolutional net-works (GCN) [17] has become an effective tool for address-ing the task of machine learning on graphs such as semi-supervised node classification [17] event classification [29]clustering [8] link prediction [27] and recommended sys-tem [44] Given a graph GCN can directly conduct the con-volutional operation on the graph to learn the nonlinear em-bedding of nodes In our work to discern and reveal the inter-active relationships between IOCs we utilize GCN to learnmore discriminative representation from attributes and graphstructure simultaneously which is the premise for threat intel-ligence computing

8 Discussion

Data Availability The proposed framework assumes thatsufficient threat description data can be obtained for generat-ing comprehensive and the latest CTIs Fortunately with thegrowing prosperity of social media an increasing number ofsecurity-related data (eg blogs posts news and open secu-rity databases) can be collected effortlessly To automaticallycollect security-related data we develop a crawler system tocollect threat description data from 73 international securitysources (eg blogs hacker forum posts security bulletinsetc) providing sufficient raw materials for generating cyberthreat intelligenceModel Extensibility In this paper 6 types of IOCs and 9

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 253

types of relationships are modeled in HINTI However ourproposed framework is extensible in which more types ofIOCs and relationships can be enrolled to represent richer andmore comprehensive threat information such as maliciousdomains phishing Emails attack tools their interactions etcHigh-level Semantic Relations In view of the computa-tional complexity of the model our threat intelligence comput-ing method focues on utilizing the meta-paths to quantify thesimilarity between IOCs while ignoring the influence of themeta-graph on it which inevitably misses characterizing somehigh-level semantic information Nevertheless the proposedcomputing framework introduces an attention mechanism tolearn the signification of different meta-paths to character-ize IOCs and their interactive relationships which effectivelycompensates for the performance degradation caused by ig-noring the meta-graphsSecurity Knowledge Reasoning Although our proposedframework exhibits promising results in CTI extraction andmodeling computing how to implement advanced securityknowledge reasoning and prediction is still an open problemeg it remains challenging to predict whether a vulnerabilitycould potentially affect a particular type of devices in thefuture We will investigate this problem in the future

9 Conclusion

This work explores a new direction of threat intelligence com-puting which aims to uncover new knowledge in the relation-ships among different threat vectors We propose HINTI acyber threat intelligence framework to model and quantify theinterdependent relationships among different types of IOCsby leveraging heterogeneous graph convolutional networksWe develop a multi-granular attention mechanism to learnthe importance of different features and model the interde-pendent relationships among IOCs using HIN We proposethe concept of threat intelligence computing and present ageneral intelligence computing framework based on graphconvolutional networks Experimental results demonstratethat the proposed multi-granular attention based IOC extrac-tion method outperforms the existing state-of-the-art methodsThe proposed threat intelligence computing framework caneffectively mine security knowledge hidden in the interdepen-dent relationships among IOCs which enables crucial threatintelligence applications such as threat profiling and rankingattack preference modeling and vulnerability similarity analy-sis We would like to emphasize that the knowledge discoveryamong interdependent CTIs is a new field that calls for acollaborative effort from security experts and data scientists

In future we plan to develop a predicative and reasoningmodel based on HINTI and explore preventative countermea-sures to protect cyber infrastructure from future threats Wealso plan to add more types of IOCs and relations to depicta more comprehensive threat landscape Moreover we willleverage both meta-paths and meta-graphs to characterize the

IOCs and their interactions to further improve the embeddingperformance and to strike a balance between the accuracyand computational complexity of the model We will alsoinvestigate the feasibility of security knowledge predictionbased on HINTI to infer the potential future relationshipsbetween the vulnerabilities and devices

Acknowledgement

We would like to thank our shepherd Tobias Fiebig andthe anonymous reviewers for providing valuable feedbackon our work We also thank Hao Peng and Lichao Sunfor their feedback on the early version of this work Thiswork was supported in part by National Science Founda-tion grants CNS1950171 CNS-1949753 It was also sup-ported by the NSFC for Innovative Research Group ScienceFund Project (62141003) National Key RampD Program China(2018YFB0803 503) the 2018 joint Research Foundationof Ministry of Education China Mobile (MCM20180507)and the Opening Project of Shanghai Trusted Industrial Con-trol Platform (TICPSH202003020-ZC) Any opinions find-ings and conclusions or recommendations expressed in thismaterial do not necessarily reflect the views of any fundingagencies

References

[1] Jure Leskovec Aditya Grover node2vec Scalable fea-ture learning for networks In Acm Sigkdd InternationalConference on Knowledge Discovery Data Mining2016

[2] Marco Balduzzi Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications In WWW 2016

[3] Sean Barnum Standardizing cyber threat intelligenceinformation with the structured threat information ex-pression (stix) Mitre Corporation 111ndash22 2012

[4] Eric W Burger Michael D Goodman Panos Kam-panakis and Kevin A Zhu Taxonomy model for cyberthreat intelligence information exchange technologiesIn Proceedings of the 2014 ACM Workshop on Infor-mation Sharing amp Collaborative Security pages 51ndash602014

[5] Onur Catakoglu Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications The web conference pages 333ndash3432016

[6] Danqi Chen and Christopher D Manning A fast and ac-curate dependency parser using neural networks pages740ndash750 2014

254 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

[7] Qian Chen and Robert A Bridges Automated behav-ioral analysis of malware a case study of wannacry ran-somware In 16th IEEE ICMLA pages 454ndash460 2017

[8] Weilin Chiang Xuanqing Liu Si Si Yang Li SamyBengio and Chojui Hsieh Cluster-gcn An efficient al-gorithm for training deep and large graph convolutionalnetworks knowledge discovery and data mining pages257ndash266 2019

[9] Roman Danyliw Jan Meijer and Yuri Demchenko Theincident object description exchange format Interna-tional Journal of High Performance Computing Appli-cations 50701ndash92 2007

[10] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 2017

[11] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 ACM 2017

[12] Yujie Fan Yiming Zhang Yanfang Ye and Xin Li Au-tomatic opioid user detection from twitter Transductiveensemble built on different meta-graph based similari-ties over heterogeneous information network In IJCAIpages 3357ndash3363 2018

[13] Fireeye Openioc httpswwwfireeyecomblogthreat-research201310openioc-basicshtml Accessed January 20 2020

[14] Shifu Hou Yanfang Ye Yangqiu Song and Melih Ab-dulhayoglu Hindroid An intelligent android malwaredetection system based on structured heterogeneous in-formation network In Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Dis-covery and Data Mining pages 1507ndash1515 2017

[15] Zhiheng Huang Wei Xu and Kai Yu Bidirectional lstm-crf models for sequence tagging arXiv1508019912015

[16] Michael D Iannacone Shawn Bohn Grant NakamuraJohn Gerth Kelly MT Huffer Robert A Bridges Erik MFerragut and John R Goodall Developing an ontologyfor cyber security knowledge graphs CISR 1512 2015

[17] Thomas Kipf and Max Welling Semi-supervised clas-sification with graph convolutional networks arXivLearning 2016

[18] Thomas N Kipf and Max Welling Semi-supervisedclassification with graph convolutional networks arXivpreprint arXiv160902907 2016

[19] Tero Kokkonen Architecture for the cyber security situ-ational awareness system In Internet of Things SmartSpaces and Next Generation Networks and Systemspages 294ndash302 Springer 2016

[20] Mehmet Necip Kurt Yasin Yılmaz and Xiaodong WangDistributed quickest detection of cyber-attacks in smartgrid IEEE Transactions on Information Forensics andSecurity 13(8)2015ndash2030 2018

[21] John Lafferty Andrew Mccallum and Fernando PereiraConditional random fields Probabilistic models for seg-menting and labeling sequence data ICML pages 282ndash289 2001

[22] Xiaojing Liao Yuan Kan Xiao Feng Wang Li Zhouand Raheem Beyah Acing the ioc game Toward auto-matic discovery and analysis of open-source cyber threatintelligence In ACM Sigsac Conference on ComputerCommunications Security 2016

[23] Rob Mcmillan Open threat intelligencehttpwwwgartnercomdoc2487216definition-threat-intelligence AccessedJanuary 20 2020

[24] Tomas Mikolov Kai Chen Greg Corrado and JeffreyDean Efficient estimation of word representations invector space arXiv preprint arXiv13013781 2013

[25] Sudip Mittal Prajit Kumar Das Varish Mulwad Anu-pam Joshi and Tim Finin Cybertwitter Using twitterto generate alerts for cybersecurity threats and vulnera-bilities In Proceedings of the 2016 IEEE Advances inSocial Networks Analysis and Mining pages 860ndash8672016

[26] Eric Nunes Ahmad Diab Andrew Gunn EricssonMarin Vineet Mishra Vivin Paliath John RobertsonJana Shakarian Amanda Thart and Paulo ShakarianDarknet and deepnet mining for proactive cybersecuritythreat intelligence In 2016 IEEE ISI pages 7ndash12 2016

[27] Shirui Pan Ruiqi Hu Guodong Long Jing Jiang LinaYao and Chengqi Zhang Adversarially regularizedgraph autoencoder for graph embedding pages 2609ndash2615 2018

[28] P Pawlinski P Jaroszewski P Kijewski L SiewierskiP Jacewicz P Zielony and R Zuber Actionable infor-mation for security incident response European UnionAgency for Network and Information Security Herak-lion Greece 2014

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 255

[29] Hao Peng Jianxin Li Qiran Gong Yangqiu SongYuanxing Ning Kunfeng Lai and Philip S Yu Fine-grained event categorization with heterogeneous graphconvolutional networks In Proceedings of the 28th In-ternational Joint Conference on Artificial Intelligencepages 3238ndash3245 AAAI Press 2019

[30] Sara Qamar Zahid Anwar Mohammad Ashiqur Rah-man Ehab Al-Shaer and Bei-Tseng Chu Data-drivenanalytics for cyber-threat intelligence and informationsharing Computers amp Security 6735ndash58 2017

[31] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In USENIXSecurity 2015

[32] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In 24thUSENIX Security pages 1041ndash1056 2015

[33] Deepak Sharma and Avadhesha Surolia Degree Cen-trality Springer New York 2013

[34] Saurabh Singh Pradip Kumar Sharma Seo Yeon MoonDaesung Moon and Jong Hyuk Park A comprehensivestudy on apt attacks and countermeasures for futurenetworks and communications challenges and solutionsJournal of Supercomputing pages 1ndash32 2016

[35] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks Principles and methodologiesSynthesis Lectures on Data Mining and Knowledge Dis-covery 3(2)1ndash159 2012

[36] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks a structural analysis approachACM SIGKDD 14(2)20ndash28 2013

[37] Yizhou Sun Jiawei Han Xifeng Yan Philip S Yu andTianyi Wu Pathsim Meta path-based top-k similaritysearch in heterogeneous information networks Proceed-ings of the VLDB Endowment 4(11)992ndash1003 2011

[38] Yizhou Sun Brandon Norick Jiawei Han Xifeng YanPhilip S Yu and Xiao Yu Pathselclus Integrating meta-path selection with user-guided object clustering in het-erogeneous information networks ACM Transactionson TKDD 7(3)11 2013

[39] Wiem Tounsi and Helmi Rais A survey on technicalthreat intelligence in the age of sophisticated cyber at-tacks Computers amp Security 72212ndash233 2018

[40] Thomas D Wagner Esther Palomar Khaled Mahbuband Ali E Abdallah Towards an anonymity supportedplatform for shared cyber threat intelligence risks andsecurity of internet and systems pages 175ndash183 2017

[41] Chenguang Wang Yangqiu Song Haoran Li MingZhang and Jiawei Han Text classification with het-erogeneous information network kernels In 13th AAAI2016

[42] Xiao Wang Houye Ji Chuan Shi Bai Wang Peng CuiP Yu and Yanfang Ye Heterogeneous graph attentionnetwork 2019

[43] Jie Yang Shuailong Liang and Yue Zhang Design chal-lenges and misconceptions in neural sequence labelingarXiv Computation and Language 2018

[44] Rex Ying Ruining He Kaifeng Chen Pong Eksombat-chai William L Hamilton and Jure Leskovec Graphconvolutional neural networks for web-scale recom-mender systems In Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Dis-covery amp Data Mining pages 974ndash983 ACM 2018

[45] Jiawei Zhang Xiangnan Kong and Philip S Yu Trans-ferring heterogeneous links across location-based socialnetworks In The 7th ACM international conference onWeb search and data mining pages 303ndash312 2014

[46] Yishuai Zhao Bo Lang and Ming Liu Ontology-basedunified model for heterogeneous threat intelligence inte-gration and sharing In 2017 11th IEEE InternationalConference on Anti-counterfeiting Security and Identi-fication (ASID) pages 11ndash15 2017

256 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

  • Introduction
  • Background
    • Cyber Threat Intelligence
    • Motivation
    • Preliminaries
      • Architecture Overview of HINTI
      • Methodology
        • Multi-granular Attention Based IOC Extraction
        • Cyber Threat Intelligence Modeling
        • Threat Intelligence Computing
          • Experimental Evaluation
            • Dataset and Settings
            • Evaluation of IOC Extraction
              • Application of Threat Intelligence Computing
                • Threat Profiling and Significance Ranking of IOCs
                • Attack Preference Modeling
                • Vulnerability Similarity Analysis
                  • Related Work
                  • Discussion
                  • Conclusion
Page 6: Cyber Threat Intelligence Modeling Based on Heterogeneous ... · 2.1Cyber Threat Intelligence Cyber Threat Intelligence (CTI) extracted from security-related data is structured information

Figure 6 The framework of multi-granular IOC extraction

in X We first chunk the sentence into n-gram componentsincluding char-level 1-gram 2-gram and 3-gram which arethe inputs of our trained model written as follows

e jxi=V j

embedding(xi) (1)

where V jembedding transforms the chunk with granularity j into

a vector space and xi is the i-th word in a sentence X Thusthe threat description sentence Xi can be vectorized as follows

rarrh j

i = LST M f orward([e jx0e j

x1 middot middot middot e j

xi])

larrh j

i = LST Mbackward([e jx0e j

x1 middot middot middot e j

xi])

(2)

whererarrh j

i andlarrh j

i are the embedded features learned byforward LSTM and backward LSTM respectively Let O bethe output of Bi-LSTM which is a weighted sum of embeddedfeatures with weights corresponding to the importance ofdifferent features

O = H middotW T (3)

where H =j

sum~βiσ(h

j1h

j2 middot middot middot h

ji ) h j

i = (rarrh j

i +larrh j

i ) ~βi is theweight vector to represent the importance of h j

i in whichj and i are the segmentation granularity of sentences and thecorresponding index of the chunk W is the parameter matrix

Given a security-related sentence X = (x1x2 middot middot middot xi) itscorresponding threat object sequence Y = (y1 y2 middot middot middot yi) andits output of Bi-LSTM O we can compute the overall labelscore of X and Y as follows

S(X Y ) =n

sumi=0

(Ayiyi+1 +Oiyi) (4)

where Ayiyi+1 is the state transition matrix in CRF model andOiyi as the output of Bi-LSTM hidden layer (calculated byEq (3)) represents the label score of i-th word corresponding

to the type yi Next we utilize so f tmax function to normalizethe overall label score

p(Y |X) =eS(X Y )

sumyisinYX

eS(X Y )(5)

We design an objective function to maximize the proba-bility p(Y |X) to achieve the highest label score for differentIOCs which can be written as follows

argmax log(p(Y |X)) = argmax (S(X Y )minus

log( sumyisinYX

eS(X y))) (6)

By solving the objective function above we assign correctlabels to the n-gram components according to which we canidentify the IOCs with different lengths Our multi-granularattention based IOC extraction method is capable of identify-ing different types of IOCs and its evaluation is presented inSection 5

42 Cyber Threat Intelligence ModelingCTI modeling is an important step to explore the intricaterelationship between heterogeneous IOCs In our work HINis introduced to group different types of IOCs into a graphto explore their interactive relationships In this section weportray the main principle of threat intelligence modeling

To model the intricate interdependent relationships amongIOCs we define the following 9 relationships among 6 typesof IOCs as follows

bull R1 To depict the relation of an attacker and the ex-ploited vulnerability we construct the attacker-exploit-vulnerability matrix A For each element Ai j isin 01Ai j=1 indicates attacker i exploits vulnerability j

bull R2 To depict the relation of an attacker and a devicewe build the attacker-invade-device matrix D For eachelement Di j isin 01 Di j=1 indicates attacker i invadesdevice j

bull R3 Two attacker can cooperate to attack a target Tostudy the relationship of attacker-attacker we constructthe attacker-cooperate-attacker matrix C For each ele-ment Ci j isin 01 Ci j=1 indicates there exists a cooper-ative relationship between attacker i and j

bull R4 To describe the relation of a vulnerability and theaffected device we build the vulnerability-affect-devicematrix M For each element Mi j isin 01 Mi j=1 indi-cates vulnerability i affects device j

bull R5 A vulnerability is often labeled as a specific attacktype by Common Vulnerabilities and Exposures (CVE)

246 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

system7 To explore the relation of vulnerability-attacktype we build the vulnerability-belong-attack type ma-trix G where each element Gi j isin 01 denotes if vul-nerability i belongs to an attack type j

bull R6 A vulnerability often involves one or more maliciousfiles To describe the relation of vulnerability-file webuild the vulnerability-include-file matrix F For eachelement Fi j isin 01 Fi j=1 denotes that vulnerability iincludes malicious file j

bull R7 A malicious file often targets a specific device Weestablish the file-target-device matrix T to explore therelation of file-device For each element Ti j isin 01Ti j=1 indicates malicious file i targets device j

bull R8 Oftentimes a vulnerability evolves from anotherTo study the relationship of vulnerability-vulnerabilitywe build the vulnerability-evolve-vulnerability matrixE where each element Ei j isin 01 indicates if vulnera-bility i evolves from vulnerability j

bull R9 To depict the relation device-platform that a de-vice belongs to a platform we build the device-belong-platform matrix P where each element Pi j isin 01 il-lustrates if device i belongs to platform j

Based on the above 9 types of relationships HINTIleverages the syntactic dependency parser [6] (eg subject-predicate-object attributive clause etc) to automatically ex-tract the 9 relationships among IOCs from threat descriptionseach of which is represented as a triple (IOCirelation IOC j)For instance given a security-related description ldquoOn May12 2017 WannaCry exploited the MS17-010 vulnerabilityto affect a larger number of Windows devices which is aransomware attack via encrypted disks Using the syntacticdependency parser we can extract the following triples (Wan-naCry exploit MS17-010) (MS17-010 affect Windows de-vice) (WannaCry is ransomware) Such triples extractedfrom various data sources can be incrementally assembledinto HINTI to model the relationships among IOCs whichcould offer a more comprehensive threat landscape that de-scribes the threat context Particularly we further define 17types of meta-paths shown in Table 1 to probe into the interde-pendent relationships over attackers vulnerabilities maliciousfiles attack types devices and platforms HINTI is able toconvey a richer context of threat events by scrutinizing 17types of meta-paths and reveal the in-depth threat insightsbehind the heterogeneous IOCs (see Section 6 for details)

43 Threat Intelligence ComputingIn this section we illustrate the concept of threat intelligencecomputing and design a general threat intelligence computing

7httpcvemitreorg

framework based on heterogeneous graph convolutional net-works which quantifies and measures the relevance betweenIOCs by analyzing meta-path based semantic similarity Herewe first provide a formal definition of threat intelligence com-puting based on heterogeneous graph convolutional networks

Definition 4 Threat Intelligence Computing Based onHeterogeneous Graph Convolutional Networks Given thethreat intelligence graph G = (VE) the meta-path set M =P1P2 middot middot middot Pi The threat intelligence computing i) com-putes the similarity between IOCs based on meta-path Pi togenerate corresponding adjacency matrix Ai ii) constructsthe feature matrix of nodes Xi by embedding attribute in-formation of IOCs into a latent vector space iii) conductsgraph convolution GCN(AiXi) to quantify the interdependentrelationships between IOCs by following meta-path Pi andembeds them into a low-dimensional space

The threat intelligence computing aims to model the seman-tic relationships between IOCs and measure their similaritybased on meta-paths which can be used for advanced secu-rity knowledge discovery such as threat object classificationthreat type matching threat evolution analysis etc Intuitivelythe objects connected by the most significant meta-paths tendto bear more similarity [37] In this paper we propose aweight-learning based threat intelligence similarity measurewhich uses self-attention to improve the performance of simi-larity measurement between any two IOCs This method canbe formalized as below

Definition 5 Weight-learning based Node Similar-ity Measure Given a set of symmetric meta-path setP = [Pm]

Mprime

m=1 the similarity S(hih j) between any two IOCshi and h j is defined as

S(hih j) =Mprime

summ

rarrw

2middot | hirarr j isin Pm || hirarri isin Pm |+ | h jrarr j isin Pm |

(7)

where hirarr j isin hm is a path instance between IOC hi andh j following meta-path Pm hirarri isin Pm is that between IOCinstance hi and hi and h jrarr j isin Pm is that between IOC in-stance h j and h j where | hirarr j isin Pm |=CPm(i j) | hirarri isinPm |=CPm(i i) | h jrarr j isin Pm |=CPm( j j) and CPm is thecommuting matrix based on meta-path Pm defined below

rarrw =

[w1 wm wMprime ] denote the meta-path weights wherewm is the weight of meta-paths Pm and M

primeis the number of

meta-pathsS(hih j) is defined in two parts (1) the semantic overlap

in the numerator which describes the number of meta-pathbetween IOC instance hi and h j (2) and the semantic broad-ness in the denominator which depicts the number of totalmeta-paths between themselves The larger number of meta-path between IOC instance hi and h j the more similar thetwo IOCs are which is normalized by the semantic broadness

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 247

of denominator Moreover different from existing similaritymeasures [37] we propose an attention mechanism basedsimilarity measure method by introducing the weight vec-torrarrw = [w1 wm wMprime ] which is a trainable coefficient

vector to learn the importance of different meta-paths forcharacterizing IOCs

Obviously it is computationally expensive to measure thesimilarity among IOCs in the constructed heterogeneousgraph as it usually requires to randomly walk a larger numberof nodes in the graph Fortunately in our work it is unneces-sary to walk through the entire graph as we prescribe a limitby introducing predefined meta-paths and we only focus onthe symmetrical meta-paths presented in Table 1 To calcu-late the similarity between IOCs under different meta-pathinstances we need to compute the corresponding commutingmatrices [37] following the meta-paths

Given a meta-path set P = sumMprime

m A1A2 middot middot middot Al+1 themeta-path based commuting matrix can be defined as CP =UA1A2 UA2A3 middot middot middot UAAl+1 where CP(i j) represents the prob-ability of object iisinA1 reaching object j isinAl+1 under the pathP and is a connection operation These symmetric meta-paths not only efficiently reduce the complexity of walkingbut also ensures that the commuting matrix can be easily de-composed which greatly reduces the computational costs Inaddition the symmetric meta-paths in the graph G allow us toleverage the pairwise random-walk [37] to further acceleratethe computation

With Eq (7) and pairwise random-walk we can obtain thesimilarity embedding between any two IOCs hi and h j undera meta-path set P Based on the low-dimensional similarityembedding we derive a weighted adjacent matrix betweenIOCs denoted as Ai isin RNtimesN where N is the number of aspecific type of IOC in G Meanwhile to utilize the attributedinformation of nodes we train a word2vec model [24] toembed the attribute information of nodes into a feature matrixXi isin RNtimesd where N is the number of IOCs in Ai and d is thedimension of node feature With the learned adjacency matrixAi and its feature matrix Xi we can leverage the classicalGCN [18] to characterize the relationship between IOC hiand h j Particularly the layer-wise propagation rule of GCNcan be defined as below

H(l+1) = σ(Dminus12 ADminus

12 H(l)W (l)) (8)

where A = A+ IN is the adjacency matrix of IOCs with self-connections IN is the identity matrix Dii =sum j Ai j and W (l) isa l-th layer trainable weight matrix σ(middot) denotes an activationfunction such as relu H(l) isin RNtimesd is the matrix of activationin the l-th layer We perform graph convolution [18] on Aiand Xi to generate the embedding Z between IOCs belongingto type i

Z = f (XiAi) = σ(Ai middot relu(AiXiW(0)i )W (1)

i ) (9)

where W (0)i isin RdtimesH is an input-to-hidden weight matrix for a

hidden layer with H feature mapsW (1)i isinRHtimesF is a hidden-to-

output weight matrix with F feature maps in the output layerXi isin RNtimesd N is the number of a specific type of IOCs d is thedimension of their corresponding features and σ is anotheractivation function such as sigmoid Ai = Dminus

12 AiDminus

12 can be

calculated offline Here we leverage the cross-entropy loss tooptimize the performance of our proposed threat intelligenceframework written as follows

Loss(Yl f Zl f ) =minussumlisinYl

F

sumf=1

Yl f middot lnZl f (10)

where Yl is the set of node indices that have labels Yl f is thereal label and Zl f is a corresponding label that our modelpredicts Based on Eq (10) we conduct stochastic gradientdescent to continuously optimize the neural network weightsW (0)

i W (1)i and

rarrw to reduce the loss and build a general

threat intelligence computing framework Using this frame-work security organizations are able to mine richer securityknowledge hidden in the interdependent relationships amongIOCs

5 Experimental Evaluation

51 Dataset and SettingsWe develop a threat data collector to automatically collectcyber threat data from a set of sources including 73 inter-national security blogs (eg fireeye cloudflare) hacker fo-rum posts (eg Blackhat Hack5) security bulletins (egMicrosoft Cisco) CVE detail description and ExploitDB Acomplete list of data sources is presented in the Baidu cloud8We set up a daemon to collect the newly generated securityevents every day So far more than 245786 security-relateddata describing threat events have been collected For trainingand evaluating our proposed IOC extraction method 30000samples from 5000 texts are annotated by utilizing the B-I-Osequence tagging method (see Section 22 for the example)and an annotation example is shown in Figure 2

For 30000 labeled samples we randomly select 60 ofsamples as a training set 20 of samples as a verificationset and the rest of the samples as our test set Based on thedata sets we comprehensively evaluate the performance ofHINTI for extracting IOCs and threat intelligence computingWe run all of the experiments on 16 cores Intel(R) Core(TM)i7-6700 CPU 340GHz with 64GB RAM and 4times NVIDIATesla K80 GPU The software programs are executed on theTensorFlow-GPU framework on Ubuntu 1604

52 Evaluation of IOC ExtractionA set of experiments are conducted to evaluate the sensitiv-ity of different parameters in the multi-granular based IOC

8httpspanbaiducoms1J631WMYY_T_awa8aY5xy3A

248 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

extraction model We mainly consider 8 hyper-parametersthat seriously impact the performance of the model as shownin Table 2 More specifically Embedding_dim is one ofthe most important factors that determine the generaliza-tion capability of the model Here we fix other parame-ters while fine-tuning the embedding size in the range of(50100150200250300350400) Experimental resultsshow that the accuracy of extracted IOC achieves the bestwhen Embedding_dim=300 Learning_rate is another ma-jor factor for determining the stride of gradient descent inminimizing the loss function which determines whetherthe model can find a global optimal solution We fix otherparameters to fine-tune the Learning_rate in the rangeof (000100050010050105) and the performancereaches the best when the Learning_rate=0001 Similarlywe fine-tune the other hyper-parameters with 5000 epochsand the hyper-parameters allowing our model to perform op-timally are recorded in Table 2

Table 2 Hyperparameters setting in the multi-granular basedIOC extraction method

Parameter value Parameter Value

Embedding_dim 300 Hidden_dim 128

Sequence_length 500 Epoch_num 5000

Learning_rate 0001 Batch_size 64

Dropout_rate 05 Optimizer Adam

Table 3 Performance of IOC extraction wrt IOC types

IOC Type Precision Recall Micro-F1

IP 9956 9952 9954File 9436 9688 9560Type 9986 9981 9983

Email 9932 9987 9949Device 9326 9278 9302Vender 9307 9445 9424Version 9698 9799 9748Domain 9658 9589 9623Software 8825 8931 8878Function 9503 9559 9531Platform 9431 9257 9343Malware 8976 9123 9049

Vulnerability 9925 9873 9899Other 9829 9842 9835

In this paper we extract 13 major types of IOCs and the

Figure 7 Performance of IOC extraction using embeddingfeatures with different granularity

performance is presented in Table 3 Overall our IOC ex-traction method demonstrates excellent performance in termsof precision recall and Micro-F1 (ie micro-averaged F1-score) for most types of IOCs such as function maliciousIP and device However we observe a performance degra-dation when recognizing software and malware This can beattributed to the fact that most software and malware is namedby random strings such as md5 hash Moreover we find thatthe number of training samples impacts the performance ofthe model Specifically the performance becomes unsatisfac-tory (eg Software Malware) when the number of a certaintype of training samples is insufficient (ie less than 5000)

In order to verify the effectiveness of multi-granular embed-ding features we assess the performance of IOC extractionwith features of different granularity including char-level1-gram 2-gram 3-gram and multi-granular features The ex-perimental results are demonstrated in Figure 7 from whichwe can observe that the proposed multi-granular embeddingfeature outperforms others since it leverages the attentionmechanism to simultaneously learn multi-granular features tocharacterize different patterns of IOCs

Next to verify the effectiveness of the proposed IOC ex-traction method we compare it with the state-of-the-art en-tity recognition approaches including general NER toolsNLTK NER9 and Stanford NER10 professional IOC extrac-tion method Stucco [16] and iACE [22] and popular en-tity recognition approaches CRF [21] BiLSTM and BiL-STM+CRF [15] The experimental results of different meth-ods on real-world data are demonstrated in Table 4

The results indicate that our proposed IOC extraction out-performs the state-of-the-art entity recognition methods andtools in terms of precision recall and Micro-F1 and its im-provement can be attributed to the following factors Firstcompared with Standford NER and NLTK NER the NLP tools

9httpwwwnltkorgbookch07html10httpsstanfordnlpgithubioCoreNLPnerhtml

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 249

Table 4 Performance of threat entity recognition usingdifferent methods

Method Accuracy Precision Micro-F1

NLTK NER 6945 6851 6749

Stanford NER 6835 6674 6858

iACE 9214 9126 9225

Stucco 9116 9221 9147

CRF 9264 9180 9265

BiLSTM 9478 9521 9435

BiLSTM+CRF 9638 9642 9627

Multi-granular 9859 9872 9869

trained with general news corpora our method uses a security-related training corpus collected and labeled by ourselves asa data source for training our model Second different fromthe rule-based extraction approaches (eg iACE and Stucco)our proposed deep learning based method provides an end-to-end system with more advanced features to represent variousIOCs Third comparing to RNN-based methods (eg BiL-STM and BiLSTM+CRF) our proposed method brings inmulti-granular embedding sizes (char-level 1-gram 2-gramand 3-gram) to simultaneously learn the characteristics of vari-ous sizes and types of IOCs which can identify more complexand irregular IOCs Last but not the least our method imple-ments an attention mechanism to learn the weights of featureswith various scales to effectively characterize different typesof IOCs further enhancing the IOC recognition accuracy

6 Application of Threat Intelligence Comput-ing

Our proposed threat intelligence computing framework basedon heterogeneous graph convolutional networks can be usedto mine novel security knowledge behind heterogeneous IOCsIn this section we evaluate its effectiveness and applicabilityusing three real-world applications profiling and ranking forCTIs attack preference modeling and vulnerability similarityanalysis

61 Threat Profiling and Significance Rankingof IOCs

Due to the disparity in the significance of threats it is im-portant to derive the threat profile and rank the significanceof IOCs for demystifying the landscape of threats Howevermost of the existing CTIs are incapable of modeling the asso-ciated relationships between heterogeneous IOCs

Different from isolated CTIs HINTI leverages HIN to

model the interdependent relationships among IOCs withtwo characteristics first the isolated IOCs can be integratedinto a graph-based HIN to clearly display the associated rela-tionships among IOCs which is capable of directly depictingthe basic threat profile For example Figure 3 depicts a threatprofiling sample an attacker utilizes CVE-2017-0143 vulnera-bility to invade Vista SP2 and Win7 SP1 devices belonging tothe Microsoft platform and CVE-2017-0143 is a remote codeexecution vulnerability that uses a ldquoSMBbat malicious fileSecond the significance of IOCs in HINTI can be naturallyranked based on the proposed threat intelligence computingframework

Table 5 shows the top 5 authoritative ranking score [35] ofvulnerability attacker attack type and platform from whichsecurity experts can gain a clear insight into the impact ofeach IOC Degree centrality [33] which describes the numberof links incident upon a node is widely used in evaluatingthe importance of a node in a graph It can used to quantifythe immediate risk of a node that connects with other nodesfor delivering network flows such as virus spreading Heredegree centrality can be utilized in verifying the effectivenessof the proposed threat intelligence computing framework inranking the importance of IOCs It is worth noting that bothour ranking method and degree centrality work regardless ofthe time of attacks We compute the degree centrality rank-ing of IOCs based on the fact that the node with a higherdegree centrality is more important than a node with a lowerone For instance if the degree centrality of a vulnerabilityis higher it indicates that this vulnerability is exploited bymore attackers or it affects more devices The ranking resultof degree centrality shown in Table 5 is consistent with theranking result based on the proposed threat intelligence com-puting framework demonstrating the capability of the CTIcomputing framework in ranking the importance of differenttypes of IOCs

62 Attack Preference Modeling

Attack preference modeling is meaningful for security organi-zations to gain insight into the attack intention of attackersbuild attack portraits and develop personalized defense strate-gies Here we leverage HINTI to integrate different types ofIOCs and their interdependent relationships to comprehen-sively depict the picture of attack events which helps modelthe attack preferences With the proposed threat intelligencecomputing framework we model attack preferences by clus-tering the embedded attackersrsquo vectors

In this task each malicious IP address is treated as an in-truder and its attack preferences are mainly reflected in threefeatures including the platforms it destroys (including Win-dows Linux Unix ASP Android Apache etc) the industriesit invades (eg education finance government Internet ofThings and Industrial control system etc) and the exploittypes it employs (eg DOS Buffer overflow Execute code

250 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

Table 5 The significance ranking of different types of IOCs (CV E1 CVE-2017-0146 CV E2 CVE-2006-5911 CV E3 CVE-2008-6543 CV E4 CVE-2012-1199 CV E4 CVE-2006-4985 AR Authoritative Ranking DC Degree Centrality value)

Vulnerability Attacker Platform Attack Type

No AR DC Monicker AR DC Category AR DC Exploit_type AR DC

CV E1 02713 7643 Meatsploit 02764 549 PHP 04562 17865 Webapps 05494 11648

CV E2 02431 7124 GSR team 01391 327 windows 02242 13793 DOS 01772 8741

CV E3 02132 6833 Ihsan 00698 279 Linux 00736 8792 Overflow 01533 7652

CV E4 01826 6145 Techsa 00695 247 Linux86 00623 8147 CSRF 00966 5433

CV E5 01739 5637 Aurimma 00622 204 ASP 00382 5027 SQL 00251 2171

(a) AV DV T AT (P13) (b) AV DPDTV T AT (P17) (c) AV PV T AT (P14)

Figure 8 The performance of attack preference modeling with different meta-paths in which the preference of attacker i isreduced to a two-dimensional space (xiyi) and each cluster represents a group with a specific attack preference

Sql injection XSS Gain information etc)Specifically we first utilize our proposed threat intelligence

computing framework to embed each attacker into a low-dimensional vector space and then perform DBSCAN algo-rithm on the embedded vector to cluster attackers with thesame preferences into corresponding groups Figure 8 showsthe top 3 clustering results under different types of meta-paths in which the meta-path AV DPDTV T AT (P17) performsthe best performance with compact and well-separated clus-ters indicating that it contains richer semantic relationshipsfor characterizing attack preferences than other meta-paths

To verify the effectiveness of attack preference modelingwe identify 5297 distinct attackers (each unique IP address istreated as an attacker) who have submitted at least 10 cyberattacks For these attackers five cybersecurity researchersconsisting of three doctoral and two master students spentabout fortnight to manually annotate their attack preferencesfrom three perspectives the platforms they destroyed the in-dustries they attacked and the attack types they exploited Toensure the accuracy of data labeling we test the consistencyof the tags for the 5297 attackers and remove the sampleswith ambiguous tags As a result we obtain 3000 sampleswith consistent tags Based on the labeled samples we furtherevaluate the performance of different meta-paths on model-

Table 6 Performance of modeling attack preference withdifferent meta-paths

Metapath Accuracy Precision Micro-F1

P1 7431 7622 7525

P4 7116 7327 7216

P5 6915 7143 7027

P12 7214 7646 7424

P13 7965 8131 8047

P14 7748 7934 7840

P15 8017 7976 7996

P17 8139 8172 8155

ing attack preferences In the attack modeling scenario weonly focus on the meta-paths that both the start node and theend node are attackers The experimental results are demon-strated in Table 6 Obviously different meta-paths displaydifferent abilities in characterizing the attack preferences ofcyber intruders The performance when using P17 is more

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 251

superior than the one with other meta-paths which indicatesthat P17 holds more valuable information that characterizesthe attack preferences of cybercriminals since P17 includesthe semantics information of P1P4P5 and P12 sim P15

In addition we compare the capabilities of our proposedcomputing framework with those of other state-of-the-art em-bedding methods in terms of attack preference modelingOur analysis result shows that the accuracy of attack pref-erence modeling reaches 081 which outperforms the exist-ing popular models Node2vec (with precision of 071) [1]metapaht2vec (with precision of 073) [11] and HAN (withprecision of 076) [42] The performance improvement canbe attributed to the following characteristics First our com-puting framework utilizes weight-learning to learn the signifi-cance of different meta-paths for evaluating the similarity be-tween attackers Second the proposed computing frameworkleverages GCN to learn the structural information betweenattackers to obtain more discriminative structural features thatimproves the performance of attack preference modeling

63 Vulnerability Similarity AnalysisVulnerability classification or clustering is crucial for conduct-ing vulnerability trend analysis the correlation analysis ofincidents and exploits and the evaluation of countermeasuresThe traditional vulnerability analysis relies on the manualinvestigation of the source codes which requires expert ex-pertise and consumes considerable efforts In this sectionwe propose an unsupervised vulnerability similarity analy-sis method based on the proposed threat intelligence com-puting framework which can automatically group similarvulnerabilities into corresponding communities Particularlythe vulnerability-related IOCs are first embedded into a low-dimensional vector space using CTI computing frameworkThen the DBSCAN algorithm is performed on the embed-ded vector space to cluster vulnerabilities into correspondingcommunities The clustering results are presented in Figure 9

Figure 9 (c) shows all vulnerabilities are clustered into 12clusters using meta-path V DPDTV T (P16) which is very closeto the classification standard (ie 13) recommended by CVEdetails an authoritative database that publishes vulnerabilityinformation By manually analyzing the training samples wefind that HTTP Response Splitting vulnerability does not ap-pear in our dataset Therefore our cluster number (ie 12)is consistent with CVE Details11 To further validate the ef-fectiveness of threat intelligence computing framework forvulnerability clustering we randomly select 100 vulnerabili-ties from each cluster for manual inspection to measure theconsistency of the vulnerability types in each cluster and theresults are presented in Table 7 Obviously the clustering per-formance of cluster 8 (ie File Inclusion) and cluster 10 (ieDirectory Traversal) is remarkably worse than other clustersTo explain the reason we examine our training data and the

11httpswwwcvedetailscom

Table 7 Accuracy of vulnerability clustering

Cluster ID Vulnerability type Accuracy

cluster1 Denial of Service 8012

cluster2 XSS 8353

cluster3 Execute Code 8150

cluster4 Overflow 7650

cluster5 Gain Privilege 9156

cluster6 Bypass Something 7174

cluster7 CSRF 9327

cluster8 File Inclusion 6172

cluster9 Gain Informa 7042

cluster10 Directory Traversal 6949

cluster11 Memory Corruption 8156

cluster12 SQL Injection 8067

average 7851

computing framework We found that the proportion of thesetwo types of vulnerabilities is too small (cluster 8 is 34and cluster 10 is 42) making our computing frameworkvery likely to be under-fit with insufficient data Howeverthe proposed computing framework performs well on mosttypes of vulnerabilities in an unsupervised manner especiallygiven sufficient samples (eg cluster 7 is 176 and cluster 5is 157)

In addition by examining the clustering results we havean observation that the vulnerabilities in the same cluster arelikely to have evolutionary relationships For instance CVE-2018-0802 an office zero-day vulnerability is evolved fromthe CVE-2017-11882 They both include EQNEDT32exe fileused to edit the formula in Office software which allowsremote attackers to execute arbitrary codes by constructinga malformed font name The modeling and computation ofinterdependent relationships among IOCs in HINTI facilitatethe discovery of such intricate connections between vulnera-bilities

In summary HINTI is capable of depicting a more compre-hensive threat landscape and the proposed CTI computingframework has the ability to bring novel security insightstoward different real-world security applications Howeverthere are still numerous opportunities for enhancing thesesecurity applications Specifically for attack preference mod-eling although each individual IP address is treated as anattacker we cannot determine whether it belongs to a realattacker or is disguised by a proxy Fortunately even if thereal attack address cannot be captured understanding the at-tack preferences of these IP proxies which are widely usedin cybercrime is also meaningful for gaining insight into thecyber threats For vulnerability similarity analysis data imbal-

252 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

(a) AV DV T AT (P13) (b) V DV T (P10) (c) V DPDTV T (P16)

Figure 9 Illustration of the vulnerability similarity analysis based on different meta-paths in which vulnerability i can be reducedinto a two-dimensional space (xiyi) and each cluster indicates a particular type of vulnerability

ance issue affects the performance of model and inadequatetraining samples often result in model underfitting as shownin the case of cluster 8 and cluster 10

7 Related Work

Cyber Treat Intelligence An increasing number of securityvendors and researchers start exploring CTI for protectingsystem security and defending against new threat vectors [28]Existing CTI extraction tools such as IBM X-Force12 Threatcrowd13 Openctiio14 AlienVault15 CleanMX16 and Phish-Tank17 use regular expression to synthesize IOC from thedescriptive texts However these methods often producehigh false positive rate by misjudging legitimate entities asIOCs [22]

Recently Balzarotti et al [2] develop a system to extractIOCs from web pages and identify malicious URLs fromJavaScript codes Sabottke et al [31] propose to detect po-tential vulnerability exploits by extracting and analyzing thetweets that contain ldquoCVErdquo keyword Liao et al [22] present atool iACE for automatically extracting IOCs which excelsat processing technology articles Nevertheless iACE identi-fies IOCs from a single article which does not consider therich semantic information from multi-source texts Zhao etal [46] define different ontologies to describe the relationshipbetween entities based on expert knowledge Numerous popu-lar CTI platforms including IODEF [9] STIX [3] TAXII [40]OpenIOC [13] and CyBox [19] focus on extracting and shar-ing CTI Yet none of the existing approaches could uncoverthe interdependent relations among CTIs extracted from multi-source texts let alone quantifying CTIsrsquo relevance and miningvaluable threat intelligence hidden behind the isolated CTIs

12httpsexchangexforceibmcloudcom13httpswwwthreatcrowdorg14httpsdemoopenctiio15 httpsotxalienvaultcom16httplistclean-mxcom17httpswwwphishtankcom

Heterogeneous Information Network Real-world systemsoften contain a large number of interacting multi-typed ob-jects which can naturally be expressed as a heterogeneousinformation network (HIN) HIN as a conceptual graph repre-sentation can effectively fuse information and exploit richersemantics in interacting objects and links [37] HIN has beenapplied to network traffic analysis [38] public social mediadata analysis [45] and large-scale document analysis [41]Recent applications of HIN include mobile malware detec-tion [14] and opioid user identification [12] In this paper forthe first time we use HIN for CTI modelingGraph Convolutional Network Graph convolutional net-works (GCN) [17] has become an effective tool for address-ing the task of machine learning on graphs such as semi-supervised node classification [17] event classification [29]clustering [8] link prediction [27] and recommended sys-tem [44] Given a graph GCN can directly conduct the con-volutional operation on the graph to learn the nonlinear em-bedding of nodes In our work to discern and reveal the inter-active relationships between IOCs we utilize GCN to learnmore discriminative representation from attributes and graphstructure simultaneously which is the premise for threat intel-ligence computing

8 Discussion

Data Availability The proposed framework assumes thatsufficient threat description data can be obtained for generat-ing comprehensive and the latest CTIs Fortunately with thegrowing prosperity of social media an increasing number ofsecurity-related data (eg blogs posts news and open secu-rity databases) can be collected effortlessly To automaticallycollect security-related data we develop a crawler system tocollect threat description data from 73 international securitysources (eg blogs hacker forum posts security bulletinsetc) providing sufficient raw materials for generating cyberthreat intelligenceModel Extensibility In this paper 6 types of IOCs and 9

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 253

types of relationships are modeled in HINTI However ourproposed framework is extensible in which more types ofIOCs and relationships can be enrolled to represent richer andmore comprehensive threat information such as maliciousdomains phishing Emails attack tools their interactions etcHigh-level Semantic Relations In view of the computa-tional complexity of the model our threat intelligence comput-ing method focues on utilizing the meta-paths to quantify thesimilarity between IOCs while ignoring the influence of themeta-graph on it which inevitably misses characterizing somehigh-level semantic information Nevertheless the proposedcomputing framework introduces an attention mechanism tolearn the signification of different meta-paths to character-ize IOCs and their interactive relationships which effectivelycompensates for the performance degradation caused by ig-noring the meta-graphsSecurity Knowledge Reasoning Although our proposedframework exhibits promising results in CTI extraction andmodeling computing how to implement advanced securityknowledge reasoning and prediction is still an open problemeg it remains challenging to predict whether a vulnerabilitycould potentially affect a particular type of devices in thefuture We will investigate this problem in the future

9 Conclusion

This work explores a new direction of threat intelligence com-puting which aims to uncover new knowledge in the relation-ships among different threat vectors We propose HINTI acyber threat intelligence framework to model and quantify theinterdependent relationships among different types of IOCsby leveraging heterogeneous graph convolutional networksWe develop a multi-granular attention mechanism to learnthe importance of different features and model the interde-pendent relationships among IOCs using HIN We proposethe concept of threat intelligence computing and present ageneral intelligence computing framework based on graphconvolutional networks Experimental results demonstratethat the proposed multi-granular attention based IOC extrac-tion method outperforms the existing state-of-the-art methodsThe proposed threat intelligence computing framework caneffectively mine security knowledge hidden in the interdepen-dent relationships among IOCs which enables crucial threatintelligence applications such as threat profiling and rankingattack preference modeling and vulnerability similarity analy-sis We would like to emphasize that the knowledge discoveryamong interdependent CTIs is a new field that calls for acollaborative effort from security experts and data scientists

In future we plan to develop a predicative and reasoningmodel based on HINTI and explore preventative countermea-sures to protect cyber infrastructure from future threats Wealso plan to add more types of IOCs and relations to depicta more comprehensive threat landscape Moreover we willleverage both meta-paths and meta-graphs to characterize the

IOCs and their interactions to further improve the embeddingperformance and to strike a balance between the accuracyand computational complexity of the model We will alsoinvestigate the feasibility of security knowledge predictionbased on HINTI to infer the potential future relationshipsbetween the vulnerabilities and devices

Acknowledgement

We would like to thank our shepherd Tobias Fiebig andthe anonymous reviewers for providing valuable feedbackon our work We also thank Hao Peng and Lichao Sunfor their feedback on the early version of this work Thiswork was supported in part by National Science Founda-tion grants CNS1950171 CNS-1949753 It was also sup-ported by the NSFC for Innovative Research Group ScienceFund Project (62141003) National Key RampD Program China(2018YFB0803 503) the 2018 joint Research Foundationof Ministry of Education China Mobile (MCM20180507)and the Opening Project of Shanghai Trusted Industrial Con-trol Platform (TICPSH202003020-ZC) Any opinions find-ings and conclusions or recommendations expressed in thismaterial do not necessarily reflect the views of any fundingagencies

References

[1] Jure Leskovec Aditya Grover node2vec Scalable fea-ture learning for networks In Acm Sigkdd InternationalConference on Knowledge Discovery Data Mining2016

[2] Marco Balduzzi Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications In WWW 2016

[3] Sean Barnum Standardizing cyber threat intelligenceinformation with the structured threat information ex-pression (stix) Mitre Corporation 111ndash22 2012

[4] Eric W Burger Michael D Goodman Panos Kam-panakis and Kevin A Zhu Taxonomy model for cyberthreat intelligence information exchange technologiesIn Proceedings of the 2014 ACM Workshop on Infor-mation Sharing amp Collaborative Security pages 51ndash602014

[5] Onur Catakoglu Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications The web conference pages 333ndash3432016

[6] Danqi Chen and Christopher D Manning A fast and ac-curate dependency parser using neural networks pages740ndash750 2014

254 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

[7] Qian Chen and Robert A Bridges Automated behav-ioral analysis of malware a case study of wannacry ran-somware In 16th IEEE ICMLA pages 454ndash460 2017

[8] Weilin Chiang Xuanqing Liu Si Si Yang Li SamyBengio and Chojui Hsieh Cluster-gcn An efficient al-gorithm for training deep and large graph convolutionalnetworks knowledge discovery and data mining pages257ndash266 2019

[9] Roman Danyliw Jan Meijer and Yuri Demchenko Theincident object description exchange format Interna-tional Journal of High Performance Computing Appli-cations 50701ndash92 2007

[10] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 2017

[11] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 ACM 2017

[12] Yujie Fan Yiming Zhang Yanfang Ye and Xin Li Au-tomatic opioid user detection from twitter Transductiveensemble built on different meta-graph based similari-ties over heterogeneous information network In IJCAIpages 3357ndash3363 2018

[13] Fireeye Openioc httpswwwfireeyecomblogthreat-research201310openioc-basicshtml Accessed January 20 2020

[14] Shifu Hou Yanfang Ye Yangqiu Song and Melih Ab-dulhayoglu Hindroid An intelligent android malwaredetection system based on structured heterogeneous in-formation network In Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Dis-covery and Data Mining pages 1507ndash1515 2017

[15] Zhiheng Huang Wei Xu and Kai Yu Bidirectional lstm-crf models for sequence tagging arXiv1508019912015

[16] Michael D Iannacone Shawn Bohn Grant NakamuraJohn Gerth Kelly MT Huffer Robert A Bridges Erik MFerragut and John R Goodall Developing an ontologyfor cyber security knowledge graphs CISR 1512 2015

[17] Thomas Kipf and Max Welling Semi-supervised clas-sification with graph convolutional networks arXivLearning 2016

[18] Thomas N Kipf and Max Welling Semi-supervisedclassification with graph convolutional networks arXivpreprint arXiv160902907 2016

[19] Tero Kokkonen Architecture for the cyber security situ-ational awareness system In Internet of Things SmartSpaces and Next Generation Networks and Systemspages 294ndash302 Springer 2016

[20] Mehmet Necip Kurt Yasin Yılmaz and Xiaodong WangDistributed quickest detection of cyber-attacks in smartgrid IEEE Transactions on Information Forensics andSecurity 13(8)2015ndash2030 2018

[21] John Lafferty Andrew Mccallum and Fernando PereiraConditional random fields Probabilistic models for seg-menting and labeling sequence data ICML pages 282ndash289 2001

[22] Xiaojing Liao Yuan Kan Xiao Feng Wang Li Zhouand Raheem Beyah Acing the ioc game Toward auto-matic discovery and analysis of open-source cyber threatintelligence In ACM Sigsac Conference on ComputerCommunications Security 2016

[23] Rob Mcmillan Open threat intelligencehttpwwwgartnercomdoc2487216definition-threat-intelligence AccessedJanuary 20 2020

[24] Tomas Mikolov Kai Chen Greg Corrado and JeffreyDean Efficient estimation of word representations invector space arXiv preprint arXiv13013781 2013

[25] Sudip Mittal Prajit Kumar Das Varish Mulwad Anu-pam Joshi and Tim Finin Cybertwitter Using twitterto generate alerts for cybersecurity threats and vulnera-bilities In Proceedings of the 2016 IEEE Advances inSocial Networks Analysis and Mining pages 860ndash8672016

[26] Eric Nunes Ahmad Diab Andrew Gunn EricssonMarin Vineet Mishra Vivin Paliath John RobertsonJana Shakarian Amanda Thart and Paulo ShakarianDarknet and deepnet mining for proactive cybersecuritythreat intelligence In 2016 IEEE ISI pages 7ndash12 2016

[27] Shirui Pan Ruiqi Hu Guodong Long Jing Jiang LinaYao and Chengqi Zhang Adversarially regularizedgraph autoencoder for graph embedding pages 2609ndash2615 2018

[28] P Pawlinski P Jaroszewski P Kijewski L SiewierskiP Jacewicz P Zielony and R Zuber Actionable infor-mation for security incident response European UnionAgency for Network and Information Security Herak-lion Greece 2014

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 255

[29] Hao Peng Jianxin Li Qiran Gong Yangqiu SongYuanxing Ning Kunfeng Lai and Philip S Yu Fine-grained event categorization with heterogeneous graphconvolutional networks In Proceedings of the 28th In-ternational Joint Conference on Artificial Intelligencepages 3238ndash3245 AAAI Press 2019

[30] Sara Qamar Zahid Anwar Mohammad Ashiqur Rah-man Ehab Al-Shaer and Bei-Tseng Chu Data-drivenanalytics for cyber-threat intelligence and informationsharing Computers amp Security 6735ndash58 2017

[31] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In USENIXSecurity 2015

[32] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In 24thUSENIX Security pages 1041ndash1056 2015

[33] Deepak Sharma and Avadhesha Surolia Degree Cen-trality Springer New York 2013

[34] Saurabh Singh Pradip Kumar Sharma Seo Yeon MoonDaesung Moon and Jong Hyuk Park A comprehensivestudy on apt attacks and countermeasures for futurenetworks and communications challenges and solutionsJournal of Supercomputing pages 1ndash32 2016

[35] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks Principles and methodologiesSynthesis Lectures on Data Mining and Knowledge Dis-covery 3(2)1ndash159 2012

[36] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks a structural analysis approachACM SIGKDD 14(2)20ndash28 2013

[37] Yizhou Sun Jiawei Han Xifeng Yan Philip S Yu andTianyi Wu Pathsim Meta path-based top-k similaritysearch in heterogeneous information networks Proceed-ings of the VLDB Endowment 4(11)992ndash1003 2011

[38] Yizhou Sun Brandon Norick Jiawei Han Xifeng YanPhilip S Yu and Xiao Yu Pathselclus Integrating meta-path selection with user-guided object clustering in het-erogeneous information networks ACM Transactionson TKDD 7(3)11 2013

[39] Wiem Tounsi and Helmi Rais A survey on technicalthreat intelligence in the age of sophisticated cyber at-tacks Computers amp Security 72212ndash233 2018

[40] Thomas D Wagner Esther Palomar Khaled Mahbuband Ali E Abdallah Towards an anonymity supportedplatform for shared cyber threat intelligence risks andsecurity of internet and systems pages 175ndash183 2017

[41] Chenguang Wang Yangqiu Song Haoran Li MingZhang and Jiawei Han Text classification with het-erogeneous information network kernels In 13th AAAI2016

[42] Xiao Wang Houye Ji Chuan Shi Bai Wang Peng CuiP Yu and Yanfang Ye Heterogeneous graph attentionnetwork 2019

[43] Jie Yang Shuailong Liang and Yue Zhang Design chal-lenges and misconceptions in neural sequence labelingarXiv Computation and Language 2018

[44] Rex Ying Ruining He Kaifeng Chen Pong Eksombat-chai William L Hamilton and Jure Leskovec Graphconvolutional neural networks for web-scale recom-mender systems In Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Dis-covery amp Data Mining pages 974ndash983 ACM 2018

[45] Jiawei Zhang Xiangnan Kong and Philip S Yu Trans-ferring heterogeneous links across location-based socialnetworks In The 7th ACM international conference onWeb search and data mining pages 303ndash312 2014

[46] Yishuai Zhao Bo Lang and Ming Liu Ontology-basedunified model for heterogeneous threat intelligence inte-gration and sharing In 2017 11th IEEE InternationalConference on Anti-counterfeiting Security and Identi-fication (ASID) pages 11ndash15 2017

256 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

  • Introduction
  • Background
    • Cyber Threat Intelligence
    • Motivation
    • Preliminaries
      • Architecture Overview of HINTI
      • Methodology
        • Multi-granular Attention Based IOC Extraction
        • Cyber Threat Intelligence Modeling
        • Threat Intelligence Computing
          • Experimental Evaluation
            • Dataset and Settings
            • Evaluation of IOC Extraction
              • Application of Threat Intelligence Computing
                • Threat Profiling and Significance Ranking of IOCs
                • Attack Preference Modeling
                • Vulnerability Similarity Analysis
                  • Related Work
                  • Discussion
                  • Conclusion
Page 7: Cyber Threat Intelligence Modeling Based on Heterogeneous ... · 2.1Cyber Threat Intelligence Cyber Threat Intelligence (CTI) extracted from security-related data is structured information

system7 To explore the relation of vulnerability-attacktype we build the vulnerability-belong-attack type ma-trix G where each element Gi j isin 01 denotes if vul-nerability i belongs to an attack type j

bull R6 A vulnerability often involves one or more maliciousfiles To describe the relation of vulnerability-file webuild the vulnerability-include-file matrix F For eachelement Fi j isin 01 Fi j=1 denotes that vulnerability iincludes malicious file j

bull R7 A malicious file often targets a specific device Weestablish the file-target-device matrix T to explore therelation of file-device For each element Ti j isin 01Ti j=1 indicates malicious file i targets device j

bull R8 Oftentimes a vulnerability evolves from anotherTo study the relationship of vulnerability-vulnerabilitywe build the vulnerability-evolve-vulnerability matrixE where each element Ei j isin 01 indicates if vulnera-bility i evolves from vulnerability j

bull R9 To depict the relation device-platform that a de-vice belongs to a platform we build the device-belong-platform matrix P where each element Pi j isin 01 il-lustrates if device i belongs to platform j

Based on the above 9 types of relationships HINTIleverages the syntactic dependency parser [6] (eg subject-predicate-object attributive clause etc) to automatically ex-tract the 9 relationships among IOCs from threat descriptionseach of which is represented as a triple (IOCirelation IOC j)For instance given a security-related description ldquoOn May12 2017 WannaCry exploited the MS17-010 vulnerabilityto affect a larger number of Windows devices which is aransomware attack via encrypted disks Using the syntacticdependency parser we can extract the following triples (Wan-naCry exploit MS17-010) (MS17-010 affect Windows de-vice) (WannaCry is ransomware) Such triples extractedfrom various data sources can be incrementally assembledinto HINTI to model the relationships among IOCs whichcould offer a more comprehensive threat landscape that de-scribes the threat context Particularly we further define 17types of meta-paths shown in Table 1 to probe into the interde-pendent relationships over attackers vulnerabilities maliciousfiles attack types devices and platforms HINTI is able toconvey a richer context of threat events by scrutinizing 17types of meta-paths and reveal the in-depth threat insightsbehind the heterogeneous IOCs (see Section 6 for details)

43 Threat Intelligence ComputingIn this section we illustrate the concept of threat intelligencecomputing and design a general threat intelligence computing

7httpcvemitreorg

framework based on heterogeneous graph convolutional net-works which quantifies and measures the relevance betweenIOCs by analyzing meta-path based semantic similarity Herewe first provide a formal definition of threat intelligence com-puting based on heterogeneous graph convolutional networks

Definition 4 Threat Intelligence Computing Based onHeterogeneous Graph Convolutional Networks Given thethreat intelligence graph G = (VE) the meta-path set M =P1P2 middot middot middot Pi The threat intelligence computing i) com-putes the similarity between IOCs based on meta-path Pi togenerate corresponding adjacency matrix Ai ii) constructsthe feature matrix of nodes Xi by embedding attribute in-formation of IOCs into a latent vector space iii) conductsgraph convolution GCN(AiXi) to quantify the interdependentrelationships between IOCs by following meta-path Pi andembeds them into a low-dimensional space

The threat intelligence computing aims to model the seman-tic relationships between IOCs and measure their similaritybased on meta-paths which can be used for advanced secu-rity knowledge discovery such as threat object classificationthreat type matching threat evolution analysis etc Intuitivelythe objects connected by the most significant meta-paths tendto bear more similarity [37] In this paper we propose aweight-learning based threat intelligence similarity measurewhich uses self-attention to improve the performance of simi-larity measurement between any two IOCs This method canbe formalized as below

Definition 5 Weight-learning based Node Similar-ity Measure Given a set of symmetric meta-path setP = [Pm]

Mprime

m=1 the similarity S(hih j) between any two IOCshi and h j is defined as

S(hih j) =Mprime

summ

rarrw

2middot | hirarr j isin Pm || hirarri isin Pm |+ | h jrarr j isin Pm |

(7)

where hirarr j isin hm is a path instance between IOC hi andh j following meta-path Pm hirarri isin Pm is that between IOCinstance hi and hi and h jrarr j isin Pm is that between IOC in-stance h j and h j where | hirarr j isin Pm |=CPm(i j) | hirarri isinPm |=CPm(i i) | h jrarr j isin Pm |=CPm( j j) and CPm is thecommuting matrix based on meta-path Pm defined below

rarrw =

[w1 wm wMprime ] denote the meta-path weights wherewm is the weight of meta-paths Pm and M

primeis the number of

meta-pathsS(hih j) is defined in two parts (1) the semantic overlap

in the numerator which describes the number of meta-pathbetween IOC instance hi and h j (2) and the semantic broad-ness in the denominator which depicts the number of totalmeta-paths between themselves The larger number of meta-path between IOC instance hi and h j the more similar thetwo IOCs are which is normalized by the semantic broadness

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 247

of denominator Moreover different from existing similaritymeasures [37] we propose an attention mechanism basedsimilarity measure method by introducing the weight vec-torrarrw = [w1 wm wMprime ] which is a trainable coefficient

vector to learn the importance of different meta-paths forcharacterizing IOCs

Obviously it is computationally expensive to measure thesimilarity among IOCs in the constructed heterogeneousgraph as it usually requires to randomly walk a larger numberof nodes in the graph Fortunately in our work it is unneces-sary to walk through the entire graph as we prescribe a limitby introducing predefined meta-paths and we only focus onthe symmetrical meta-paths presented in Table 1 To calcu-late the similarity between IOCs under different meta-pathinstances we need to compute the corresponding commutingmatrices [37] following the meta-paths

Given a meta-path set P = sumMprime

m A1A2 middot middot middot Al+1 themeta-path based commuting matrix can be defined as CP =UA1A2 UA2A3 middot middot middot UAAl+1 where CP(i j) represents the prob-ability of object iisinA1 reaching object j isinAl+1 under the pathP and is a connection operation These symmetric meta-paths not only efficiently reduce the complexity of walkingbut also ensures that the commuting matrix can be easily de-composed which greatly reduces the computational costs Inaddition the symmetric meta-paths in the graph G allow us toleverage the pairwise random-walk [37] to further acceleratethe computation

With Eq (7) and pairwise random-walk we can obtain thesimilarity embedding between any two IOCs hi and h j undera meta-path set P Based on the low-dimensional similarityembedding we derive a weighted adjacent matrix betweenIOCs denoted as Ai isin RNtimesN where N is the number of aspecific type of IOC in G Meanwhile to utilize the attributedinformation of nodes we train a word2vec model [24] toembed the attribute information of nodes into a feature matrixXi isin RNtimesd where N is the number of IOCs in Ai and d is thedimension of node feature With the learned adjacency matrixAi and its feature matrix Xi we can leverage the classicalGCN [18] to characterize the relationship between IOC hiand h j Particularly the layer-wise propagation rule of GCNcan be defined as below

H(l+1) = σ(Dminus12 ADminus

12 H(l)W (l)) (8)

where A = A+ IN is the adjacency matrix of IOCs with self-connections IN is the identity matrix Dii =sum j Ai j and W (l) isa l-th layer trainable weight matrix σ(middot) denotes an activationfunction such as relu H(l) isin RNtimesd is the matrix of activationin the l-th layer We perform graph convolution [18] on Aiand Xi to generate the embedding Z between IOCs belongingto type i

Z = f (XiAi) = σ(Ai middot relu(AiXiW(0)i )W (1)

i ) (9)

where W (0)i isin RdtimesH is an input-to-hidden weight matrix for a

hidden layer with H feature mapsW (1)i isinRHtimesF is a hidden-to-

output weight matrix with F feature maps in the output layerXi isin RNtimesd N is the number of a specific type of IOCs d is thedimension of their corresponding features and σ is anotheractivation function such as sigmoid Ai = Dminus

12 AiDminus

12 can be

calculated offline Here we leverage the cross-entropy loss tooptimize the performance of our proposed threat intelligenceframework written as follows

Loss(Yl f Zl f ) =minussumlisinYl

F

sumf=1

Yl f middot lnZl f (10)

where Yl is the set of node indices that have labels Yl f is thereal label and Zl f is a corresponding label that our modelpredicts Based on Eq (10) we conduct stochastic gradientdescent to continuously optimize the neural network weightsW (0)

i W (1)i and

rarrw to reduce the loss and build a general

threat intelligence computing framework Using this frame-work security organizations are able to mine richer securityknowledge hidden in the interdependent relationships amongIOCs

5 Experimental Evaluation

51 Dataset and SettingsWe develop a threat data collector to automatically collectcyber threat data from a set of sources including 73 inter-national security blogs (eg fireeye cloudflare) hacker fo-rum posts (eg Blackhat Hack5) security bulletins (egMicrosoft Cisco) CVE detail description and ExploitDB Acomplete list of data sources is presented in the Baidu cloud8We set up a daemon to collect the newly generated securityevents every day So far more than 245786 security-relateddata describing threat events have been collected For trainingand evaluating our proposed IOC extraction method 30000samples from 5000 texts are annotated by utilizing the B-I-Osequence tagging method (see Section 22 for the example)and an annotation example is shown in Figure 2

For 30000 labeled samples we randomly select 60 ofsamples as a training set 20 of samples as a verificationset and the rest of the samples as our test set Based on thedata sets we comprehensively evaluate the performance ofHINTI for extracting IOCs and threat intelligence computingWe run all of the experiments on 16 cores Intel(R) Core(TM)i7-6700 CPU 340GHz with 64GB RAM and 4times NVIDIATesla K80 GPU The software programs are executed on theTensorFlow-GPU framework on Ubuntu 1604

52 Evaluation of IOC ExtractionA set of experiments are conducted to evaluate the sensitiv-ity of different parameters in the multi-granular based IOC

8httpspanbaiducoms1J631WMYY_T_awa8aY5xy3A

248 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

extraction model We mainly consider 8 hyper-parametersthat seriously impact the performance of the model as shownin Table 2 More specifically Embedding_dim is one ofthe most important factors that determine the generaliza-tion capability of the model Here we fix other parame-ters while fine-tuning the embedding size in the range of(50100150200250300350400) Experimental resultsshow that the accuracy of extracted IOC achieves the bestwhen Embedding_dim=300 Learning_rate is another ma-jor factor for determining the stride of gradient descent inminimizing the loss function which determines whetherthe model can find a global optimal solution We fix otherparameters to fine-tune the Learning_rate in the rangeof (000100050010050105) and the performancereaches the best when the Learning_rate=0001 Similarlywe fine-tune the other hyper-parameters with 5000 epochsand the hyper-parameters allowing our model to perform op-timally are recorded in Table 2

Table 2 Hyperparameters setting in the multi-granular basedIOC extraction method

Parameter value Parameter Value

Embedding_dim 300 Hidden_dim 128

Sequence_length 500 Epoch_num 5000

Learning_rate 0001 Batch_size 64

Dropout_rate 05 Optimizer Adam

Table 3 Performance of IOC extraction wrt IOC types

IOC Type Precision Recall Micro-F1

IP 9956 9952 9954File 9436 9688 9560Type 9986 9981 9983

Email 9932 9987 9949Device 9326 9278 9302Vender 9307 9445 9424Version 9698 9799 9748Domain 9658 9589 9623Software 8825 8931 8878Function 9503 9559 9531Platform 9431 9257 9343Malware 8976 9123 9049

Vulnerability 9925 9873 9899Other 9829 9842 9835

In this paper we extract 13 major types of IOCs and the

Figure 7 Performance of IOC extraction using embeddingfeatures with different granularity

performance is presented in Table 3 Overall our IOC ex-traction method demonstrates excellent performance in termsof precision recall and Micro-F1 (ie micro-averaged F1-score) for most types of IOCs such as function maliciousIP and device However we observe a performance degra-dation when recognizing software and malware This can beattributed to the fact that most software and malware is namedby random strings such as md5 hash Moreover we find thatthe number of training samples impacts the performance ofthe model Specifically the performance becomes unsatisfac-tory (eg Software Malware) when the number of a certaintype of training samples is insufficient (ie less than 5000)

In order to verify the effectiveness of multi-granular embed-ding features we assess the performance of IOC extractionwith features of different granularity including char-level1-gram 2-gram 3-gram and multi-granular features The ex-perimental results are demonstrated in Figure 7 from whichwe can observe that the proposed multi-granular embeddingfeature outperforms others since it leverages the attentionmechanism to simultaneously learn multi-granular features tocharacterize different patterns of IOCs

Next to verify the effectiveness of the proposed IOC ex-traction method we compare it with the state-of-the-art en-tity recognition approaches including general NER toolsNLTK NER9 and Stanford NER10 professional IOC extrac-tion method Stucco [16] and iACE [22] and popular en-tity recognition approaches CRF [21] BiLSTM and BiL-STM+CRF [15] The experimental results of different meth-ods on real-world data are demonstrated in Table 4

The results indicate that our proposed IOC extraction out-performs the state-of-the-art entity recognition methods andtools in terms of precision recall and Micro-F1 and its im-provement can be attributed to the following factors Firstcompared with Standford NER and NLTK NER the NLP tools

9httpwwwnltkorgbookch07html10httpsstanfordnlpgithubioCoreNLPnerhtml

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 249

Table 4 Performance of threat entity recognition usingdifferent methods

Method Accuracy Precision Micro-F1

NLTK NER 6945 6851 6749

Stanford NER 6835 6674 6858

iACE 9214 9126 9225

Stucco 9116 9221 9147

CRF 9264 9180 9265

BiLSTM 9478 9521 9435

BiLSTM+CRF 9638 9642 9627

Multi-granular 9859 9872 9869

trained with general news corpora our method uses a security-related training corpus collected and labeled by ourselves asa data source for training our model Second different fromthe rule-based extraction approaches (eg iACE and Stucco)our proposed deep learning based method provides an end-to-end system with more advanced features to represent variousIOCs Third comparing to RNN-based methods (eg BiL-STM and BiLSTM+CRF) our proposed method brings inmulti-granular embedding sizes (char-level 1-gram 2-gramand 3-gram) to simultaneously learn the characteristics of vari-ous sizes and types of IOCs which can identify more complexand irregular IOCs Last but not the least our method imple-ments an attention mechanism to learn the weights of featureswith various scales to effectively characterize different typesof IOCs further enhancing the IOC recognition accuracy

6 Application of Threat Intelligence Comput-ing

Our proposed threat intelligence computing framework basedon heterogeneous graph convolutional networks can be usedto mine novel security knowledge behind heterogeneous IOCsIn this section we evaluate its effectiveness and applicabilityusing three real-world applications profiling and ranking forCTIs attack preference modeling and vulnerability similarityanalysis

61 Threat Profiling and Significance Rankingof IOCs

Due to the disparity in the significance of threats it is im-portant to derive the threat profile and rank the significanceof IOCs for demystifying the landscape of threats Howevermost of the existing CTIs are incapable of modeling the asso-ciated relationships between heterogeneous IOCs

Different from isolated CTIs HINTI leverages HIN to

model the interdependent relationships among IOCs withtwo characteristics first the isolated IOCs can be integratedinto a graph-based HIN to clearly display the associated rela-tionships among IOCs which is capable of directly depictingthe basic threat profile For example Figure 3 depicts a threatprofiling sample an attacker utilizes CVE-2017-0143 vulnera-bility to invade Vista SP2 and Win7 SP1 devices belonging tothe Microsoft platform and CVE-2017-0143 is a remote codeexecution vulnerability that uses a ldquoSMBbat malicious fileSecond the significance of IOCs in HINTI can be naturallyranked based on the proposed threat intelligence computingframework

Table 5 shows the top 5 authoritative ranking score [35] ofvulnerability attacker attack type and platform from whichsecurity experts can gain a clear insight into the impact ofeach IOC Degree centrality [33] which describes the numberof links incident upon a node is widely used in evaluatingthe importance of a node in a graph It can used to quantifythe immediate risk of a node that connects with other nodesfor delivering network flows such as virus spreading Heredegree centrality can be utilized in verifying the effectivenessof the proposed threat intelligence computing framework inranking the importance of IOCs It is worth noting that bothour ranking method and degree centrality work regardless ofthe time of attacks We compute the degree centrality rank-ing of IOCs based on the fact that the node with a higherdegree centrality is more important than a node with a lowerone For instance if the degree centrality of a vulnerabilityis higher it indicates that this vulnerability is exploited bymore attackers or it affects more devices The ranking resultof degree centrality shown in Table 5 is consistent with theranking result based on the proposed threat intelligence com-puting framework demonstrating the capability of the CTIcomputing framework in ranking the importance of differenttypes of IOCs

62 Attack Preference Modeling

Attack preference modeling is meaningful for security organi-zations to gain insight into the attack intention of attackersbuild attack portraits and develop personalized defense strate-gies Here we leverage HINTI to integrate different types ofIOCs and their interdependent relationships to comprehen-sively depict the picture of attack events which helps modelthe attack preferences With the proposed threat intelligencecomputing framework we model attack preferences by clus-tering the embedded attackersrsquo vectors

In this task each malicious IP address is treated as an in-truder and its attack preferences are mainly reflected in threefeatures including the platforms it destroys (including Win-dows Linux Unix ASP Android Apache etc) the industriesit invades (eg education finance government Internet ofThings and Industrial control system etc) and the exploittypes it employs (eg DOS Buffer overflow Execute code

250 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

Table 5 The significance ranking of different types of IOCs (CV E1 CVE-2017-0146 CV E2 CVE-2006-5911 CV E3 CVE-2008-6543 CV E4 CVE-2012-1199 CV E4 CVE-2006-4985 AR Authoritative Ranking DC Degree Centrality value)

Vulnerability Attacker Platform Attack Type

No AR DC Monicker AR DC Category AR DC Exploit_type AR DC

CV E1 02713 7643 Meatsploit 02764 549 PHP 04562 17865 Webapps 05494 11648

CV E2 02431 7124 GSR team 01391 327 windows 02242 13793 DOS 01772 8741

CV E3 02132 6833 Ihsan 00698 279 Linux 00736 8792 Overflow 01533 7652

CV E4 01826 6145 Techsa 00695 247 Linux86 00623 8147 CSRF 00966 5433

CV E5 01739 5637 Aurimma 00622 204 ASP 00382 5027 SQL 00251 2171

(a) AV DV T AT (P13) (b) AV DPDTV T AT (P17) (c) AV PV T AT (P14)

Figure 8 The performance of attack preference modeling with different meta-paths in which the preference of attacker i isreduced to a two-dimensional space (xiyi) and each cluster represents a group with a specific attack preference

Sql injection XSS Gain information etc)Specifically we first utilize our proposed threat intelligence

computing framework to embed each attacker into a low-dimensional vector space and then perform DBSCAN algo-rithm on the embedded vector to cluster attackers with thesame preferences into corresponding groups Figure 8 showsthe top 3 clustering results under different types of meta-paths in which the meta-path AV DPDTV T AT (P17) performsthe best performance with compact and well-separated clus-ters indicating that it contains richer semantic relationshipsfor characterizing attack preferences than other meta-paths

To verify the effectiveness of attack preference modelingwe identify 5297 distinct attackers (each unique IP address istreated as an attacker) who have submitted at least 10 cyberattacks For these attackers five cybersecurity researchersconsisting of three doctoral and two master students spentabout fortnight to manually annotate their attack preferencesfrom three perspectives the platforms they destroyed the in-dustries they attacked and the attack types they exploited Toensure the accuracy of data labeling we test the consistencyof the tags for the 5297 attackers and remove the sampleswith ambiguous tags As a result we obtain 3000 sampleswith consistent tags Based on the labeled samples we furtherevaluate the performance of different meta-paths on model-

Table 6 Performance of modeling attack preference withdifferent meta-paths

Metapath Accuracy Precision Micro-F1

P1 7431 7622 7525

P4 7116 7327 7216

P5 6915 7143 7027

P12 7214 7646 7424

P13 7965 8131 8047

P14 7748 7934 7840

P15 8017 7976 7996

P17 8139 8172 8155

ing attack preferences In the attack modeling scenario weonly focus on the meta-paths that both the start node and theend node are attackers The experimental results are demon-strated in Table 6 Obviously different meta-paths displaydifferent abilities in characterizing the attack preferences ofcyber intruders The performance when using P17 is more

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 251

superior than the one with other meta-paths which indicatesthat P17 holds more valuable information that characterizesthe attack preferences of cybercriminals since P17 includesthe semantics information of P1P4P5 and P12 sim P15

In addition we compare the capabilities of our proposedcomputing framework with those of other state-of-the-art em-bedding methods in terms of attack preference modelingOur analysis result shows that the accuracy of attack pref-erence modeling reaches 081 which outperforms the exist-ing popular models Node2vec (with precision of 071) [1]metapaht2vec (with precision of 073) [11] and HAN (withprecision of 076) [42] The performance improvement canbe attributed to the following characteristics First our com-puting framework utilizes weight-learning to learn the signifi-cance of different meta-paths for evaluating the similarity be-tween attackers Second the proposed computing frameworkleverages GCN to learn the structural information betweenattackers to obtain more discriminative structural features thatimproves the performance of attack preference modeling

63 Vulnerability Similarity AnalysisVulnerability classification or clustering is crucial for conduct-ing vulnerability trend analysis the correlation analysis ofincidents and exploits and the evaluation of countermeasuresThe traditional vulnerability analysis relies on the manualinvestigation of the source codes which requires expert ex-pertise and consumes considerable efforts In this sectionwe propose an unsupervised vulnerability similarity analy-sis method based on the proposed threat intelligence com-puting framework which can automatically group similarvulnerabilities into corresponding communities Particularlythe vulnerability-related IOCs are first embedded into a low-dimensional vector space using CTI computing frameworkThen the DBSCAN algorithm is performed on the embed-ded vector space to cluster vulnerabilities into correspondingcommunities The clustering results are presented in Figure 9

Figure 9 (c) shows all vulnerabilities are clustered into 12clusters using meta-path V DPDTV T (P16) which is very closeto the classification standard (ie 13) recommended by CVEdetails an authoritative database that publishes vulnerabilityinformation By manually analyzing the training samples wefind that HTTP Response Splitting vulnerability does not ap-pear in our dataset Therefore our cluster number (ie 12)is consistent with CVE Details11 To further validate the ef-fectiveness of threat intelligence computing framework forvulnerability clustering we randomly select 100 vulnerabili-ties from each cluster for manual inspection to measure theconsistency of the vulnerability types in each cluster and theresults are presented in Table 7 Obviously the clustering per-formance of cluster 8 (ie File Inclusion) and cluster 10 (ieDirectory Traversal) is remarkably worse than other clustersTo explain the reason we examine our training data and the

11httpswwwcvedetailscom

Table 7 Accuracy of vulnerability clustering

Cluster ID Vulnerability type Accuracy

cluster1 Denial of Service 8012

cluster2 XSS 8353

cluster3 Execute Code 8150

cluster4 Overflow 7650

cluster5 Gain Privilege 9156

cluster6 Bypass Something 7174

cluster7 CSRF 9327

cluster8 File Inclusion 6172

cluster9 Gain Informa 7042

cluster10 Directory Traversal 6949

cluster11 Memory Corruption 8156

cluster12 SQL Injection 8067

average 7851

computing framework We found that the proportion of thesetwo types of vulnerabilities is too small (cluster 8 is 34and cluster 10 is 42) making our computing frameworkvery likely to be under-fit with insufficient data Howeverthe proposed computing framework performs well on mosttypes of vulnerabilities in an unsupervised manner especiallygiven sufficient samples (eg cluster 7 is 176 and cluster 5is 157)

In addition by examining the clustering results we havean observation that the vulnerabilities in the same cluster arelikely to have evolutionary relationships For instance CVE-2018-0802 an office zero-day vulnerability is evolved fromthe CVE-2017-11882 They both include EQNEDT32exe fileused to edit the formula in Office software which allowsremote attackers to execute arbitrary codes by constructinga malformed font name The modeling and computation ofinterdependent relationships among IOCs in HINTI facilitatethe discovery of such intricate connections between vulnera-bilities

In summary HINTI is capable of depicting a more compre-hensive threat landscape and the proposed CTI computingframework has the ability to bring novel security insightstoward different real-world security applications Howeverthere are still numerous opportunities for enhancing thesesecurity applications Specifically for attack preference mod-eling although each individual IP address is treated as anattacker we cannot determine whether it belongs to a realattacker or is disguised by a proxy Fortunately even if thereal attack address cannot be captured understanding the at-tack preferences of these IP proxies which are widely usedin cybercrime is also meaningful for gaining insight into thecyber threats For vulnerability similarity analysis data imbal-

252 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

(a) AV DV T AT (P13) (b) V DV T (P10) (c) V DPDTV T (P16)

Figure 9 Illustration of the vulnerability similarity analysis based on different meta-paths in which vulnerability i can be reducedinto a two-dimensional space (xiyi) and each cluster indicates a particular type of vulnerability

ance issue affects the performance of model and inadequatetraining samples often result in model underfitting as shownin the case of cluster 8 and cluster 10

7 Related Work

Cyber Treat Intelligence An increasing number of securityvendors and researchers start exploring CTI for protectingsystem security and defending against new threat vectors [28]Existing CTI extraction tools such as IBM X-Force12 Threatcrowd13 Openctiio14 AlienVault15 CleanMX16 and Phish-Tank17 use regular expression to synthesize IOC from thedescriptive texts However these methods often producehigh false positive rate by misjudging legitimate entities asIOCs [22]

Recently Balzarotti et al [2] develop a system to extractIOCs from web pages and identify malicious URLs fromJavaScript codes Sabottke et al [31] propose to detect po-tential vulnerability exploits by extracting and analyzing thetweets that contain ldquoCVErdquo keyword Liao et al [22] present atool iACE for automatically extracting IOCs which excelsat processing technology articles Nevertheless iACE identi-fies IOCs from a single article which does not consider therich semantic information from multi-source texts Zhao etal [46] define different ontologies to describe the relationshipbetween entities based on expert knowledge Numerous popu-lar CTI platforms including IODEF [9] STIX [3] TAXII [40]OpenIOC [13] and CyBox [19] focus on extracting and shar-ing CTI Yet none of the existing approaches could uncoverthe interdependent relations among CTIs extracted from multi-source texts let alone quantifying CTIsrsquo relevance and miningvaluable threat intelligence hidden behind the isolated CTIs

12httpsexchangexforceibmcloudcom13httpswwwthreatcrowdorg14httpsdemoopenctiio15 httpsotxalienvaultcom16httplistclean-mxcom17httpswwwphishtankcom

Heterogeneous Information Network Real-world systemsoften contain a large number of interacting multi-typed ob-jects which can naturally be expressed as a heterogeneousinformation network (HIN) HIN as a conceptual graph repre-sentation can effectively fuse information and exploit richersemantics in interacting objects and links [37] HIN has beenapplied to network traffic analysis [38] public social mediadata analysis [45] and large-scale document analysis [41]Recent applications of HIN include mobile malware detec-tion [14] and opioid user identification [12] In this paper forthe first time we use HIN for CTI modelingGraph Convolutional Network Graph convolutional net-works (GCN) [17] has become an effective tool for address-ing the task of machine learning on graphs such as semi-supervised node classification [17] event classification [29]clustering [8] link prediction [27] and recommended sys-tem [44] Given a graph GCN can directly conduct the con-volutional operation on the graph to learn the nonlinear em-bedding of nodes In our work to discern and reveal the inter-active relationships between IOCs we utilize GCN to learnmore discriminative representation from attributes and graphstructure simultaneously which is the premise for threat intel-ligence computing

8 Discussion

Data Availability The proposed framework assumes thatsufficient threat description data can be obtained for generat-ing comprehensive and the latest CTIs Fortunately with thegrowing prosperity of social media an increasing number ofsecurity-related data (eg blogs posts news and open secu-rity databases) can be collected effortlessly To automaticallycollect security-related data we develop a crawler system tocollect threat description data from 73 international securitysources (eg blogs hacker forum posts security bulletinsetc) providing sufficient raw materials for generating cyberthreat intelligenceModel Extensibility In this paper 6 types of IOCs and 9

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 253

types of relationships are modeled in HINTI However ourproposed framework is extensible in which more types ofIOCs and relationships can be enrolled to represent richer andmore comprehensive threat information such as maliciousdomains phishing Emails attack tools their interactions etcHigh-level Semantic Relations In view of the computa-tional complexity of the model our threat intelligence comput-ing method focues on utilizing the meta-paths to quantify thesimilarity between IOCs while ignoring the influence of themeta-graph on it which inevitably misses characterizing somehigh-level semantic information Nevertheless the proposedcomputing framework introduces an attention mechanism tolearn the signification of different meta-paths to character-ize IOCs and their interactive relationships which effectivelycompensates for the performance degradation caused by ig-noring the meta-graphsSecurity Knowledge Reasoning Although our proposedframework exhibits promising results in CTI extraction andmodeling computing how to implement advanced securityknowledge reasoning and prediction is still an open problemeg it remains challenging to predict whether a vulnerabilitycould potentially affect a particular type of devices in thefuture We will investigate this problem in the future

9 Conclusion

This work explores a new direction of threat intelligence com-puting which aims to uncover new knowledge in the relation-ships among different threat vectors We propose HINTI acyber threat intelligence framework to model and quantify theinterdependent relationships among different types of IOCsby leveraging heterogeneous graph convolutional networksWe develop a multi-granular attention mechanism to learnthe importance of different features and model the interde-pendent relationships among IOCs using HIN We proposethe concept of threat intelligence computing and present ageneral intelligence computing framework based on graphconvolutional networks Experimental results demonstratethat the proposed multi-granular attention based IOC extrac-tion method outperforms the existing state-of-the-art methodsThe proposed threat intelligence computing framework caneffectively mine security knowledge hidden in the interdepen-dent relationships among IOCs which enables crucial threatintelligence applications such as threat profiling and rankingattack preference modeling and vulnerability similarity analy-sis We would like to emphasize that the knowledge discoveryamong interdependent CTIs is a new field that calls for acollaborative effort from security experts and data scientists

In future we plan to develop a predicative and reasoningmodel based on HINTI and explore preventative countermea-sures to protect cyber infrastructure from future threats Wealso plan to add more types of IOCs and relations to depicta more comprehensive threat landscape Moreover we willleverage both meta-paths and meta-graphs to characterize the

IOCs and their interactions to further improve the embeddingperformance and to strike a balance between the accuracyand computational complexity of the model We will alsoinvestigate the feasibility of security knowledge predictionbased on HINTI to infer the potential future relationshipsbetween the vulnerabilities and devices

Acknowledgement

We would like to thank our shepherd Tobias Fiebig andthe anonymous reviewers for providing valuable feedbackon our work We also thank Hao Peng and Lichao Sunfor their feedback on the early version of this work Thiswork was supported in part by National Science Founda-tion grants CNS1950171 CNS-1949753 It was also sup-ported by the NSFC for Innovative Research Group ScienceFund Project (62141003) National Key RampD Program China(2018YFB0803 503) the 2018 joint Research Foundationof Ministry of Education China Mobile (MCM20180507)and the Opening Project of Shanghai Trusted Industrial Con-trol Platform (TICPSH202003020-ZC) Any opinions find-ings and conclusions or recommendations expressed in thismaterial do not necessarily reflect the views of any fundingagencies

References

[1] Jure Leskovec Aditya Grover node2vec Scalable fea-ture learning for networks In Acm Sigkdd InternationalConference on Knowledge Discovery Data Mining2016

[2] Marco Balduzzi Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications In WWW 2016

[3] Sean Barnum Standardizing cyber threat intelligenceinformation with the structured threat information ex-pression (stix) Mitre Corporation 111ndash22 2012

[4] Eric W Burger Michael D Goodman Panos Kam-panakis and Kevin A Zhu Taxonomy model for cyberthreat intelligence information exchange technologiesIn Proceedings of the 2014 ACM Workshop on Infor-mation Sharing amp Collaborative Security pages 51ndash602014

[5] Onur Catakoglu Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications The web conference pages 333ndash3432016

[6] Danqi Chen and Christopher D Manning A fast and ac-curate dependency parser using neural networks pages740ndash750 2014

254 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

[7] Qian Chen and Robert A Bridges Automated behav-ioral analysis of malware a case study of wannacry ran-somware In 16th IEEE ICMLA pages 454ndash460 2017

[8] Weilin Chiang Xuanqing Liu Si Si Yang Li SamyBengio and Chojui Hsieh Cluster-gcn An efficient al-gorithm for training deep and large graph convolutionalnetworks knowledge discovery and data mining pages257ndash266 2019

[9] Roman Danyliw Jan Meijer and Yuri Demchenko Theincident object description exchange format Interna-tional Journal of High Performance Computing Appli-cations 50701ndash92 2007

[10] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 2017

[11] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 ACM 2017

[12] Yujie Fan Yiming Zhang Yanfang Ye and Xin Li Au-tomatic opioid user detection from twitter Transductiveensemble built on different meta-graph based similari-ties over heterogeneous information network In IJCAIpages 3357ndash3363 2018

[13] Fireeye Openioc httpswwwfireeyecomblogthreat-research201310openioc-basicshtml Accessed January 20 2020

[14] Shifu Hou Yanfang Ye Yangqiu Song and Melih Ab-dulhayoglu Hindroid An intelligent android malwaredetection system based on structured heterogeneous in-formation network In Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Dis-covery and Data Mining pages 1507ndash1515 2017

[15] Zhiheng Huang Wei Xu and Kai Yu Bidirectional lstm-crf models for sequence tagging arXiv1508019912015

[16] Michael D Iannacone Shawn Bohn Grant NakamuraJohn Gerth Kelly MT Huffer Robert A Bridges Erik MFerragut and John R Goodall Developing an ontologyfor cyber security knowledge graphs CISR 1512 2015

[17] Thomas Kipf and Max Welling Semi-supervised clas-sification with graph convolutional networks arXivLearning 2016

[18] Thomas N Kipf and Max Welling Semi-supervisedclassification with graph convolutional networks arXivpreprint arXiv160902907 2016

[19] Tero Kokkonen Architecture for the cyber security situ-ational awareness system In Internet of Things SmartSpaces and Next Generation Networks and Systemspages 294ndash302 Springer 2016

[20] Mehmet Necip Kurt Yasin Yılmaz and Xiaodong WangDistributed quickest detection of cyber-attacks in smartgrid IEEE Transactions on Information Forensics andSecurity 13(8)2015ndash2030 2018

[21] John Lafferty Andrew Mccallum and Fernando PereiraConditional random fields Probabilistic models for seg-menting and labeling sequence data ICML pages 282ndash289 2001

[22] Xiaojing Liao Yuan Kan Xiao Feng Wang Li Zhouand Raheem Beyah Acing the ioc game Toward auto-matic discovery and analysis of open-source cyber threatintelligence In ACM Sigsac Conference on ComputerCommunications Security 2016

[23] Rob Mcmillan Open threat intelligencehttpwwwgartnercomdoc2487216definition-threat-intelligence AccessedJanuary 20 2020

[24] Tomas Mikolov Kai Chen Greg Corrado and JeffreyDean Efficient estimation of word representations invector space arXiv preprint arXiv13013781 2013

[25] Sudip Mittal Prajit Kumar Das Varish Mulwad Anu-pam Joshi and Tim Finin Cybertwitter Using twitterto generate alerts for cybersecurity threats and vulnera-bilities In Proceedings of the 2016 IEEE Advances inSocial Networks Analysis and Mining pages 860ndash8672016

[26] Eric Nunes Ahmad Diab Andrew Gunn EricssonMarin Vineet Mishra Vivin Paliath John RobertsonJana Shakarian Amanda Thart and Paulo ShakarianDarknet and deepnet mining for proactive cybersecuritythreat intelligence In 2016 IEEE ISI pages 7ndash12 2016

[27] Shirui Pan Ruiqi Hu Guodong Long Jing Jiang LinaYao and Chengqi Zhang Adversarially regularizedgraph autoencoder for graph embedding pages 2609ndash2615 2018

[28] P Pawlinski P Jaroszewski P Kijewski L SiewierskiP Jacewicz P Zielony and R Zuber Actionable infor-mation for security incident response European UnionAgency for Network and Information Security Herak-lion Greece 2014

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 255

[29] Hao Peng Jianxin Li Qiran Gong Yangqiu SongYuanxing Ning Kunfeng Lai and Philip S Yu Fine-grained event categorization with heterogeneous graphconvolutional networks In Proceedings of the 28th In-ternational Joint Conference on Artificial Intelligencepages 3238ndash3245 AAAI Press 2019

[30] Sara Qamar Zahid Anwar Mohammad Ashiqur Rah-man Ehab Al-Shaer and Bei-Tseng Chu Data-drivenanalytics for cyber-threat intelligence and informationsharing Computers amp Security 6735ndash58 2017

[31] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In USENIXSecurity 2015

[32] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In 24thUSENIX Security pages 1041ndash1056 2015

[33] Deepak Sharma and Avadhesha Surolia Degree Cen-trality Springer New York 2013

[34] Saurabh Singh Pradip Kumar Sharma Seo Yeon MoonDaesung Moon and Jong Hyuk Park A comprehensivestudy on apt attacks and countermeasures for futurenetworks and communications challenges and solutionsJournal of Supercomputing pages 1ndash32 2016

[35] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks Principles and methodologiesSynthesis Lectures on Data Mining and Knowledge Dis-covery 3(2)1ndash159 2012

[36] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks a structural analysis approachACM SIGKDD 14(2)20ndash28 2013

[37] Yizhou Sun Jiawei Han Xifeng Yan Philip S Yu andTianyi Wu Pathsim Meta path-based top-k similaritysearch in heterogeneous information networks Proceed-ings of the VLDB Endowment 4(11)992ndash1003 2011

[38] Yizhou Sun Brandon Norick Jiawei Han Xifeng YanPhilip S Yu and Xiao Yu Pathselclus Integrating meta-path selection with user-guided object clustering in het-erogeneous information networks ACM Transactionson TKDD 7(3)11 2013

[39] Wiem Tounsi and Helmi Rais A survey on technicalthreat intelligence in the age of sophisticated cyber at-tacks Computers amp Security 72212ndash233 2018

[40] Thomas D Wagner Esther Palomar Khaled Mahbuband Ali E Abdallah Towards an anonymity supportedplatform for shared cyber threat intelligence risks andsecurity of internet and systems pages 175ndash183 2017

[41] Chenguang Wang Yangqiu Song Haoran Li MingZhang and Jiawei Han Text classification with het-erogeneous information network kernels In 13th AAAI2016

[42] Xiao Wang Houye Ji Chuan Shi Bai Wang Peng CuiP Yu and Yanfang Ye Heterogeneous graph attentionnetwork 2019

[43] Jie Yang Shuailong Liang and Yue Zhang Design chal-lenges and misconceptions in neural sequence labelingarXiv Computation and Language 2018

[44] Rex Ying Ruining He Kaifeng Chen Pong Eksombat-chai William L Hamilton and Jure Leskovec Graphconvolutional neural networks for web-scale recom-mender systems In Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Dis-covery amp Data Mining pages 974ndash983 ACM 2018

[45] Jiawei Zhang Xiangnan Kong and Philip S Yu Trans-ferring heterogeneous links across location-based socialnetworks In The 7th ACM international conference onWeb search and data mining pages 303ndash312 2014

[46] Yishuai Zhao Bo Lang and Ming Liu Ontology-basedunified model for heterogeneous threat intelligence inte-gration and sharing In 2017 11th IEEE InternationalConference on Anti-counterfeiting Security and Identi-fication (ASID) pages 11ndash15 2017

256 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

  • Introduction
  • Background
    • Cyber Threat Intelligence
    • Motivation
    • Preliminaries
      • Architecture Overview of HINTI
      • Methodology
        • Multi-granular Attention Based IOC Extraction
        • Cyber Threat Intelligence Modeling
        • Threat Intelligence Computing
          • Experimental Evaluation
            • Dataset and Settings
            • Evaluation of IOC Extraction
              • Application of Threat Intelligence Computing
                • Threat Profiling and Significance Ranking of IOCs
                • Attack Preference Modeling
                • Vulnerability Similarity Analysis
                  • Related Work
                  • Discussion
                  • Conclusion
Page 8: Cyber Threat Intelligence Modeling Based on Heterogeneous ... · 2.1Cyber Threat Intelligence Cyber Threat Intelligence (CTI) extracted from security-related data is structured information

of denominator Moreover different from existing similaritymeasures [37] we propose an attention mechanism basedsimilarity measure method by introducing the weight vec-torrarrw = [w1 wm wMprime ] which is a trainable coefficient

vector to learn the importance of different meta-paths forcharacterizing IOCs

Obviously it is computationally expensive to measure thesimilarity among IOCs in the constructed heterogeneousgraph as it usually requires to randomly walk a larger numberof nodes in the graph Fortunately in our work it is unneces-sary to walk through the entire graph as we prescribe a limitby introducing predefined meta-paths and we only focus onthe symmetrical meta-paths presented in Table 1 To calcu-late the similarity between IOCs under different meta-pathinstances we need to compute the corresponding commutingmatrices [37] following the meta-paths

Given a meta-path set P = sumMprime

m A1A2 middot middot middot Al+1 themeta-path based commuting matrix can be defined as CP =UA1A2 UA2A3 middot middot middot UAAl+1 where CP(i j) represents the prob-ability of object iisinA1 reaching object j isinAl+1 under the pathP and is a connection operation These symmetric meta-paths not only efficiently reduce the complexity of walkingbut also ensures that the commuting matrix can be easily de-composed which greatly reduces the computational costs Inaddition the symmetric meta-paths in the graph G allow us toleverage the pairwise random-walk [37] to further acceleratethe computation

With Eq (7) and pairwise random-walk we can obtain thesimilarity embedding between any two IOCs hi and h j undera meta-path set P Based on the low-dimensional similarityembedding we derive a weighted adjacent matrix betweenIOCs denoted as Ai isin RNtimesN where N is the number of aspecific type of IOC in G Meanwhile to utilize the attributedinformation of nodes we train a word2vec model [24] toembed the attribute information of nodes into a feature matrixXi isin RNtimesd where N is the number of IOCs in Ai and d is thedimension of node feature With the learned adjacency matrixAi and its feature matrix Xi we can leverage the classicalGCN [18] to characterize the relationship between IOC hiand h j Particularly the layer-wise propagation rule of GCNcan be defined as below

H(l+1) = σ(Dminus12 ADminus

12 H(l)W (l)) (8)

where A = A+ IN is the adjacency matrix of IOCs with self-connections IN is the identity matrix Dii =sum j Ai j and W (l) isa l-th layer trainable weight matrix σ(middot) denotes an activationfunction such as relu H(l) isin RNtimesd is the matrix of activationin the l-th layer We perform graph convolution [18] on Aiand Xi to generate the embedding Z between IOCs belongingto type i

Z = f (XiAi) = σ(Ai middot relu(AiXiW(0)i )W (1)

i ) (9)

where W (0)i isin RdtimesH is an input-to-hidden weight matrix for a

hidden layer with H feature mapsW (1)i isinRHtimesF is a hidden-to-

output weight matrix with F feature maps in the output layerXi isin RNtimesd N is the number of a specific type of IOCs d is thedimension of their corresponding features and σ is anotheractivation function such as sigmoid Ai = Dminus

12 AiDminus

12 can be

calculated offline Here we leverage the cross-entropy loss tooptimize the performance of our proposed threat intelligenceframework written as follows

Loss(Yl f Zl f ) =minussumlisinYl

F

sumf=1

Yl f middot lnZl f (10)

where Yl is the set of node indices that have labels Yl f is thereal label and Zl f is a corresponding label that our modelpredicts Based on Eq (10) we conduct stochastic gradientdescent to continuously optimize the neural network weightsW (0)

i W (1)i and

rarrw to reduce the loss and build a general

threat intelligence computing framework Using this frame-work security organizations are able to mine richer securityknowledge hidden in the interdependent relationships amongIOCs

5 Experimental Evaluation

51 Dataset and SettingsWe develop a threat data collector to automatically collectcyber threat data from a set of sources including 73 inter-national security blogs (eg fireeye cloudflare) hacker fo-rum posts (eg Blackhat Hack5) security bulletins (egMicrosoft Cisco) CVE detail description and ExploitDB Acomplete list of data sources is presented in the Baidu cloud8We set up a daemon to collect the newly generated securityevents every day So far more than 245786 security-relateddata describing threat events have been collected For trainingand evaluating our proposed IOC extraction method 30000samples from 5000 texts are annotated by utilizing the B-I-Osequence tagging method (see Section 22 for the example)and an annotation example is shown in Figure 2

For 30000 labeled samples we randomly select 60 ofsamples as a training set 20 of samples as a verificationset and the rest of the samples as our test set Based on thedata sets we comprehensively evaluate the performance ofHINTI for extracting IOCs and threat intelligence computingWe run all of the experiments on 16 cores Intel(R) Core(TM)i7-6700 CPU 340GHz with 64GB RAM and 4times NVIDIATesla K80 GPU The software programs are executed on theTensorFlow-GPU framework on Ubuntu 1604

52 Evaluation of IOC ExtractionA set of experiments are conducted to evaluate the sensitiv-ity of different parameters in the multi-granular based IOC

8httpspanbaiducoms1J631WMYY_T_awa8aY5xy3A

248 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

extraction model We mainly consider 8 hyper-parametersthat seriously impact the performance of the model as shownin Table 2 More specifically Embedding_dim is one ofthe most important factors that determine the generaliza-tion capability of the model Here we fix other parame-ters while fine-tuning the embedding size in the range of(50100150200250300350400) Experimental resultsshow that the accuracy of extracted IOC achieves the bestwhen Embedding_dim=300 Learning_rate is another ma-jor factor for determining the stride of gradient descent inminimizing the loss function which determines whetherthe model can find a global optimal solution We fix otherparameters to fine-tune the Learning_rate in the rangeof (000100050010050105) and the performancereaches the best when the Learning_rate=0001 Similarlywe fine-tune the other hyper-parameters with 5000 epochsand the hyper-parameters allowing our model to perform op-timally are recorded in Table 2

Table 2 Hyperparameters setting in the multi-granular basedIOC extraction method

Parameter value Parameter Value

Embedding_dim 300 Hidden_dim 128

Sequence_length 500 Epoch_num 5000

Learning_rate 0001 Batch_size 64

Dropout_rate 05 Optimizer Adam

Table 3 Performance of IOC extraction wrt IOC types

IOC Type Precision Recall Micro-F1

IP 9956 9952 9954File 9436 9688 9560Type 9986 9981 9983

Email 9932 9987 9949Device 9326 9278 9302Vender 9307 9445 9424Version 9698 9799 9748Domain 9658 9589 9623Software 8825 8931 8878Function 9503 9559 9531Platform 9431 9257 9343Malware 8976 9123 9049

Vulnerability 9925 9873 9899Other 9829 9842 9835

In this paper we extract 13 major types of IOCs and the

Figure 7 Performance of IOC extraction using embeddingfeatures with different granularity

performance is presented in Table 3 Overall our IOC ex-traction method demonstrates excellent performance in termsof precision recall and Micro-F1 (ie micro-averaged F1-score) for most types of IOCs such as function maliciousIP and device However we observe a performance degra-dation when recognizing software and malware This can beattributed to the fact that most software and malware is namedby random strings such as md5 hash Moreover we find thatthe number of training samples impacts the performance ofthe model Specifically the performance becomes unsatisfac-tory (eg Software Malware) when the number of a certaintype of training samples is insufficient (ie less than 5000)

In order to verify the effectiveness of multi-granular embed-ding features we assess the performance of IOC extractionwith features of different granularity including char-level1-gram 2-gram 3-gram and multi-granular features The ex-perimental results are demonstrated in Figure 7 from whichwe can observe that the proposed multi-granular embeddingfeature outperforms others since it leverages the attentionmechanism to simultaneously learn multi-granular features tocharacterize different patterns of IOCs

Next to verify the effectiveness of the proposed IOC ex-traction method we compare it with the state-of-the-art en-tity recognition approaches including general NER toolsNLTK NER9 and Stanford NER10 professional IOC extrac-tion method Stucco [16] and iACE [22] and popular en-tity recognition approaches CRF [21] BiLSTM and BiL-STM+CRF [15] The experimental results of different meth-ods on real-world data are demonstrated in Table 4

The results indicate that our proposed IOC extraction out-performs the state-of-the-art entity recognition methods andtools in terms of precision recall and Micro-F1 and its im-provement can be attributed to the following factors Firstcompared with Standford NER and NLTK NER the NLP tools

9httpwwwnltkorgbookch07html10httpsstanfordnlpgithubioCoreNLPnerhtml

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 249

Table 4 Performance of threat entity recognition usingdifferent methods

Method Accuracy Precision Micro-F1

NLTK NER 6945 6851 6749

Stanford NER 6835 6674 6858

iACE 9214 9126 9225

Stucco 9116 9221 9147

CRF 9264 9180 9265

BiLSTM 9478 9521 9435

BiLSTM+CRF 9638 9642 9627

Multi-granular 9859 9872 9869

trained with general news corpora our method uses a security-related training corpus collected and labeled by ourselves asa data source for training our model Second different fromthe rule-based extraction approaches (eg iACE and Stucco)our proposed deep learning based method provides an end-to-end system with more advanced features to represent variousIOCs Third comparing to RNN-based methods (eg BiL-STM and BiLSTM+CRF) our proposed method brings inmulti-granular embedding sizes (char-level 1-gram 2-gramand 3-gram) to simultaneously learn the characteristics of vari-ous sizes and types of IOCs which can identify more complexand irregular IOCs Last but not the least our method imple-ments an attention mechanism to learn the weights of featureswith various scales to effectively characterize different typesof IOCs further enhancing the IOC recognition accuracy

6 Application of Threat Intelligence Comput-ing

Our proposed threat intelligence computing framework basedon heterogeneous graph convolutional networks can be usedto mine novel security knowledge behind heterogeneous IOCsIn this section we evaluate its effectiveness and applicabilityusing three real-world applications profiling and ranking forCTIs attack preference modeling and vulnerability similarityanalysis

61 Threat Profiling and Significance Rankingof IOCs

Due to the disparity in the significance of threats it is im-portant to derive the threat profile and rank the significanceof IOCs for demystifying the landscape of threats Howevermost of the existing CTIs are incapable of modeling the asso-ciated relationships between heterogeneous IOCs

Different from isolated CTIs HINTI leverages HIN to

model the interdependent relationships among IOCs withtwo characteristics first the isolated IOCs can be integratedinto a graph-based HIN to clearly display the associated rela-tionships among IOCs which is capable of directly depictingthe basic threat profile For example Figure 3 depicts a threatprofiling sample an attacker utilizes CVE-2017-0143 vulnera-bility to invade Vista SP2 and Win7 SP1 devices belonging tothe Microsoft platform and CVE-2017-0143 is a remote codeexecution vulnerability that uses a ldquoSMBbat malicious fileSecond the significance of IOCs in HINTI can be naturallyranked based on the proposed threat intelligence computingframework

Table 5 shows the top 5 authoritative ranking score [35] ofvulnerability attacker attack type and platform from whichsecurity experts can gain a clear insight into the impact ofeach IOC Degree centrality [33] which describes the numberof links incident upon a node is widely used in evaluatingthe importance of a node in a graph It can used to quantifythe immediate risk of a node that connects with other nodesfor delivering network flows such as virus spreading Heredegree centrality can be utilized in verifying the effectivenessof the proposed threat intelligence computing framework inranking the importance of IOCs It is worth noting that bothour ranking method and degree centrality work regardless ofthe time of attacks We compute the degree centrality rank-ing of IOCs based on the fact that the node with a higherdegree centrality is more important than a node with a lowerone For instance if the degree centrality of a vulnerabilityis higher it indicates that this vulnerability is exploited bymore attackers or it affects more devices The ranking resultof degree centrality shown in Table 5 is consistent with theranking result based on the proposed threat intelligence com-puting framework demonstrating the capability of the CTIcomputing framework in ranking the importance of differenttypes of IOCs

62 Attack Preference Modeling

Attack preference modeling is meaningful for security organi-zations to gain insight into the attack intention of attackersbuild attack portraits and develop personalized defense strate-gies Here we leverage HINTI to integrate different types ofIOCs and their interdependent relationships to comprehen-sively depict the picture of attack events which helps modelthe attack preferences With the proposed threat intelligencecomputing framework we model attack preferences by clus-tering the embedded attackersrsquo vectors

In this task each malicious IP address is treated as an in-truder and its attack preferences are mainly reflected in threefeatures including the platforms it destroys (including Win-dows Linux Unix ASP Android Apache etc) the industriesit invades (eg education finance government Internet ofThings and Industrial control system etc) and the exploittypes it employs (eg DOS Buffer overflow Execute code

250 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

Table 5 The significance ranking of different types of IOCs (CV E1 CVE-2017-0146 CV E2 CVE-2006-5911 CV E3 CVE-2008-6543 CV E4 CVE-2012-1199 CV E4 CVE-2006-4985 AR Authoritative Ranking DC Degree Centrality value)

Vulnerability Attacker Platform Attack Type

No AR DC Monicker AR DC Category AR DC Exploit_type AR DC

CV E1 02713 7643 Meatsploit 02764 549 PHP 04562 17865 Webapps 05494 11648

CV E2 02431 7124 GSR team 01391 327 windows 02242 13793 DOS 01772 8741

CV E3 02132 6833 Ihsan 00698 279 Linux 00736 8792 Overflow 01533 7652

CV E4 01826 6145 Techsa 00695 247 Linux86 00623 8147 CSRF 00966 5433

CV E5 01739 5637 Aurimma 00622 204 ASP 00382 5027 SQL 00251 2171

(a) AV DV T AT (P13) (b) AV DPDTV T AT (P17) (c) AV PV T AT (P14)

Figure 8 The performance of attack preference modeling with different meta-paths in which the preference of attacker i isreduced to a two-dimensional space (xiyi) and each cluster represents a group with a specific attack preference

Sql injection XSS Gain information etc)Specifically we first utilize our proposed threat intelligence

computing framework to embed each attacker into a low-dimensional vector space and then perform DBSCAN algo-rithm on the embedded vector to cluster attackers with thesame preferences into corresponding groups Figure 8 showsthe top 3 clustering results under different types of meta-paths in which the meta-path AV DPDTV T AT (P17) performsthe best performance with compact and well-separated clus-ters indicating that it contains richer semantic relationshipsfor characterizing attack preferences than other meta-paths

To verify the effectiveness of attack preference modelingwe identify 5297 distinct attackers (each unique IP address istreated as an attacker) who have submitted at least 10 cyberattacks For these attackers five cybersecurity researchersconsisting of three doctoral and two master students spentabout fortnight to manually annotate their attack preferencesfrom three perspectives the platforms they destroyed the in-dustries they attacked and the attack types they exploited Toensure the accuracy of data labeling we test the consistencyof the tags for the 5297 attackers and remove the sampleswith ambiguous tags As a result we obtain 3000 sampleswith consistent tags Based on the labeled samples we furtherevaluate the performance of different meta-paths on model-

Table 6 Performance of modeling attack preference withdifferent meta-paths

Metapath Accuracy Precision Micro-F1

P1 7431 7622 7525

P4 7116 7327 7216

P5 6915 7143 7027

P12 7214 7646 7424

P13 7965 8131 8047

P14 7748 7934 7840

P15 8017 7976 7996

P17 8139 8172 8155

ing attack preferences In the attack modeling scenario weonly focus on the meta-paths that both the start node and theend node are attackers The experimental results are demon-strated in Table 6 Obviously different meta-paths displaydifferent abilities in characterizing the attack preferences ofcyber intruders The performance when using P17 is more

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 251

superior than the one with other meta-paths which indicatesthat P17 holds more valuable information that characterizesthe attack preferences of cybercriminals since P17 includesthe semantics information of P1P4P5 and P12 sim P15

In addition we compare the capabilities of our proposedcomputing framework with those of other state-of-the-art em-bedding methods in terms of attack preference modelingOur analysis result shows that the accuracy of attack pref-erence modeling reaches 081 which outperforms the exist-ing popular models Node2vec (with precision of 071) [1]metapaht2vec (with precision of 073) [11] and HAN (withprecision of 076) [42] The performance improvement canbe attributed to the following characteristics First our com-puting framework utilizes weight-learning to learn the signifi-cance of different meta-paths for evaluating the similarity be-tween attackers Second the proposed computing frameworkleverages GCN to learn the structural information betweenattackers to obtain more discriminative structural features thatimproves the performance of attack preference modeling

63 Vulnerability Similarity AnalysisVulnerability classification or clustering is crucial for conduct-ing vulnerability trend analysis the correlation analysis ofincidents and exploits and the evaluation of countermeasuresThe traditional vulnerability analysis relies on the manualinvestigation of the source codes which requires expert ex-pertise and consumes considerable efforts In this sectionwe propose an unsupervised vulnerability similarity analy-sis method based on the proposed threat intelligence com-puting framework which can automatically group similarvulnerabilities into corresponding communities Particularlythe vulnerability-related IOCs are first embedded into a low-dimensional vector space using CTI computing frameworkThen the DBSCAN algorithm is performed on the embed-ded vector space to cluster vulnerabilities into correspondingcommunities The clustering results are presented in Figure 9

Figure 9 (c) shows all vulnerabilities are clustered into 12clusters using meta-path V DPDTV T (P16) which is very closeto the classification standard (ie 13) recommended by CVEdetails an authoritative database that publishes vulnerabilityinformation By manually analyzing the training samples wefind that HTTP Response Splitting vulnerability does not ap-pear in our dataset Therefore our cluster number (ie 12)is consistent with CVE Details11 To further validate the ef-fectiveness of threat intelligence computing framework forvulnerability clustering we randomly select 100 vulnerabili-ties from each cluster for manual inspection to measure theconsistency of the vulnerability types in each cluster and theresults are presented in Table 7 Obviously the clustering per-formance of cluster 8 (ie File Inclusion) and cluster 10 (ieDirectory Traversal) is remarkably worse than other clustersTo explain the reason we examine our training data and the

11httpswwwcvedetailscom

Table 7 Accuracy of vulnerability clustering

Cluster ID Vulnerability type Accuracy

cluster1 Denial of Service 8012

cluster2 XSS 8353

cluster3 Execute Code 8150

cluster4 Overflow 7650

cluster5 Gain Privilege 9156

cluster6 Bypass Something 7174

cluster7 CSRF 9327

cluster8 File Inclusion 6172

cluster9 Gain Informa 7042

cluster10 Directory Traversal 6949

cluster11 Memory Corruption 8156

cluster12 SQL Injection 8067

average 7851

computing framework We found that the proportion of thesetwo types of vulnerabilities is too small (cluster 8 is 34and cluster 10 is 42) making our computing frameworkvery likely to be under-fit with insufficient data Howeverthe proposed computing framework performs well on mosttypes of vulnerabilities in an unsupervised manner especiallygiven sufficient samples (eg cluster 7 is 176 and cluster 5is 157)

In addition by examining the clustering results we havean observation that the vulnerabilities in the same cluster arelikely to have evolutionary relationships For instance CVE-2018-0802 an office zero-day vulnerability is evolved fromthe CVE-2017-11882 They both include EQNEDT32exe fileused to edit the formula in Office software which allowsremote attackers to execute arbitrary codes by constructinga malformed font name The modeling and computation ofinterdependent relationships among IOCs in HINTI facilitatethe discovery of such intricate connections between vulnera-bilities

In summary HINTI is capable of depicting a more compre-hensive threat landscape and the proposed CTI computingframework has the ability to bring novel security insightstoward different real-world security applications Howeverthere are still numerous opportunities for enhancing thesesecurity applications Specifically for attack preference mod-eling although each individual IP address is treated as anattacker we cannot determine whether it belongs to a realattacker or is disguised by a proxy Fortunately even if thereal attack address cannot be captured understanding the at-tack preferences of these IP proxies which are widely usedin cybercrime is also meaningful for gaining insight into thecyber threats For vulnerability similarity analysis data imbal-

252 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

(a) AV DV T AT (P13) (b) V DV T (P10) (c) V DPDTV T (P16)

Figure 9 Illustration of the vulnerability similarity analysis based on different meta-paths in which vulnerability i can be reducedinto a two-dimensional space (xiyi) and each cluster indicates a particular type of vulnerability

ance issue affects the performance of model and inadequatetraining samples often result in model underfitting as shownin the case of cluster 8 and cluster 10

7 Related Work

Cyber Treat Intelligence An increasing number of securityvendors and researchers start exploring CTI for protectingsystem security and defending against new threat vectors [28]Existing CTI extraction tools such as IBM X-Force12 Threatcrowd13 Openctiio14 AlienVault15 CleanMX16 and Phish-Tank17 use regular expression to synthesize IOC from thedescriptive texts However these methods often producehigh false positive rate by misjudging legitimate entities asIOCs [22]

Recently Balzarotti et al [2] develop a system to extractIOCs from web pages and identify malicious URLs fromJavaScript codes Sabottke et al [31] propose to detect po-tential vulnerability exploits by extracting and analyzing thetweets that contain ldquoCVErdquo keyword Liao et al [22] present atool iACE for automatically extracting IOCs which excelsat processing technology articles Nevertheless iACE identi-fies IOCs from a single article which does not consider therich semantic information from multi-source texts Zhao etal [46] define different ontologies to describe the relationshipbetween entities based on expert knowledge Numerous popu-lar CTI platforms including IODEF [9] STIX [3] TAXII [40]OpenIOC [13] and CyBox [19] focus on extracting and shar-ing CTI Yet none of the existing approaches could uncoverthe interdependent relations among CTIs extracted from multi-source texts let alone quantifying CTIsrsquo relevance and miningvaluable threat intelligence hidden behind the isolated CTIs

12httpsexchangexforceibmcloudcom13httpswwwthreatcrowdorg14httpsdemoopenctiio15 httpsotxalienvaultcom16httplistclean-mxcom17httpswwwphishtankcom

Heterogeneous Information Network Real-world systemsoften contain a large number of interacting multi-typed ob-jects which can naturally be expressed as a heterogeneousinformation network (HIN) HIN as a conceptual graph repre-sentation can effectively fuse information and exploit richersemantics in interacting objects and links [37] HIN has beenapplied to network traffic analysis [38] public social mediadata analysis [45] and large-scale document analysis [41]Recent applications of HIN include mobile malware detec-tion [14] and opioid user identification [12] In this paper forthe first time we use HIN for CTI modelingGraph Convolutional Network Graph convolutional net-works (GCN) [17] has become an effective tool for address-ing the task of machine learning on graphs such as semi-supervised node classification [17] event classification [29]clustering [8] link prediction [27] and recommended sys-tem [44] Given a graph GCN can directly conduct the con-volutional operation on the graph to learn the nonlinear em-bedding of nodes In our work to discern and reveal the inter-active relationships between IOCs we utilize GCN to learnmore discriminative representation from attributes and graphstructure simultaneously which is the premise for threat intel-ligence computing

8 Discussion

Data Availability The proposed framework assumes thatsufficient threat description data can be obtained for generat-ing comprehensive and the latest CTIs Fortunately with thegrowing prosperity of social media an increasing number ofsecurity-related data (eg blogs posts news and open secu-rity databases) can be collected effortlessly To automaticallycollect security-related data we develop a crawler system tocollect threat description data from 73 international securitysources (eg blogs hacker forum posts security bulletinsetc) providing sufficient raw materials for generating cyberthreat intelligenceModel Extensibility In this paper 6 types of IOCs and 9

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 253

types of relationships are modeled in HINTI However ourproposed framework is extensible in which more types ofIOCs and relationships can be enrolled to represent richer andmore comprehensive threat information such as maliciousdomains phishing Emails attack tools their interactions etcHigh-level Semantic Relations In view of the computa-tional complexity of the model our threat intelligence comput-ing method focues on utilizing the meta-paths to quantify thesimilarity between IOCs while ignoring the influence of themeta-graph on it which inevitably misses characterizing somehigh-level semantic information Nevertheless the proposedcomputing framework introduces an attention mechanism tolearn the signification of different meta-paths to character-ize IOCs and their interactive relationships which effectivelycompensates for the performance degradation caused by ig-noring the meta-graphsSecurity Knowledge Reasoning Although our proposedframework exhibits promising results in CTI extraction andmodeling computing how to implement advanced securityknowledge reasoning and prediction is still an open problemeg it remains challenging to predict whether a vulnerabilitycould potentially affect a particular type of devices in thefuture We will investigate this problem in the future

9 Conclusion

This work explores a new direction of threat intelligence com-puting which aims to uncover new knowledge in the relation-ships among different threat vectors We propose HINTI acyber threat intelligence framework to model and quantify theinterdependent relationships among different types of IOCsby leveraging heterogeneous graph convolutional networksWe develop a multi-granular attention mechanism to learnthe importance of different features and model the interde-pendent relationships among IOCs using HIN We proposethe concept of threat intelligence computing and present ageneral intelligence computing framework based on graphconvolutional networks Experimental results demonstratethat the proposed multi-granular attention based IOC extrac-tion method outperforms the existing state-of-the-art methodsThe proposed threat intelligence computing framework caneffectively mine security knowledge hidden in the interdepen-dent relationships among IOCs which enables crucial threatintelligence applications such as threat profiling and rankingattack preference modeling and vulnerability similarity analy-sis We would like to emphasize that the knowledge discoveryamong interdependent CTIs is a new field that calls for acollaborative effort from security experts and data scientists

In future we plan to develop a predicative and reasoningmodel based on HINTI and explore preventative countermea-sures to protect cyber infrastructure from future threats Wealso plan to add more types of IOCs and relations to depicta more comprehensive threat landscape Moreover we willleverage both meta-paths and meta-graphs to characterize the

IOCs and their interactions to further improve the embeddingperformance and to strike a balance between the accuracyand computational complexity of the model We will alsoinvestigate the feasibility of security knowledge predictionbased on HINTI to infer the potential future relationshipsbetween the vulnerabilities and devices

Acknowledgement

We would like to thank our shepherd Tobias Fiebig andthe anonymous reviewers for providing valuable feedbackon our work We also thank Hao Peng and Lichao Sunfor their feedback on the early version of this work Thiswork was supported in part by National Science Founda-tion grants CNS1950171 CNS-1949753 It was also sup-ported by the NSFC for Innovative Research Group ScienceFund Project (62141003) National Key RampD Program China(2018YFB0803 503) the 2018 joint Research Foundationof Ministry of Education China Mobile (MCM20180507)and the Opening Project of Shanghai Trusted Industrial Con-trol Platform (TICPSH202003020-ZC) Any opinions find-ings and conclusions or recommendations expressed in thismaterial do not necessarily reflect the views of any fundingagencies

References

[1] Jure Leskovec Aditya Grover node2vec Scalable fea-ture learning for networks In Acm Sigkdd InternationalConference on Knowledge Discovery Data Mining2016

[2] Marco Balduzzi Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications In WWW 2016

[3] Sean Barnum Standardizing cyber threat intelligenceinformation with the structured threat information ex-pression (stix) Mitre Corporation 111ndash22 2012

[4] Eric W Burger Michael D Goodman Panos Kam-panakis and Kevin A Zhu Taxonomy model for cyberthreat intelligence information exchange technologiesIn Proceedings of the 2014 ACM Workshop on Infor-mation Sharing amp Collaborative Security pages 51ndash602014

[5] Onur Catakoglu Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications The web conference pages 333ndash3432016

[6] Danqi Chen and Christopher D Manning A fast and ac-curate dependency parser using neural networks pages740ndash750 2014

254 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

[7] Qian Chen and Robert A Bridges Automated behav-ioral analysis of malware a case study of wannacry ran-somware In 16th IEEE ICMLA pages 454ndash460 2017

[8] Weilin Chiang Xuanqing Liu Si Si Yang Li SamyBengio and Chojui Hsieh Cluster-gcn An efficient al-gorithm for training deep and large graph convolutionalnetworks knowledge discovery and data mining pages257ndash266 2019

[9] Roman Danyliw Jan Meijer and Yuri Demchenko Theincident object description exchange format Interna-tional Journal of High Performance Computing Appli-cations 50701ndash92 2007

[10] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 2017

[11] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 ACM 2017

[12] Yujie Fan Yiming Zhang Yanfang Ye and Xin Li Au-tomatic opioid user detection from twitter Transductiveensemble built on different meta-graph based similari-ties over heterogeneous information network In IJCAIpages 3357ndash3363 2018

[13] Fireeye Openioc httpswwwfireeyecomblogthreat-research201310openioc-basicshtml Accessed January 20 2020

[14] Shifu Hou Yanfang Ye Yangqiu Song and Melih Ab-dulhayoglu Hindroid An intelligent android malwaredetection system based on structured heterogeneous in-formation network In Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Dis-covery and Data Mining pages 1507ndash1515 2017

[15] Zhiheng Huang Wei Xu and Kai Yu Bidirectional lstm-crf models for sequence tagging arXiv1508019912015

[16] Michael D Iannacone Shawn Bohn Grant NakamuraJohn Gerth Kelly MT Huffer Robert A Bridges Erik MFerragut and John R Goodall Developing an ontologyfor cyber security knowledge graphs CISR 1512 2015

[17] Thomas Kipf and Max Welling Semi-supervised clas-sification with graph convolutional networks arXivLearning 2016

[18] Thomas N Kipf and Max Welling Semi-supervisedclassification with graph convolutional networks arXivpreprint arXiv160902907 2016

[19] Tero Kokkonen Architecture for the cyber security situ-ational awareness system In Internet of Things SmartSpaces and Next Generation Networks and Systemspages 294ndash302 Springer 2016

[20] Mehmet Necip Kurt Yasin Yılmaz and Xiaodong WangDistributed quickest detection of cyber-attacks in smartgrid IEEE Transactions on Information Forensics andSecurity 13(8)2015ndash2030 2018

[21] John Lafferty Andrew Mccallum and Fernando PereiraConditional random fields Probabilistic models for seg-menting and labeling sequence data ICML pages 282ndash289 2001

[22] Xiaojing Liao Yuan Kan Xiao Feng Wang Li Zhouand Raheem Beyah Acing the ioc game Toward auto-matic discovery and analysis of open-source cyber threatintelligence In ACM Sigsac Conference on ComputerCommunications Security 2016

[23] Rob Mcmillan Open threat intelligencehttpwwwgartnercomdoc2487216definition-threat-intelligence AccessedJanuary 20 2020

[24] Tomas Mikolov Kai Chen Greg Corrado and JeffreyDean Efficient estimation of word representations invector space arXiv preprint arXiv13013781 2013

[25] Sudip Mittal Prajit Kumar Das Varish Mulwad Anu-pam Joshi and Tim Finin Cybertwitter Using twitterto generate alerts for cybersecurity threats and vulnera-bilities In Proceedings of the 2016 IEEE Advances inSocial Networks Analysis and Mining pages 860ndash8672016

[26] Eric Nunes Ahmad Diab Andrew Gunn EricssonMarin Vineet Mishra Vivin Paliath John RobertsonJana Shakarian Amanda Thart and Paulo ShakarianDarknet and deepnet mining for proactive cybersecuritythreat intelligence In 2016 IEEE ISI pages 7ndash12 2016

[27] Shirui Pan Ruiqi Hu Guodong Long Jing Jiang LinaYao and Chengqi Zhang Adversarially regularizedgraph autoencoder for graph embedding pages 2609ndash2615 2018

[28] P Pawlinski P Jaroszewski P Kijewski L SiewierskiP Jacewicz P Zielony and R Zuber Actionable infor-mation for security incident response European UnionAgency for Network and Information Security Herak-lion Greece 2014

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 255

[29] Hao Peng Jianxin Li Qiran Gong Yangqiu SongYuanxing Ning Kunfeng Lai and Philip S Yu Fine-grained event categorization with heterogeneous graphconvolutional networks In Proceedings of the 28th In-ternational Joint Conference on Artificial Intelligencepages 3238ndash3245 AAAI Press 2019

[30] Sara Qamar Zahid Anwar Mohammad Ashiqur Rah-man Ehab Al-Shaer and Bei-Tseng Chu Data-drivenanalytics for cyber-threat intelligence and informationsharing Computers amp Security 6735ndash58 2017

[31] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In USENIXSecurity 2015

[32] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In 24thUSENIX Security pages 1041ndash1056 2015

[33] Deepak Sharma and Avadhesha Surolia Degree Cen-trality Springer New York 2013

[34] Saurabh Singh Pradip Kumar Sharma Seo Yeon MoonDaesung Moon and Jong Hyuk Park A comprehensivestudy on apt attacks and countermeasures for futurenetworks and communications challenges and solutionsJournal of Supercomputing pages 1ndash32 2016

[35] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks Principles and methodologiesSynthesis Lectures on Data Mining and Knowledge Dis-covery 3(2)1ndash159 2012

[36] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks a structural analysis approachACM SIGKDD 14(2)20ndash28 2013

[37] Yizhou Sun Jiawei Han Xifeng Yan Philip S Yu andTianyi Wu Pathsim Meta path-based top-k similaritysearch in heterogeneous information networks Proceed-ings of the VLDB Endowment 4(11)992ndash1003 2011

[38] Yizhou Sun Brandon Norick Jiawei Han Xifeng YanPhilip S Yu and Xiao Yu Pathselclus Integrating meta-path selection with user-guided object clustering in het-erogeneous information networks ACM Transactionson TKDD 7(3)11 2013

[39] Wiem Tounsi and Helmi Rais A survey on technicalthreat intelligence in the age of sophisticated cyber at-tacks Computers amp Security 72212ndash233 2018

[40] Thomas D Wagner Esther Palomar Khaled Mahbuband Ali E Abdallah Towards an anonymity supportedplatform for shared cyber threat intelligence risks andsecurity of internet and systems pages 175ndash183 2017

[41] Chenguang Wang Yangqiu Song Haoran Li MingZhang and Jiawei Han Text classification with het-erogeneous information network kernels In 13th AAAI2016

[42] Xiao Wang Houye Ji Chuan Shi Bai Wang Peng CuiP Yu and Yanfang Ye Heterogeneous graph attentionnetwork 2019

[43] Jie Yang Shuailong Liang and Yue Zhang Design chal-lenges and misconceptions in neural sequence labelingarXiv Computation and Language 2018

[44] Rex Ying Ruining He Kaifeng Chen Pong Eksombat-chai William L Hamilton and Jure Leskovec Graphconvolutional neural networks for web-scale recom-mender systems In Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Dis-covery amp Data Mining pages 974ndash983 ACM 2018

[45] Jiawei Zhang Xiangnan Kong and Philip S Yu Trans-ferring heterogeneous links across location-based socialnetworks In The 7th ACM international conference onWeb search and data mining pages 303ndash312 2014

[46] Yishuai Zhao Bo Lang and Ming Liu Ontology-basedunified model for heterogeneous threat intelligence inte-gration and sharing In 2017 11th IEEE InternationalConference on Anti-counterfeiting Security and Identi-fication (ASID) pages 11ndash15 2017

256 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

  • Introduction
  • Background
    • Cyber Threat Intelligence
    • Motivation
    • Preliminaries
      • Architecture Overview of HINTI
      • Methodology
        • Multi-granular Attention Based IOC Extraction
        • Cyber Threat Intelligence Modeling
        • Threat Intelligence Computing
          • Experimental Evaluation
            • Dataset and Settings
            • Evaluation of IOC Extraction
              • Application of Threat Intelligence Computing
                • Threat Profiling and Significance Ranking of IOCs
                • Attack Preference Modeling
                • Vulnerability Similarity Analysis
                  • Related Work
                  • Discussion
                  • Conclusion
Page 9: Cyber Threat Intelligence Modeling Based on Heterogeneous ... · 2.1Cyber Threat Intelligence Cyber Threat Intelligence (CTI) extracted from security-related data is structured information

extraction model We mainly consider 8 hyper-parametersthat seriously impact the performance of the model as shownin Table 2 More specifically Embedding_dim is one ofthe most important factors that determine the generaliza-tion capability of the model Here we fix other parame-ters while fine-tuning the embedding size in the range of(50100150200250300350400) Experimental resultsshow that the accuracy of extracted IOC achieves the bestwhen Embedding_dim=300 Learning_rate is another ma-jor factor for determining the stride of gradient descent inminimizing the loss function which determines whetherthe model can find a global optimal solution We fix otherparameters to fine-tune the Learning_rate in the rangeof (000100050010050105) and the performancereaches the best when the Learning_rate=0001 Similarlywe fine-tune the other hyper-parameters with 5000 epochsand the hyper-parameters allowing our model to perform op-timally are recorded in Table 2

Table 2 Hyperparameters setting in the multi-granular basedIOC extraction method

Parameter value Parameter Value

Embedding_dim 300 Hidden_dim 128

Sequence_length 500 Epoch_num 5000

Learning_rate 0001 Batch_size 64

Dropout_rate 05 Optimizer Adam

Table 3 Performance of IOC extraction wrt IOC types

IOC Type Precision Recall Micro-F1

IP 9956 9952 9954File 9436 9688 9560Type 9986 9981 9983

Email 9932 9987 9949Device 9326 9278 9302Vender 9307 9445 9424Version 9698 9799 9748Domain 9658 9589 9623Software 8825 8931 8878Function 9503 9559 9531Platform 9431 9257 9343Malware 8976 9123 9049

Vulnerability 9925 9873 9899Other 9829 9842 9835

In this paper we extract 13 major types of IOCs and the

Figure 7 Performance of IOC extraction using embeddingfeatures with different granularity

performance is presented in Table 3 Overall our IOC ex-traction method demonstrates excellent performance in termsof precision recall and Micro-F1 (ie micro-averaged F1-score) for most types of IOCs such as function maliciousIP and device However we observe a performance degra-dation when recognizing software and malware This can beattributed to the fact that most software and malware is namedby random strings such as md5 hash Moreover we find thatthe number of training samples impacts the performance ofthe model Specifically the performance becomes unsatisfac-tory (eg Software Malware) when the number of a certaintype of training samples is insufficient (ie less than 5000)

In order to verify the effectiveness of multi-granular embed-ding features we assess the performance of IOC extractionwith features of different granularity including char-level1-gram 2-gram 3-gram and multi-granular features The ex-perimental results are demonstrated in Figure 7 from whichwe can observe that the proposed multi-granular embeddingfeature outperforms others since it leverages the attentionmechanism to simultaneously learn multi-granular features tocharacterize different patterns of IOCs

Next to verify the effectiveness of the proposed IOC ex-traction method we compare it with the state-of-the-art en-tity recognition approaches including general NER toolsNLTK NER9 and Stanford NER10 professional IOC extrac-tion method Stucco [16] and iACE [22] and popular en-tity recognition approaches CRF [21] BiLSTM and BiL-STM+CRF [15] The experimental results of different meth-ods on real-world data are demonstrated in Table 4

The results indicate that our proposed IOC extraction out-performs the state-of-the-art entity recognition methods andtools in terms of precision recall and Micro-F1 and its im-provement can be attributed to the following factors Firstcompared with Standford NER and NLTK NER the NLP tools

9httpwwwnltkorgbookch07html10httpsstanfordnlpgithubioCoreNLPnerhtml

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 249

Table 4 Performance of threat entity recognition usingdifferent methods

Method Accuracy Precision Micro-F1

NLTK NER 6945 6851 6749

Stanford NER 6835 6674 6858

iACE 9214 9126 9225

Stucco 9116 9221 9147

CRF 9264 9180 9265

BiLSTM 9478 9521 9435

BiLSTM+CRF 9638 9642 9627

Multi-granular 9859 9872 9869

trained with general news corpora our method uses a security-related training corpus collected and labeled by ourselves asa data source for training our model Second different fromthe rule-based extraction approaches (eg iACE and Stucco)our proposed deep learning based method provides an end-to-end system with more advanced features to represent variousIOCs Third comparing to RNN-based methods (eg BiL-STM and BiLSTM+CRF) our proposed method brings inmulti-granular embedding sizes (char-level 1-gram 2-gramand 3-gram) to simultaneously learn the characteristics of vari-ous sizes and types of IOCs which can identify more complexand irregular IOCs Last but not the least our method imple-ments an attention mechanism to learn the weights of featureswith various scales to effectively characterize different typesof IOCs further enhancing the IOC recognition accuracy

6 Application of Threat Intelligence Comput-ing

Our proposed threat intelligence computing framework basedon heterogeneous graph convolutional networks can be usedto mine novel security knowledge behind heterogeneous IOCsIn this section we evaluate its effectiveness and applicabilityusing three real-world applications profiling and ranking forCTIs attack preference modeling and vulnerability similarityanalysis

61 Threat Profiling and Significance Rankingof IOCs

Due to the disparity in the significance of threats it is im-portant to derive the threat profile and rank the significanceof IOCs for demystifying the landscape of threats Howevermost of the existing CTIs are incapable of modeling the asso-ciated relationships between heterogeneous IOCs

Different from isolated CTIs HINTI leverages HIN to

model the interdependent relationships among IOCs withtwo characteristics first the isolated IOCs can be integratedinto a graph-based HIN to clearly display the associated rela-tionships among IOCs which is capable of directly depictingthe basic threat profile For example Figure 3 depicts a threatprofiling sample an attacker utilizes CVE-2017-0143 vulnera-bility to invade Vista SP2 and Win7 SP1 devices belonging tothe Microsoft platform and CVE-2017-0143 is a remote codeexecution vulnerability that uses a ldquoSMBbat malicious fileSecond the significance of IOCs in HINTI can be naturallyranked based on the proposed threat intelligence computingframework

Table 5 shows the top 5 authoritative ranking score [35] ofvulnerability attacker attack type and platform from whichsecurity experts can gain a clear insight into the impact ofeach IOC Degree centrality [33] which describes the numberof links incident upon a node is widely used in evaluatingthe importance of a node in a graph It can used to quantifythe immediate risk of a node that connects with other nodesfor delivering network flows such as virus spreading Heredegree centrality can be utilized in verifying the effectivenessof the proposed threat intelligence computing framework inranking the importance of IOCs It is worth noting that bothour ranking method and degree centrality work regardless ofthe time of attacks We compute the degree centrality rank-ing of IOCs based on the fact that the node with a higherdegree centrality is more important than a node with a lowerone For instance if the degree centrality of a vulnerabilityis higher it indicates that this vulnerability is exploited bymore attackers or it affects more devices The ranking resultof degree centrality shown in Table 5 is consistent with theranking result based on the proposed threat intelligence com-puting framework demonstrating the capability of the CTIcomputing framework in ranking the importance of differenttypes of IOCs

62 Attack Preference Modeling

Attack preference modeling is meaningful for security organi-zations to gain insight into the attack intention of attackersbuild attack portraits and develop personalized defense strate-gies Here we leverage HINTI to integrate different types ofIOCs and their interdependent relationships to comprehen-sively depict the picture of attack events which helps modelthe attack preferences With the proposed threat intelligencecomputing framework we model attack preferences by clus-tering the embedded attackersrsquo vectors

In this task each malicious IP address is treated as an in-truder and its attack preferences are mainly reflected in threefeatures including the platforms it destroys (including Win-dows Linux Unix ASP Android Apache etc) the industriesit invades (eg education finance government Internet ofThings and Industrial control system etc) and the exploittypes it employs (eg DOS Buffer overflow Execute code

250 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

Table 5 The significance ranking of different types of IOCs (CV E1 CVE-2017-0146 CV E2 CVE-2006-5911 CV E3 CVE-2008-6543 CV E4 CVE-2012-1199 CV E4 CVE-2006-4985 AR Authoritative Ranking DC Degree Centrality value)

Vulnerability Attacker Platform Attack Type

No AR DC Monicker AR DC Category AR DC Exploit_type AR DC

CV E1 02713 7643 Meatsploit 02764 549 PHP 04562 17865 Webapps 05494 11648

CV E2 02431 7124 GSR team 01391 327 windows 02242 13793 DOS 01772 8741

CV E3 02132 6833 Ihsan 00698 279 Linux 00736 8792 Overflow 01533 7652

CV E4 01826 6145 Techsa 00695 247 Linux86 00623 8147 CSRF 00966 5433

CV E5 01739 5637 Aurimma 00622 204 ASP 00382 5027 SQL 00251 2171

(a) AV DV T AT (P13) (b) AV DPDTV T AT (P17) (c) AV PV T AT (P14)

Figure 8 The performance of attack preference modeling with different meta-paths in which the preference of attacker i isreduced to a two-dimensional space (xiyi) and each cluster represents a group with a specific attack preference

Sql injection XSS Gain information etc)Specifically we first utilize our proposed threat intelligence

computing framework to embed each attacker into a low-dimensional vector space and then perform DBSCAN algo-rithm on the embedded vector to cluster attackers with thesame preferences into corresponding groups Figure 8 showsthe top 3 clustering results under different types of meta-paths in which the meta-path AV DPDTV T AT (P17) performsthe best performance with compact and well-separated clus-ters indicating that it contains richer semantic relationshipsfor characterizing attack preferences than other meta-paths

To verify the effectiveness of attack preference modelingwe identify 5297 distinct attackers (each unique IP address istreated as an attacker) who have submitted at least 10 cyberattacks For these attackers five cybersecurity researchersconsisting of three doctoral and two master students spentabout fortnight to manually annotate their attack preferencesfrom three perspectives the platforms they destroyed the in-dustries they attacked and the attack types they exploited Toensure the accuracy of data labeling we test the consistencyof the tags for the 5297 attackers and remove the sampleswith ambiguous tags As a result we obtain 3000 sampleswith consistent tags Based on the labeled samples we furtherevaluate the performance of different meta-paths on model-

Table 6 Performance of modeling attack preference withdifferent meta-paths

Metapath Accuracy Precision Micro-F1

P1 7431 7622 7525

P4 7116 7327 7216

P5 6915 7143 7027

P12 7214 7646 7424

P13 7965 8131 8047

P14 7748 7934 7840

P15 8017 7976 7996

P17 8139 8172 8155

ing attack preferences In the attack modeling scenario weonly focus on the meta-paths that both the start node and theend node are attackers The experimental results are demon-strated in Table 6 Obviously different meta-paths displaydifferent abilities in characterizing the attack preferences ofcyber intruders The performance when using P17 is more

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 251

superior than the one with other meta-paths which indicatesthat P17 holds more valuable information that characterizesthe attack preferences of cybercriminals since P17 includesthe semantics information of P1P4P5 and P12 sim P15

In addition we compare the capabilities of our proposedcomputing framework with those of other state-of-the-art em-bedding methods in terms of attack preference modelingOur analysis result shows that the accuracy of attack pref-erence modeling reaches 081 which outperforms the exist-ing popular models Node2vec (with precision of 071) [1]metapaht2vec (with precision of 073) [11] and HAN (withprecision of 076) [42] The performance improvement canbe attributed to the following characteristics First our com-puting framework utilizes weight-learning to learn the signifi-cance of different meta-paths for evaluating the similarity be-tween attackers Second the proposed computing frameworkleverages GCN to learn the structural information betweenattackers to obtain more discriminative structural features thatimproves the performance of attack preference modeling

63 Vulnerability Similarity AnalysisVulnerability classification or clustering is crucial for conduct-ing vulnerability trend analysis the correlation analysis ofincidents and exploits and the evaluation of countermeasuresThe traditional vulnerability analysis relies on the manualinvestigation of the source codes which requires expert ex-pertise and consumes considerable efforts In this sectionwe propose an unsupervised vulnerability similarity analy-sis method based on the proposed threat intelligence com-puting framework which can automatically group similarvulnerabilities into corresponding communities Particularlythe vulnerability-related IOCs are first embedded into a low-dimensional vector space using CTI computing frameworkThen the DBSCAN algorithm is performed on the embed-ded vector space to cluster vulnerabilities into correspondingcommunities The clustering results are presented in Figure 9

Figure 9 (c) shows all vulnerabilities are clustered into 12clusters using meta-path V DPDTV T (P16) which is very closeto the classification standard (ie 13) recommended by CVEdetails an authoritative database that publishes vulnerabilityinformation By manually analyzing the training samples wefind that HTTP Response Splitting vulnerability does not ap-pear in our dataset Therefore our cluster number (ie 12)is consistent with CVE Details11 To further validate the ef-fectiveness of threat intelligence computing framework forvulnerability clustering we randomly select 100 vulnerabili-ties from each cluster for manual inspection to measure theconsistency of the vulnerability types in each cluster and theresults are presented in Table 7 Obviously the clustering per-formance of cluster 8 (ie File Inclusion) and cluster 10 (ieDirectory Traversal) is remarkably worse than other clustersTo explain the reason we examine our training data and the

11httpswwwcvedetailscom

Table 7 Accuracy of vulnerability clustering

Cluster ID Vulnerability type Accuracy

cluster1 Denial of Service 8012

cluster2 XSS 8353

cluster3 Execute Code 8150

cluster4 Overflow 7650

cluster5 Gain Privilege 9156

cluster6 Bypass Something 7174

cluster7 CSRF 9327

cluster8 File Inclusion 6172

cluster9 Gain Informa 7042

cluster10 Directory Traversal 6949

cluster11 Memory Corruption 8156

cluster12 SQL Injection 8067

average 7851

computing framework We found that the proportion of thesetwo types of vulnerabilities is too small (cluster 8 is 34and cluster 10 is 42) making our computing frameworkvery likely to be under-fit with insufficient data Howeverthe proposed computing framework performs well on mosttypes of vulnerabilities in an unsupervised manner especiallygiven sufficient samples (eg cluster 7 is 176 and cluster 5is 157)

In addition by examining the clustering results we havean observation that the vulnerabilities in the same cluster arelikely to have evolutionary relationships For instance CVE-2018-0802 an office zero-day vulnerability is evolved fromthe CVE-2017-11882 They both include EQNEDT32exe fileused to edit the formula in Office software which allowsremote attackers to execute arbitrary codes by constructinga malformed font name The modeling and computation ofinterdependent relationships among IOCs in HINTI facilitatethe discovery of such intricate connections between vulnera-bilities

In summary HINTI is capable of depicting a more compre-hensive threat landscape and the proposed CTI computingframework has the ability to bring novel security insightstoward different real-world security applications Howeverthere are still numerous opportunities for enhancing thesesecurity applications Specifically for attack preference mod-eling although each individual IP address is treated as anattacker we cannot determine whether it belongs to a realattacker or is disguised by a proxy Fortunately even if thereal attack address cannot be captured understanding the at-tack preferences of these IP proxies which are widely usedin cybercrime is also meaningful for gaining insight into thecyber threats For vulnerability similarity analysis data imbal-

252 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

(a) AV DV T AT (P13) (b) V DV T (P10) (c) V DPDTV T (P16)

Figure 9 Illustration of the vulnerability similarity analysis based on different meta-paths in which vulnerability i can be reducedinto a two-dimensional space (xiyi) and each cluster indicates a particular type of vulnerability

ance issue affects the performance of model and inadequatetraining samples often result in model underfitting as shownin the case of cluster 8 and cluster 10

7 Related Work

Cyber Treat Intelligence An increasing number of securityvendors and researchers start exploring CTI for protectingsystem security and defending against new threat vectors [28]Existing CTI extraction tools such as IBM X-Force12 Threatcrowd13 Openctiio14 AlienVault15 CleanMX16 and Phish-Tank17 use regular expression to synthesize IOC from thedescriptive texts However these methods often producehigh false positive rate by misjudging legitimate entities asIOCs [22]

Recently Balzarotti et al [2] develop a system to extractIOCs from web pages and identify malicious URLs fromJavaScript codes Sabottke et al [31] propose to detect po-tential vulnerability exploits by extracting and analyzing thetweets that contain ldquoCVErdquo keyword Liao et al [22] present atool iACE for automatically extracting IOCs which excelsat processing technology articles Nevertheless iACE identi-fies IOCs from a single article which does not consider therich semantic information from multi-source texts Zhao etal [46] define different ontologies to describe the relationshipbetween entities based on expert knowledge Numerous popu-lar CTI platforms including IODEF [9] STIX [3] TAXII [40]OpenIOC [13] and CyBox [19] focus on extracting and shar-ing CTI Yet none of the existing approaches could uncoverthe interdependent relations among CTIs extracted from multi-source texts let alone quantifying CTIsrsquo relevance and miningvaluable threat intelligence hidden behind the isolated CTIs

12httpsexchangexforceibmcloudcom13httpswwwthreatcrowdorg14httpsdemoopenctiio15 httpsotxalienvaultcom16httplistclean-mxcom17httpswwwphishtankcom

Heterogeneous Information Network Real-world systemsoften contain a large number of interacting multi-typed ob-jects which can naturally be expressed as a heterogeneousinformation network (HIN) HIN as a conceptual graph repre-sentation can effectively fuse information and exploit richersemantics in interacting objects and links [37] HIN has beenapplied to network traffic analysis [38] public social mediadata analysis [45] and large-scale document analysis [41]Recent applications of HIN include mobile malware detec-tion [14] and opioid user identification [12] In this paper forthe first time we use HIN for CTI modelingGraph Convolutional Network Graph convolutional net-works (GCN) [17] has become an effective tool for address-ing the task of machine learning on graphs such as semi-supervised node classification [17] event classification [29]clustering [8] link prediction [27] and recommended sys-tem [44] Given a graph GCN can directly conduct the con-volutional operation on the graph to learn the nonlinear em-bedding of nodes In our work to discern and reveal the inter-active relationships between IOCs we utilize GCN to learnmore discriminative representation from attributes and graphstructure simultaneously which is the premise for threat intel-ligence computing

8 Discussion

Data Availability The proposed framework assumes thatsufficient threat description data can be obtained for generat-ing comprehensive and the latest CTIs Fortunately with thegrowing prosperity of social media an increasing number ofsecurity-related data (eg blogs posts news and open secu-rity databases) can be collected effortlessly To automaticallycollect security-related data we develop a crawler system tocollect threat description data from 73 international securitysources (eg blogs hacker forum posts security bulletinsetc) providing sufficient raw materials for generating cyberthreat intelligenceModel Extensibility In this paper 6 types of IOCs and 9

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 253

types of relationships are modeled in HINTI However ourproposed framework is extensible in which more types ofIOCs and relationships can be enrolled to represent richer andmore comprehensive threat information such as maliciousdomains phishing Emails attack tools their interactions etcHigh-level Semantic Relations In view of the computa-tional complexity of the model our threat intelligence comput-ing method focues on utilizing the meta-paths to quantify thesimilarity between IOCs while ignoring the influence of themeta-graph on it which inevitably misses characterizing somehigh-level semantic information Nevertheless the proposedcomputing framework introduces an attention mechanism tolearn the signification of different meta-paths to character-ize IOCs and their interactive relationships which effectivelycompensates for the performance degradation caused by ig-noring the meta-graphsSecurity Knowledge Reasoning Although our proposedframework exhibits promising results in CTI extraction andmodeling computing how to implement advanced securityknowledge reasoning and prediction is still an open problemeg it remains challenging to predict whether a vulnerabilitycould potentially affect a particular type of devices in thefuture We will investigate this problem in the future

9 Conclusion

This work explores a new direction of threat intelligence com-puting which aims to uncover new knowledge in the relation-ships among different threat vectors We propose HINTI acyber threat intelligence framework to model and quantify theinterdependent relationships among different types of IOCsby leveraging heterogeneous graph convolutional networksWe develop a multi-granular attention mechanism to learnthe importance of different features and model the interde-pendent relationships among IOCs using HIN We proposethe concept of threat intelligence computing and present ageneral intelligence computing framework based on graphconvolutional networks Experimental results demonstratethat the proposed multi-granular attention based IOC extrac-tion method outperforms the existing state-of-the-art methodsThe proposed threat intelligence computing framework caneffectively mine security knowledge hidden in the interdepen-dent relationships among IOCs which enables crucial threatintelligence applications such as threat profiling and rankingattack preference modeling and vulnerability similarity analy-sis We would like to emphasize that the knowledge discoveryamong interdependent CTIs is a new field that calls for acollaborative effort from security experts and data scientists

In future we plan to develop a predicative and reasoningmodel based on HINTI and explore preventative countermea-sures to protect cyber infrastructure from future threats Wealso plan to add more types of IOCs and relations to depicta more comprehensive threat landscape Moreover we willleverage both meta-paths and meta-graphs to characterize the

IOCs and their interactions to further improve the embeddingperformance and to strike a balance between the accuracyand computational complexity of the model We will alsoinvestigate the feasibility of security knowledge predictionbased on HINTI to infer the potential future relationshipsbetween the vulnerabilities and devices

Acknowledgement

We would like to thank our shepherd Tobias Fiebig andthe anonymous reviewers for providing valuable feedbackon our work We also thank Hao Peng and Lichao Sunfor their feedback on the early version of this work Thiswork was supported in part by National Science Founda-tion grants CNS1950171 CNS-1949753 It was also sup-ported by the NSFC for Innovative Research Group ScienceFund Project (62141003) National Key RampD Program China(2018YFB0803 503) the 2018 joint Research Foundationof Ministry of Education China Mobile (MCM20180507)and the Opening Project of Shanghai Trusted Industrial Con-trol Platform (TICPSH202003020-ZC) Any opinions find-ings and conclusions or recommendations expressed in thismaterial do not necessarily reflect the views of any fundingagencies

References

[1] Jure Leskovec Aditya Grover node2vec Scalable fea-ture learning for networks In Acm Sigkdd InternationalConference on Knowledge Discovery Data Mining2016

[2] Marco Balduzzi Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications In WWW 2016

[3] Sean Barnum Standardizing cyber threat intelligenceinformation with the structured threat information ex-pression (stix) Mitre Corporation 111ndash22 2012

[4] Eric W Burger Michael D Goodman Panos Kam-panakis and Kevin A Zhu Taxonomy model for cyberthreat intelligence information exchange technologiesIn Proceedings of the 2014 ACM Workshop on Infor-mation Sharing amp Collaborative Security pages 51ndash602014

[5] Onur Catakoglu Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications The web conference pages 333ndash3432016

[6] Danqi Chen and Christopher D Manning A fast and ac-curate dependency parser using neural networks pages740ndash750 2014

254 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

[7] Qian Chen and Robert A Bridges Automated behav-ioral analysis of malware a case study of wannacry ran-somware In 16th IEEE ICMLA pages 454ndash460 2017

[8] Weilin Chiang Xuanqing Liu Si Si Yang Li SamyBengio and Chojui Hsieh Cluster-gcn An efficient al-gorithm for training deep and large graph convolutionalnetworks knowledge discovery and data mining pages257ndash266 2019

[9] Roman Danyliw Jan Meijer and Yuri Demchenko Theincident object description exchange format Interna-tional Journal of High Performance Computing Appli-cations 50701ndash92 2007

[10] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 2017

[11] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 ACM 2017

[12] Yujie Fan Yiming Zhang Yanfang Ye and Xin Li Au-tomatic opioid user detection from twitter Transductiveensemble built on different meta-graph based similari-ties over heterogeneous information network In IJCAIpages 3357ndash3363 2018

[13] Fireeye Openioc httpswwwfireeyecomblogthreat-research201310openioc-basicshtml Accessed January 20 2020

[14] Shifu Hou Yanfang Ye Yangqiu Song and Melih Ab-dulhayoglu Hindroid An intelligent android malwaredetection system based on structured heterogeneous in-formation network In Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Dis-covery and Data Mining pages 1507ndash1515 2017

[15] Zhiheng Huang Wei Xu and Kai Yu Bidirectional lstm-crf models for sequence tagging arXiv1508019912015

[16] Michael D Iannacone Shawn Bohn Grant NakamuraJohn Gerth Kelly MT Huffer Robert A Bridges Erik MFerragut and John R Goodall Developing an ontologyfor cyber security knowledge graphs CISR 1512 2015

[17] Thomas Kipf and Max Welling Semi-supervised clas-sification with graph convolutional networks arXivLearning 2016

[18] Thomas N Kipf and Max Welling Semi-supervisedclassification with graph convolutional networks arXivpreprint arXiv160902907 2016

[19] Tero Kokkonen Architecture for the cyber security situ-ational awareness system In Internet of Things SmartSpaces and Next Generation Networks and Systemspages 294ndash302 Springer 2016

[20] Mehmet Necip Kurt Yasin Yılmaz and Xiaodong WangDistributed quickest detection of cyber-attacks in smartgrid IEEE Transactions on Information Forensics andSecurity 13(8)2015ndash2030 2018

[21] John Lafferty Andrew Mccallum and Fernando PereiraConditional random fields Probabilistic models for seg-menting and labeling sequence data ICML pages 282ndash289 2001

[22] Xiaojing Liao Yuan Kan Xiao Feng Wang Li Zhouand Raheem Beyah Acing the ioc game Toward auto-matic discovery and analysis of open-source cyber threatintelligence In ACM Sigsac Conference on ComputerCommunications Security 2016

[23] Rob Mcmillan Open threat intelligencehttpwwwgartnercomdoc2487216definition-threat-intelligence AccessedJanuary 20 2020

[24] Tomas Mikolov Kai Chen Greg Corrado and JeffreyDean Efficient estimation of word representations invector space arXiv preprint arXiv13013781 2013

[25] Sudip Mittal Prajit Kumar Das Varish Mulwad Anu-pam Joshi and Tim Finin Cybertwitter Using twitterto generate alerts for cybersecurity threats and vulnera-bilities In Proceedings of the 2016 IEEE Advances inSocial Networks Analysis and Mining pages 860ndash8672016

[26] Eric Nunes Ahmad Diab Andrew Gunn EricssonMarin Vineet Mishra Vivin Paliath John RobertsonJana Shakarian Amanda Thart and Paulo ShakarianDarknet and deepnet mining for proactive cybersecuritythreat intelligence In 2016 IEEE ISI pages 7ndash12 2016

[27] Shirui Pan Ruiqi Hu Guodong Long Jing Jiang LinaYao and Chengqi Zhang Adversarially regularizedgraph autoencoder for graph embedding pages 2609ndash2615 2018

[28] P Pawlinski P Jaroszewski P Kijewski L SiewierskiP Jacewicz P Zielony and R Zuber Actionable infor-mation for security incident response European UnionAgency for Network and Information Security Herak-lion Greece 2014

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 255

[29] Hao Peng Jianxin Li Qiran Gong Yangqiu SongYuanxing Ning Kunfeng Lai and Philip S Yu Fine-grained event categorization with heterogeneous graphconvolutional networks In Proceedings of the 28th In-ternational Joint Conference on Artificial Intelligencepages 3238ndash3245 AAAI Press 2019

[30] Sara Qamar Zahid Anwar Mohammad Ashiqur Rah-man Ehab Al-Shaer and Bei-Tseng Chu Data-drivenanalytics for cyber-threat intelligence and informationsharing Computers amp Security 6735ndash58 2017

[31] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In USENIXSecurity 2015

[32] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In 24thUSENIX Security pages 1041ndash1056 2015

[33] Deepak Sharma and Avadhesha Surolia Degree Cen-trality Springer New York 2013

[34] Saurabh Singh Pradip Kumar Sharma Seo Yeon MoonDaesung Moon and Jong Hyuk Park A comprehensivestudy on apt attacks and countermeasures for futurenetworks and communications challenges and solutionsJournal of Supercomputing pages 1ndash32 2016

[35] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks Principles and methodologiesSynthesis Lectures on Data Mining and Knowledge Dis-covery 3(2)1ndash159 2012

[36] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks a structural analysis approachACM SIGKDD 14(2)20ndash28 2013

[37] Yizhou Sun Jiawei Han Xifeng Yan Philip S Yu andTianyi Wu Pathsim Meta path-based top-k similaritysearch in heterogeneous information networks Proceed-ings of the VLDB Endowment 4(11)992ndash1003 2011

[38] Yizhou Sun Brandon Norick Jiawei Han Xifeng YanPhilip S Yu and Xiao Yu Pathselclus Integrating meta-path selection with user-guided object clustering in het-erogeneous information networks ACM Transactionson TKDD 7(3)11 2013

[39] Wiem Tounsi and Helmi Rais A survey on technicalthreat intelligence in the age of sophisticated cyber at-tacks Computers amp Security 72212ndash233 2018

[40] Thomas D Wagner Esther Palomar Khaled Mahbuband Ali E Abdallah Towards an anonymity supportedplatform for shared cyber threat intelligence risks andsecurity of internet and systems pages 175ndash183 2017

[41] Chenguang Wang Yangqiu Song Haoran Li MingZhang and Jiawei Han Text classification with het-erogeneous information network kernels In 13th AAAI2016

[42] Xiao Wang Houye Ji Chuan Shi Bai Wang Peng CuiP Yu and Yanfang Ye Heterogeneous graph attentionnetwork 2019

[43] Jie Yang Shuailong Liang and Yue Zhang Design chal-lenges and misconceptions in neural sequence labelingarXiv Computation and Language 2018

[44] Rex Ying Ruining He Kaifeng Chen Pong Eksombat-chai William L Hamilton and Jure Leskovec Graphconvolutional neural networks for web-scale recom-mender systems In Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Dis-covery amp Data Mining pages 974ndash983 ACM 2018

[45] Jiawei Zhang Xiangnan Kong and Philip S Yu Trans-ferring heterogeneous links across location-based socialnetworks In The 7th ACM international conference onWeb search and data mining pages 303ndash312 2014

[46] Yishuai Zhao Bo Lang and Ming Liu Ontology-basedunified model for heterogeneous threat intelligence inte-gration and sharing In 2017 11th IEEE InternationalConference on Anti-counterfeiting Security and Identi-fication (ASID) pages 11ndash15 2017

256 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

  • Introduction
  • Background
    • Cyber Threat Intelligence
    • Motivation
    • Preliminaries
      • Architecture Overview of HINTI
      • Methodology
        • Multi-granular Attention Based IOC Extraction
        • Cyber Threat Intelligence Modeling
        • Threat Intelligence Computing
          • Experimental Evaluation
            • Dataset and Settings
            • Evaluation of IOC Extraction
              • Application of Threat Intelligence Computing
                • Threat Profiling and Significance Ranking of IOCs
                • Attack Preference Modeling
                • Vulnerability Similarity Analysis
                  • Related Work
                  • Discussion
                  • Conclusion
Page 10: Cyber Threat Intelligence Modeling Based on Heterogeneous ... · 2.1Cyber Threat Intelligence Cyber Threat Intelligence (CTI) extracted from security-related data is structured information

Table 4 Performance of threat entity recognition usingdifferent methods

Method Accuracy Precision Micro-F1

NLTK NER 6945 6851 6749

Stanford NER 6835 6674 6858

iACE 9214 9126 9225

Stucco 9116 9221 9147

CRF 9264 9180 9265

BiLSTM 9478 9521 9435

BiLSTM+CRF 9638 9642 9627

Multi-granular 9859 9872 9869

trained with general news corpora our method uses a security-related training corpus collected and labeled by ourselves asa data source for training our model Second different fromthe rule-based extraction approaches (eg iACE and Stucco)our proposed deep learning based method provides an end-to-end system with more advanced features to represent variousIOCs Third comparing to RNN-based methods (eg BiL-STM and BiLSTM+CRF) our proposed method brings inmulti-granular embedding sizes (char-level 1-gram 2-gramand 3-gram) to simultaneously learn the characteristics of vari-ous sizes and types of IOCs which can identify more complexand irregular IOCs Last but not the least our method imple-ments an attention mechanism to learn the weights of featureswith various scales to effectively characterize different typesof IOCs further enhancing the IOC recognition accuracy

6 Application of Threat Intelligence Comput-ing

Our proposed threat intelligence computing framework basedon heterogeneous graph convolutional networks can be usedto mine novel security knowledge behind heterogeneous IOCsIn this section we evaluate its effectiveness and applicabilityusing three real-world applications profiling and ranking forCTIs attack preference modeling and vulnerability similarityanalysis

61 Threat Profiling and Significance Rankingof IOCs

Due to the disparity in the significance of threats it is im-portant to derive the threat profile and rank the significanceof IOCs for demystifying the landscape of threats Howevermost of the existing CTIs are incapable of modeling the asso-ciated relationships between heterogeneous IOCs

Different from isolated CTIs HINTI leverages HIN to

model the interdependent relationships among IOCs withtwo characteristics first the isolated IOCs can be integratedinto a graph-based HIN to clearly display the associated rela-tionships among IOCs which is capable of directly depictingthe basic threat profile For example Figure 3 depicts a threatprofiling sample an attacker utilizes CVE-2017-0143 vulnera-bility to invade Vista SP2 and Win7 SP1 devices belonging tothe Microsoft platform and CVE-2017-0143 is a remote codeexecution vulnerability that uses a ldquoSMBbat malicious fileSecond the significance of IOCs in HINTI can be naturallyranked based on the proposed threat intelligence computingframework

Table 5 shows the top 5 authoritative ranking score [35] ofvulnerability attacker attack type and platform from whichsecurity experts can gain a clear insight into the impact ofeach IOC Degree centrality [33] which describes the numberof links incident upon a node is widely used in evaluatingthe importance of a node in a graph It can used to quantifythe immediate risk of a node that connects with other nodesfor delivering network flows such as virus spreading Heredegree centrality can be utilized in verifying the effectivenessof the proposed threat intelligence computing framework inranking the importance of IOCs It is worth noting that bothour ranking method and degree centrality work regardless ofthe time of attacks We compute the degree centrality rank-ing of IOCs based on the fact that the node with a higherdegree centrality is more important than a node with a lowerone For instance if the degree centrality of a vulnerabilityis higher it indicates that this vulnerability is exploited bymore attackers or it affects more devices The ranking resultof degree centrality shown in Table 5 is consistent with theranking result based on the proposed threat intelligence com-puting framework demonstrating the capability of the CTIcomputing framework in ranking the importance of differenttypes of IOCs

62 Attack Preference Modeling

Attack preference modeling is meaningful for security organi-zations to gain insight into the attack intention of attackersbuild attack portraits and develop personalized defense strate-gies Here we leverage HINTI to integrate different types ofIOCs and their interdependent relationships to comprehen-sively depict the picture of attack events which helps modelthe attack preferences With the proposed threat intelligencecomputing framework we model attack preferences by clus-tering the embedded attackersrsquo vectors

In this task each malicious IP address is treated as an in-truder and its attack preferences are mainly reflected in threefeatures including the platforms it destroys (including Win-dows Linux Unix ASP Android Apache etc) the industriesit invades (eg education finance government Internet ofThings and Industrial control system etc) and the exploittypes it employs (eg DOS Buffer overflow Execute code

250 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

Table 5 The significance ranking of different types of IOCs (CV E1 CVE-2017-0146 CV E2 CVE-2006-5911 CV E3 CVE-2008-6543 CV E4 CVE-2012-1199 CV E4 CVE-2006-4985 AR Authoritative Ranking DC Degree Centrality value)

Vulnerability Attacker Platform Attack Type

No AR DC Monicker AR DC Category AR DC Exploit_type AR DC

CV E1 02713 7643 Meatsploit 02764 549 PHP 04562 17865 Webapps 05494 11648

CV E2 02431 7124 GSR team 01391 327 windows 02242 13793 DOS 01772 8741

CV E3 02132 6833 Ihsan 00698 279 Linux 00736 8792 Overflow 01533 7652

CV E4 01826 6145 Techsa 00695 247 Linux86 00623 8147 CSRF 00966 5433

CV E5 01739 5637 Aurimma 00622 204 ASP 00382 5027 SQL 00251 2171

(a) AV DV T AT (P13) (b) AV DPDTV T AT (P17) (c) AV PV T AT (P14)

Figure 8 The performance of attack preference modeling with different meta-paths in which the preference of attacker i isreduced to a two-dimensional space (xiyi) and each cluster represents a group with a specific attack preference

Sql injection XSS Gain information etc)Specifically we first utilize our proposed threat intelligence

computing framework to embed each attacker into a low-dimensional vector space and then perform DBSCAN algo-rithm on the embedded vector to cluster attackers with thesame preferences into corresponding groups Figure 8 showsthe top 3 clustering results under different types of meta-paths in which the meta-path AV DPDTV T AT (P17) performsthe best performance with compact and well-separated clus-ters indicating that it contains richer semantic relationshipsfor characterizing attack preferences than other meta-paths

To verify the effectiveness of attack preference modelingwe identify 5297 distinct attackers (each unique IP address istreated as an attacker) who have submitted at least 10 cyberattacks For these attackers five cybersecurity researchersconsisting of three doctoral and two master students spentabout fortnight to manually annotate their attack preferencesfrom three perspectives the platforms they destroyed the in-dustries they attacked and the attack types they exploited Toensure the accuracy of data labeling we test the consistencyof the tags for the 5297 attackers and remove the sampleswith ambiguous tags As a result we obtain 3000 sampleswith consistent tags Based on the labeled samples we furtherevaluate the performance of different meta-paths on model-

Table 6 Performance of modeling attack preference withdifferent meta-paths

Metapath Accuracy Precision Micro-F1

P1 7431 7622 7525

P4 7116 7327 7216

P5 6915 7143 7027

P12 7214 7646 7424

P13 7965 8131 8047

P14 7748 7934 7840

P15 8017 7976 7996

P17 8139 8172 8155

ing attack preferences In the attack modeling scenario weonly focus on the meta-paths that both the start node and theend node are attackers The experimental results are demon-strated in Table 6 Obviously different meta-paths displaydifferent abilities in characterizing the attack preferences ofcyber intruders The performance when using P17 is more

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 251

superior than the one with other meta-paths which indicatesthat P17 holds more valuable information that characterizesthe attack preferences of cybercriminals since P17 includesthe semantics information of P1P4P5 and P12 sim P15

In addition we compare the capabilities of our proposedcomputing framework with those of other state-of-the-art em-bedding methods in terms of attack preference modelingOur analysis result shows that the accuracy of attack pref-erence modeling reaches 081 which outperforms the exist-ing popular models Node2vec (with precision of 071) [1]metapaht2vec (with precision of 073) [11] and HAN (withprecision of 076) [42] The performance improvement canbe attributed to the following characteristics First our com-puting framework utilizes weight-learning to learn the signifi-cance of different meta-paths for evaluating the similarity be-tween attackers Second the proposed computing frameworkleverages GCN to learn the structural information betweenattackers to obtain more discriminative structural features thatimproves the performance of attack preference modeling

63 Vulnerability Similarity AnalysisVulnerability classification or clustering is crucial for conduct-ing vulnerability trend analysis the correlation analysis ofincidents and exploits and the evaluation of countermeasuresThe traditional vulnerability analysis relies on the manualinvestigation of the source codes which requires expert ex-pertise and consumes considerable efforts In this sectionwe propose an unsupervised vulnerability similarity analy-sis method based on the proposed threat intelligence com-puting framework which can automatically group similarvulnerabilities into corresponding communities Particularlythe vulnerability-related IOCs are first embedded into a low-dimensional vector space using CTI computing frameworkThen the DBSCAN algorithm is performed on the embed-ded vector space to cluster vulnerabilities into correspondingcommunities The clustering results are presented in Figure 9

Figure 9 (c) shows all vulnerabilities are clustered into 12clusters using meta-path V DPDTV T (P16) which is very closeto the classification standard (ie 13) recommended by CVEdetails an authoritative database that publishes vulnerabilityinformation By manually analyzing the training samples wefind that HTTP Response Splitting vulnerability does not ap-pear in our dataset Therefore our cluster number (ie 12)is consistent with CVE Details11 To further validate the ef-fectiveness of threat intelligence computing framework forvulnerability clustering we randomly select 100 vulnerabili-ties from each cluster for manual inspection to measure theconsistency of the vulnerability types in each cluster and theresults are presented in Table 7 Obviously the clustering per-formance of cluster 8 (ie File Inclusion) and cluster 10 (ieDirectory Traversal) is remarkably worse than other clustersTo explain the reason we examine our training data and the

11httpswwwcvedetailscom

Table 7 Accuracy of vulnerability clustering

Cluster ID Vulnerability type Accuracy

cluster1 Denial of Service 8012

cluster2 XSS 8353

cluster3 Execute Code 8150

cluster4 Overflow 7650

cluster5 Gain Privilege 9156

cluster6 Bypass Something 7174

cluster7 CSRF 9327

cluster8 File Inclusion 6172

cluster9 Gain Informa 7042

cluster10 Directory Traversal 6949

cluster11 Memory Corruption 8156

cluster12 SQL Injection 8067

average 7851

computing framework We found that the proportion of thesetwo types of vulnerabilities is too small (cluster 8 is 34and cluster 10 is 42) making our computing frameworkvery likely to be under-fit with insufficient data Howeverthe proposed computing framework performs well on mosttypes of vulnerabilities in an unsupervised manner especiallygiven sufficient samples (eg cluster 7 is 176 and cluster 5is 157)

In addition by examining the clustering results we havean observation that the vulnerabilities in the same cluster arelikely to have evolutionary relationships For instance CVE-2018-0802 an office zero-day vulnerability is evolved fromthe CVE-2017-11882 They both include EQNEDT32exe fileused to edit the formula in Office software which allowsremote attackers to execute arbitrary codes by constructinga malformed font name The modeling and computation ofinterdependent relationships among IOCs in HINTI facilitatethe discovery of such intricate connections between vulnera-bilities

In summary HINTI is capable of depicting a more compre-hensive threat landscape and the proposed CTI computingframework has the ability to bring novel security insightstoward different real-world security applications Howeverthere are still numerous opportunities for enhancing thesesecurity applications Specifically for attack preference mod-eling although each individual IP address is treated as anattacker we cannot determine whether it belongs to a realattacker or is disguised by a proxy Fortunately even if thereal attack address cannot be captured understanding the at-tack preferences of these IP proxies which are widely usedin cybercrime is also meaningful for gaining insight into thecyber threats For vulnerability similarity analysis data imbal-

252 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

(a) AV DV T AT (P13) (b) V DV T (P10) (c) V DPDTV T (P16)

Figure 9 Illustration of the vulnerability similarity analysis based on different meta-paths in which vulnerability i can be reducedinto a two-dimensional space (xiyi) and each cluster indicates a particular type of vulnerability

ance issue affects the performance of model and inadequatetraining samples often result in model underfitting as shownin the case of cluster 8 and cluster 10

7 Related Work

Cyber Treat Intelligence An increasing number of securityvendors and researchers start exploring CTI for protectingsystem security and defending against new threat vectors [28]Existing CTI extraction tools such as IBM X-Force12 Threatcrowd13 Openctiio14 AlienVault15 CleanMX16 and Phish-Tank17 use regular expression to synthesize IOC from thedescriptive texts However these methods often producehigh false positive rate by misjudging legitimate entities asIOCs [22]

Recently Balzarotti et al [2] develop a system to extractIOCs from web pages and identify malicious URLs fromJavaScript codes Sabottke et al [31] propose to detect po-tential vulnerability exploits by extracting and analyzing thetweets that contain ldquoCVErdquo keyword Liao et al [22] present atool iACE for automatically extracting IOCs which excelsat processing technology articles Nevertheless iACE identi-fies IOCs from a single article which does not consider therich semantic information from multi-source texts Zhao etal [46] define different ontologies to describe the relationshipbetween entities based on expert knowledge Numerous popu-lar CTI platforms including IODEF [9] STIX [3] TAXII [40]OpenIOC [13] and CyBox [19] focus on extracting and shar-ing CTI Yet none of the existing approaches could uncoverthe interdependent relations among CTIs extracted from multi-source texts let alone quantifying CTIsrsquo relevance and miningvaluable threat intelligence hidden behind the isolated CTIs

12httpsexchangexforceibmcloudcom13httpswwwthreatcrowdorg14httpsdemoopenctiio15 httpsotxalienvaultcom16httplistclean-mxcom17httpswwwphishtankcom

Heterogeneous Information Network Real-world systemsoften contain a large number of interacting multi-typed ob-jects which can naturally be expressed as a heterogeneousinformation network (HIN) HIN as a conceptual graph repre-sentation can effectively fuse information and exploit richersemantics in interacting objects and links [37] HIN has beenapplied to network traffic analysis [38] public social mediadata analysis [45] and large-scale document analysis [41]Recent applications of HIN include mobile malware detec-tion [14] and opioid user identification [12] In this paper forthe first time we use HIN for CTI modelingGraph Convolutional Network Graph convolutional net-works (GCN) [17] has become an effective tool for address-ing the task of machine learning on graphs such as semi-supervised node classification [17] event classification [29]clustering [8] link prediction [27] and recommended sys-tem [44] Given a graph GCN can directly conduct the con-volutional operation on the graph to learn the nonlinear em-bedding of nodes In our work to discern and reveal the inter-active relationships between IOCs we utilize GCN to learnmore discriminative representation from attributes and graphstructure simultaneously which is the premise for threat intel-ligence computing

8 Discussion

Data Availability The proposed framework assumes thatsufficient threat description data can be obtained for generat-ing comprehensive and the latest CTIs Fortunately with thegrowing prosperity of social media an increasing number ofsecurity-related data (eg blogs posts news and open secu-rity databases) can be collected effortlessly To automaticallycollect security-related data we develop a crawler system tocollect threat description data from 73 international securitysources (eg blogs hacker forum posts security bulletinsetc) providing sufficient raw materials for generating cyberthreat intelligenceModel Extensibility In this paper 6 types of IOCs and 9

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 253

types of relationships are modeled in HINTI However ourproposed framework is extensible in which more types ofIOCs and relationships can be enrolled to represent richer andmore comprehensive threat information such as maliciousdomains phishing Emails attack tools their interactions etcHigh-level Semantic Relations In view of the computa-tional complexity of the model our threat intelligence comput-ing method focues on utilizing the meta-paths to quantify thesimilarity between IOCs while ignoring the influence of themeta-graph on it which inevitably misses characterizing somehigh-level semantic information Nevertheless the proposedcomputing framework introduces an attention mechanism tolearn the signification of different meta-paths to character-ize IOCs and their interactive relationships which effectivelycompensates for the performance degradation caused by ig-noring the meta-graphsSecurity Knowledge Reasoning Although our proposedframework exhibits promising results in CTI extraction andmodeling computing how to implement advanced securityknowledge reasoning and prediction is still an open problemeg it remains challenging to predict whether a vulnerabilitycould potentially affect a particular type of devices in thefuture We will investigate this problem in the future

9 Conclusion

This work explores a new direction of threat intelligence com-puting which aims to uncover new knowledge in the relation-ships among different threat vectors We propose HINTI acyber threat intelligence framework to model and quantify theinterdependent relationships among different types of IOCsby leveraging heterogeneous graph convolutional networksWe develop a multi-granular attention mechanism to learnthe importance of different features and model the interde-pendent relationships among IOCs using HIN We proposethe concept of threat intelligence computing and present ageneral intelligence computing framework based on graphconvolutional networks Experimental results demonstratethat the proposed multi-granular attention based IOC extrac-tion method outperforms the existing state-of-the-art methodsThe proposed threat intelligence computing framework caneffectively mine security knowledge hidden in the interdepen-dent relationships among IOCs which enables crucial threatintelligence applications such as threat profiling and rankingattack preference modeling and vulnerability similarity analy-sis We would like to emphasize that the knowledge discoveryamong interdependent CTIs is a new field that calls for acollaborative effort from security experts and data scientists

In future we plan to develop a predicative and reasoningmodel based on HINTI and explore preventative countermea-sures to protect cyber infrastructure from future threats Wealso plan to add more types of IOCs and relations to depicta more comprehensive threat landscape Moreover we willleverage both meta-paths and meta-graphs to characterize the

IOCs and their interactions to further improve the embeddingperformance and to strike a balance between the accuracyand computational complexity of the model We will alsoinvestigate the feasibility of security knowledge predictionbased on HINTI to infer the potential future relationshipsbetween the vulnerabilities and devices

Acknowledgement

We would like to thank our shepherd Tobias Fiebig andthe anonymous reviewers for providing valuable feedbackon our work We also thank Hao Peng and Lichao Sunfor their feedback on the early version of this work Thiswork was supported in part by National Science Founda-tion grants CNS1950171 CNS-1949753 It was also sup-ported by the NSFC for Innovative Research Group ScienceFund Project (62141003) National Key RampD Program China(2018YFB0803 503) the 2018 joint Research Foundationof Ministry of Education China Mobile (MCM20180507)and the Opening Project of Shanghai Trusted Industrial Con-trol Platform (TICPSH202003020-ZC) Any opinions find-ings and conclusions or recommendations expressed in thismaterial do not necessarily reflect the views of any fundingagencies

References

[1] Jure Leskovec Aditya Grover node2vec Scalable fea-ture learning for networks In Acm Sigkdd InternationalConference on Knowledge Discovery Data Mining2016

[2] Marco Balduzzi Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications In WWW 2016

[3] Sean Barnum Standardizing cyber threat intelligenceinformation with the structured threat information ex-pression (stix) Mitre Corporation 111ndash22 2012

[4] Eric W Burger Michael D Goodman Panos Kam-panakis and Kevin A Zhu Taxonomy model for cyberthreat intelligence information exchange technologiesIn Proceedings of the 2014 ACM Workshop on Infor-mation Sharing amp Collaborative Security pages 51ndash602014

[5] Onur Catakoglu Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications The web conference pages 333ndash3432016

[6] Danqi Chen and Christopher D Manning A fast and ac-curate dependency parser using neural networks pages740ndash750 2014

254 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

[7] Qian Chen and Robert A Bridges Automated behav-ioral analysis of malware a case study of wannacry ran-somware In 16th IEEE ICMLA pages 454ndash460 2017

[8] Weilin Chiang Xuanqing Liu Si Si Yang Li SamyBengio and Chojui Hsieh Cluster-gcn An efficient al-gorithm for training deep and large graph convolutionalnetworks knowledge discovery and data mining pages257ndash266 2019

[9] Roman Danyliw Jan Meijer and Yuri Demchenko Theincident object description exchange format Interna-tional Journal of High Performance Computing Appli-cations 50701ndash92 2007

[10] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 2017

[11] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 ACM 2017

[12] Yujie Fan Yiming Zhang Yanfang Ye and Xin Li Au-tomatic opioid user detection from twitter Transductiveensemble built on different meta-graph based similari-ties over heterogeneous information network In IJCAIpages 3357ndash3363 2018

[13] Fireeye Openioc httpswwwfireeyecomblogthreat-research201310openioc-basicshtml Accessed January 20 2020

[14] Shifu Hou Yanfang Ye Yangqiu Song and Melih Ab-dulhayoglu Hindroid An intelligent android malwaredetection system based on structured heterogeneous in-formation network In Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Dis-covery and Data Mining pages 1507ndash1515 2017

[15] Zhiheng Huang Wei Xu and Kai Yu Bidirectional lstm-crf models for sequence tagging arXiv1508019912015

[16] Michael D Iannacone Shawn Bohn Grant NakamuraJohn Gerth Kelly MT Huffer Robert A Bridges Erik MFerragut and John R Goodall Developing an ontologyfor cyber security knowledge graphs CISR 1512 2015

[17] Thomas Kipf and Max Welling Semi-supervised clas-sification with graph convolutional networks arXivLearning 2016

[18] Thomas N Kipf and Max Welling Semi-supervisedclassification with graph convolutional networks arXivpreprint arXiv160902907 2016

[19] Tero Kokkonen Architecture for the cyber security situ-ational awareness system In Internet of Things SmartSpaces and Next Generation Networks and Systemspages 294ndash302 Springer 2016

[20] Mehmet Necip Kurt Yasin Yılmaz and Xiaodong WangDistributed quickest detection of cyber-attacks in smartgrid IEEE Transactions on Information Forensics andSecurity 13(8)2015ndash2030 2018

[21] John Lafferty Andrew Mccallum and Fernando PereiraConditional random fields Probabilistic models for seg-menting and labeling sequence data ICML pages 282ndash289 2001

[22] Xiaojing Liao Yuan Kan Xiao Feng Wang Li Zhouand Raheem Beyah Acing the ioc game Toward auto-matic discovery and analysis of open-source cyber threatintelligence In ACM Sigsac Conference on ComputerCommunications Security 2016

[23] Rob Mcmillan Open threat intelligencehttpwwwgartnercomdoc2487216definition-threat-intelligence AccessedJanuary 20 2020

[24] Tomas Mikolov Kai Chen Greg Corrado and JeffreyDean Efficient estimation of word representations invector space arXiv preprint arXiv13013781 2013

[25] Sudip Mittal Prajit Kumar Das Varish Mulwad Anu-pam Joshi and Tim Finin Cybertwitter Using twitterto generate alerts for cybersecurity threats and vulnera-bilities In Proceedings of the 2016 IEEE Advances inSocial Networks Analysis and Mining pages 860ndash8672016

[26] Eric Nunes Ahmad Diab Andrew Gunn EricssonMarin Vineet Mishra Vivin Paliath John RobertsonJana Shakarian Amanda Thart and Paulo ShakarianDarknet and deepnet mining for proactive cybersecuritythreat intelligence In 2016 IEEE ISI pages 7ndash12 2016

[27] Shirui Pan Ruiqi Hu Guodong Long Jing Jiang LinaYao and Chengqi Zhang Adversarially regularizedgraph autoencoder for graph embedding pages 2609ndash2615 2018

[28] P Pawlinski P Jaroszewski P Kijewski L SiewierskiP Jacewicz P Zielony and R Zuber Actionable infor-mation for security incident response European UnionAgency for Network and Information Security Herak-lion Greece 2014

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 255

[29] Hao Peng Jianxin Li Qiran Gong Yangqiu SongYuanxing Ning Kunfeng Lai and Philip S Yu Fine-grained event categorization with heterogeneous graphconvolutional networks In Proceedings of the 28th In-ternational Joint Conference on Artificial Intelligencepages 3238ndash3245 AAAI Press 2019

[30] Sara Qamar Zahid Anwar Mohammad Ashiqur Rah-man Ehab Al-Shaer and Bei-Tseng Chu Data-drivenanalytics for cyber-threat intelligence and informationsharing Computers amp Security 6735ndash58 2017

[31] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In USENIXSecurity 2015

[32] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In 24thUSENIX Security pages 1041ndash1056 2015

[33] Deepak Sharma and Avadhesha Surolia Degree Cen-trality Springer New York 2013

[34] Saurabh Singh Pradip Kumar Sharma Seo Yeon MoonDaesung Moon and Jong Hyuk Park A comprehensivestudy on apt attacks and countermeasures for futurenetworks and communications challenges and solutionsJournal of Supercomputing pages 1ndash32 2016

[35] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks Principles and methodologiesSynthesis Lectures on Data Mining and Knowledge Dis-covery 3(2)1ndash159 2012

[36] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks a structural analysis approachACM SIGKDD 14(2)20ndash28 2013

[37] Yizhou Sun Jiawei Han Xifeng Yan Philip S Yu andTianyi Wu Pathsim Meta path-based top-k similaritysearch in heterogeneous information networks Proceed-ings of the VLDB Endowment 4(11)992ndash1003 2011

[38] Yizhou Sun Brandon Norick Jiawei Han Xifeng YanPhilip S Yu and Xiao Yu Pathselclus Integrating meta-path selection with user-guided object clustering in het-erogeneous information networks ACM Transactionson TKDD 7(3)11 2013

[39] Wiem Tounsi and Helmi Rais A survey on technicalthreat intelligence in the age of sophisticated cyber at-tacks Computers amp Security 72212ndash233 2018

[40] Thomas D Wagner Esther Palomar Khaled Mahbuband Ali E Abdallah Towards an anonymity supportedplatform for shared cyber threat intelligence risks andsecurity of internet and systems pages 175ndash183 2017

[41] Chenguang Wang Yangqiu Song Haoran Li MingZhang and Jiawei Han Text classification with het-erogeneous information network kernels In 13th AAAI2016

[42] Xiao Wang Houye Ji Chuan Shi Bai Wang Peng CuiP Yu and Yanfang Ye Heterogeneous graph attentionnetwork 2019

[43] Jie Yang Shuailong Liang and Yue Zhang Design chal-lenges and misconceptions in neural sequence labelingarXiv Computation and Language 2018

[44] Rex Ying Ruining He Kaifeng Chen Pong Eksombat-chai William L Hamilton and Jure Leskovec Graphconvolutional neural networks for web-scale recom-mender systems In Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Dis-covery amp Data Mining pages 974ndash983 ACM 2018

[45] Jiawei Zhang Xiangnan Kong and Philip S Yu Trans-ferring heterogeneous links across location-based socialnetworks In The 7th ACM international conference onWeb search and data mining pages 303ndash312 2014

[46] Yishuai Zhao Bo Lang and Ming Liu Ontology-basedunified model for heterogeneous threat intelligence inte-gration and sharing In 2017 11th IEEE InternationalConference on Anti-counterfeiting Security and Identi-fication (ASID) pages 11ndash15 2017

256 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

  • Introduction
  • Background
    • Cyber Threat Intelligence
    • Motivation
    • Preliminaries
      • Architecture Overview of HINTI
      • Methodology
        • Multi-granular Attention Based IOC Extraction
        • Cyber Threat Intelligence Modeling
        • Threat Intelligence Computing
          • Experimental Evaluation
            • Dataset and Settings
            • Evaluation of IOC Extraction
              • Application of Threat Intelligence Computing
                • Threat Profiling and Significance Ranking of IOCs
                • Attack Preference Modeling
                • Vulnerability Similarity Analysis
                  • Related Work
                  • Discussion
                  • Conclusion
Page 11: Cyber Threat Intelligence Modeling Based on Heterogeneous ... · 2.1Cyber Threat Intelligence Cyber Threat Intelligence (CTI) extracted from security-related data is structured information

Table 5 The significance ranking of different types of IOCs (CV E1 CVE-2017-0146 CV E2 CVE-2006-5911 CV E3 CVE-2008-6543 CV E4 CVE-2012-1199 CV E4 CVE-2006-4985 AR Authoritative Ranking DC Degree Centrality value)

Vulnerability Attacker Platform Attack Type

No AR DC Monicker AR DC Category AR DC Exploit_type AR DC

CV E1 02713 7643 Meatsploit 02764 549 PHP 04562 17865 Webapps 05494 11648

CV E2 02431 7124 GSR team 01391 327 windows 02242 13793 DOS 01772 8741

CV E3 02132 6833 Ihsan 00698 279 Linux 00736 8792 Overflow 01533 7652

CV E4 01826 6145 Techsa 00695 247 Linux86 00623 8147 CSRF 00966 5433

CV E5 01739 5637 Aurimma 00622 204 ASP 00382 5027 SQL 00251 2171

(a) AV DV T AT (P13) (b) AV DPDTV T AT (P17) (c) AV PV T AT (P14)

Figure 8 The performance of attack preference modeling with different meta-paths in which the preference of attacker i isreduced to a two-dimensional space (xiyi) and each cluster represents a group with a specific attack preference

Sql injection XSS Gain information etc)Specifically we first utilize our proposed threat intelligence

computing framework to embed each attacker into a low-dimensional vector space and then perform DBSCAN algo-rithm on the embedded vector to cluster attackers with thesame preferences into corresponding groups Figure 8 showsthe top 3 clustering results under different types of meta-paths in which the meta-path AV DPDTV T AT (P17) performsthe best performance with compact and well-separated clus-ters indicating that it contains richer semantic relationshipsfor characterizing attack preferences than other meta-paths

To verify the effectiveness of attack preference modelingwe identify 5297 distinct attackers (each unique IP address istreated as an attacker) who have submitted at least 10 cyberattacks For these attackers five cybersecurity researchersconsisting of three doctoral and two master students spentabout fortnight to manually annotate their attack preferencesfrom three perspectives the platforms they destroyed the in-dustries they attacked and the attack types they exploited Toensure the accuracy of data labeling we test the consistencyof the tags for the 5297 attackers and remove the sampleswith ambiguous tags As a result we obtain 3000 sampleswith consistent tags Based on the labeled samples we furtherevaluate the performance of different meta-paths on model-

Table 6 Performance of modeling attack preference withdifferent meta-paths

Metapath Accuracy Precision Micro-F1

P1 7431 7622 7525

P4 7116 7327 7216

P5 6915 7143 7027

P12 7214 7646 7424

P13 7965 8131 8047

P14 7748 7934 7840

P15 8017 7976 7996

P17 8139 8172 8155

ing attack preferences In the attack modeling scenario weonly focus on the meta-paths that both the start node and theend node are attackers The experimental results are demon-strated in Table 6 Obviously different meta-paths displaydifferent abilities in characterizing the attack preferences ofcyber intruders The performance when using P17 is more

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 251

superior than the one with other meta-paths which indicatesthat P17 holds more valuable information that characterizesthe attack preferences of cybercriminals since P17 includesthe semantics information of P1P4P5 and P12 sim P15

In addition we compare the capabilities of our proposedcomputing framework with those of other state-of-the-art em-bedding methods in terms of attack preference modelingOur analysis result shows that the accuracy of attack pref-erence modeling reaches 081 which outperforms the exist-ing popular models Node2vec (with precision of 071) [1]metapaht2vec (with precision of 073) [11] and HAN (withprecision of 076) [42] The performance improvement canbe attributed to the following characteristics First our com-puting framework utilizes weight-learning to learn the signifi-cance of different meta-paths for evaluating the similarity be-tween attackers Second the proposed computing frameworkleverages GCN to learn the structural information betweenattackers to obtain more discriminative structural features thatimproves the performance of attack preference modeling

63 Vulnerability Similarity AnalysisVulnerability classification or clustering is crucial for conduct-ing vulnerability trend analysis the correlation analysis ofincidents and exploits and the evaluation of countermeasuresThe traditional vulnerability analysis relies on the manualinvestigation of the source codes which requires expert ex-pertise and consumes considerable efforts In this sectionwe propose an unsupervised vulnerability similarity analy-sis method based on the proposed threat intelligence com-puting framework which can automatically group similarvulnerabilities into corresponding communities Particularlythe vulnerability-related IOCs are first embedded into a low-dimensional vector space using CTI computing frameworkThen the DBSCAN algorithm is performed on the embed-ded vector space to cluster vulnerabilities into correspondingcommunities The clustering results are presented in Figure 9

Figure 9 (c) shows all vulnerabilities are clustered into 12clusters using meta-path V DPDTV T (P16) which is very closeto the classification standard (ie 13) recommended by CVEdetails an authoritative database that publishes vulnerabilityinformation By manually analyzing the training samples wefind that HTTP Response Splitting vulnerability does not ap-pear in our dataset Therefore our cluster number (ie 12)is consistent with CVE Details11 To further validate the ef-fectiveness of threat intelligence computing framework forvulnerability clustering we randomly select 100 vulnerabili-ties from each cluster for manual inspection to measure theconsistency of the vulnerability types in each cluster and theresults are presented in Table 7 Obviously the clustering per-formance of cluster 8 (ie File Inclusion) and cluster 10 (ieDirectory Traversal) is remarkably worse than other clustersTo explain the reason we examine our training data and the

11httpswwwcvedetailscom

Table 7 Accuracy of vulnerability clustering

Cluster ID Vulnerability type Accuracy

cluster1 Denial of Service 8012

cluster2 XSS 8353

cluster3 Execute Code 8150

cluster4 Overflow 7650

cluster5 Gain Privilege 9156

cluster6 Bypass Something 7174

cluster7 CSRF 9327

cluster8 File Inclusion 6172

cluster9 Gain Informa 7042

cluster10 Directory Traversal 6949

cluster11 Memory Corruption 8156

cluster12 SQL Injection 8067

average 7851

computing framework We found that the proportion of thesetwo types of vulnerabilities is too small (cluster 8 is 34and cluster 10 is 42) making our computing frameworkvery likely to be under-fit with insufficient data Howeverthe proposed computing framework performs well on mosttypes of vulnerabilities in an unsupervised manner especiallygiven sufficient samples (eg cluster 7 is 176 and cluster 5is 157)

In addition by examining the clustering results we havean observation that the vulnerabilities in the same cluster arelikely to have evolutionary relationships For instance CVE-2018-0802 an office zero-day vulnerability is evolved fromthe CVE-2017-11882 They both include EQNEDT32exe fileused to edit the formula in Office software which allowsremote attackers to execute arbitrary codes by constructinga malformed font name The modeling and computation ofinterdependent relationships among IOCs in HINTI facilitatethe discovery of such intricate connections between vulnera-bilities

In summary HINTI is capable of depicting a more compre-hensive threat landscape and the proposed CTI computingframework has the ability to bring novel security insightstoward different real-world security applications Howeverthere are still numerous opportunities for enhancing thesesecurity applications Specifically for attack preference mod-eling although each individual IP address is treated as anattacker we cannot determine whether it belongs to a realattacker or is disguised by a proxy Fortunately even if thereal attack address cannot be captured understanding the at-tack preferences of these IP proxies which are widely usedin cybercrime is also meaningful for gaining insight into thecyber threats For vulnerability similarity analysis data imbal-

252 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

(a) AV DV T AT (P13) (b) V DV T (P10) (c) V DPDTV T (P16)

Figure 9 Illustration of the vulnerability similarity analysis based on different meta-paths in which vulnerability i can be reducedinto a two-dimensional space (xiyi) and each cluster indicates a particular type of vulnerability

ance issue affects the performance of model and inadequatetraining samples often result in model underfitting as shownin the case of cluster 8 and cluster 10

7 Related Work

Cyber Treat Intelligence An increasing number of securityvendors and researchers start exploring CTI for protectingsystem security and defending against new threat vectors [28]Existing CTI extraction tools such as IBM X-Force12 Threatcrowd13 Openctiio14 AlienVault15 CleanMX16 and Phish-Tank17 use regular expression to synthesize IOC from thedescriptive texts However these methods often producehigh false positive rate by misjudging legitimate entities asIOCs [22]

Recently Balzarotti et al [2] develop a system to extractIOCs from web pages and identify malicious URLs fromJavaScript codes Sabottke et al [31] propose to detect po-tential vulnerability exploits by extracting and analyzing thetweets that contain ldquoCVErdquo keyword Liao et al [22] present atool iACE for automatically extracting IOCs which excelsat processing technology articles Nevertheless iACE identi-fies IOCs from a single article which does not consider therich semantic information from multi-source texts Zhao etal [46] define different ontologies to describe the relationshipbetween entities based on expert knowledge Numerous popu-lar CTI platforms including IODEF [9] STIX [3] TAXII [40]OpenIOC [13] and CyBox [19] focus on extracting and shar-ing CTI Yet none of the existing approaches could uncoverthe interdependent relations among CTIs extracted from multi-source texts let alone quantifying CTIsrsquo relevance and miningvaluable threat intelligence hidden behind the isolated CTIs

12httpsexchangexforceibmcloudcom13httpswwwthreatcrowdorg14httpsdemoopenctiio15 httpsotxalienvaultcom16httplistclean-mxcom17httpswwwphishtankcom

Heterogeneous Information Network Real-world systemsoften contain a large number of interacting multi-typed ob-jects which can naturally be expressed as a heterogeneousinformation network (HIN) HIN as a conceptual graph repre-sentation can effectively fuse information and exploit richersemantics in interacting objects and links [37] HIN has beenapplied to network traffic analysis [38] public social mediadata analysis [45] and large-scale document analysis [41]Recent applications of HIN include mobile malware detec-tion [14] and opioid user identification [12] In this paper forthe first time we use HIN for CTI modelingGraph Convolutional Network Graph convolutional net-works (GCN) [17] has become an effective tool for address-ing the task of machine learning on graphs such as semi-supervised node classification [17] event classification [29]clustering [8] link prediction [27] and recommended sys-tem [44] Given a graph GCN can directly conduct the con-volutional operation on the graph to learn the nonlinear em-bedding of nodes In our work to discern and reveal the inter-active relationships between IOCs we utilize GCN to learnmore discriminative representation from attributes and graphstructure simultaneously which is the premise for threat intel-ligence computing

8 Discussion

Data Availability The proposed framework assumes thatsufficient threat description data can be obtained for generat-ing comprehensive and the latest CTIs Fortunately with thegrowing prosperity of social media an increasing number ofsecurity-related data (eg blogs posts news and open secu-rity databases) can be collected effortlessly To automaticallycollect security-related data we develop a crawler system tocollect threat description data from 73 international securitysources (eg blogs hacker forum posts security bulletinsetc) providing sufficient raw materials for generating cyberthreat intelligenceModel Extensibility In this paper 6 types of IOCs and 9

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 253

types of relationships are modeled in HINTI However ourproposed framework is extensible in which more types ofIOCs and relationships can be enrolled to represent richer andmore comprehensive threat information such as maliciousdomains phishing Emails attack tools their interactions etcHigh-level Semantic Relations In view of the computa-tional complexity of the model our threat intelligence comput-ing method focues on utilizing the meta-paths to quantify thesimilarity between IOCs while ignoring the influence of themeta-graph on it which inevitably misses characterizing somehigh-level semantic information Nevertheless the proposedcomputing framework introduces an attention mechanism tolearn the signification of different meta-paths to character-ize IOCs and their interactive relationships which effectivelycompensates for the performance degradation caused by ig-noring the meta-graphsSecurity Knowledge Reasoning Although our proposedframework exhibits promising results in CTI extraction andmodeling computing how to implement advanced securityknowledge reasoning and prediction is still an open problemeg it remains challenging to predict whether a vulnerabilitycould potentially affect a particular type of devices in thefuture We will investigate this problem in the future

9 Conclusion

This work explores a new direction of threat intelligence com-puting which aims to uncover new knowledge in the relation-ships among different threat vectors We propose HINTI acyber threat intelligence framework to model and quantify theinterdependent relationships among different types of IOCsby leveraging heterogeneous graph convolutional networksWe develop a multi-granular attention mechanism to learnthe importance of different features and model the interde-pendent relationships among IOCs using HIN We proposethe concept of threat intelligence computing and present ageneral intelligence computing framework based on graphconvolutional networks Experimental results demonstratethat the proposed multi-granular attention based IOC extrac-tion method outperforms the existing state-of-the-art methodsThe proposed threat intelligence computing framework caneffectively mine security knowledge hidden in the interdepen-dent relationships among IOCs which enables crucial threatintelligence applications such as threat profiling and rankingattack preference modeling and vulnerability similarity analy-sis We would like to emphasize that the knowledge discoveryamong interdependent CTIs is a new field that calls for acollaborative effort from security experts and data scientists

In future we plan to develop a predicative and reasoningmodel based on HINTI and explore preventative countermea-sures to protect cyber infrastructure from future threats Wealso plan to add more types of IOCs and relations to depicta more comprehensive threat landscape Moreover we willleverage both meta-paths and meta-graphs to characterize the

IOCs and their interactions to further improve the embeddingperformance and to strike a balance between the accuracyand computational complexity of the model We will alsoinvestigate the feasibility of security knowledge predictionbased on HINTI to infer the potential future relationshipsbetween the vulnerabilities and devices

Acknowledgement

We would like to thank our shepherd Tobias Fiebig andthe anonymous reviewers for providing valuable feedbackon our work We also thank Hao Peng and Lichao Sunfor their feedback on the early version of this work Thiswork was supported in part by National Science Founda-tion grants CNS1950171 CNS-1949753 It was also sup-ported by the NSFC for Innovative Research Group ScienceFund Project (62141003) National Key RampD Program China(2018YFB0803 503) the 2018 joint Research Foundationof Ministry of Education China Mobile (MCM20180507)and the Opening Project of Shanghai Trusted Industrial Con-trol Platform (TICPSH202003020-ZC) Any opinions find-ings and conclusions or recommendations expressed in thismaterial do not necessarily reflect the views of any fundingagencies

References

[1] Jure Leskovec Aditya Grover node2vec Scalable fea-ture learning for networks In Acm Sigkdd InternationalConference on Knowledge Discovery Data Mining2016

[2] Marco Balduzzi Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications In WWW 2016

[3] Sean Barnum Standardizing cyber threat intelligenceinformation with the structured threat information ex-pression (stix) Mitre Corporation 111ndash22 2012

[4] Eric W Burger Michael D Goodman Panos Kam-panakis and Kevin A Zhu Taxonomy model for cyberthreat intelligence information exchange technologiesIn Proceedings of the 2014 ACM Workshop on Infor-mation Sharing amp Collaborative Security pages 51ndash602014

[5] Onur Catakoglu Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications The web conference pages 333ndash3432016

[6] Danqi Chen and Christopher D Manning A fast and ac-curate dependency parser using neural networks pages740ndash750 2014

254 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

[7] Qian Chen and Robert A Bridges Automated behav-ioral analysis of malware a case study of wannacry ran-somware In 16th IEEE ICMLA pages 454ndash460 2017

[8] Weilin Chiang Xuanqing Liu Si Si Yang Li SamyBengio and Chojui Hsieh Cluster-gcn An efficient al-gorithm for training deep and large graph convolutionalnetworks knowledge discovery and data mining pages257ndash266 2019

[9] Roman Danyliw Jan Meijer and Yuri Demchenko Theincident object description exchange format Interna-tional Journal of High Performance Computing Appli-cations 50701ndash92 2007

[10] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 2017

[11] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 ACM 2017

[12] Yujie Fan Yiming Zhang Yanfang Ye and Xin Li Au-tomatic opioid user detection from twitter Transductiveensemble built on different meta-graph based similari-ties over heterogeneous information network In IJCAIpages 3357ndash3363 2018

[13] Fireeye Openioc httpswwwfireeyecomblogthreat-research201310openioc-basicshtml Accessed January 20 2020

[14] Shifu Hou Yanfang Ye Yangqiu Song and Melih Ab-dulhayoglu Hindroid An intelligent android malwaredetection system based on structured heterogeneous in-formation network In Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Dis-covery and Data Mining pages 1507ndash1515 2017

[15] Zhiheng Huang Wei Xu and Kai Yu Bidirectional lstm-crf models for sequence tagging arXiv1508019912015

[16] Michael D Iannacone Shawn Bohn Grant NakamuraJohn Gerth Kelly MT Huffer Robert A Bridges Erik MFerragut and John R Goodall Developing an ontologyfor cyber security knowledge graphs CISR 1512 2015

[17] Thomas Kipf and Max Welling Semi-supervised clas-sification with graph convolutional networks arXivLearning 2016

[18] Thomas N Kipf and Max Welling Semi-supervisedclassification with graph convolutional networks arXivpreprint arXiv160902907 2016

[19] Tero Kokkonen Architecture for the cyber security situ-ational awareness system In Internet of Things SmartSpaces and Next Generation Networks and Systemspages 294ndash302 Springer 2016

[20] Mehmet Necip Kurt Yasin Yılmaz and Xiaodong WangDistributed quickest detection of cyber-attacks in smartgrid IEEE Transactions on Information Forensics andSecurity 13(8)2015ndash2030 2018

[21] John Lafferty Andrew Mccallum and Fernando PereiraConditional random fields Probabilistic models for seg-menting and labeling sequence data ICML pages 282ndash289 2001

[22] Xiaojing Liao Yuan Kan Xiao Feng Wang Li Zhouand Raheem Beyah Acing the ioc game Toward auto-matic discovery and analysis of open-source cyber threatintelligence In ACM Sigsac Conference on ComputerCommunications Security 2016

[23] Rob Mcmillan Open threat intelligencehttpwwwgartnercomdoc2487216definition-threat-intelligence AccessedJanuary 20 2020

[24] Tomas Mikolov Kai Chen Greg Corrado and JeffreyDean Efficient estimation of word representations invector space arXiv preprint arXiv13013781 2013

[25] Sudip Mittal Prajit Kumar Das Varish Mulwad Anu-pam Joshi and Tim Finin Cybertwitter Using twitterto generate alerts for cybersecurity threats and vulnera-bilities In Proceedings of the 2016 IEEE Advances inSocial Networks Analysis and Mining pages 860ndash8672016

[26] Eric Nunes Ahmad Diab Andrew Gunn EricssonMarin Vineet Mishra Vivin Paliath John RobertsonJana Shakarian Amanda Thart and Paulo ShakarianDarknet and deepnet mining for proactive cybersecuritythreat intelligence In 2016 IEEE ISI pages 7ndash12 2016

[27] Shirui Pan Ruiqi Hu Guodong Long Jing Jiang LinaYao and Chengqi Zhang Adversarially regularizedgraph autoencoder for graph embedding pages 2609ndash2615 2018

[28] P Pawlinski P Jaroszewski P Kijewski L SiewierskiP Jacewicz P Zielony and R Zuber Actionable infor-mation for security incident response European UnionAgency for Network and Information Security Herak-lion Greece 2014

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 255

[29] Hao Peng Jianxin Li Qiran Gong Yangqiu SongYuanxing Ning Kunfeng Lai and Philip S Yu Fine-grained event categorization with heterogeneous graphconvolutional networks In Proceedings of the 28th In-ternational Joint Conference on Artificial Intelligencepages 3238ndash3245 AAAI Press 2019

[30] Sara Qamar Zahid Anwar Mohammad Ashiqur Rah-man Ehab Al-Shaer and Bei-Tseng Chu Data-drivenanalytics for cyber-threat intelligence and informationsharing Computers amp Security 6735ndash58 2017

[31] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In USENIXSecurity 2015

[32] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In 24thUSENIX Security pages 1041ndash1056 2015

[33] Deepak Sharma and Avadhesha Surolia Degree Cen-trality Springer New York 2013

[34] Saurabh Singh Pradip Kumar Sharma Seo Yeon MoonDaesung Moon and Jong Hyuk Park A comprehensivestudy on apt attacks and countermeasures for futurenetworks and communications challenges and solutionsJournal of Supercomputing pages 1ndash32 2016

[35] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks Principles and methodologiesSynthesis Lectures on Data Mining and Knowledge Dis-covery 3(2)1ndash159 2012

[36] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks a structural analysis approachACM SIGKDD 14(2)20ndash28 2013

[37] Yizhou Sun Jiawei Han Xifeng Yan Philip S Yu andTianyi Wu Pathsim Meta path-based top-k similaritysearch in heterogeneous information networks Proceed-ings of the VLDB Endowment 4(11)992ndash1003 2011

[38] Yizhou Sun Brandon Norick Jiawei Han Xifeng YanPhilip S Yu and Xiao Yu Pathselclus Integrating meta-path selection with user-guided object clustering in het-erogeneous information networks ACM Transactionson TKDD 7(3)11 2013

[39] Wiem Tounsi and Helmi Rais A survey on technicalthreat intelligence in the age of sophisticated cyber at-tacks Computers amp Security 72212ndash233 2018

[40] Thomas D Wagner Esther Palomar Khaled Mahbuband Ali E Abdallah Towards an anonymity supportedplatform for shared cyber threat intelligence risks andsecurity of internet and systems pages 175ndash183 2017

[41] Chenguang Wang Yangqiu Song Haoran Li MingZhang and Jiawei Han Text classification with het-erogeneous information network kernels In 13th AAAI2016

[42] Xiao Wang Houye Ji Chuan Shi Bai Wang Peng CuiP Yu and Yanfang Ye Heterogeneous graph attentionnetwork 2019

[43] Jie Yang Shuailong Liang and Yue Zhang Design chal-lenges and misconceptions in neural sequence labelingarXiv Computation and Language 2018

[44] Rex Ying Ruining He Kaifeng Chen Pong Eksombat-chai William L Hamilton and Jure Leskovec Graphconvolutional neural networks for web-scale recom-mender systems In Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Dis-covery amp Data Mining pages 974ndash983 ACM 2018

[45] Jiawei Zhang Xiangnan Kong and Philip S Yu Trans-ferring heterogeneous links across location-based socialnetworks In The 7th ACM international conference onWeb search and data mining pages 303ndash312 2014

[46] Yishuai Zhao Bo Lang and Ming Liu Ontology-basedunified model for heterogeneous threat intelligence inte-gration and sharing In 2017 11th IEEE InternationalConference on Anti-counterfeiting Security and Identi-fication (ASID) pages 11ndash15 2017

256 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

  • Introduction
  • Background
    • Cyber Threat Intelligence
    • Motivation
    • Preliminaries
      • Architecture Overview of HINTI
      • Methodology
        • Multi-granular Attention Based IOC Extraction
        • Cyber Threat Intelligence Modeling
        • Threat Intelligence Computing
          • Experimental Evaluation
            • Dataset and Settings
            • Evaluation of IOC Extraction
              • Application of Threat Intelligence Computing
                • Threat Profiling and Significance Ranking of IOCs
                • Attack Preference Modeling
                • Vulnerability Similarity Analysis
                  • Related Work
                  • Discussion
                  • Conclusion
Page 12: Cyber Threat Intelligence Modeling Based on Heterogeneous ... · 2.1Cyber Threat Intelligence Cyber Threat Intelligence (CTI) extracted from security-related data is structured information

superior than the one with other meta-paths which indicatesthat P17 holds more valuable information that characterizesthe attack preferences of cybercriminals since P17 includesthe semantics information of P1P4P5 and P12 sim P15

In addition we compare the capabilities of our proposedcomputing framework with those of other state-of-the-art em-bedding methods in terms of attack preference modelingOur analysis result shows that the accuracy of attack pref-erence modeling reaches 081 which outperforms the exist-ing popular models Node2vec (with precision of 071) [1]metapaht2vec (with precision of 073) [11] and HAN (withprecision of 076) [42] The performance improvement canbe attributed to the following characteristics First our com-puting framework utilizes weight-learning to learn the signifi-cance of different meta-paths for evaluating the similarity be-tween attackers Second the proposed computing frameworkleverages GCN to learn the structural information betweenattackers to obtain more discriminative structural features thatimproves the performance of attack preference modeling

63 Vulnerability Similarity AnalysisVulnerability classification or clustering is crucial for conduct-ing vulnerability trend analysis the correlation analysis ofincidents and exploits and the evaluation of countermeasuresThe traditional vulnerability analysis relies on the manualinvestigation of the source codes which requires expert ex-pertise and consumes considerable efforts In this sectionwe propose an unsupervised vulnerability similarity analy-sis method based on the proposed threat intelligence com-puting framework which can automatically group similarvulnerabilities into corresponding communities Particularlythe vulnerability-related IOCs are first embedded into a low-dimensional vector space using CTI computing frameworkThen the DBSCAN algorithm is performed on the embed-ded vector space to cluster vulnerabilities into correspondingcommunities The clustering results are presented in Figure 9

Figure 9 (c) shows all vulnerabilities are clustered into 12clusters using meta-path V DPDTV T (P16) which is very closeto the classification standard (ie 13) recommended by CVEdetails an authoritative database that publishes vulnerabilityinformation By manually analyzing the training samples wefind that HTTP Response Splitting vulnerability does not ap-pear in our dataset Therefore our cluster number (ie 12)is consistent with CVE Details11 To further validate the ef-fectiveness of threat intelligence computing framework forvulnerability clustering we randomly select 100 vulnerabili-ties from each cluster for manual inspection to measure theconsistency of the vulnerability types in each cluster and theresults are presented in Table 7 Obviously the clustering per-formance of cluster 8 (ie File Inclusion) and cluster 10 (ieDirectory Traversal) is remarkably worse than other clustersTo explain the reason we examine our training data and the

11httpswwwcvedetailscom

Table 7 Accuracy of vulnerability clustering

Cluster ID Vulnerability type Accuracy

cluster1 Denial of Service 8012

cluster2 XSS 8353

cluster3 Execute Code 8150

cluster4 Overflow 7650

cluster5 Gain Privilege 9156

cluster6 Bypass Something 7174

cluster7 CSRF 9327

cluster8 File Inclusion 6172

cluster9 Gain Informa 7042

cluster10 Directory Traversal 6949

cluster11 Memory Corruption 8156

cluster12 SQL Injection 8067

average 7851

computing framework We found that the proportion of thesetwo types of vulnerabilities is too small (cluster 8 is 34and cluster 10 is 42) making our computing frameworkvery likely to be under-fit with insufficient data Howeverthe proposed computing framework performs well on mosttypes of vulnerabilities in an unsupervised manner especiallygiven sufficient samples (eg cluster 7 is 176 and cluster 5is 157)

In addition by examining the clustering results we havean observation that the vulnerabilities in the same cluster arelikely to have evolutionary relationships For instance CVE-2018-0802 an office zero-day vulnerability is evolved fromthe CVE-2017-11882 They both include EQNEDT32exe fileused to edit the formula in Office software which allowsremote attackers to execute arbitrary codes by constructinga malformed font name The modeling and computation ofinterdependent relationships among IOCs in HINTI facilitatethe discovery of such intricate connections between vulnera-bilities

In summary HINTI is capable of depicting a more compre-hensive threat landscape and the proposed CTI computingframework has the ability to bring novel security insightstoward different real-world security applications Howeverthere are still numerous opportunities for enhancing thesesecurity applications Specifically for attack preference mod-eling although each individual IP address is treated as anattacker we cannot determine whether it belongs to a realattacker or is disguised by a proxy Fortunately even if thereal attack address cannot be captured understanding the at-tack preferences of these IP proxies which are widely usedin cybercrime is also meaningful for gaining insight into thecyber threats For vulnerability similarity analysis data imbal-

252 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

(a) AV DV T AT (P13) (b) V DV T (P10) (c) V DPDTV T (P16)

Figure 9 Illustration of the vulnerability similarity analysis based on different meta-paths in which vulnerability i can be reducedinto a two-dimensional space (xiyi) and each cluster indicates a particular type of vulnerability

ance issue affects the performance of model and inadequatetraining samples often result in model underfitting as shownin the case of cluster 8 and cluster 10

7 Related Work

Cyber Treat Intelligence An increasing number of securityvendors and researchers start exploring CTI for protectingsystem security and defending against new threat vectors [28]Existing CTI extraction tools such as IBM X-Force12 Threatcrowd13 Openctiio14 AlienVault15 CleanMX16 and Phish-Tank17 use regular expression to synthesize IOC from thedescriptive texts However these methods often producehigh false positive rate by misjudging legitimate entities asIOCs [22]

Recently Balzarotti et al [2] develop a system to extractIOCs from web pages and identify malicious URLs fromJavaScript codes Sabottke et al [31] propose to detect po-tential vulnerability exploits by extracting and analyzing thetweets that contain ldquoCVErdquo keyword Liao et al [22] present atool iACE for automatically extracting IOCs which excelsat processing technology articles Nevertheless iACE identi-fies IOCs from a single article which does not consider therich semantic information from multi-source texts Zhao etal [46] define different ontologies to describe the relationshipbetween entities based on expert knowledge Numerous popu-lar CTI platforms including IODEF [9] STIX [3] TAXII [40]OpenIOC [13] and CyBox [19] focus on extracting and shar-ing CTI Yet none of the existing approaches could uncoverthe interdependent relations among CTIs extracted from multi-source texts let alone quantifying CTIsrsquo relevance and miningvaluable threat intelligence hidden behind the isolated CTIs

12httpsexchangexforceibmcloudcom13httpswwwthreatcrowdorg14httpsdemoopenctiio15 httpsotxalienvaultcom16httplistclean-mxcom17httpswwwphishtankcom

Heterogeneous Information Network Real-world systemsoften contain a large number of interacting multi-typed ob-jects which can naturally be expressed as a heterogeneousinformation network (HIN) HIN as a conceptual graph repre-sentation can effectively fuse information and exploit richersemantics in interacting objects and links [37] HIN has beenapplied to network traffic analysis [38] public social mediadata analysis [45] and large-scale document analysis [41]Recent applications of HIN include mobile malware detec-tion [14] and opioid user identification [12] In this paper forthe first time we use HIN for CTI modelingGraph Convolutional Network Graph convolutional net-works (GCN) [17] has become an effective tool for address-ing the task of machine learning on graphs such as semi-supervised node classification [17] event classification [29]clustering [8] link prediction [27] and recommended sys-tem [44] Given a graph GCN can directly conduct the con-volutional operation on the graph to learn the nonlinear em-bedding of nodes In our work to discern and reveal the inter-active relationships between IOCs we utilize GCN to learnmore discriminative representation from attributes and graphstructure simultaneously which is the premise for threat intel-ligence computing

8 Discussion

Data Availability The proposed framework assumes thatsufficient threat description data can be obtained for generat-ing comprehensive and the latest CTIs Fortunately with thegrowing prosperity of social media an increasing number ofsecurity-related data (eg blogs posts news and open secu-rity databases) can be collected effortlessly To automaticallycollect security-related data we develop a crawler system tocollect threat description data from 73 international securitysources (eg blogs hacker forum posts security bulletinsetc) providing sufficient raw materials for generating cyberthreat intelligenceModel Extensibility In this paper 6 types of IOCs and 9

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 253

types of relationships are modeled in HINTI However ourproposed framework is extensible in which more types ofIOCs and relationships can be enrolled to represent richer andmore comprehensive threat information such as maliciousdomains phishing Emails attack tools their interactions etcHigh-level Semantic Relations In view of the computa-tional complexity of the model our threat intelligence comput-ing method focues on utilizing the meta-paths to quantify thesimilarity between IOCs while ignoring the influence of themeta-graph on it which inevitably misses characterizing somehigh-level semantic information Nevertheless the proposedcomputing framework introduces an attention mechanism tolearn the signification of different meta-paths to character-ize IOCs and their interactive relationships which effectivelycompensates for the performance degradation caused by ig-noring the meta-graphsSecurity Knowledge Reasoning Although our proposedframework exhibits promising results in CTI extraction andmodeling computing how to implement advanced securityknowledge reasoning and prediction is still an open problemeg it remains challenging to predict whether a vulnerabilitycould potentially affect a particular type of devices in thefuture We will investigate this problem in the future

9 Conclusion

This work explores a new direction of threat intelligence com-puting which aims to uncover new knowledge in the relation-ships among different threat vectors We propose HINTI acyber threat intelligence framework to model and quantify theinterdependent relationships among different types of IOCsby leveraging heterogeneous graph convolutional networksWe develop a multi-granular attention mechanism to learnthe importance of different features and model the interde-pendent relationships among IOCs using HIN We proposethe concept of threat intelligence computing and present ageneral intelligence computing framework based on graphconvolutional networks Experimental results demonstratethat the proposed multi-granular attention based IOC extrac-tion method outperforms the existing state-of-the-art methodsThe proposed threat intelligence computing framework caneffectively mine security knowledge hidden in the interdepen-dent relationships among IOCs which enables crucial threatintelligence applications such as threat profiling and rankingattack preference modeling and vulnerability similarity analy-sis We would like to emphasize that the knowledge discoveryamong interdependent CTIs is a new field that calls for acollaborative effort from security experts and data scientists

In future we plan to develop a predicative and reasoningmodel based on HINTI and explore preventative countermea-sures to protect cyber infrastructure from future threats Wealso plan to add more types of IOCs and relations to depicta more comprehensive threat landscape Moreover we willleverage both meta-paths and meta-graphs to characterize the

IOCs and their interactions to further improve the embeddingperformance and to strike a balance between the accuracyand computational complexity of the model We will alsoinvestigate the feasibility of security knowledge predictionbased on HINTI to infer the potential future relationshipsbetween the vulnerabilities and devices

Acknowledgement

We would like to thank our shepherd Tobias Fiebig andthe anonymous reviewers for providing valuable feedbackon our work We also thank Hao Peng and Lichao Sunfor their feedback on the early version of this work Thiswork was supported in part by National Science Founda-tion grants CNS1950171 CNS-1949753 It was also sup-ported by the NSFC for Innovative Research Group ScienceFund Project (62141003) National Key RampD Program China(2018YFB0803 503) the 2018 joint Research Foundationof Ministry of Education China Mobile (MCM20180507)and the Opening Project of Shanghai Trusted Industrial Con-trol Platform (TICPSH202003020-ZC) Any opinions find-ings and conclusions or recommendations expressed in thismaterial do not necessarily reflect the views of any fundingagencies

References

[1] Jure Leskovec Aditya Grover node2vec Scalable fea-ture learning for networks In Acm Sigkdd InternationalConference on Knowledge Discovery Data Mining2016

[2] Marco Balduzzi Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications In WWW 2016

[3] Sean Barnum Standardizing cyber threat intelligenceinformation with the structured threat information ex-pression (stix) Mitre Corporation 111ndash22 2012

[4] Eric W Burger Michael D Goodman Panos Kam-panakis and Kevin A Zhu Taxonomy model for cyberthreat intelligence information exchange technologiesIn Proceedings of the 2014 ACM Workshop on Infor-mation Sharing amp Collaborative Security pages 51ndash602014

[5] Onur Catakoglu Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications The web conference pages 333ndash3432016

[6] Danqi Chen and Christopher D Manning A fast and ac-curate dependency parser using neural networks pages740ndash750 2014

254 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

[7] Qian Chen and Robert A Bridges Automated behav-ioral analysis of malware a case study of wannacry ran-somware In 16th IEEE ICMLA pages 454ndash460 2017

[8] Weilin Chiang Xuanqing Liu Si Si Yang Li SamyBengio and Chojui Hsieh Cluster-gcn An efficient al-gorithm for training deep and large graph convolutionalnetworks knowledge discovery and data mining pages257ndash266 2019

[9] Roman Danyliw Jan Meijer and Yuri Demchenko Theincident object description exchange format Interna-tional Journal of High Performance Computing Appli-cations 50701ndash92 2007

[10] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 2017

[11] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 ACM 2017

[12] Yujie Fan Yiming Zhang Yanfang Ye and Xin Li Au-tomatic opioid user detection from twitter Transductiveensemble built on different meta-graph based similari-ties over heterogeneous information network In IJCAIpages 3357ndash3363 2018

[13] Fireeye Openioc httpswwwfireeyecomblogthreat-research201310openioc-basicshtml Accessed January 20 2020

[14] Shifu Hou Yanfang Ye Yangqiu Song and Melih Ab-dulhayoglu Hindroid An intelligent android malwaredetection system based on structured heterogeneous in-formation network In Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Dis-covery and Data Mining pages 1507ndash1515 2017

[15] Zhiheng Huang Wei Xu and Kai Yu Bidirectional lstm-crf models for sequence tagging arXiv1508019912015

[16] Michael D Iannacone Shawn Bohn Grant NakamuraJohn Gerth Kelly MT Huffer Robert A Bridges Erik MFerragut and John R Goodall Developing an ontologyfor cyber security knowledge graphs CISR 1512 2015

[17] Thomas Kipf and Max Welling Semi-supervised clas-sification with graph convolutional networks arXivLearning 2016

[18] Thomas N Kipf and Max Welling Semi-supervisedclassification with graph convolutional networks arXivpreprint arXiv160902907 2016

[19] Tero Kokkonen Architecture for the cyber security situ-ational awareness system In Internet of Things SmartSpaces and Next Generation Networks and Systemspages 294ndash302 Springer 2016

[20] Mehmet Necip Kurt Yasin Yılmaz and Xiaodong WangDistributed quickest detection of cyber-attacks in smartgrid IEEE Transactions on Information Forensics andSecurity 13(8)2015ndash2030 2018

[21] John Lafferty Andrew Mccallum and Fernando PereiraConditional random fields Probabilistic models for seg-menting and labeling sequence data ICML pages 282ndash289 2001

[22] Xiaojing Liao Yuan Kan Xiao Feng Wang Li Zhouand Raheem Beyah Acing the ioc game Toward auto-matic discovery and analysis of open-source cyber threatintelligence In ACM Sigsac Conference on ComputerCommunications Security 2016

[23] Rob Mcmillan Open threat intelligencehttpwwwgartnercomdoc2487216definition-threat-intelligence AccessedJanuary 20 2020

[24] Tomas Mikolov Kai Chen Greg Corrado and JeffreyDean Efficient estimation of word representations invector space arXiv preprint arXiv13013781 2013

[25] Sudip Mittal Prajit Kumar Das Varish Mulwad Anu-pam Joshi and Tim Finin Cybertwitter Using twitterto generate alerts for cybersecurity threats and vulnera-bilities In Proceedings of the 2016 IEEE Advances inSocial Networks Analysis and Mining pages 860ndash8672016

[26] Eric Nunes Ahmad Diab Andrew Gunn EricssonMarin Vineet Mishra Vivin Paliath John RobertsonJana Shakarian Amanda Thart and Paulo ShakarianDarknet and deepnet mining for proactive cybersecuritythreat intelligence In 2016 IEEE ISI pages 7ndash12 2016

[27] Shirui Pan Ruiqi Hu Guodong Long Jing Jiang LinaYao and Chengqi Zhang Adversarially regularizedgraph autoencoder for graph embedding pages 2609ndash2615 2018

[28] P Pawlinski P Jaroszewski P Kijewski L SiewierskiP Jacewicz P Zielony and R Zuber Actionable infor-mation for security incident response European UnionAgency for Network and Information Security Herak-lion Greece 2014

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 255

[29] Hao Peng Jianxin Li Qiran Gong Yangqiu SongYuanxing Ning Kunfeng Lai and Philip S Yu Fine-grained event categorization with heterogeneous graphconvolutional networks In Proceedings of the 28th In-ternational Joint Conference on Artificial Intelligencepages 3238ndash3245 AAAI Press 2019

[30] Sara Qamar Zahid Anwar Mohammad Ashiqur Rah-man Ehab Al-Shaer and Bei-Tseng Chu Data-drivenanalytics for cyber-threat intelligence and informationsharing Computers amp Security 6735ndash58 2017

[31] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In USENIXSecurity 2015

[32] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In 24thUSENIX Security pages 1041ndash1056 2015

[33] Deepak Sharma and Avadhesha Surolia Degree Cen-trality Springer New York 2013

[34] Saurabh Singh Pradip Kumar Sharma Seo Yeon MoonDaesung Moon and Jong Hyuk Park A comprehensivestudy on apt attacks and countermeasures for futurenetworks and communications challenges and solutionsJournal of Supercomputing pages 1ndash32 2016

[35] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks Principles and methodologiesSynthesis Lectures on Data Mining and Knowledge Dis-covery 3(2)1ndash159 2012

[36] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks a structural analysis approachACM SIGKDD 14(2)20ndash28 2013

[37] Yizhou Sun Jiawei Han Xifeng Yan Philip S Yu andTianyi Wu Pathsim Meta path-based top-k similaritysearch in heterogeneous information networks Proceed-ings of the VLDB Endowment 4(11)992ndash1003 2011

[38] Yizhou Sun Brandon Norick Jiawei Han Xifeng YanPhilip S Yu and Xiao Yu Pathselclus Integrating meta-path selection with user-guided object clustering in het-erogeneous information networks ACM Transactionson TKDD 7(3)11 2013

[39] Wiem Tounsi and Helmi Rais A survey on technicalthreat intelligence in the age of sophisticated cyber at-tacks Computers amp Security 72212ndash233 2018

[40] Thomas D Wagner Esther Palomar Khaled Mahbuband Ali E Abdallah Towards an anonymity supportedplatform for shared cyber threat intelligence risks andsecurity of internet and systems pages 175ndash183 2017

[41] Chenguang Wang Yangqiu Song Haoran Li MingZhang and Jiawei Han Text classification with het-erogeneous information network kernels In 13th AAAI2016

[42] Xiao Wang Houye Ji Chuan Shi Bai Wang Peng CuiP Yu and Yanfang Ye Heterogeneous graph attentionnetwork 2019

[43] Jie Yang Shuailong Liang and Yue Zhang Design chal-lenges and misconceptions in neural sequence labelingarXiv Computation and Language 2018

[44] Rex Ying Ruining He Kaifeng Chen Pong Eksombat-chai William L Hamilton and Jure Leskovec Graphconvolutional neural networks for web-scale recom-mender systems In Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Dis-covery amp Data Mining pages 974ndash983 ACM 2018

[45] Jiawei Zhang Xiangnan Kong and Philip S Yu Trans-ferring heterogeneous links across location-based socialnetworks In The 7th ACM international conference onWeb search and data mining pages 303ndash312 2014

[46] Yishuai Zhao Bo Lang and Ming Liu Ontology-basedunified model for heterogeneous threat intelligence inte-gration and sharing In 2017 11th IEEE InternationalConference on Anti-counterfeiting Security and Identi-fication (ASID) pages 11ndash15 2017

256 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

  • Introduction
  • Background
    • Cyber Threat Intelligence
    • Motivation
    • Preliminaries
      • Architecture Overview of HINTI
      • Methodology
        • Multi-granular Attention Based IOC Extraction
        • Cyber Threat Intelligence Modeling
        • Threat Intelligence Computing
          • Experimental Evaluation
            • Dataset and Settings
            • Evaluation of IOC Extraction
              • Application of Threat Intelligence Computing
                • Threat Profiling and Significance Ranking of IOCs
                • Attack Preference Modeling
                • Vulnerability Similarity Analysis
                  • Related Work
                  • Discussion
                  • Conclusion
Page 13: Cyber Threat Intelligence Modeling Based on Heterogeneous ... · 2.1Cyber Threat Intelligence Cyber Threat Intelligence (CTI) extracted from security-related data is structured information

(a) AV DV T AT (P13) (b) V DV T (P10) (c) V DPDTV T (P16)

Figure 9 Illustration of the vulnerability similarity analysis based on different meta-paths in which vulnerability i can be reducedinto a two-dimensional space (xiyi) and each cluster indicates a particular type of vulnerability

ance issue affects the performance of model and inadequatetraining samples often result in model underfitting as shownin the case of cluster 8 and cluster 10

7 Related Work

Cyber Treat Intelligence An increasing number of securityvendors and researchers start exploring CTI for protectingsystem security and defending against new threat vectors [28]Existing CTI extraction tools such as IBM X-Force12 Threatcrowd13 Openctiio14 AlienVault15 CleanMX16 and Phish-Tank17 use regular expression to synthesize IOC from thedescriptive texts However these methods often producehigh false positive rate by misjudging legitimate entities asIOCs [22]

Recently Balzarotti et al [2] develop a system to extractIOCs from web pages and identify malicious URLs fromJavaScript codes Sabottke et al [31] propose to detect po-tential vulnerability exploits by extracting and analyzing thetweets that contain ldquoCVErdquo keyword Liao et al [22] present atool iACE for automatically extracting IOCs which excelsat processing technology articles Nevertheless iACE identi-fies IOCs from a single article which does not consider therich semantic information from multi-source texts Zhao etal [46] define different ontologies to describe the relationshipbetween entities based on expert knowledge Numerous popu-lar CTI platforms including IODEF [9] STIX [3] TAXII [40]OpenIOC [13] and CyBox [19] focus on extracting and shar-ing CTI Yet none of the existing approaches could uncoverthe interdependent relations among CTIs extracted from multi-source texts let alone quantifying CTIsrsquo relevance and miningvaluable threat intelligence hidden behind the isolated CTIs

12httpsexchangexforceibmcloudcom13httpswwwthreatcrowdorg14httpsdemoopenctiio15 httpsotxalienvaultcom16httplistclean-mxcom17httpswwwphishtankcom

Heterogeneous Information Network Real-world systemsoften contain a large number of interacting multi-typed ob-jects which can naturally be expressed as a heterogeneousinformation network (HIN) HIN as a conceptual graph repre-sentation can effectively fuse information and exploit richersemantics in interacting objects and links [37] HIN has beenapplied to network traffic analysis [38] public social mediadata analysis [45] and large-scale document analysis [41]Recent applications of HIN include mobile malware detec-tion [14] and opioid user identification [12] In this paper forthe first time we use HIN for CTI modelingGraph Convolutional Network Graph convolutional net-works (GCN) [17] has become an effective tool for address-ing the task of machine learning on graphs such as semi-supervised node classification [17] event classification [29]clustering [8] link prediction [27] and recommended sys-tem [44] Given a graph GCN can directly conduct the con-volutional operation on the graph to learn the nonlinear em-bedding of nodes In our work to discern and reveal the inter-active relationships between IOCs we utilize GCN to learnmore discriminative representation from attributes and graphstructure simultaneously which is the premise for threat intel-ligence computing

8 Discussion

Data Availability The proposed framework assumes thatsufficient threat description data can be obtained for generat-ing comprehensive and the latest CTIs Fortunately with thegrowing prosperity of social media an increasing number ofsecurity-related data (eg blogs posts news and open secu-rity databases) can be collected effortlessly To automaticallycollect security-related data we develop a crawler system tocollect threat description data from 73 international securitysources (eg blogs hacker forum posts security bulletinsetc) providing sufficient raw materials for generating cyberthreat intelligenceModel Extensibility In this paper 6 types of IOCs and 9

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 253

types of relationships are modeled in HINTI However ourproposed framework is extensible in which more types ofIOCs and relationships can be enrolled to represent richer andmore comprehensive threat information such as maliciousdomains phishing Emails attack tools their interactions etcHigh-level Semantic Relations In view of the computa-tional complexity of the model our threat intelligence comput-ing method focues on utilizing the meta-paths to quantify thesimilarity between IOCs while ignoring the influence of themeta-graph on it which inevitably misses characterizing somehigh-level semantic information Nevertheless the proposedcomputing framework introduces an attention mechanism tolearn the signification of different meta-paths to character-ize IOCs and their interactive relationships which effectivelycompensates for the performance degradation caused by ig-noring the meta-graphsSecurity Knowledge Reasoning Although our proposedframework exhibits promising results in CTI extraction andmodeling computing how to implement advanced securityknowledge reasoning and prediction is still an open problemeg it remains challenging to predict whether a vulnerabilitycould potentially affect a particular type of devices in thefuture We will investigate this problem in the future

9 Conclusion

This work explores a new direction of threat intelligence com-puting which aims to uncover new knowledge in the relation-ships among different threat vectors We propose HINTI acyber threat intelligence framework to model and quantify theinterdependent relationships among different types of IOCsby leveraging heterogeneous graph convolutional networksWe develop a multi-granular attention mechanism to learnthe importance of different features and model the interde-pendent relationships among IOCs using HIN We proposethe concept of threat intelligence computing and present ageneral intelligence computing framework based on graphconvolutional networks Experimental results demonstratethat the proposed multi-granular attention based IOC extrac-tion method outperforms the existing state-of-the-art methodsThe proposed threat intelligence computing framework caneffectively mine security knowledge hidden in the interdepen-dent relationships among IOCs which enables crucial threatintelligence applications such as threat profiling and rankingattack preference modeling and vulnerability similarity analy-sis We would like to emphasize that the knowledge discoveryamong interdependent CTIs is a new field that calls for acollaborative effort from security experts and data scientists

In future we plan to develop a predicative and reasoningmodel based on HINTI and explore preventative countermea-sures to protect cyber infrastructure from future threats Wealso plan to add more types of IOCs and relations to depicta more comprehensive threat landscape Moreover we willleverage both meta-paths and meta-graphs to characterize the

IOCs and their interactions to further improve the embeddingperformance and to strike a balance between the accuracyand computational complexity of the model We will alsoinvestigate the feasibility of security knowledge predictionbased on HINTI to infer the potential future relationshipsbetween the vulnerabilities and devices

Acknowledgement

We would like to thank our shepherd Tobias Fiebig andthe anonymous reviewers for providing valuable feedbackon our work We also thank Hao Peng and Lichao Sunfor their feedback on the early version of this work Thiswork was supported in part by National Science Founda-tion grants CNS1950171 CNS-1949753 It was also sup-ported by the NSFC for Innovative Research Group ScienceFund Project (62141003) National Key RampD Program China(2018YFB0803 503) the 2018 joint Research Foundationof Ministry of Education China Mobile (MCM20180507)and the Opening Project of Shanghai Trusted Industrial Con-trol Platform (TICPSH202003020-ZC) Any opinions find-ings and conclusions or recommendations expressed in thismaterial do not necessarily reflect the views of any fundingagencies

References

[1] Jure Leskovec Aditya Grover node2vec Scalable fea-ture learning for networks In Acm Sigkdd InternationalConference on Knowledge Discovery Data Mining2016

[2] Marco Balduzzi Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications In WWW 2016

[3] Sean Barnum Standardizing cyber threat intelligenceinformation with the structured threat information ex-pression (stix) Mitre Corporation 111ndash22 2012

[4] Eric W Burger Michael D Goodman Panos Kam-panakis and Kevin A Zhu Taxonomy model for cyberthreat intelligence information exchange technologiesIn Proceedings of the 2014 ACM Workshop on Infor-mation Sharing amp Collaborative Security pages 51ndash602014

[5] Onur Catakoglu Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications The web conference pages 333ndash3432016

[6] Danqi Chen and Christopher D Manning A fast and ac-curate dependency parser using neural networks pages740ndash750 2014

254 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

[7] Qian Chen and Robert A Bridges Automated behav-ioral analysis of malware a case study of wannacry ran-somware In 16th IEEE ICMLA pages 454ndash460 2017

[8] Weilin Chiang Xuanqing Liu Si Si Yang Li SamyBengio and Chojui Hsieh Cluster-gcn An efficient al-gorithm for training deep and large graph convolutionalnetworks knowledge discovery and data mining pages257ndash266 2019

[9] Roman Danyliw Jan Meijer and Yuri Demchenko Theincident object description exchange format Interna-tional Journal of High Performance Computing Appli-cations 50701ndash92 2007

[10] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 2017

[11] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 ACM 2017

[12] Yujie Fan Yiming Zhang Yanfang Ye and Xin Li Au-tomatic opioid user detection from twitter Transductiveensemble built on different meta-graph based similari-ties over heterogeneous information network In IJCAIpages 3357ndash3363 2018

[13] Fireeye Openioc httpswwwfireeyecomblogthreat-research201310openioc-basicshtml Accessed January 20 2020

[14] Shifu Hou Yanfang Ye Yangqiu Song and Melih Ab-dulhayoglu Hindroid An intelligent android malwaredetection system based on structured heterogeneous in-formation network In Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Dis-covery and Data Mining pages 1507ndash1515 2017

[15] Zhiheng Huang Wei Xu and Kai Yu Bidirectional lstm-crf models for sequence tagging arXiv1508019912015

[16] Michael D Iannacone Shawn Bohn Grant NakamuraJohn Gerth Kelly MT Huffer Robert A Bridges Erik MFerragut and John R Goodall Developing an ontologyfor cyber security knowledge graphs CISR 1512 2015

[17] Thomas Kipf and Max Welling Semi-supervised clas-sification with graph convolutional networks arXivLearning 2016

[18] Thomas N Kipf and Max Welling Semi-supervisedclassification with graph convolutional networks arXivpreprint arXiv160902907 2016

[19] Tero Kokkonen Architecture for the cyber security situ-ational awareness system In Internet of Things SmartSpaces and Next Generation Networks and Systemspages 294ndash302 Springer 2016

[20] Mehmet Necip Kurt Yasin Yılmaz and Xiaodong WangDistributed quickest detection of cyber-attacks in smartgrid IEEE Transactions on Information Forensics andSecurity 13(8)2015ndash2030 2018

[21] John Lafferty Andrew Mccallum and Fernando PereiraConditional random fields Probabilistic models for seg-menting and labeling sequence data ICML pages 282ndash289 2001

[22] Xiaojing Liao Yuan Kan Xiao Feng Wang Li Zhouand Raheem Beyah Acing the ioc game Toward auto-matic discovery and analysis of open-source cyber threatintelligence In ACM Sigsac Conference on ComputerCommunications Security 2016

[23] Rob Mcmillan Open threat intelligencehttpwwwgartnercomdoc2487216definition-threat-intelligence AccessedJanuary 20 2020

[24] Tomas Mikolov Kai Chen Greg Corrado and JeffreyDean Efficient estimation of word representations invector space arXiv preprint arXiv13013781 2013

[25] Sudip Mittal Prajit Kumar Das Varish Mulwad Anu-pam Joshi and Tim Finin Cybertwitter Using twitterto generate alerts for cybersecurity threats and vulnera-bilities In Proceedings of the 2016 IEEE Advances inSocial Networks Analysis and Mining pages 860ndash8672016

[26] Eric Nunes Ahmad Diab Andrew Gunn EricssonMarin Vineet Mishra Vivin Paliath John RobertsonJana Shakarian Amanda Thart and Paulo ShakarianDarknet and deepnet mining for proactive cybersecuritythreat intelligence In 2016 IEEE ISI pages 7ndash12 2016

[27] Shirui Pan Ruiqi Hu Guodong Long Jing Jiang LinaYao and Chengqi Zhang Adversarially regularizedgraph autoencoder for graph embedding pages 2609ndash2615 2018

[28] P Pawlinski P Jaroszewski P Kijewski L SiewierskiP Jacewicz P Zielony and R Zuber Actionable infor-mation for security incident response European UnionAgency for Network and Information Security Herak-lion Greece 2014

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 255

[29] Hao Peng Jianxin Li Qiran Gong Yangqiu SongYuanxing Ning Kunfeng Lai and Philip S Yu Fine-grained event categorization with heterogeneous graphconvolutional networks In Proceedings of the 28th In-ternational Joint Conference on Artificial Intelligencepages 3238ndash3245 AAAI Press 2019

[30] Sara Qamar Zahid Anwar Mohammad Ashiqur Rah-man Ehab Al-Shaer and Bei-Tseng Chu Data-drivenanalytics for cyber-threat intelligence and informationsharing Computers amp Security 6735ndash58 2017

[31] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In USENIXSecurity 2015

[32] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In 24thUSENIX Security pages 1041ndash1056 2015

[33] Deepak Sharma and Avadhesha Surolia Degree Cen-trality Springer New York 2013

[34] Saurabh Singh Pradip Kumar Sharma Seo Yeon MoonDaesung Moon and Jong Hyuk Park A comprehensivestudy on apt attacks and countermeasures for futurenetworks and communications challenges and solutionsJournal of Supercomputing pages 1ndash32 2016

[35] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks Principles and methodologiesSynthesis Lectures on Data Mining and Knowledge Dis-covery 3(2)1ndash159 2012

[36] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks a structural analysis approachACM SIGKDD 14(2)20ndash28 2013

[37] Yizhou Sun Jiawei Han Xifeng Yan Philip S Yu andTianyi Wu Pathsim Meta path-based top-k similaritysearch in heterogeneous information networks Proceed-ings of the VLDB Endowment 4(11)992ndash1003 2011

[38] Yizhou Sun Brandon Norick Jiawei Han Xifeng YanPhilip S Yu and Xiao Yu Pathselclus Integrating meta-path selection with user-guided object clustering in het-erogeneous information networks ACM Transactionson TKDD 7(3)11 2013

[39] Wiem Tounsi and Helmi Rais A survey on technicalthreat intelligence in the age of sophisticated cyber at-tacks Computers amp Security 72212ndash233 2018

[40] Thomas D Wagner Esther Palomar Khaled Mahbuband Ali E Abdallah Towards an anonymity supportedplatform for shared cyber threat intelligence risks andsecurity of internet and systems pages 175ndash183 2017

[41] Chenguang Wang Yangqiu Song Haoran Li MingZhang and Jiawei Han Text classification with het-erogeneous information network kernels In 13th AAAI2016

[42] Xiao Wang Houye Ji Chuan Shi Bai Wang Peng CuiP Yu and Yanfang Ye Heterogeneous graph attentionnetwork 2019

[43] Jie Yang Shuailong Liang and Yue Zhang Design chal-lenges and misconceptions in neural sequence labelingarXiv Computation and Language 2018

[44] Rex Ying Ruining He Kaifeng Chen Pong Eksombat-chai William L Hamilton and Jure Leskovec Graphconvolutional neural networks for web-scale recom-mender systems In Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Dis-covery amp Data Mining pages 974ndash983 ACM 2018

[45] Jiawei Zhang Xiangnan Kong and Philip S Yu Trans-ferring heterogeneous links across location-based socialnetworks In The 7th ACM international conference onWeb search and data mining pages 303ndash312 2014

[46] Yishuai Zhao Bo Lang and Ming Liu Ontology-basedunified model for heterogeneous threat intelligence inte-gration and sharing In 2017 11th IEEE InternationalConference on Anti-counterfeiting Security and Identi-fication (ASID) pages 11ndash15 2017

256 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

  • Introduction
  • Background
    • Cyber Threat Intelligence
    • Motivation
    • Preliminaries
      • Architecture Overview of HINTI
      • Methodology
        • Multi-granular Attention Based IOC Extraction
        • Cyber Threat Intelligence Modeling
        • Threat Intelligence Computing
          • Experimental Evaluation
            • Dataset and Settings
            • Evaluation of IOC Extraction
              • Application of Threat Intelligence Computing
                • Threat Profiling and Significance Ranking of IOCs
                • Attack Preference Modeling
                • Vulnerability Similarity Analysis
                  • Related Work
                  • Discussion
                  • Conclusion
Page 14: Cyber Threat Intelligence Modeling Based on Heterogeneous ... · 2.1Cyber Threat Intelligence Cyber Threat Intelligence (CTI) extracted from security-related data is structured information

types of relationships are modeled in HINTI However ourproposed framework is extensible in which more types ofIOCs and relationships can be enrolled to represent richer andmore comprehensive threat information such as maliciousdomains phishing Emails attack tools their interactions etcHigh-level Semantic Relations In view of the computa-tional complexity of the model our threat intelligence comput-ing method focues on utilizing the meta-paths to quantify thesimilarity between IOCs while ignoring the influence of themeta-graph on it which inevitably misses characterizing somehigh-level semantic information Nevertheless the proposedcomputing framework introduces an attention mechanism tolearn the signification of different meta-paths to character-ize IOCs and their interactive relationships which effectivelycompensates for the performance degradation caused by ig-noring the meta-graphsSecurity Knowledge Reasoning Although our proposedframework exhibits promising results in CTI extraction andmodeling computing how to implement advanced securityknowledge reasoning and prediction is still an open problemeg it remains challenging to predict whether a vulnerabilitycould potentially affect a particular type of devices in thefuture We will investigate this problem in the future

9 Conclusion

This work explores a new direction of threat intelligence com-puting which aims to uncover new knowledge in the relation-ships among different threat vectors We propose HINTI acyber threat intelligence framework to model and quantify theinterdependent relationships among different types of IOCsby leveraging heterogeneous graph convolutional networksWe develop a multi-granular attention mechanism to learnthe importance of different features and model the interde-pendent relationships among IOCs using HIN We proposethe concept of threat intelligence computing and present ageneral intelligence computing framework based on graphconvolutional networks Experimental results demonstratethat the proposed multi-granular attention based IOC extrac-tion method outperforms the existing state-of-the-art methodsThe proposed threat intelligence computing framework caneffectively mine security knowledge hidden in the interdepen-dent relationships among IOCs which enables crucial threatintelligence applications such as threat profiling and rankingattack preference modeling and vulnerability similarity analy-sis We would like to emphasize that the knowledge discoveryamong interdependent CTIs is a new field that calls for acollaborative effort from security experts and data scientists

In future we plan to develop a predicative and reasoningmodel based on HINTI and explore preventative countermea-sures to protect cyber infrastructure from future threats Wealso plan to add more types of IOCs and relations to depicta more comprehensive threat landscape Moreover we willleverage both meta-paths and meta-graphs to characterize the

IOCs and their interactions to further improve the embeddingperformance and to strike a balance between the accuracyand computational complexity of the model We will alsoinvestigate the feasibility of security knowledge predictionbased on HINTI to infer the potential future relationshipsbetween the vulnerabilities and devices

Acknowledgement

We would like to thank our shepherd Tobias Fiebig andthe anonymous reviewers for providing valuable feedbackon our work We also thank Hao Peng and Lichao Sunfor their feedback on the early version of this work Thiswork was supported in part by National Science Founda-tion grants CNS1950171 CNS-1949753 It was also sup-ported by the NSFC for Innovative Research Group ScienceFund Project (62141003) National Key RampD Program China(2018YFB0803 503) the 2018 joint Research Foundationof Ministry of Education China Mobile (MCM20180507)and the Opening Project of Shanghai Trusted Industrial Con-trol Platform (TICPSH202003020-ZC) Any opinions find-ings and conclusions or recommendations expressed in thismaterial do not necessarily reflect the views of any fundingagencies

References

[1] Jure Leskovec Aditya Grover node2vec Scalable fea-ture learning for networks In Acm Sigkdd InternationalConference on Knowledge Discovery Data Mining2016

[2] Marco Balduzzi Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications In WWW 2016

[3] Sean Barnum Standardizing cyber threat intelligenceinformation with the structured threat information ex-pression (stix) Mitre Corporation 111ndash22 2012

[4] Eric W Burger Michael D Goodman Panos Kam-panakis and Kevin A Zhu Taxonomy model for cyberthreat intelligence information exchange technologiesIn Proceedings of the 2014 ACM Workshop on Infor-mation Sharing amp Collaborative Security pages 51ndash602014

[5] Onur Catakoglu Marco Balduzzi and Davide BalzarottiAutomatic extraction of indicators of compromise forweb applications The web conference pages 333ndash3432016

[6] Danqi Chen and Christopher D Manning A fast and ac-curate dependency parser using neural networks pages740ndash750 2014

254 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

[7] Qian Chen and Robert A Bridges Automated behav-ioral analysis of malware a case study of wannacry ran-somware In 16th IEEE ICMLA pages 454ndash460 2017

[8] Weilin Chiang Xuanqing Liu Si Si Yang Li SamyBengio and Chojui Hsieh Cluster-gcn An efficient al-gorithm for training deep and large graph convolutionalnetworks knowledge discovery and data mining pages257ndash266 2019

[9] Roman Danyliw Jan Meijer and Yuri Demchenko Theincident object description exchange format Interna-tional Journal of High Performance Computing Appli-cations 50701ndash92 2007

[10] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 2017

[11] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 ACM 2017

[12] Yujie Fan Yiming Zhang Yanfang Ye and Xin Li Au-tomatic opioid user detection from twitter Transductiveensemble built on different meta-graph based similari-ties over heterogeneous information network In IJCAIpages 3357ndash3363 2018

[13] Fireeye Openioc httpswwwfireeyecomblogthreat-research201310openioc-basicshtml Accessed January 20 2020

[14] Shifu Hou Yanfang Ye Yangqiu Song and Melih Ab-dulhayoglu Hindroid An intelligent android malwaredetection system based on structured heterogeneous in-formation network In Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Dis-covery and Data Mining pages 1507ndash1515 2017

[15] Zhiheng Huang Wei Xu and Kai Yu Bidirectional lstm-crf models for sequence tagging arXiv1508019912015

[16] Michael D Iannacone Shawn Bohn Grant NakamuraJohn Gerth Kelly MT Huffer Robert A Bridges Erik MFerragut and John R Goodall Developing an ontologyfor cyber security knowledge graphs CISR 1512 2015

[17] Thomas Kipf and Max Welling Semi-supervised clas-sification with graph convolutional networks arXivLearning 2016

[18] Thomas N Kipf and Max Welling Semi-supervisedclassification with graph convolutional networks arXivpreprint arXiv160902907 2016

[19] Tero Kokkonen Architecture for the cyber security situ-ational awareness system In Internet of Things SmartSpaces and Next Generation Networks and Systemspages 294ndash302 Springer 2016

[20] Mehmet Necip Kurt Yasin Yılmaz and Xiaodong WangDistributed quickest detection of cyber-attacks in smartgrid IEEE Transactions on Information Forensics andSecurity 13(8)2015ndash2030 2018

[21] John Lafferty Andrew Mccallum and Fernando PereiraConditional random fields Probabilistic models for seg-menting and labeling sequence data ICML pages 282ndash289 2001

[22] Xiaojing Liao Yuan Kan Xiao Feng Wang Li Zhouand Raheem Beyah Acing the ioc game Toward auto-matic discovery and analysis of open-source cyber threatintelligence In ACM Sigsac Conference on ComputerCommunications Security 2016

[23] Rob Mcmillan Open threat intelligencehttpwwwgartnercomdoc2487216definition-threat-intelligence AccessedJanuary 20 2020

[24] Tomas Mikolov Kai Chen Greg Corrado and JeffreyDean Efficient estimation of word representations invector space arXiv preprint arXiv13013781 2013

[25] Sudip Mittal Prajit Kumar Das Varish Mulwad Anu-pam Joshi and Tim Finin Cybertwitter Using twitterto generate alerts for cybersecurity threats and vulnera-bilities In Proceedings of the 2016 IEEE Advances inSocial Networks Analysis and Mining pages 860ndash8672016

[26] Eric Nunes Ahmad Diab Andrew Gunn EricssonMarin Vineet Mishra Vivin Paliath John RobertsonJana Shakarian Amanda Thart and Paulo ShakarianDarknet and deepnet mining for proactive cybersecuritythreat intelligence In 2016 IEEE ISI pages 7ndash12 2016

[27] Shirui Pan Ruiqi Hu Guodong Long Jing Jiang LinaYao and Chengqi Zhang Adversarially regularizedgraph autoencoder for graph embedding pages 2609ndash2615 2018

[28] P Pawlinski P Jaroszewski P Kijewski L SiewierskiP Jacewicz P Zielony and R Zuber Actionable infor-mation for security incident response European UnionAgency for Network and Information Security Herak-lion Greece 2014

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 255

[29] Hao Peng Jianxin Li Qiran Gong Yangqiu SongYuanxing Ning Kunfeng Lai and Philip S Yu Fine-grained event categorization with heterogeneous graphconvolutional networks In Proceedings of the 28th In-ternational Joint Conference on Artificial Intelligencepages 3238ndash3245 AAAI Press 2019

[30] Sara Qamar Zahid Anwar Mohammad Ashiqur Rah-man Ehab Al-Shaer and Bei-Tseng Chu Data-drivenanalytics for cyber-threat intelligence and informationsharing Computers amp Security 6735ndash58 2017

[31] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In USENIXSecurity 2015

[32] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In 24thUSENIX Security pages 1041ndash1056 2015

[33] Deepak Sharma and Avadhesha Surolia Degree Cen-trality Springer New York 2013

[34] Saurabh Singh Pradip Kumar Sharma Seo Yeon MoonDaesung Moon and Jong Hyuk Park A comprehensivestudy on apt attacks and countermeasures for futurenetworks and communications challenges and solutionsJournal of Supercomputing pages 1ndash32 2016

[35] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks Principles and methodologiesSynthesis Lectures on Data Mining and Knowledge Dis-covery 3(2)1ndash159 2012

[36] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks a structural analysis approachACM SIGKDD 14(2)20ndash28 2013

[37] Yizhou Sun Jiawei Han Xifeng Yan Philip S Yu andTianyi Wu Pathsim Meta path-based top-k similaritysearch in heterogeneous information networks Proceed-ings of the VLDB Endowment 4(11)992ndash1003 2011

[38] Yizhou Sun Brandon Norick Jiawei Han Xifeng YanPhilip S Yu and Xiao Yu Pathselclus Integrating meta-path selection with user-guided object clustering in het-erogeneous information networks ACM Transactionson TKDD 7(3)11 2013

[39] Wiem Tounsi and Helmi Rais A survey on technicalthreat intelligence in the age of sophisticated cyber at-tacks Computers amp Security 72212ndash233 2018

[40] Thomas D Wagner Esther Palomar Khaled Mahbuband Ali E Abdallah Towards an anonymity supportedplatform for shared cyber threat intelligence risks andsecurity of internet and systems pages 175ndash183 2017

[41] Chenguang Wang Yangqiu Song Haoran Li MingZhang and Jiawei Han Text classification with het-erogeneous information network kernels In 13th AAAI2016

[42] Xiao Wang Houye Ji Chuan Shi Bai Wang Peng CuiP Yu and Yanfang Ye Heterogeneous graph attentionnetwork 2019

[43] Jie Yang Shuailong Liang and Yue Zhang Design chal-lenges and misconceptions in neural sequence labelingarXiv Computation and Language 2018

[44] Rex Ying Ruining He Kaifeng Chen Pong Eksombat-chai William L Hamilton and Jure Leskovec Graphconvolutional neural networks for web-scale recom-mender systems In Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Dis-covery amp Data Mining pages 974ndash983 ACM 2018

[45] Jiawei Zhang Xiangnan Kong and Philip S Yu Trans-ferring heterogeneous links across location-based socialnetworks In The 7th ACM international conference onWeb search and data mining pages 303ndash312 2014

[46] Yishuai Zhao Bo Lang and Ming Liu Ontology-basedunified model for heterogeneous threat intelligence inte-gration and sharing In 2017 11th IEEE InternationalConference on Anti-counterfeiting Security and Identi-fication (ASID) pages 11ndash15 2017

256 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

  • Introduction
  • Background
    • Cyber Threat Intelligence
    • Motivation
    • Preliminaries
      • Architecture Overview of HINTI
      • Methodology
        • Multi-granular Attention Based IOC Extraction
        • Cyber Threat Intelligence Modeling
        • Threat Intelligence Computing
          • Experimental Evaluation
            • Dataset and Settings
            • Evaluation of IOC Extraction
              • Application of Threat Intelligence Computing
                • Threat Profiling and Significance Ranking of IOCs
                • Attack Preference Modeling
                • Vulnerability Similarity Analysis
                  • Related Work
                  • Discussion
                  • Conclusion
Page 15: Cyber Threat Intelligence Modeling Based on Heterogeneous ... · 2.1Cyber Threat Intelligence Cyber Threat Intelligence (CTI) extracted from security-related data is structured information

[7] Qian Chen and Robert A Bridges Automated behav-ioral analysis of malware a case study of wannacry ran-somware In 16th IEEE ICMLA pages 454ndash460 2017

[8] Weilin Chiang Xuanqing Liu Si Si Yang Li SamyBengio and Chojui Hsieh Cluster-gcn An efficient al-gorithm for training deep and large graph convolutionalnetworks knowledge discovery and data mining pages257ndash266 2019

[9] Roman Danyliw Jan Meijer and Yuri Demchenko Theincident object description exchange format Interna-tional Journal of High Performance Computing Appli-cations 50701ndash92 2007

[10] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 2017

[11] Yuxiao Dong Nitesh V Chawla and Ananthram Swamimetapath2vec Scalable representation learning for het-erogeneous networks In 23rd ACM SIGKDD pages135ndash144 ACM 2017

[12] Yujie Fan Yiming Zhang Yanfang Ye and Xin Li Au-tomatic opioid user detection from twitter Transductiveensemble built on different meta-graph based similari-ties over heterogeneous information network In IJCAIpages 3357ndash3363 2018

[13] Fireeye Openioc httpswwwfireeyecomblogthreat-research201310openioc-basicshtml Accessed January 20 2020

[14] Shifu Hou Yanfang Ye Yangqiu Song and Melih Ab-dulhayoglu Hindroid An intelligent android malwaredetection system based on structured heterogeneous in-formation network In Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Dis-covery and Data Mining pages 1507ndash1515 2017

[15] Zhiheng Huang Wei Xu and Kai Yu Bidirectional lstm-crf models for sequence tagging arXiv1508019912015

[16] Michael D Iannacone Shawn Bohn Grant NakamuraJohn Gerth Kelly MT Huffer Robert A Bridges Erik MFerragut and John R Goodall Developing an ontologyfor cyber security knowledge graphs CISR 1512 2015

[17] Thomas Kipf and Max Welling Semi-supervised clas-sification with graph convolutional networks arXivLearning 2016

[18] Thomas N Kipf and Max Welling Semi-supervisedclassification with graph convolutional networks arXivpreprint arXiv160902907 2016

[19] Tero Kokkonen Architecture for the cyber security situ-ational awareness system In Internet of Things SmartSpaces and Next Generation Networks and Systemspages 294ndash302 Springer 2016

[20] Mehmet Necip Kurt Yasin Yılmaz and Xiaodong WangDistributed quickest detection of cyber-attacks in smartgrid IEEE Transactions on Information Forensics andSecurity 13(8)2015ndash2030 2018

[21] John Lafferty Andrew Mccallum and Fernando PereiraConditional random fields Probabilistic models for seg-menting and labeling sequence data ICML pages 282ndash289 2001

[22] Xiaojing Liao Yuan Kan Xiao Feng Wang Li Zhouand Raheem Beyah Acing the ioc game Toward auto-matic discovery and analysis of open-source cyber threatintelligence In ACM Sigsac Conference on ComputerCommunications Security 2016

[23] Rob Mcmillan Open threat intelligencehttpwwwgartnercomdoc2487216definition-threat-intelligence AccessedJanuary 20 2020

[24] Tomas Mikolov Kai Chen Greg Corrado and JeffreyDean Efficient estimation of word representations invector space arXiv preprint arXiv13013781 2013

[25] Sudip Mittal Prajit Kumar Das Varish Mulwad Anu-pam Joshi and Tim Finin Cybertwitter Using twitterto generate alerts for cybersecurity threats and vulnera-bilities In Proceedings of the 2016 IEEE Advances inSocial Networks Analysis and Mining pages 860ndash8672016

[26] Eric Nunes Ahmad Diab Andrew Gunn EricssonMarin Vineet Mishra Vivin Paliath John RobertsonJana Shakarian Amanda Thart and Paulo ShakarianDarknet and deepnet mining for proactive cybersecuritythreat intelligence In 2016 IEEE ISI pages 7ndash12 2016

[27] Shirui Pan Ruiqi Hu Guodong Long Jing Jiang LinaYao and Chengqi Zhang Adversarially regularizedgraph autoencoder for graph embedding pages 2609ndash2615 2018

[28] P Pawlinski P Jaroszewski P Kijewski L SiewierskiP Jacewicz P Zielony and R Zuber Actionable infor-mation for security incident response European UnionAgency for Network and Information Security Herak-lion Greece 2014

USENIX Association 23rd International Symposium on Research in Attacks Intrusions and Defenses 255

[29] Hao Peng Jianxin Li Qiran Gong Yangqiu SongYuanxing Ning Kunfeng Lai and Philip S Yu Fine-grained event categorization with heterogeneous graphconvolutional networks In Proceedings of the 28th In-ternational Joint Conference on Artificial Intelligencepages 3238ndash3245 AAAI Press 2019

[30] Sara Qamar Zahid Anwar Mohammad Ashiqur Rah-man Ehab Al-Shaer and Bei-Tseng Chu Data-drivenanalytics for cyber-threat intelligence and informationsharing Computers amp Security 6735ndash58 2017

[31] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In USENIXSecurity 2015

[32] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In 24thUSENIX Security pages 1041ndash1056 2015

[33] Deepak Sharma and Avadhesha Surolia Degree Cen-trality Springer New York 2013

[34] Saurabh Singh Pradip Kumar Sharma Seo Yeon MoonDaesung Moon and Jong Hyuk Park A comprehensivestudy on apt attacks and countermeasures for futurenetworks and communications challenges and solutionsJournal of Supercomputing pages 1ndash32 2016

[35] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks Principles and methodologiesSynthesis Lectures on Data Mining and Knowledge Dis-covery 3(2)1ndash159 2012

[36] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks a structural analysis approachACM SIGKDD 14(2)20ndash28 2013

[37] Yizhou Sun Jiawei Han Xifeng Yan Philip S Yu andTianyi Wu Pathsim Meta path-based top-k similaritysearch in heterogeneous information networks Proceed-ings of the VLDB Endowment 4(11)992ndash1003 2011

[38] Yizhou Sun Brandon Norick Jiawei Han Xifeng YanPhilip S Yu and Xiao Yu Pathselclus Integrating meta-path selection with user-guided object clustering in het-erogeneous information networks ACM Transactionson TKDD 7(3)11 2013

[39] Wiem Tounsi and Helmi Rais A survey on technicalthreat intelligence in the age of sophisticated cyber at-tacks Computers amp Security 72212ndash233 2018

[40] Thomas D Wagner Esther Palomar Khaled Mahbuband Ali E Abdallah Towards an anonymity supportedplatform for shared cyber threat intelligence risks andsecurity of internet and systems pages 175ndash183 2017

[41] Chenguang Wang Yangqiu Song Haoran Li MingZhang and Jiawei Han Text classification with het-erogeneous information network kernels In 13th AAAI2016

[42] Xiao Wang Houye Ji Chuan Shi Bai Wang Peng CuiP Yu and Yanfang Ye Heterogeneous graph attentionnetwork 2019

[43] Jie Yang Shuailong Liang and Yue Zhang Design chal-lenges and misconceptions in neural sequence labelingarXiv Computation and Language 2018

[44] Rex Ying Ruining He Kaifeng Chen Pong Eksombat-chai William L Hamilton and Jure Leskovec Graphconvolutional neural networks for web-scale recom-mender systems In Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Dis-covery amp Data Mining pages 974ndash983 ACM 2018

[45] Jiawei Zhang Xiangnan Kong and Philip S Yu Trans-ferring heterogeneous links across location-based socialnetworks In The 7th ACM international conference onWeb search and data mining pages 303ndash312 2014

[46] Yishuai Zhao Bo Lang and Ming Liu Ontology-basedunified model for heterogeneous threat intelligence inte-gration and sharing In 2017 11th IEEE InternationalConference on Anti-counterfeiting Security and Identi-fication (ASID) pages 11ndash15 2017

256 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

  • Introduction
  • Background
    • Cyber Threat Intelligence
    • Motivation
    • Preliminaries
      • Architecture Overview of HINTI
      • Methodology
        • Multi-granular Attention Based IOC Extraction
        • Cyber Threat Intelligence Modeling
        • Threat Intelligence Computing
          • Experimental Evaluation
            • Dataset and Settings
            • Evaluation of IOC Extraction
              • Application of Threat Intelligence Computing
                • Threat Profiling and Significance Ranking of IOCs
                • Attack Preference Modeling
                • Vulnerability Similarity Analysis
                  • Related Work
                  • Discussion
                  • Conclusion
Page 16: Cyber Threat Intelligence Modeling Based on Heterogeneous ... · 2.1Cyber Threat Intelligence Cyber Threat Intelligence (CTI) extracted from security-related data is structured information

[29] Hao Peng Jianxin Li Qiran Gong Yangqiu SongYuanxing Ning Kunfeng Lai and Philip S Yu Fine-grained event categorization with heterogeneous graphconvolutional networks In Proceedings of the 28th In-ternational Joint Conference on Artificial Intelligencepages 3238ndash3245 AAAI Press 2019

[30] Sara Qamar Zahid Anwar Mohammad Ashiqur Rah-man Ehab Al-Shaer and Bei-Tseng Chu Data-drivenanalytics for cyber-threat intelligence and informationsharing Computers amp Security 6735ndash58 2017

[31] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In USENIXSecurity 2015

[32] Carl Sabottke Octavian Suciu and Tudor Dumitras Vul-nerability disclosure in the age of social media exploit-ing twitter for predicting real-world exploits In 24thUSENIX Security pages 1041ndash1056 2015

[33] Deepak Sharma and Avadhesha Surolia Degree Cen-trality Springer New York 2013

[34] Saurabh Singh Pradip Kumar Sharma Seo Yeon MoonDaesung Moon and Jong Hyuk Park A comprehensivestudy on apt attacks and countermeasures for futurenetworks and communications challenges and solutionsJournal of Supercomputing pages 1ndash32 2016

[35] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks Principles and methodologiesSynthesis Lectures on Data Mining and Knowledge Dis-covery 3(2)1ndash159 2012

[36] Yizhou Sun and Jiawei Han Mining heterogeneousinformation networks a structural analysis approachACM SIGKDD 14(2)20ndash28 2013

[37] Yizhou Sun Jiawei Han Xifeng Yan Philip S Yu andTianyi Wu Pathsim Meta path-based top-k similaritysearch in heterogeneous information networks Proceed-ings of the VLDB Endowment 4(11)992ndash1003 2011

[38] Yizhou Sun Brandon Norick Jiawei Han Xifeng YanPhilip S Yu and Xiao Yu Pathselclus Integrating meta-path selection with user-guided object clustering in het-erogeneous information networks ACM Transactionson TKDD 7(3)11 2013

[39] Wiem Tounsi and Helmi Rais A survey on technicalthreat intelligence in the age of sophisticated cyber at-tacks Computers amp Security 72212ndash233 2018

[40] Thomas D Wagner Esther Palomar Khaled Mahbuband Ali E Abdallah Towards an anonymity supportedplatform for shared cyber threat intelligence risks andsecurity of internet and systems pages 175ndash183 2017

[41] Chenguang Wang Yangqiu Song Haoran Li MingZhang and Jiawei Han Text classification with het-erogeneous information network kernels In 13th AAAI2016

[42] Xiao Wang Houye Ji Chuan Shi Bai Wang Peng CuiP Yu and Yanfang Ye Heterogeneous graph attentionnetwork 2019

[43] Jie Yang Shuailong Liang and Yue Zhang Design chal-lenges and misconceptions in neural sequence labelingarXiv Computation and Language 2018

[44] Rex Ying Ruining He Kaifeng Chen Pong Eksombat-chai William L Hamilton and Jure Leskovec Graphconvolutional neural networks for web-scale recom-mender systems In Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Dis-covery amp Data Mining pages 974ndash983 ACM 2018

[45] Jiawei Zhang Xiangnan Kong and Philip S Yu Trans-ferring heterogeneous links across location-based socialnetworks In The 7th ACM international conference onWeb search and data mining pages 303ndash312 2014

[46] Yishuai Zhao Bo Lang and Ming Liu Ontology-basedunified model for heterogeneous threat intelligence inte-gration and sharing In 2017 11th IEEE InternationalConference on Anti-counterfeiting Security and Identi-fication (ASID) pages 11ndash15 2017

256 23rd International Symposium on Research in Attacks Intrusions and Defenses USENIX Association

  • Introduction
  • Background
    • Cyber Threat Intelligence
    • Motivation
    • Preliminaries
      • Architecture Overview of HINTI
      • Methodology
        • Multi-granular Attention Based IOC Extraction
        • Cyber Threat Intelligence Modeling
        • Threat Intelligence Computing
          • Experimental Evaluation
            • Dataset and Settings
            • Evaluation of IOC Extraction
              • Application of Threat Intelligence Computing
                • Threat Profiling and Significance Ranking of IOCs
                • Attack Preference Modeling
                • Vulnerability Similarity Analysis
                  • Related Work
                  • Discussion
                  • Conclusion