12
At-Risk System Identification via Analysis of Discussions on the Darkweb Eric Nunes, Paulo Shakarian Arizona State University Tempe, AZ 85281, USA Email: {enunes1, shak} @asu.edu Gerardo I. Simari Department of Computer Science and Engineering Universidad Nacional del Sur (UNS), Argentina Institute for C.S. and Eng. (CONICET–UNS) Email: [email protected] Abstract—Threat assessment of systems is critical to organi- zations’ security policy. Identifying systems likely to be at-risk by threat actors can help organizations better defend against likely cyber attacks. Currently, identifying such systems to a large extent is guided by the Common Vulnerability Scoring System (CVSS). Previous research has demonstrated poor correlation between a high CVSS score and at-risk systems. In this paper, we look at hacker discussions on darkweb marketplaces and forums to identify the platforms, vendors, and products likely to be at-risk by hackers. We propose a reasoning system that combines DeLP (Defeasible Logic Programming) and machine learning classifiers to identify systems based on hacker dis- cussions observed on the darkweb. The resulting system is therefore a hybrid between classical knowledge representation and reasoning techniques and machine learning classifiers. We evaluate the system on hacker discussions collected from nearly 300 darkweb forums and marketplaces provided by a threat intelligence company. We improved precision by 15%–57% while maintaining recall over baseline approaches. I. I NTRODUCTION Adequate assessment of threats to systems is a central aspect of a mature security policy—identifying systems that are at-risk can help defend against potential cyber attacks. Currently, organizations rely on the rating system (CVSS score) provided by The National Institute of Science and Technology that maintains a comprehensive list of publicly disclosed vulnerabilities in the National Vulnerability Database (NVD [21]) to identify if their systems are at risk. Case studies have shown poor correlation between the CVSS score and the likelihood that a vulnerability on a system will be targeted by hackers [2]. Hence, organizations are constantly looking for ways to proactively identify if their vulnerable systems are of interest to hackers. Threat intelligence from deepweb and darkweb (D2web) has been leveraged to predict whether or not a vulnerability mention on D2web will be exploited [4], [3]. This method only considers hacker discussions that have a CVE number mentioned in them—a limitation of the approach is therefore that discussions with no vulnerability identifiers (CVE) that are of interest to threat actors are not taken into account. In this paper, we propose to leverage this threat intelligence gathered from D2web markets and forums to identify the systems that might be of interest to threat actors. We identify systems based on the structured naming scheme Common TABLE I: System components and examples Components Explanation and Examples Platform Can be either hardware (h), operating system (o), or application (a) based on what the vul- nerability exploits. Vendor The owner of the vulnerable product. Exam- ples include Google, Microsoft, The Mozilla Foundation, and the University of Oxford. Product The product that is vulnerable. Examples in- clude Internet Explorer, Java Runtime Environ- ment, Adobe Reader, and Windows 2000. Platform Enumeration (CPE [7]). We focus our efforts towards identifying the first three system components of the CPE naming scheme; Table I shows these three components, with examples for each. We design a system that leverages threat intelligence (hacker discussions) and makes a decision regarding at-risk systems, at the same time providing arguments as to why a particular decision was made. It explores multiple competing hypotheses (in this case multiple platforms, vendors, products) based on the discussions for and against a particular at-risk component. The resulting system is a hybrid that combines DeLP with machine learning classifiers. Previously, a similar reasoning system was employed for attributing cyber-attacks to respon- sible threat actors [23] evaluated on a capture-the-flag dataset. Specific contributions of this paper include: We frame identifying at-risk systems as a multi-label clas- sification problem, and apply several machine learning approaches to compare their performance. We find that large number of possible label choices for vendors and products with less representation in training account for the majority of the misclassified samples. To address misclassification, we propose a hybrid rea- soning framework that combines machine learning tech- niques with defeasible argumentation to reduce the set of possible labels for each system component. The reasoning framework can provide arguments supporting the deci- sions, indicating why a particular system was identified over others; this is an important aspect, supporting a security analyst in better understanding the result. 978-1-5386-4922-0/18/$31.00 2018 IEEE

At-Risk System Identification via Analysis of …At-Risk System Identification via Analysis of Discussions on the Darkweb Eric Nunes, Paulo Shakarian Arizona State University Tempe,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: At-Risk System Identification via Analysis of …At-Risk System Identification via Analysis of Discussions on the Darkweb Eric Nunes, Paulo Shakarian Arizona State University Tempe,

At-Risk System Identification viaAnalysis of Discussions on the DarkwebEric Nunes, Paulo Shakarian

Arizona State UniversityTempe, AZ 85281, USA

Email: enunes1, shak @asu.edu

Gerardo I. SimariDepartment of Computer Science and EngineeringUniversidad Nacional del Sur (UNS), Argentina

Institute for C.S. and Eng. (CONICET–UNS)Email: [email protected]

Abstract—Threat assessment of systems is critical to organi-zations’ security policy. Identifying systems likely to be at-riskby threat actors can help organizations better defend againstlikely cyber attacks. Currently, identifying such systems to a largeextent is guided by the Common Vulnerability Scoring System(CVSS). Previous research has demonstrated poor correlationbetween a high CVSS score and at-risk systems. In this paper,we look at hacker discussions on darkweb marketplaces andforums to identify the platforms, vendors, and products likelyto be at-risk by hackers. We propose a reasoning system thatcombines DeLP (Defeasible Logic Programming) and machinelearning classifiers to identify systems based on hacker dis-cussions observed on the darkweb. The resulting system istherefore a hybrid between classical knowledge representationand reasoning techniques and machine learning classifiers. Weevaluate the system on hacker discussions collected from nearly300 darkweb forums and marketplaces provided by a threatintelligence company. We improved precision by 15%–57% whilemaintaining recall over baseline approaches.

I. INTRODUCTION

Adequate assessment of threats to systems is a centralaspect of a mature security policy—identifying systems thatare at-risk can help defend against potential cyber attacks.Currently, organizations rely on the rating system (CVSSscore) provided by The National Institute of Science andTechnology that maintains a comprehensive list of publiclydisclosed vulnerabilities in the National Vulnerability Database(NVD [21]) to identify if their systems are at risk. Case studieshave shown poor correlation between the CVSS score and thelikelihood that a vulnerability on a system will be targeted byhackers [2]. Hence, organizations are constantly looking forways to proactively identify if their vulnerable systems are ofinterest to hackers.

Threat intelligence from deepweb and darkweb (D2web)has been leveraged to predict whether or not a vulnerabilitymention on D2web will be exploited [4], [3]. This methodonly considers hacker discussions that have a CVE numbermentioned in them—a limitation of the approach is thereforethat discussions with no vulnerability identifiers (CVE) thatare of interest to threat actors are not taken into account.In this paper, we propose to leverage this threat intelligencegathered from D2web markets and forums to identify thesystems that might be of interest to threat actors. We identifysystems based on the structured naming scheme Common

TABLE I: System components and examples

Components Explanation and Examples

Platform Can be either hardware (h), operating system(o), or application (a) based on what the vul-nerability exploits.

Vendor The owner of the vulnerable product. Exam-ples include Google, Microsoft, The MozillaFoundation, and the University of Oxford.

Product The product that is vulnerable. Examples in-clude Internet Explorer, Java Runtime Environ-ment, Adobe Reader, and Windows 2000.

Platform Enumeration (CPE [7]). We focus our efforts towardsidentifying the first three system components of the CPEnaming scheme; Table I shows these three components, withexamples for each.

We design a system that leverages threat intelligence (hackerdiscussions) and makes a decision regarding at-risk systems,at the same time providing arguments as to why a particulardecision was made. It explores multiple competing hypotheses(in this case multiple platforms, vendors, products) based onthe discussions for and against a particular at-risk component.The resulting system is a hybrid that combines DeLP withmachine learning classifiers. Previously, a similar reasoningsystem was employed for attributing cyber-attacks to respon-sible threat actors [23] evaluated on a capture-the-flag dataset.Specific contributions of this paper include:

• We frame identifying at-risk systems as a multi-label clas-sification problem, and apply several machine learningapproaches to compare their performance. We find thatlarge number of possible label choices for vendors andproducts with less representation in training account forthe majority of the misclassified samples.

• To address misclassification, we propose a hybrid rea-soning framework that combines machine learning tech-niques with defeasible argumentation to reduce the set ofpossible labels for each system component. The reasoningframework can provide arguments supporting the deci-sions, indicating why a particular system was identifiedover others; this is an important aspect, supporting asecurity analyst in better understanding the result.978-1-5386-4922-0/18/$31.00 2018 IEEE

Page 2: At-Risk System Identification via Analysis of …At-Risk System Identification via Analysis of Discussions on the Darkweb Eric Nunes, Paulo Shakarian Arizona State University Tempe,

• We report on experiments showing that the reduced setof labels used in conjunction with the classifiers leads tosignificant improvement in precision (15%-57%) whilemaintaining comparable recall.

The rest of the paper is organized as follows. In Section IIwe briefly discuss some terminologies used throughout thework. A system overview is presented in Section III, and thenwe discuss the dataset and provide an analysis in Section IV.This is followed by the argumentation model based on [12] inSection V; then, the experimental setup and results (along withthe DeLP programs for each system component) are discussedin Section VI. This is followed by a discussion on the resultsand related work in Section VII and Section VIII, respectively.Finally, conclusions are discussed in Section IX.

II. BACKGROUND

A. Darkweb (D2web) websites

Darkweb refers to the portion of the internet that is notindexed by search engines and hence cannot be accessedby standard browsers. Specialized browsers like “The OnionRouter” (Tor)1 are required to access these websites. Widelyused for underground communication, Tor is free softwarededicated to protect the privacy of its users by obscuring trafficanalysis [10]. The network traffic in Tor is guided through anumber of volunteer-operated servers (also called “nodes”).Each node of the network encrypts the information it blindlypasses on neither registering where the traffic came from norwhere it is headed [10], disallowing any tracking. We retrieveinformation from both marketplaces, where users advertise tosell information regarding vulnerabilities or exploits targetingthe vulnerabilities, and forums that provide discussions ondiscovered vulnerabilities among others.

Markets: Users advertise and sell their products and services(referred to as items) on marketplaces. D2web marketplacesprovide a new avenue to gather information about the cyberthreat landscape, in particular exploits targeting vulnerabilitiesor hacking services provided by vendors at a particular price.These marketplaces also sell goods and services relating todrugs, pornography, weapons, and software services—theseneed to be filtered out for our application.

Forums: These are user-oriented platforms where like-mindedindividuals have discussions on topics of interest, regardlessof their geophysical location. Administrators set up D2webforums with communication safety for their members in mind.While structure and organization of D2web-hosted forumsmight be very similar to more familiar web-forums, the topicsand concerns of the users vary distinctly. Forums addressingmalicious hackers feature discussions on programming, hack-ing, and cyber-security with newly discovered vulnerabilitiesas well as zero-days (vulnerabilities not publicly disclosedyet). Threads are dedicated to security concerns like privacyand online-safety—such topics plug back into and determinethe structures and usage of the platforms.

1See the Tor Project’s official website (https://www.torproject.org/)

B. Vulnerability related terms

Vulnerability is a flaw in a system (software/hardware) thatmakes the system vulnerable to attacks compromising theconfidentiality, integrity or availability of the system to causeharm [24].CVE: Common vulnerability enumeration (CVE) is a uniqueidentifier assigned to a system vulnerability reported toNIST [8]. NIST maintains a database of all the vulnerabili-ties publicly available in the National Vulnerability Database(NVD [21]). Predicting exploitability of a CVE is an importantproblem and recent work leveraging darkweb data has showngood performance in achieving that goal [4], [3]. But thesetechniques reply on direct mentions of CVE’s. We Note that avery small portion of hacker discussions in the data from thecommercial provider has direct CVE mentions.CPE: Common platform enumeration (CPE) is a list ofsoftware / hardware products that are vulnerable for a givenCVE. NIST makes this data available for each vulnerability inits database. Identifying at-risk systems in terms of its compo-nents (see Table I) is an important step towards predicting ifthose systems will be targeted by threat actors (in cases wherethe hacker discussion is not associated with a CVE number).For the system components under consideration, there existsa hierarchy starting from the platform to vendor to product.For instance, if we are considering operating systems, thenthere are limited number of vendors that provide it: Microsoft,Apple, Google, etc. If we identify Microsoft as our vendor,then the products are related to the Windows operating system.This hierarchy helps us to narrow down possible choices aswe go down the hierarchy.

III. SYSTEM OVERVIEW

Fig. 1 gives an overview of the reasoning system; it consistsof the following three main modules:• Knowledge Base: Our knowledge base consists of hacker

discussions from darkweb (D2web) forums and mar-ketplaces. This data is maintained and made availablethrough APIs by a commercial darkweb threat intel-ligence provider2. The database is collected from 302websites3. We use the hacker discussions in terms ofposted content (from forums) and item descriptions (frommarkets), the website it is posted on, and the user postingthe discussion as inputs to both the argumentation andmachine learning models. We also input the CPE hierar-chy from NVD to the argumentation model. We discussand provide further analysis of the data in Section IV. Forthe experiment, we sort the dataset by time (depending onwhen the discussion was posted); the first 80% is reservedfor training (knowledge base) and the remaining 20% fortesting. We follow similar time split to compute the CPEheirarchy as well.

• Argumentation Model: This component constructs ar-guments for a given query (at-risk system component)

2Cyber Reconnaissance, Inc. (CRY3CON), https://www.cyr3con.com.3At the time of writing this paper

Page 3: At-Risk System Identification via Analysis of …At-Risk System Identification via Analysis of Discussions on the Darkweb Eric Nunes, Paulo Shakarian Arizona State University Tempe,

D2web API’s(discussions from

forums and markets

NVD(CPE Hierarchy)

Machine Learning Model

Argumentation Modelθ: Facts

ω: Strict Rulesδ: Defeasible Rules

θ:

At-risk SystemVulnerability

Discussion features

Website / userPreference

Reduced set of platforms, vendors, products

Fig. 1: Reasoning System

using elements in the knowledge base. We use a for-malism called DeLP that combines logic programmingwith defeasible argumentation. It is made up of threeconstructs: facts: observations from the knowledge basethat cannot be contradicted; strict rules: logical com-binations of facts that are always true; and defeasiblerules: can be thought of as strict rules but are onlytrue if no contradictory evidence is present. We discussthe argumentation framework with examples for eachof the constructs in Section V. Arguments help reducethe set of possible choices for platforms, vendors andproducts; this reduced set of possible system componentsacts as one of the inputs to the machine learning model.The argumentation model thus constrains the machinelearning model to identify the system from the reducedset of possible platforms, vendors, and products.

• Machine Learning Model: The machine learning modeltakes the knowledge base and query as input, along withthe reduced set of possible system components from theargumentation model, and provides a result identifyingthe system. It is constrained by the argumentation modelto select the components from the reduced platform,vendor and product set, which aids the machine learn-ing model (improving precision) as demonstrated in theresults section of the paper. We use text-based featuresextracted from the discussions (TF-IDF/Doc2Vec) for themachine learning model. Any standard machine learningmodel can be used in this module. We provide a compar-ison of different machine learning models to select thebest one.

IV. DATASET

A. D2web data

We use D2web data supplied by a threat intelligence com-pany. The data is accessed via APIs. The data is comprisedof forum discussions and marketplace items offered for salein D2web. Exploration of D2web discussions in terms oftheir structure, content and behavior of users who post thesediscussions is reported in [30]. The data is collected period-ically to obtain time-based information indicating changes inthe forums and marketplaces. To ensure collection of cyber-security relevant data, machine learning models are employedthat filter the data related to drugs, weapons, and otherirrelevant discussions. Table II shows the characteristics forthe websites, posts/items, and users. The data is comprisedfrom websites with different languages. A single website mighthave discussions in different languages. Fig. 2 shows thepercentage of total websites from the D2web for the top tenlanguages used to post discussions Majority of the websiteshave discussions in English (73%), with other languageshaving an even distribution. The commercial data collectionplatform automatically identifies the language and translates itto English using the Google Translate API [14].

Ground Truth. In order to evaluate the performance of thereasoning framework, we need ground truth associated withthe hacker discussions. To obtain ground truth we considerdiscussions from forums and marketplaces that mention a CVEnumber. From the CVE number we can look up the vulnerablesystems using the NVD; we note that for both training andtesting we remove the CVE number while computing features.Table II shows the characteristics for the websites, posts/items,and users that mention a CVE number. The hacker discussionwith CVE mentions belong to 135 websites posted by 3361users. On analyzing the CVE mentions most of the older

Page 4: At-Risk System Identification via Analysis of …At-Risk System Identification via Analysis of Discussions on the Darkweb Eric Nunes, Paulo Shakarian Arizona State University Tempe,

0

20

40

60

80

% o

f to

tal w

ebsi

tes

Language

Fig. 2: Percentage of total websites belonging to the top tenlanguages in the D2web data.

TABLE II: Characteristics of D2web data

Number of D2web websites 302

Number of unique users 635,163

Number of unique posts / items 6,277,638

Number of D2web websites (CVE mentions) 135

Number of unique users (CVE mentions) 3,361

Number of unique posts / items (CVE mentions) 25,145

vulnerabilities target products that are no longer in use. Forthat reason in our experiments we consider CVE discussionsposted after 2013 (starting 01/01/2014). These discussionmake up around 70% of the total CVE discussions.

CPE Hierarchy. We compute the hierarchy for all the vulnera-ble systems from all the vulnerabilities disclosed in NVD [21],and maintain it as a dictionary to build arguments on top ofit. Fig. 3 shows a subset of the built hierarchy with the threesystem components (platform, vendor and product).

Website/User preference. We compute and maintain a list ofsystem components discussed for each website and user. Thislets us know if a particular website is preferred by hackersto discuss specific at-risk systems. The user list gives us thepreference of the user regarding what at-risk systems are ofinterest to him/her.

Overall in our dataset, for platforms most discussions posea threat to operating systems (57%), following by applica-tions (43%) and hardware makes up a small fraction of thediscussions (3%). There are discussions that pose a risk tomultiple platforms i.e. operating systems and application or infew instances all three. For vendors, the top five at-risk basedon CVE mentions in the hacker discussions: Microsoft (24%),Linux (9%), Apple (6%), Oracle (5%), Adobe (5%). Similar toplatforms discussions can pose a risk to multiple vendors. Forproducts the distribution is more even since a single vendorcan have multiple products. Even though Microsoft dominatesthe vendor discussion, it also has the most number of productsthat are at risk. The top five at-risk products based on CVEmentions in the hacker discussions: Windows server (5%),

Operating System (o)

Mac OSXWindows Server

Windows 10

AppleMicrosoft

Product

Vendor

Platform

Fig. 3: Subset of CPE Hierarchy

Windows 8.1 (4%), Linux kernel (3.8%), Mac OSX (2.3%),Flash player (1.9%).

V. ARGUMENTATION MODEL

Our approach relies on a model of the world where wecan analyze competing hypotheses. Such a model allows forcontradictory information so it can handle inconsistency in thedata similar to the one employed for attributing cyber-attacksto responsible threat actors [31], [23].

Before describing the argumentation model in detail, weintroduce some necessary notation. Variables and constantsymbols represent items such as the platform/vendor/productat-risk by the discussion and post/webID/userID represent thehacker discussion, where it was posted and who posted it re-spectively (we note that for privacy concerns the webID/userIDis represented as an integer in the data provided by the APIs—the names are not disclosed). We denote the set of all variablesymbols with V and the set of all constants with C. For ourmodel we require six subsets of C:

• Cpost denoting the hacker discussion,• Cweb , denoting the websites (both forums and market-

places) where the hacker discussion was posted,• Cuser , denoting the users who posts hacker discussions,

and• Cplatform , Cvendor , Cproduct denoting the three compo-

nents at-risk by the discussion (see Table I).

We use symbols in all capital letters to denote variables. Inthe running example, we use a subset of the D2web datasetcollected by the threat intelligence company.

Example 1. The following system and post/web/user informa-tion will be used in the running example:

Cpost = post1, post2, ..., postnCweb = webID1,webID2, ...,webIDnCuser = userID1, userID2, ..., userIDnCplatform = h, o, aCvendor = microsoft, google, the mozilla foundationCproduct = internet explorer, windows 10, adobe reader

Page 5: At-Risk System Identification via Analysis of …At-Risk System Identification via Analysis of Discussions on the Darkweb Eric Nunes, Paulo Shakarian Arizona State University Tempe,

TABLE III: Example predicates and explanation

Predicate Explanation

posted(post1, webID1) post1 was posted on the web-site webID1.

at risk(D,V) Post D discussed vendor V be-ing at-risk.

user preference(userID1,microsoft)

userID1 prefers to post discussionsregarding Microsoft systems at-risk.

previously seen(webID1, adobe flash)

At-risk discussions regarding AdobeFlash are discussed in webID1.

parent(microsoft, safari) Vendor Microsoft is a parent ofproduct Safari.

The language also contains a set of predicate symbols thathave constants or variables as arguments, and denote eventsthat can be either true or false. We denote the set of predicateswith P; examples of predicates are shown in Table III. Forinstance, user preference(userID1,microsoft) will either betrue or false, and denotes the event where userID1 prefersto post discussions regarding microsoft systems at-risk.

A ground atom is composed by a predicate symbol anda tuple of constants, one for each argument—hence, groundatoms have no variables. The set of all ground atoms is denotedwith G. A ground literal L is either a ground atom or a negatedground atom. An example of a ground atom for our runningexample is posted(post1, webID1). In the following, we willuse G′ to denote a subset of G.

In order to be able to deal with conflicting information andoffer explainable results, we choose a structured argumentationframework [26] for our model; our approach works by creatingarguments (in the form of a set of rules and facts) that competewith each other to identify at-risk system given a hackerdiscussion on D2web. In this case, arguments are defeatedbased on the evaluation of contradicting information in otherarguments. This procedure is commonly known as a dialec-tical process since it follows the same structure as dialoguesbetween humans—as such, arguments that are undefeated (orwarranted, in DeLP) prevail. Structuring the analysis in thismanner also allows us to leverage the resulting structure, sincethe set of all prevailing arguments give a clear map of howthe conclusion is supported by the available data.

The clear benefit of the transparency afforded by such aprocess is that it lets a (human) security analyst not onlyadd new arguments based on new evidence, but also eliminateinformation identified as incorrect (perhaps because it is outof date, or because it comes from a source newly identified asuntrustworthy) and fine-tune the model for better performance.Since the argumentation model can deal with inconsistentinformation, it draws a natural analogy to the way humanssettle disputes when there is disagreement. Having a clearexplanation of why one argument is chosen over others is adesirable characteristic for both the analyst and for organiza-tions to make decisions and policy changes. We now briefly

discuss some preliminaries on DeLP.

Defeasible Logic Programming: DeLP is a formalism thatcombines logic programming with defeasible argumentation;we refer the interested reader to [12] for a fully detailedpresentation of the system.

In summary, the formalism is made up of several constructs,namely facts, strict rules, and defeasible rules. Facts representstatements obtained from evidence, and are therefore alwaysconsidered to be true; similarly, strict rules are logical combi-nations of elements (facts or other inferences) that can alwaysbe performed. On the contrary, defeasible rules can be thoughtof as strict rules that may be true in some situations, but couldbe false if certain contradictory evidence is presented. Thesethree constructs are used to build arguments, and DeLP pro-grams are simply sets of facts, strict rules and defeasible rules.We adopt the usual notation for DeLP programs, denoting theprogram (or knowledge base) with Π = (Θ,Ω,∆), where Θis the set of facts, Ω is the set of strict rules, and ∆ is theset of defeasible rules. Examples of the three constructs areprovided with respect to the dataset in Fig. 4. We now describethe notation used to denote these constructs.

Facts (Θ) are ground literals that represent atomic informationor its (strong) negation (¬).

Strict Rules (Ω) represent cause and effect information; theyare of the form L0 ← L1, ...Ln, where L0 is a literal andLii>0 is a set of literals.

Defeasible Rules (∆) are weaker versions of strict rules, andare of the form L0 -≺ L1, ...., Ln, where L0, is the literal andLii>0 is a set of literals.

When a hacker discussion happens on D2web, the modelcan be used to derive arguments to determine the at-risk system(in terms of platform, vendor, and product). Derivation followsthe same mechanism as classical logic programming [16];the main difference is that DeLP incorporates defeasibleargumentation, which decides which arguments are warranted,which arguments are defeated, and which arguments shouldbe considered to be blocked—the latter are arguments that areinvolved in a conflict for which a winner cannot be determined.

Fig. 4 shows a ground argumentation framework demon-strating constructs derived from our D2web data. For instance,θ1 indicates the fact that a hacker discussion post1 was postedon the D2web website webID1, and θ5 indicates that useruserID1 prefers to post discussions regarding apple products.For the strict rules, ω1 says that for a given post post1 posinga threat to operating system (o), the vendor sandisk cannotbe at risk if the parent of sandisk is not operating system(o)4. Defeasible rules can be read similarly; δ2 indicates thatif post1 poses a threat to the vendor apple, the product safarican be at-risk if apple is the parent of safari. By replacingthe constants with variables in the predicates we can derive anon-ground argumentation framework that can be applied ingeneral.

4This encodes the CPE hierarchical structure.

Page 6: At-Risk System Identification via Analysis of …At-Risk System Identification via Analysis of Discussions on the Darkweb Eric Nunes, Paulo Shakarian Arizona State University Tempe,

Θ : θ1 = posted(post1,webID1)θ2 = posted(post1, userID1)θ3 = parent(o,micorsoft)θ4 = parent(apple, safari)θ5 = user preference(userID1, apple)θ6 = previously seen(webID1, o)

Ω : ω1 = ¬ at risk(post1, sandisk)←at risk(post1, o),¬parent(o, sandisk)

ω2 = ¬ at risk(post1, internet explorer)←at risk(post1, apple),¬parent(apple, internet explorer)

∆ : δ1 = at risk(post1,microsoft) -≺at risk(post1, o),parent(o,microsoft)

δ2 = at risk(post1, safari) -≺at risk(post1, apple),parent(apple, safari)

δ3 = at risk(post1, apple) -≺user preference(userID1, apple)

δ4 = at risk(post1, o) -≺previously seen(webID1, o)

Fig. 4: A ground argumentation framework.

〈A1, at risk(post1,microsoft) 〉 A1 = δ1, δ4, θ3〈A2,at risk(post1, safari) 〉 A2 = δ2, δ3, θ4〈A3, at risk(post1, apple)〉 A3 = δ3, θ5〈A4, at risk(post1, o)〉 A4 = δ4, θ6

Fig. 5: Example ground arguments from Figure 4.

Definition 1. (Argument) An argument for a literal L is apair 〈A, L〉, where A ⊆ Π provides a minimal proof for Lmeeting the requirements: (1) L is defeasibly derived from A5,(2) Θ ∪ Ω ∪ ∆ is not contradictory, and (3) A is a minimalsubset of ∆ satisfying 1 and 2, denoted 〈A, L〉.

Literal L is called the conclusion supported by the ar-gument, and A is the support. An argument 〈B, L〉 is asubargument of 〈A, L′〉 iff B ⊆ A. The following examplesdiscuss arguments for our scenario.

Example 2. Fig. 5 shows example arguments based on theKB from Fig. 4; here,

⟨A3, at risk(post1, apple)

⟩is a subar-

gument of⟨A2, at risk(post1, safari)

⟩.

For a given argument there may be counter-arguments thatcontradict it. A proper defeater of an argument 〈A,L〉 is acounter-argument that—by some criterion—is considered tobe better than 〈A, L〉; if the two are incomparable accordingto this criterion, the counterargument is said to be a blockingdefeater. The default criterion used in DeLP for argumentcomparison is generalized specificity [32], but any domain-specific criterion (or set of criteria) can be devised anddeployed.

5This means that there exists a derivation consisting of a sequence of rulesthat ends in L—that possibly includes defeasible rules.

A sequence of arguments is called an argumentation line.There can be more than one defeater argument, which leads toa tree structure that is built from the set of all argumentationlines rooted in the initial argument. In this dialectical tree,every child can defeat its parent (except for the root), and theleaves represent unchallenged arguments; this creates a mapof all possible argumentation lines that can be used to decidewhether or not an argument is defeated. Arguments that eitherhave no attackers or all attackers have been defeated are saidto be warranted.

Given a literal L and an argument⟨A, L

⟩, in order to

decide whether or not a literal L is warranted, every nodein the dialectical tree T (〈A, L〉) is recursively marked as “D”(defeated) or “U” (undefeated), obtaining a marked dialecticaltree T ∗(〈A, L〉) where:

• All leaves in T ∗(〈A, L〉) are marked as “U”s, and• Let 〈B, q〉 be an inner node of T ∗(〈A, L〉). Then, 〈B, q〉

will be marked as “U” iff every child of 〈B, q〉 is markedas “D”. Node 〈B, q〉 will be marked as “D” iff it has atleast one child marked as “U”.

Given argument 〈A, L〉 over Π, if the root of T ∗(〈A, L〉)is marked “U”, then T ∗(〈A, h〉) warrants L and that L iswarranted from Π. It is interesting to note that warrantedarguments correspond to those in the grounded extension of aDung abstract argumentation system [11].

An implemented DeLP system therefore takes as inputs aset of facts, strict rules, and defeasible rules, as well as aquery literal. Note that while the set of facts and strict rulesmust be consistent (non-contradictory), the set of defeasiblerules can be inconsistent—the presence of such inconsistencyis the root of “interesting” cases. We engineer our at-risksystem framework as a set of defeasible and strict rules whosestructure was created manually, but are dependent on valueslearned from a historical corpus of D2web data. Then, for agiven post discussing a vulnerability, we instantiate a set offacts for that situation; this information is then provided asinput into the DeLP system, which uses heuristics to generateall arguments for and against every possible components of thesystem (platforms, vendors, products) for the post discussion.Dialectical trees based on these arguments are analyzed, and adecision is made regarding which components are warranted.This results in a reduced set of potential choices, which wethen use as input into a classifier to obtain the at-risk system.The following section discusses these steps in full detail.

VI. EXPERIMENTS

We frame the identification of at-risk systems as a multi-label classification problem for each of the system component(platform, vendor, and product)—the basic step involves ex-tracting textual features from the discussions to be used asinput to the machine learning models. We now describe thedata pre-processing steps and the standard machine learningapproaches, along with the metrics used for evaluating themodels.

Page 7: At-Risk System Identification via Analysis of …At-Risk System Identification via Analysis of Discussions on the Darkweb Eric Nunes, Paulo Shakarian Arizona State University Tempe,

A. Data Representation

As mentioned above, we use text-based features to representthe hacker discussions on the D2web, which are then used asinput to the machine learning models. Some of the discussionsare in foreign languages (cf. Fig. 2). The commercial datacollection platform automatically identifies the language andtranslates it to English using the Google Translate API [14].The following pre-processing steps are taken to address differ-ent challenges. We employ two feature engineering techniquesnamely TF-IDF and Doc2Vec.

Text Cleaning. We remove all non-alphanumeric charactersfrom hacker discussions. This removes any special charactersthat do not contribute towards making the decision.

Misspellings and Word Variations. Misspellings and wordvariations are frequently observed in the discussions on theD2web, leading to separate features in the feature vector ifa standard bag-of-words (BOW) approach is used. In BOW,we create a dictionary of all the word occurrences in thetraining set; then, for a particular discussion, the feature vectoris created by looking up which words have occurred and theircount in the discussion. Misspellings and word variations willthus be represented as different words; to address this, we usecharacter n-gram features. As an example, consider the word“execute”—if we were using tri-gram character features, theword “execute” would yield the set of features:

“exe”,“xec”,“ecu”,“cut”,“ute”.

The benefit of this technique is that the variations or mis-spellings of the word, such as “execution”, “executable”, or“”exxecute”, will all have common features. We found thatusing character n-grams in the range 3–7 worked best in ourexperiments.

TF-IDF Features. We vectorize the n-gram features using theterm frequency-inverse document frequency (TF-IDF) model,which creates a vocabulary of all the n-grams in the discus-sion. In TF-IDF, the importance of an n-gram feature increaseswith the number of times it occurs, but is normalized by thetotal number of n-grams in the description. This eliminatescommon words from being important features. We considerthe top 1,000 most frequent features (using more than 1,000features did not improve the performance, but rather onlyadded to the training and testing time).

Doc2Vec Features. Doc2Vec is a feature engineering tech-nique to generate document vector (in our case documentrefers to a discussion), which acts as input to the classifierto identify at-risk systems. In Doc2Vec, first, a vector repre-sentation of each word in the document in computed by takinginto account the words around it (to maintain context) and thenthese word vectors are averaged to get a representation of thedocument. We implement Doc2Vec using the gensim libraryin Python6. It was been previously used to classify tweets [33]as well as product descriptions [15].

6https://radimrehurek.com/gensim/models/doc2vec.html

B. Supervised Learning Approaches

We conducted our experiments using the following standardmachine learning approaches implemented using a Pythonmachine learning library7.

Support Vector Machine (SVM). Support vector machines(SVM) work by finding a separating margin that maximizesthe geometric distance between classes (in our case, differentplatforms, vendors, and products). Given the geometric inter-pretation of the data, the separating margin is referred to as ahyperplane.

Random Forest (RF). Ensemble methods are popular classifi-cation tools. They are based on the idea of generating multiplepredictors used in combination to classify new unseen samples.We use a random forest that combines bagging for each treewith random feature selection at each node to split the data,thus generating multiple decision tree classifiers. Each decisiontree gives its own opinion on test sample classification, whichare then merged to make a final decision.

Naive Bayes Classifier (NB). NB is a probabilistic classifierthat uses Bayes’ theorem under the assumption of indepen-dent features. During training, we compute the conditionalprobabilities of a sample of a given class having a certainfeature. We also compute the prior probabilities for each class,i.e., the fraction of the training data belonging to each class.Since Naive Bayes assumes that the features are statisticallyindependent, the likelihood for a sample S represented with aset of features a associated with a class c is given by:

Pr(c|S) = Pr(c)×d∏

i=1

Pr(ai|c).

Decision Tree (DT). This is a hierarchical recursive partition-ing algorithm. We build the decision tree by finding the bestsplit feature, i.e., the feature that maximizes the informationgain at each split of a node.

Logistic Regression (LOG-REG). Logistic regression clas-sifies samples by computing the odds ratio, which gives thestrength of association between the features and the class.

C. Evaluation Metrics

In our experiments, we evaluate performance based on threemetrics: precision, recall, and F1 measure. For a given hackerdiscussion, precision is the fraction of labels (platforms, ven-dors, or products) that the model associated with the discussionthat were actual labels in the ground truth. Recall, on the otherhand, is the fraction of ground truth labels identified by themodel. The F1 measure is the harmonic mean of precision andrecall. In our results, we report the average precision, recall,and F1 for all the test discussions.

7http://scikit-learn.org/stable/

Page 8: At-Risk System Identification via Analysis of …At-Risk System Identification via Analysis of Discussions on the Darkweb Eric Nunes, Paulo Shakarian Arizona State University Tempe,

TABLE IV: Average Precision, Recall, and F1 measure forNB, LOG-REG, DT, RF and SVM to identify at-risk systems.

Component Model Precision Recall F1 measure

Platform

NB 0.68 0.65 0.66

LOG-REG 0.72 0.76 0.74

DT 0.66 0.70 0.68

RF 0.70 0.75 0.72

SVM 0.72 0.78 0.76

Vendor

NB 0.37 0.34 0.36

LOG-REG 0.28 0.25 0.27

DT 0.39 0.43 0.41

RF 0.40 0.43 0.41

SVM 0.40 0.48 0.44

Product

NB 0.19 0.14 0.16

LOG-REG 0.20 0.13 0.16

DT 0.22 0.15 0.18

RF 0.22 0.25 0.24

SVM 0.26 0.24 0.25

D. Baseline Model (BM)

For the baseline model, we only leverage the machine learn-ing technique to identify the at-risk systems. We create trainingand testing sets by sorting the discussions by posted time onthe website (to avoid temporal intermixing). We reserve thefirst 80% of the samples for training and the rest (20%) fortesting. We employed both TF-IDF and Doc2Vec as featureengineering techniques. On conducting the experiments, it wasobserved that TF-IDF performed better than Doc2Vec in all theexperiments. Hence we only report the results using TF-IDFfeatures.

Results. Table IV shows the average performance of themachine learning technique for each component of the at-risksystem. For platform identification, SVM performs the bestwith the following averages:

• precision: 0.72,• recall: 0.78, and• F1 measure: 0.76.

LOG-REG had similar precision, but lower recall. Similarly,for vendor identification, SVM performs the best with aver-ages:

• precision: 0.40,• recall: 0.48, and• F1 measure: 0.44,

with RF having similar precision. For platform identification,SVM had the best performance:

• precision: 0.28,• recall: 0.24 (comparable to RF), and• F1 measure: 0.25.

Θ : θ1 = posted(D,W)θ2 = posted(D,U)

Fig. 6: Facts defined for each test discussion.

For s ∈ Sw:∆ : δ1 = at risk(D, s) -≺ previously seen(W, s).

For s ∈ Su:δ2 = at risk(D, s) -≺ user preference(U , s).

Fig. 7: Defeasible rules for platform identification.

Since SVM performs consistently better for all three classifi-cation problems, moving forward we use SVM as our machinelearning component in the reasoning framework (cf. Fig. 1).

E. Reasoning Framework (RFrame)

As we go down the CPE hierarchy, the number of possiblelabels for vendors and products increases largely as the numberof discussions representing each label decreases, thus makinglearning difficult and decreasing performance. We address thisissue by proposing a set of strict and defeasible rules forplatform, vendor, and product identification. We note that theserules arise from the discussion that is being evaluated and donot require parameter learning.

We use the notation described in Table V for defining ourconstructs (facts, strict rules, and defeasible rules). We notethat facts cannot have variables, only constants (however, tocompress the program for presentation purposes, we use meta-variables in facts). To begin, we define the facts (see Fig. 6):θ1 states that a hacker discussion D was posted on the D2webwebsiteW (can be either forum or marketplace), and θ2 statesthat the user U posted the discussion. For each level in the CPEhierarchy, we define additional rules discussed as follows.

Platform Model. The first level of system identification isidentifying the platform that the hacker discussion is a threatto. We compute previously discussed platforms on D2webwebsites under consideration. Similarly, which platform theuser under consideration prefers (based on their previouspostings) is also computed. This shows preferred platformdiscussions on websites and by users, which can aid themachine learning model in reducing the number of platforms itcan identify from. The DeLP components that model platformidentification are shown in Fig. 7. For the defeasible rules,δ1 indicates that all the platforms Sw previously seen in theD2web websiteW where the current discussion D is observedare likely at-risk, δ2 indicates that all the platforms Su fromuser U’s previous postings are also likely at-risk.

Vendor Model. The second level is identifying the at-risk ven-dor. For this case, we use the platform result from the previousmodel, taking that as a DeLP fact. The DeLP components thatmodel vendor identification are shown in Fig. 8. Here, thefact θ1 indicates the platform identified for the discussion—note that multiple platforms may be identified based on the

Page 9: At-Risk System Identification via Analysis of …At-Risk System Identification via Analysis of Discussions on the Darkweb Eric Nunes, Paulo Shakarian Arizona State University Tempe,

TABLE V: Notation and Explanations

Notation Explanation

D The hacker discussion (posted on the website) under consideration.

W Website (marketplace or forum) where the hacker discussion was posted.

Sw, Vw and Pw The set of platforms, vendors and products at-risk by the hacker discussions previously seen inW under considerationrespectively.

U User posting the hacker discussion.

Su, Vu and Pu The set of platforms, vendors and products at-risk by the hacker discussions previously posted by user U underconsideration respectively.

Sp, Vp and Pp The set of platforms, vendors and products identified by the machine learning model at each level in the hierarchyfor hacker discussions under consideration respectively.

si, vi and pi Each element of the set Sp, Vp and Pp representing a single platform, vendor or product respectively.

For s ∈ Sp:Θ : θ1 = at risk(D, s)

For s ∈ Sp:Ω : ω1 = ¬ at risk(D, vi)← at risk(D, s),

¬parent(s, vi)

For v ∈ Vw:∆ : δ1 = at risk(D, v) -≺ previously seen(W, v).

For v ∈ Vu:δ2 = at risk(D, v) -≺ user preference(U , v).

For s ∈ Sp:δ3 = at risk(D, vi)← at risk(D, s),

parent(s, vi)

Fig. 8: Defeasible rules for vendor identification.

discussion. The strict rule ω1 states that for a given postD posing a threat to platform s, the vendor vi cannot beat-risk if the parent of vi is not the identified platform s.This rule is based on the CPE hierarchy obtained from NVD.For the defeasible rules, δ1 indicates that all the vendors Vw

previously seen in the D2web website W where the currenthacker discussion D is observed are likely at-risk, δ2 indicatesthat all the vendors Vu from user U’s previous postings arealso likely at-risk, and δ3 states that for a given post D posinga threat to platform s, all the vendors whose parent is theidentified platform are likely at-risk. This rule is also basedon the CPE hierarchy from NVD.

Product Model. The third level is identifying the at-riskproduct. For this case, we use the vendor result from theprevious model; as before, we use that as a DeLP fact. TheDeLP components that model product identification are shownin Fig. 9. Here, the fact θ1 indicates the vendor identifiedfor the discussion—again, multiple vendors may be identifiedbased on the discussion. The strict rule ω1 states that for agiven post D posing a threat to vendor v, the product pi cannotbe at-risk if the parent of pi is not the identified vendor v(again, based on the CPE hierarchy). For the defeasible rules,δ1 indicates that all the products Pw previously seen in theD2web website W where the current hacker discussion D is

For v ∈ Vp:Θ : θ1 = at risk(D, v)

For v ∈ Vp:Ω : ω1 = ¬ at risk(D, pi)← at risk(D, v),

¬parent(v, pi)

For p ∈ Pw:∆ : δ1 = at risk(D, p) -≺ previously seen(W, p).

For p ∈ Pu:δ2 = at risk(D, p) -≺ user preference(U , p).

For v ∈ Vp:δ3 = at risk(D, pi)← at risk(D, v),

parent(v, pi)

Fig. 9: Defeasible rules for product identification.

observed are likely at-risk, δ2 indicates that all the productsPu from user U’s previous postings are also likely at-risk, andδ3 states that for a given post D posing a threat to vendor v,all the products whose parent (in the CPE hierarchy) is theidentified vendor are likely at-risk.

Results. We evaluate the reasoning framework using an ex-perimental setup similar to the one discussed in the baselinemodel. We report the precision, recall, and F1 measure foreach of the system components and compare them with thebest performing baseline model (BM). Table VI shows thecomparison between the two models.

For platform identification, RFrame outperforms BM interms of precision: 0.83 vs. 0.72 (a 15.27% improvement),while maintaining the same recall. Similarly, for vendor andproduct identification there was significant improvement inprecision: 0.56 vs. 0.40 (a 40% improvement) and 0.41 vs.0.26 (a 57.69% improvement), respectively, with comparablerecall with respect to the baseline model. The major reason forthe jump in precision is the reduction of possible labels basedon the arguments introduced that aids the machine learningmodel to make the correct decision.

Page 10: At-Risk System Identification via Analysis of …At-Risk System Identification via Analysis of Discussions on the Darkweb Eric Nunes, Paulo Shakarian Arizona State University Tempe,

TABLE VI: Average Precision, Recall, and F1 measure com-parison between the baseline model (BM) and reasoningframework (RFrame).

Component Model Precision Recall F1 measure

PlatformBM 0.72 0.78 0.76

RFrame 0.83 0.78 0.80

VendorBM 0.40 0.48 0.44

RFrame 0.56 0.44 0.50

ProductBM 0.26 0.24 0.25

RFrame 0.41 0.21 0.30

VII. DISCUSSION

The performance of the reasoning system highlights that ourhybrid framework identifies at-risk systems with higher preci-sion with respect to the approach using only machine learningclassifiers. In our application, we desire a high precision—while maintaining at least comparable recall—in order toprovide high value risk assessment of systems; low precisionis often equated to a less reliable framework. The majority ofmisclassifications are a result of less data representing thosesystems in the training set; for some system components, theinstances can be as low as having only one discussion in thetraining set. This issue becomes more relevant as we go downthe hierarchy with large numbers of vendors and products.In some test instances, for the same platform and vendor,a new product not previously known to be at-risk becomesvulnerable due to a newly disclosed vulnerability. In this case,the reasoning framework is not able to identify the productsince it was not previously observed, and this can contributeto a misclassification.

From a security analyst’s perspective, the reasoning frame-work not only provides a list of possible at-risk systems butalso provides arguments indicating why a particular systemwas identified as being at-risk. This lets the analyst evaluatethe decisions made by the framework and fine-tune it ifnecessary. For cases where a new product (not previouslydiscussed in training) is at-risk, even a partial identificationof the system (in terms of platform and vendor) is of value tothe analyst. Based on the alert provided by the framework,the analyst can manually evaluate the arguments and thediscussions to identify possible products, depending on theplatform and vendor identified by the framework.

VIII. RELATED WORK

Threat assessment of systems is critical to organizations’security policy. Over the years, CVSS [9] has become astandard metric that organizations use to determine if theirsystems are at risk of being targeted by hackers. Unfortunately,case studies have shown poor correlation between the CVSSscore and which system are at-risk [2].

Identifying targeted systems through open source intel-ligence. Open source intelligence has been used previously

to identify and predict vulnerabilities that are likely to beexploited to determine which systems are at risk. [35] haslooked to predict the likelihood that a software has a vul-nerability not yet discovered using the national vulnerabilitydatabase (NVD). They show that NVD has a poor predictioncapability in doing so due to limited amount of informationavailable. On the other hand, [28] looks to predict if a realworld exploit is available based on vulnerabilities disclosedfrom Twitter data. The authors report high accuracies of 90%using a resampled, balanced, and temporal mixed dataset,not reflective of real world scenarios [6]. Identifying threatsto critical infrastructure by analyzing interactions on hackerforums was studied in [17]. Here the authors reply on keywordbased queries to identify such threats from hacker interactions.Tools to automatically identify products offered in cybercriminal markets was proposed in [25]. This technique looks toextract products mentioned in the description of the item thatis being offered, a problem different than what we address –identifying targeted systems not explicitly stated in the forumdiscussions.

More recently, researchers have shown increased intereston gathering threat intelligence from D2web to pro-activelyidentify digital threats and study hacker communities to gatherinsights. Researchers have focused on building infrastructureto gather threat information from markets (regarding goodsand services sold) and forums (discussions regarding exploitsand) [22], [27], studying the different product categoriesoffered in darkweb markets – creating a labeled dataset [18],analyzing hacker forums and carding shops to identify poten-tial threats [5], identify expert hackers to determine their spe-cialties [1], identify key hackers based on posted content, theirnetwork and since when they are active in the forum [19]. Forvulnerability research, studies look to leverage vulnerabilitymentions in the D2web to predict the likelihood of exploitationusing a combination of machine learning and social networktechniques [4], [3]. These techniques rely on the mentions ofCVE numbers to identify likely targeted systems (which is asmall fraction of vulnerabilities [4]), not taking into accountdiscussions where a CVE number is not mentioned. On theother hand, we look to identify the at-risk systems withouthaving a CVE number, which is a different problem from thosetackled in previous work.

Identifying targeted systems through software analysis. An-other way of identifying targeted softwares with vulnerabilitiesdeals with analyzing the software itself in order to determinewhich component of the software is most likely to containa vulnerability. Mapping past vulnerabilities to vulnerablesoftware components was proposed in [20], where the authorsfound that components with function calls and import state-ments are more likely to have a vulnerability. A similar methodwas employed by [29], [34], where text mining was usedto forecast whether a particular software component containsvulnerabilities. Similar text mining techniques for vulnerabilitydiscovery are listed in [13]. The text mining methods createa count dictionary of terms used in the software, which are

Page 11: At-Risk System Identification via Analysis of …At-Risk System Identification via Analysis of Discussions on the Darkweb Eric Nunes, Paulo Shakarian Arizona State University Tempe,

used as features to identify vulnerabilities. These methodssuffer from the issue of not knowing which vulnerabilitiesmight be of interest to hackers. On the other hand, we workwith hacker discussions posing a threat to systems that are ofclearly of interest to hackers since they are discussing themon the D2web websites—vulnerabilities mentioned on D2webregarding systems are more likely to be exploited [4].

IX. CONCLUSION

In this paper, we demonstrated how a reasoning frameworkbased on the DeLP structured argumentation system can beleveraged to improve the performance of identifying at-risksystems based on hacker discussions on the D2web. DeLP pro-grams built on discussions found on forums and marketplacesafford a reduction on the set of possible platforms, vendors,and products that are likely to be at-risk by the hackers par-ticipating in the discussion. This reduction of potential labelsleads to better precision while almost maintaining comparablerecall as compared to the baseline model that only leveragesmachine learning techniques. Knowing discussed systems bythreat actors as possible targets helps organizations achievebetter threat assessment for their systems.

ACKNOWLEDGMENT

Authors of this work were supported by the U.S. Depart-ment of the Navy, Office of Naval Research, grant N00014-15-1-2742 as well as the Arizona State University Global SecurityInitiative (GSI), by CONICET and Universidad Nacional delSur (UNS), Argentina and by the EU H2020 research andinnovation programme under the Marie Sklodowska-Curiegrant agreement 690974 for the project “MIREL”. The authorswould like to thank Cyber Reconnaissance, Inc. for providingthe data.

REFERENCES

[1] A. Abbasi, W. Li, V. Benjamin, S. Hu, and H. Chen. Descriptiveanalytics: Examining expert hackers in web forums. In Intelligenceand Security Informatics Conference (JISIC), 2014 IEEE Joint, pages56–63. IEEE, 2014.

[2] L. Allodi and F. Massacci. Comparing vulnerability severity and exploitsusing case-control studies. ACM Transactions on Information andSystem Security (TISSEC), 17(1):1, 2014.

[3] M. Almukaynizi, A. Grimm, E. Nunes, J. Shakarian, and P. Shakarian.Predicting cyber threats through the dynamics of user connectivity indarkweb and deepweb forums. In ACM Computational Social Science.ACM, 2017.

[4] M. Almukaynizi, E. Nunes, K. Dharaiya, M. Senguttuvan, J. Shakarian,and P. Shakarian. Proactive identification of exploits in the wild throughvulnerability mentions online. In 2017 International Conference onCyber Conflict (CyCon U.S.), pages 82–88, Nov 2017.

[5] V. Benjamin, W. Li, T. Holt, and H. Chen. Exploring threats andvulnerabilities in hacker web: Forums, irc and carding shops. InIntelligence and Security Informatics (ISI), 2015 IEEE InternationalConference on, pages 85–90. IEEE, 2015.

[6] B. L. Bullough, A. K. Yanchenko, C. L. Smith, and J. R. Zipkin.Predicting exploitation of disclosed software vulnerabilities using open-source data. In Proceedings of the 2015 ACM International Workshopon International Workshop on Security and Privacy Analytics. ACM,2017.

[7] CPE. Official common platform enumeration dictionary.https://nvd.nist.gov/cpe.cfm, Last Accessed: Feb 2018.

[8] CVE. Common vulnerabilities and exposures: The standard for informa-tion security vulnerability names. http://cve.mitre.org/, Last Accessed:Feb 2018.

[9] CVSS. Common vulnerability scoring system.https://www.first.org/cvss, Last Accessed: Feb 2018.

[10] R. Dingledine, N. Mathewson, and P. Syverson. Tor: The second-generation onion router. In Proceedings of the 13th Conference onUSENIX Security Symposium - Volume 13, SSYM’04, pages 21–21,2004.

[11] P. M. Dung. On the acceptability of arguments and its fundamental rolein nonmonotonic reasoning, logic programming and n-person games.Artificial intelligence, 77(2):321–357, 1995.

[12] A. J. Garcıa and G. R. Simari. Defeasible logic programming: Anargumentative approach. Theory and practice of logic programming,4(1+ 2):95–138, 2004.

[13] S. M. Ghaffarian and H. R. Shahriari. Software vulnerability analysisand discovery using machine-learning and data-mining techniques: Asurvey. ACM Computing Surveys (CSUR), 50(4):56, 2017.

[14] Google. Google cloud translation api documentation.https://cloud.google.com/translate/docs/, Last Accessed: Feb 2018.

[15] H. Lee and Y. Yoon. Engineering doc2vec for automatic classificationof product descriptions on o2o applications. Electronic CommerceResearch, pages 1–24, 2017.

[16] J. W. Lloyd. Foundations of logic programming. Springer Science &Business Media, 2012.

[17] M. Macdonald, R. Frank, J. Mei, and B. Monk. Identifying digitalthreats in a hacker web forum. In Advances in Social Networks Analysisand Mining (ASONAM), 2015 IEEE/ACM International Conference on,pages 926–933. IEEE, 2015.

[18] E. Marin, A. Diab, and P. Shakarian. Product offerings in malicioushacker markets. In Intelligence and Security Informatics (ISI), 2016IEEE Conference on, pages 187–189. IEEE, 2016.

[19] E. Marin, J. Shakarian, and P. Shakarian. Mining key-hackers ondarkweb forums. In International Conference on Data Intelligence andSecurity (ICDIS), 2018. IEEE, 2018.

[20] S. Neuhaus, T. Zimmermann, C. Holler, and A. Zeller. Predictingvulnerable software components. In Proceedings of the 14th ACMconference on Computer and communications security, pages 529–540.ACM, 2007.

[21] NIST. National vulnerability database. https://nvd.nist.gov/, LastAccessed: Feb 2018.

[22] E. Nunes, A. Diab, A. Gunn, E. Marin, V. Mishra, V. Paliath, J. Robert-son, J. Shakarian, A. Thart, and P. Shakarian. Darknet and deepnetmining for proactive cybersecurity threat intelligence. In Proceeding ofISI 2016, pages 7–12. IEEE, 2016.

[23] E. Nunes, P. Shakarian, G. I. Simari, and A. Ruef. Argumentationmodels for cyber attribution. In Advances in Social Networks Analysisand Mining (ASONAM), 2016 IEEE/ACM International Conference on,pages 837–844. IEEE, 2016.

[24] C. P. Pfleeger and S. L. Pfleeger. Security in computing. Prentice HallProfessional Technical Reference, 2002.

[25] R. S. Portnoff, S. Afroz, G. Durrett, J. K. Kummerfeld, T. Berg-Kirkpatrick, D. McCoy, K. Levchenko, and V. Paxson. Tools forautomated analysis of cybercriminal markets. In Proceedings of the26th International Conference on World Wide Web, pages 657–666.International World Wide Web Conferences Steering Committee, 2017.

[26] I. Rahwan, G. R. Simari, and J. van Benthem. Argumentation in artificialintelligence, volume 47. Springer, 2009.

[27] J. Robertson, A. Diab, E. Marin, E. Nunes, V. Paliath, J. Shakarian, andP. Shakarian. Darkweb Cyber Threat Intelligence Mining. CambridgeUniversity Press, 2017.

[28] C. Sabottke, O. Suciu, and T. Dumitra. Vulnerability disclosure inthe age of social media: exploiting twitter for predicting real-worldexploits. In 24th USENIX Security Symposium (USENIX Security 15),pages 1041–1056, 2015.

[29] R. Scandariato, J. Walden, A. Hovsepyan, and W. Joosen. Predictingvulnerable software components via text mining. IEEE Transactions onSoftware Engineering, 40(10):993–1006, 2014.

[30] J. Shakarian, A. T. Gunn, and P. Shakarian. Exploring malicious hackerforums. In Cyber Deception, pages 259–282. Springer, 2016.

[31] P. Shakarian, G. I. Simari, G. Moores, and S. Parsons. Cyber attribution:An argumentation-based approach. In Cyber Warfare, pages 151–171.Springer, 2015.

Page 12: At-Risk System Identification via Analysis of …At-Risk System Identification via Analysis of Discussions on the Darkweb Eric Nunes, Paulo Shakarian Arizona State University Tempe,

[32] F. Stolzenburg, A. J. Garcıa, C. I. Chesnevar, and G. R. Simari.Computing generalized specificity. Journal of Applied Non-ClassicalLogics, 13(1):87–113, 2003.

[33] L. Q. Trieu, H. Q. Tran, and M.-T. Tran. News classification fromsocial media using twitter-based doc2vec model and automatic queryexpansion. In Proceedings of the Eighth International Symposium onInformation and Communication Technology, pages 460–467. ACM,2017.

[34] J. Walden, J. Stuckman, and R. Scandariato. Predicting vulnerablecomponents: Software metrics vs text mining. In Software ReliabilityEngineering (ISSRE), 2014 IEEE 25th International Symposium on,pages 23–33. IEEE, 2014.

[35] S. Zhang, D. Caragea, and X. Ou. An empirical study on using thenational vulnerability database to predict software vulnerabilities. InInternational Conference on Database and Expert Systems Applications,pages 217–231. Springer, 2011.