28
This article was downloaded by: [T&F Internal Users], [Mr Susan Cullen] On: 20 September 2012, At: 09:37 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Applied Artificial Intelligence: An International Journal Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/uaai20 AN OVERVIEW OF DATA MINING FOR COMBATING CRIME Ephraim Nissan a a Goldsmiths' College, University of London, London, UK Version of record first published: 14 Sep 2012. To cite this article: Ephraim Nissan (2012): AN OVERVIEW OF DATA MINING FOR COMBATING CRIME, Applied Artificial Intelligence: An International Journal, 26:8, 760-786 To link to this article: http://dx.doi.org/10.1080/08839514.2012.713309 PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

AN OVERVIEW OF DATA MINING FOR COMBATING CRIME

Embed Size (px)

Citation preview

This article was downloaded by: [T&F Internal Users], [Mr Susan Cullen]On: 20 September 2012, At: 09:37Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Applied Artificial Intelligence: AnInternational JournalPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/uaai20

AN OVERVIEW OF DATA MINING FORCOMBATING CRIMEEphraim Nissan aa Goldsmiths' College, University of London, London, UK

Version of record first published: 14 Sep 2012.

To cite this article: Ephraim Nissan (2012): AN OVERVIEW OF DATA MINING FOR COMBATING CRIME,Applied Artificial Intelligence: An International Journal, 26:8, 760-786

To link to this article: http://dx.doi.org/10.1080/08839514.2012.713309

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representationthat the contents will be complete or accurate or up to date. The accuracy of anyinstructions, formulae, and drug doses should be independently verified with primarysources. The publisher shall not be liable for any loss, actions, claims, proceedings,demand, or costs or damages whatsoever or howsoever caused arising directly orindirectly in connection with or arising out of the use of this material.

AN OVERVIEW OF DATA MINING FOR COMBATING CRIME

Ephraim NissanGoldsmiths’ College, University of London, London, UK

& Artificial Intelligence (AI) and Law has been a burgeoning domain since the 1980s, but it wasnot until the 2000s that models of reasoning about legal evidence started to feature prominently.With regard to data mining, it has been applied to legal databases, as well as to law enforcement.We survey forensic applications of data mining to fraud and otherwise to crime intelligence orinvestigation. Traditionally a field separate from AI and Law, the two are now coming together.Success has been achieved especially in unravelling networks and in detecting fraud.

INTRODUCTION

Data mining has found application to knowledge discovery in legaldatabases (Stranieri and Zeleznikow 2005). Data mining has also beenvariously applied to security and criminal detection (Mena 2003). Notwith-standing two seminal publications in 1989 (Thagard 1989; Kuflik, Nissan,and Puni 1989), it was not until the 2000s that models of reasoning aboutlegal evidence (other than within statistics) burgeoned within the disciplineof Artificial Intelligence (AI) and Law, starting with a few editorial ini-tiatives (Martino and Nissan 2001; Nissan and Martino 2001, 2003, 2004;MacCrimmon and Tillers 2002; Kaptein, Prakken, and Verheij 2009). Theend of the decade was soon followed by two important authored bookson the subject (Bex 2011; Nissan 2012).

This article surveys applications of data mining to intelligence andinvestigative tasks within law enforcement. This is a separate subject, justas the use of computing various forensic sciences have followed trajectoriesapart from legal computing and AI and Law. A current rapprochementbetween AI and Law and data mining for crime investigation continues

Address correspondence to Ephraim Nissan, Department of Computing, Goldsmiths’ College,University of London, 25-27 St. James, New Cross, London SE14 6NW, UK. E-mail: [email protected]

Applied Artificial Intelligence, 26:760–786, 2012Copyright # 2012 Taylor & Francis Group, LLCISSN: 0883-9514 print=1087-6545 onlineDOI: 10.1080/08839514.2012.713309

Dow

nloa

ded

by [

T&

F In

tern

al U

sers

], [

Mr

Susa

n C

ulle

n] a

t 09:

37 2

0 Se

ptem

ber

2012

and holds the promise of a synergism that will hopefully enhance theconcrete usefulness of the broad overarching domain.

SOCIAL NETWORKS AND LINK ANALYSIS

Social network analysis (SNA) is the branch of sociology that deals withthe quantitative evaluation of an individual’s role in a group or communityby analyzing the network of connections between that individual andothers. See Breiger’s overview (2004), Freeman’s 4-volume set (2007),Aggarwal (2011), and Newman (2010). Exploratory visualization of socialnetworks is the subject of Brandes, Raab, and Wagner (2001), but withinan application to decision-making research in a real case study: theyapplied some SNA techniques for the study of the pattern of decision mak-ing itself. They represented the process of decision making as the networkof interactions between the actors involved in the process.

SNA has been applied, among the other things, to organized crime. Ina PhD dissertation, ‘‘Analysis of Layered Social Networks,’’ at the Air ForceInstitute of Technology in Ohio, United States Air Force Major Jonathan T.Hamill (2006) discussed application to counterterrorism being the motiv-ation for the project. Hamill’s thesis is concerned with prevention ofnear-term terrorist attacks.

Wherever individuals are organized, we can map their links by makingtheir social network explicit, for example, in order to better realize whichadvantage (i.e., which social capital) the individual derives from the net-work. Link analysis is a technique in which visualization plays a central role,and which has become quite important for crime intelligence and crimeinvestigation (e.g., Leary 2012). Historically, it emerged within ergonomics(Gilbreth and Gilbreth 1917; Fitts, Jones, and Milton 1950; cf. Harper andHarris 1975, p. 158). By contrast, the visualization of social networks wasapparently inaugurated by Jacob Levy Moreno (1889–1974) when heshowed sociometric charts at the 1933 convention of the Medical Societyof the State of New York (Moreno 1953, p. xiii), something that, in retro-spect, he considered to have been the breakthrough of the sociometricmovement. Moreno (Marineau 1989) is considered to have been the fatherof sociometry, as well as of psychodrama and of group psychotherapy.

Link analysis is an interactive technique, visualizing—in charts or mapsor diagrams—networks of entity-to-event associations (e.g., tying a victim to acrime), as well as entity-to-entity (e.g., blood relative, or spouse, or place ofbirth, or owner of a firm) and event-to-event (e.g., tying emails to eachother). Link analysis can benefit from social network analysis, borrowingfrom the latter, and applying this or that formal device. Users watchingon the screen the results returned by link analysis tools see those results,

An Overview of Data Mining for Combating Crime 761

Dow

nloa

ded

by [

T&

F In

tern

al U

sers

], [

Mr

Susa

n C

ulle

n] a

t 09:

37 2

0 Se

ptem

ber

2012

not the mathematics of the underlying concepts of social network analysis.Quite possibly, the best known project in link analysis for law-enforcementis COPLINK, to which the next section is devoted.

Harper and Harris (1975, p. 159) explained:

The link analysis procedure centers round the production of an associ-ation matrix and a link diagram. The association matrix provides anarray of the relationships among any set of individuals; the notationin any cell of the matrix indicates the nature of the link—strong, weak,or none—between two individuals. The link diagram, the end productof the analysis, presents a graphic illustration of the relationshipsamong the set of individuals. If some of the individuals are membersof identifiable organizations, these organizations can also be incorpor-ated into the diagram. The analysis is completed by [a] six-stepapproach.

The association matrix is triangular, and the same names identify therows and the columns. The six steps as listed by Harper and Harris(1975) were:

1. Assemble the available information;2. Abstract information relevant to individual relationships and affiliations;3. Prepare an association matrix,4. Develop a preliminary link diagram;5. Incorporate organizations in the diagram;6. Refine the link diagram.

There exist on the market such link analysis tools that are specificallytailored for assisting in criminal investigation. Mena provided a good surveyof these (2003, pp. 88–104). Crime Workbench is an intelligence manage-ment software product for criminal and fraud investigation;1 there is ascaled-down version, Crime Workbench Web, accessible from everywhere,and ‘‘aimed at the intelligence analyst and law enforcement investigatoron the move’’ (p. 100). Daisy is a link analysis tool, too, supporting a circu-lar layout of nodes; these are connected by lines inside the circle, and arepossibly surmounted by histograms outside the circle.2 By contrast, themain layout of displays generated by NETMAP—a link analysis tool3 usedby several government agencies in the United States—are in a wagon wheelformat, while also supporting other layouts.

Mena noted that a unique feature of another tool, Crime Link,4 ‘‘is itsability to generate a two-dimensional association matrix that basically showswho knows whom, who has done what, who has been where, etc.’’ (Mena 2003, p.97). This is a triangular table, with one-line textual explanations (such aspersonal names, with their variants) shown perpendicularly to its diagonal,thus identifying the rows and columns of the matrix. Those personal names

762 E. Nissan

Dow

nloa

ded

by [

T&

F In

tern

al U

sers

], [

Mr

Susa

n C

ulle

n] a

t 09:

37 2

0 Se

ptem

ber

2012

are preceded by a bullet, if the row or column includes a bullet in at leastone case. This enables to see who knows whom.

The ORIONInvestigations criminal data organizer can be integratedwith the ORIONLink link analysis tool.5 ‘‘A special feature of ORIONLinkis its ‘what-if’ mode, which allows objects and their connections to behidden or restored on the fly, allowing for the viewing of their impact onthe total organization, such as a terrorist cell or criminal gang’’ (Mena2003, p. 103).

COPLINK

COPLINK is a tool for criminal intelligence analysis that finds links indatabases among such entities, as seen previously.6 Developed by a team atthe University of Arizona in collaboration with the Tucson police,COPLINK performs data integration, pooling together the various infor-mation sources available (Hauck et al. 2002; Chen, Zeng, et al. 2003; Chen,Schroeder, et al. 2003). It ‘‘evolved into a real-time system being used ineveryday police work’’ (Hauck et al. 2002, p. 30). Drawing on experiencegained with the COPLINK project, Chen et al. (2004) presented a generalframework for crime data mining. Next, Xiang et al. (2005) described aprototype system called the COPLINK Criminal Relationship Visualizer;they contrasted the use of two views, namely, a hyperbolic tree view, and a hier-archical list view. Also see Schroeder et al. (2007).

At the Tucson Police Department, records at the time consisted ofabout 1.5 million criminal case reports, containing details from criminalevents spanning the period from 1986 to 1999 (Hauck et al. 2002, p. 31).Investigators were able, before COPLINK became available, to access therecords management system (RMS) to tie together information, but whenit came to finding relationships inside the records they had to manuallysearch the RMS data (ibid.).

COPLINK’s underlying structure is the concept space, or automatic the-saurus, a statistics-based, algorithmic technique used to identify rela-tionships between objects of interest. A concept space consists of anetwork of terms and weighted associations that assist in concept-basedinformation retrieval within an underlying information space. (Haucket al. 2002, p. 31)

Chen et al. (2004) used a concept-space approach in order to extractcriminal relations from the incident summaries and create a likely networkof suspects. When building a domain-specific concept space, the first stepconsists of identifying document collections in the specific subject domain;and for the Tucson police, the collection was the case reports in the existingdatabase. Each piece of information in the case reports database was

An Overview of Data Mining for Combating Crime 763

Dow

nloa

ded

by [

T&

F In

tern

al U

sers

], [

Mr

Susa

n C

ulle

n] a

t 09:

37 2

0 Se

ptem

ber

2012

categorized and stored in well-organized structures. At the second step, theterms were filtered and indexed (Hauck et al. 2002, pp. 31–32).

In order to carry out named-entity extraction, Chen et al. (2004) used amodified version of the AI Entity Extractor system. In order to identify thenames of persons, locations, and organizations in a document, that toolperforms a three-step process. The first step consists of identifying nounphrases according to linguistic rules. At the second step, ‘‘the system calculatesa set of feature scores for each phrase based on pattern matching and lexicallookup. Third, it uses a feedforward=back-propagation neural network topredict the most likely entity type for each phrase’’ (Chen et al. 2002, p. 53).

The second data mining task for COPLINK reported in Chen et al.(2004) ‘‘involved automatically detecting deceptive criminal identities fromthe Tucson Police Department’s database, which contains information suchas name, gender, address, ID number, and physical description. Ourdetective consultant manually identified 120 deceptive criminal recordsinvolving 44 suspects from the database’’ (p. 53).

Efficient algorithms for searching graphs, and using COPLINK’s con-cept space, were discussed by Xu and Chen (2004), who used as a datasetone year’s worth of crime reports from the Phoenix Police Department.Their algorithms compute the shortest paths between two nodes in a graph,based on weighted links.

The COPLINK project does not use entity extraction techniques; theydrew the data from a structured database system. Yet, it is often the case thatpolice records systems contain large collections of unstructured text andstructured case reports.

CATCH

Kangas et al. (2003) discussed Computer Aided Tracking and Charac-terization of Homicides (CATCH), a project of Battelle Memorial institute’sPacific Northwest Division in Richland, Washington, and the AttorneyGeneral of Washington, Criminal Division. In CATCH, Kohonen neuralnetworks (i.e., self-organizing maps) ‘‘learn to cluster similar cases fromapproximately 5,000 murders and 3,000 sexual assaults residing in the data-bases’’ (Kangas et al. 2003, p. 365), using data from the HITS (HomicideInvestigation Tracking System) database system, containing data aboutviolent crimes, primarily from the U.S. Pacific Northwest.

CATCH itself, which comes in two versions—one for murders and one forsexual assaults—is a collection of tools that includes query tools and geo-graphical maps. ‘‘The tools in CATCH are of two types. First, there are data-base mining tools to give the crime analyst a better understanding of thecontent of the database. Second, there are tools that let the analyst retrieveand compare specific crimes’’ (Kangas et al. 2003, p. 367). It is possible to

764 E. Nissan

Dow

nloa

ded

by [

T&

F In

tern

al U

sers

], [

Mr

Susa

n C

ulle

n] a

t 09:

37 2

0 Se

ptem

ber

2012

have a set of crimes placed on a geographical map as pins (e.g., along high-ways). ‘‘The user can select pins to view additional information about specificcrimes’’ (pp. 367–369).

ENRON’S EMAIL DATABASE, AND RELATED TECHNIQUES

Following the U.S. Federal Energy Regulatory Commission’s investi-gation of Enron, a large corpus of emails from Enron was put into thepublic domain, and this was an opportunity for research (e.g., Gray and Deb-receny 2006). Various techniques were used. ‘‘Probably the most discussedcontinuous email monitoring is the Carnivore system, developed by theFBI to scan emails in the United States. The CIA and NSA are assumed tohave similar systems to monitor email traffic outside of the U.S. The differ-ence being that the FBI needs a court order before it can monitor a specificperson’s email traffic in the U.S. By the way, companies do not need a courtorder to monitor employee emails’’ (Gray and Debreceny 2006, p. 7). Onestrand of research was intended to discover structures within the organiza-tion. Wilson and Banzhaf (2009) applied a genetic algorithm to the dis-covery of social networks within the Enron email database.

One of the tools presented by Goldberg et al. (2008) is SIGHTS,‘‘designed for the discovery, analysis, and knowledge visualization of socialcoalition in communication networks by analyzing communicationpatterns . . . . [The algorithms of SIGHTS] extract groups and track theirevolution in Enron-email dataset and in Blog data. The goal of SIGHTSis to assist an analyst in identifying relevant information’’ (p. 1, abstract).Goldberg et al. (2008) also described a complementary set of tools, usingRecursive Data Mining (RDM). The task of those tools is ‘‘to identify frequentpatterns in communication content such as email, blog or chat-roomsessions’’ (p. 1, abstract).

SIGHTS has three main modules: Data Collection=Storage, DataLearning and Analysis, and Knowledge Extraction=Visualization. The datasources are email data, or blogs, and it was envisaged to also include a linkto chat rooms at a later stage. From the data sources, data are collected, anda semantic graph and metadata are stored in a database. ‘‘The Data Collec-tion Modules operate on semantic graphs. The graphs are constructed byadding a node for each social network actor and a directed edge fromsender’s node to a recipient’s node. The edges are marked with the timeof the communication and, possibly, other labels. Some edge labels are onlyappropriate for specified types of graphs’’ (Goldberg et al. 2008, sec. 2).Blog Collector is a module accessing blogs, one of the data sources. Thealgorithm modules of SIGHTS interact with the database, retrieving fromit the semantic graph and metadata, and storing in it derived data. The

An Overview of Data Mining for Combating Crime 765

Dow

nloa

ded

by [

T&

F In

tern

al U

sers

], [

Mr

Susa

n C

ulle

n] a

t 09:

37 2

0 Se

ptem

ber

2012

algorithm modules include: Real Time Clustering, Leader Identification,Topic Identification, Cycle Group Analysis, Stream Group Analysis. Interac-tive visualization, through which users access the output of the algorithm,accounts for the following: Size vs. Density plot, Graph of Clusters plot,Group Persistence view, Leaders and Group Evolution view, interactiveGraph of Overlapping Clusters.

Goldberg et al. (2008, sec. 2) explained:

Temporal group algorithms identify hidden groups in the stream ofcommunications for the user specified time scales [(Baumes et al.2006, Camptepe et al. 2005)]. Noteworthy among them are the cyclemodel algorithms that identify all groups of users who persistently com-municate over a time interval and the stream model algorithm thatfinds groups based on frequently communicating triples followed bymerging correlated triples. Such groups usually communicate over mul-tiple time intervals during the period of interest, in a streaming fashion.Our algorithms also give the group structure hierarchy and can bemodified to track evolution. An example of the evolution of a groupfound in the ENRON email data set is shown in

a figure comprising three graphs, which show the evolution of a part of theEnron organizational structure in the periods 2000–2002 (respectively, thegraphs showed relations at Enron during September 2000 – September2001, March 2000 – March 2001, and September 2001 – September2002). All three graphs represented frequent structures; and the nodesin the graph represent actors of the Enron community. Some of thosenodes were present in all three time intervals.

In SIGHTS, there is an Opposition Identification Module. Its task is toidentify ‘‘the positive and negative sentiments between pairs of bloggers basedon the length and average size of the messages in the conversations that tookplace among them’’ (Goldberg et al. 2008, sec. 2). From LiveJournal.com,SIGHTS splits threads of comments into conversations between pairs ofbloggers. ‘‘The module employs the Support Vector Machine classifier thatwas trained using a dataset that was manually created to determine the opposi-tions between bloggers using the length of the conversation and the averagelength of the message in the conversation to determine whether bloggersopposed each other in a given conversation’’ (ibid.).

Bennett and Campbell (2000) provided a good introduction to supportvector machines (SVMs). Steinwart and Christmann (2008) published abook on support vector machines, as did Campbell and Ying (2011), whoseabstract states,

Support Vectors Machines have become a well-established tool withinmachine learning. They work well in practice and have now been usedacross a wide range of applications from recognizing hand-writtendigits, to face identification, text categorisation, bioinformatics, and

766 E. Nissan

Dow

nloa

ded

by [

T&

F In

tern

al U

sers

], [

Mr

Susa

n C

ulle

n] a

t 09:

37 2

0 Se

ptem

ber

2012

database marketing. In this book we give an introductory overview ofthis subject. We start with a simple Support Vector Machine for per-forming binary classification before considering multi-class classifi-cation and learning in the presence of noise.

An early definition of the concept was as follows: ‘‘The support-vectornetwork is a new learning machine for two-group classification problems.The machine conceptually implements the following idea: input vectorsare nonlinearly mapped to a very high-dimension feature space. In thisfeature space a linear decision surface is constructed. Special propertiesof the decision surface ensures [sic] high generalization ability of thelearning machine. [...] High generalization ability of support-vector net-works utilizing polynomial input transformations is demonstrated’’ (Cortesand Vapnik 1995, p. 173).

Recursive Data Mining (RDM) is ‘‘a text mining approach thatdiscovers patterns at varying degrees of abstraction in a hierarchical fash-ion. The approach allows for certain degree of approximation in match-ing patterns, which is necessary to capture non-trivial features in realisticdatasets. Due to its nature, we call this approach Recursive Data Mining(RDM)’’ (Chaoji, Hoonlor, and Szymanski 2010). Goldberg et al.(2008) also described a complementary set of tools, using RDM. As men-tioned earlier, they subjected communication content to such tools inorder to detect frequent patterns. These are discovered hierarchicallyat varying degrees of abstraction and regardless of the identify of thelanguage. Goldberg et al. (2008) resorted to RDM so that the techniqueinside social networks of communicators would distinguish leaders frommembers, as such are the roles that communicators play in such net-works. Szymanski and Zhang (2004) and Coull and Szymanski (2008)resorted to RDM for masquerade detection (within intrusion detection affect-ing computer resources) and author identification. These are subjectsinvestigated using various techniques by other authors (e.g., de Velet al. 2001; Elsayed and Oard 2006).

In contrast, as seen, Goldberg et al. (2008) ‘‘used Recursive Data Mining(RDM) for distinguishing the roles of the communicators in a social group.In general, RDM discovers, in a recursive manner, statistically significantpatterns in a stream of data. The key properties of the pattern discoveryin RDM include: (i) no restriction of the size of gaps between patterns,(ii) recursive mining in which discovered patterns are replaced bythe new token and the mining is repeated on the newly created string,(iii) tolerance to imperfect matching’’ (p. 1).

In RDM, ‘‘In the first iteration, the algorithm captures statistically sig-nificant patterns from the initial sequences. The patterns obtained areassigned new tokens. The initial sequences are re-written by collapsing eachsequence pattern to its newly assigned token, while retaining the rest of the

An Overview of Data Mining for Combating Crime 767

Dow

nloa

ded

by [

T&

F In

tern

al U

sers

], [

Mr

Susa

n C

ulle

n] a

t 09:

37 2

0 Se

ptem

ber

2012

tokens. Next, the algorithm operates on the re-written sequences and con-tinues to iterate through the pattern generation and sequence re-writingsteps until either the sequences cannot be re-written further or a prede-fined number of iterations is reached.’’ (Chaoji, Hoonlor, and Szymanski2010, sec. 4).

Goldberg et al. (2008, sec. 4) introduced an algorithm using not justone classifier for the entire process, but a different classifier for each levelof RDM abstraction: ‘‘The process starts with a sliding window of prede-fined length passing over the input sequence one token at a time. At eachstop, patterns with all possible combinations of tokens and gaps arerecorded. When [the] pass is completed, the recorded patterns arechecked for frequency of their occurrence’’ (p. 3). In fact, not all patternsare of equal importance. They continue to say that

Some patterns could be either too specific to a certain text or insignifi-cant because they contain very commonly occurring words. In eithercase, they are ineffective in classifying the mined text while adding tothe computational cost of the algorithm. The ‘usefulness’ of a patternis computed via a statistical significance test. A pattern is deemed sig-nificant if its frequency of occurrence (based on a unigram model) islarger than the expected number of occurrences in a random string.Patterns that are deemed insignificant are eliminated from further con-sideration. (p. 3).

STOCK BROKERS AND FRAUD

Neville et al. (2005) carried out automated classification of stockdatabases by aggregating features across nodes in a graph. Using statisticalrelational learning algorithms, they developed models that rank brokersaccording to the estimated risk they pose: based on who their associateswere, the probability was calculated that they would commit a serious viola-tion of securities regulations in the near future. Ethically, this is not withoutproblem. The probability estimates assigned to brokers by their models ingeneral agreed with the subjective ratings of NASD examiners. NationalAssociation of Securities Dealers (NASD) in Washington, DC, establishedin 1939, is the world’s largest private-sector securities regulator.

NASD previously used to identify higher-risk brokers using a set ofhandcrafted rules, applying it to its database of brokers, called CentralRegistration Depository (CRD), in operation since 1981. As stated byNeville et al. (2005, p. 450),

CRD was established to aid in the licensing and registration of itsbroker-dealers and the brokers who work for them. CRD maintainsinformation on all federally registered broker-dealers and brokers forthe SEC [i.e., the U.S. Securities and Exchange Commission], NASD,the states, and other federally authorized private sector regulators, such

768 E. Nissan

Dow

nloa

ded

by [

T&

F In

tern

al U

sers

], [

Mr

Susa

n C

ulle

n] a

t 09:

37 2

0 Se

ptem

ber

2012

as the New York Stock Exchange. Originally implemented in June 1981,CRD has grown to include data on approximately 3.4 million brokers,360,000 branches, and 25,000 firms. For firms, CRD information inclu-des data such as ownership and business locations. For individualbrokers, CRD includes qualification and employment information.Information in CRD is self-reported by the registered firms and brokers,although incorrect or missing reports can trigger regulatory action byNASD.

The HRB model is NASD’s identification of high-risk brokers byhandcrafted rules. Neville et al. explained (2005, p. 451),

Currently, NASD generates a list of higher-risk brokers (HRB) using aset of handcrafted rules they have formed using their domain knowl-edge and experience. This approach has two weaknesses we aim toaddress. First, the handcrafted rules simply categorize the brokers as‘higher-risk’ and ‘lower-risk’ rather than providing a risk-ordered rank-ing. A ranking would be more useful to examiners as it would allowthem to focus their attention on brokers considered to have the highestrisk. Second, NASD’s handcrafted rules use only information intrinsicto the brokers. In other words, they do not utilize relational contextinformation such as the conduct of past and current coworkers. NASDexperts believe that organizational relationships can play an importantrole in predicting serious violations. For example, brokers that havehad serious violations in the past may influence their coworkers to par-ticipate in future schemes. Furthermore, some firms tend to be associa-ted with continuous misconduct (i.e., they do not regulate their ownemployees and may even encourage violations). Lastly, higher-risk bro-kers sometimes move from one firm to another collectively, operatingin clusters, which heightens the chance of regulatory problems. Amodel that is able to use relational context information has the poten-tial to capture these types of behavior and provide more accuratepredictions.

Neville et al. (2005, p. 457) claimed that the findings of their ownproject supported the general beliefs at NASD, although predictions werereached differently:

NASD staff began this project contending that information about theprofessional and organizational networks that connect brokers wouldprovide useful information for determining their risk for seriousviolations of securities regulations. The results of this research haveborne out those beliefs. Our relational models provide predictions thatare competitive with, but significantly different from, the predictionsprovided by NASD’s hand-tuned rules, which only examined brokersand their disclosures, ignoring additional relational information suchas coworkers at present and past firms. These models show importantpotential for NASD’s screening process. They identified higher-risk bro-kers not previously identified by the NASD rules, and thus providedadditional targets for NASD examinations. Furthermore, being ident-ified as higher-risk by both our models and the HRB model was foundto be more predictive of future problems than being identified byeither model alone, thus permitting NASD to focus examinations on

An Overview of Data Mining for Combating Crime 769

Dow

nloa

ded

by [

T&

F In

tern

al U

sers

], [

Mr

Susa

n C

ulle

n] a

t 09:

37 2

0 Se

ptem

ber

2012

those most likely to have a serious violation in the near future. Andfinally, the probability estimates assigned to brokers by our models ingeneral agreed with the subjective ratings of NASD examiners, thusthe ranking provided by our models can be used to prioritize exami-ners’ attention.

Neville et al. (2005, p. 457) also indicated some limitations:

That said, the available data provide only relatively weak abilities toexploit the relational aspects of the domain. In CRD, individual brokersare directly related only through firms. Even branch relationships haveto be inferred from address information, although this limitation willbe obviated beginning this October [2005] when each broker will besystematically linked to a branch. More importantly, we do not knowwhich individual brokers work together directly, nor what other socialor organizational relationships they may share. To enhance their knowl-edge of potential links among individuals, NASD is investigating otherrecent technologies, most notably the NORA (Non-Obvious Relation-ship Awareness) system produced by Systems Research and Develop-ment, a Nevada-based company recently acquired by IBM. Suchrelationships could add substantially to the data analyzed in the workreported here, which could only use branch and firm relations presentin CRD. The work reported here also exemplifies a framework that maybe useful to projects that seek to develop screening tools to aid fieldexaminers working in other domains such as health care, insurance,banking, and environmental health and safety. In such cases, develop-ment of a labeled training set may be impractical in the initial stagesof a project. While the most accurate class labels would be the judg-ments of examiners, examiners’ time is typically limited and organiza-tions may be understandably skeptical about devoting [a] largeamount of examiners’ time to labelling data sets.

AUCTION FRAUD: NETPROBE

In NetProbe, a tool for detecting fraud at online auction sites (Panditet al. 2007; Chau, Pandit, and Faloutsos 2006), users and transactions weremodeled as a Markov random field (MRF), tuned for the detection of sus-picious patterns generated by fraudsters. A belief propagation mechanismwas resorted to in order to infer the maximum likelihood state probabilitiesof nodes in the MRF, given a propagation matrix and possibly a prior stateassignment for some of the nodes. Each node can be in one out of threestates: fraud, accomplice, or honest; or its state may be undetermined, inthe sense that NetProbe does not assign a state to that node. NetProbe usesthe propagation matrix in order to detect bipartite cores in the graph.

In order for the users to get answers to their queries in real time whenusing NetProbe, the Carnegie Mellon University team that developedNetProbe also developed Incremental NetProbe, a version that allowsapproximation. This avoids wasteful recomputation from scratch of node

770 E. Nissan

Dow

nloa

ded

by [

T&

F In

tern

al U

sers

], [

Mr

Susa

n C

ulle

n] a

t 09:

37 2

0 Se

ptem

ber

2012

beliefs. Incremental NetProbe incrementally updates node beliefs as smallchanges occur in the graph.

In NetProbe, the ‘‘key idea is to infer properties for a user based onproperties of other related users. In particular, given a graph representinginteractions between auction users, the likelihood of a user being a fraudsteris inferred by looking at the behaviour of its immediate neighbors’’ (Panditet al. 2007, p. 203). This is why trust propagation and authority propagationresearch are akin to the method of NetProbe. Some ‘‘non-trivial design andimplementation decisions’’ were made by the Carnegie Mellon team ‘‘whiledeveloping NetProbe. In particular, we discuss the following contributions:(a) a parallelizable crawler that can efficiently crawl data from auction sites,(b) a centralized queuing mechanism that avoids redundant crawling, (c)fast, efficient data structures to speed up our fraud detection algorithm,and (d) a user interface that visually demonstrates the suspicious behaviorof potential fraudsters to the end user’’ (p. 202).

MRFs are suitable for such problem-solving about inference, thatthere is uncertainty in observed data. A Markov random field (MRF) isa probabilistic model defined by local conditional probabilities. Theconcept is useful for devising contextual models with prior information:Markov random field theory is typically resorted to in order to modelcontext-dependent entities (such as, in image processing within computerscience, image pixels). Basically, an MRF is an undirected graph, that is tosay, the edges between pairs of nodes are not arrows. Each node in anMRF can be in any of a finite number of states. The state of a nodestatistically depends on each of its neighbors (i.e., those nodes to whichthe given node is connected by an edge), and on no other node in thegraph. A propagation matrix, symbolized as w, represents the dependencybetween a node and its neighbors in the given MRF. Each case w(i, j) inthe matrix has a value that is equal to the probability of a node i being instate j given that it has a neighbor in state i. If an assignment of states tothe nodes in an MRF is given, then by using the propagation matrix it ispossible to compute a likelihood of observing that assignment. The prob-lem of inferring the maximum likelihood assignment of states to nodes,where the correct states for some of the nodes are possibly known beforehand, is solved by those using MRFs by resorting to heuristic techniques(this is so because enumerating all states would be exponential in time,and because of the lack of any known theoretic method that would solvethis problem for a general MRF), and an especially powerful heuristicmethod to do that is the iterative message passing scheme of the beliefpropagation algorithm.

In order to detect likely fraudsters, a belief propagation mechanismwas used in NetProbe, that algorithm being generally used in order to inferthe maximum likelihood state probabilities of nodes in the MRF, given a

An Overview of Data Mining for Combating Crime 771

Dow

nloa

ded

by [

T&

F In

tern

al U

sers

], [

Mr

Susa

n C

ulle

n] a

t 09:

37 2

0 Se

ptem

ber

2012

propagation matrix and possibly a prior state assignment for some of thenodes. The standard notions about belief propagation are as follows:

. Let vectors be indicated in a bold font, as opposed to scalars. The kthelement in a vector v is indicated as v(k).

. Let S be the set of possible states in which a node can be.

. Let bn(r) stand for the probability that node n is in state r. Thatprobability is called the belief that n is in state r.

. Nodes pass messages to each other. Iterative message passing is how beliefpropagation works. ‘‘Let mij denote the message that node i passes tonode j; mij represents i’s opinion about the belief of j. At every iteration,each node i computes its belief based on messages received from itsneighbors, and uses the propagation matrix to transform its belief intomessages for its neighbors’’ (Pandit et al. 2007, p. 203). A message vectoris denoted by mij, and mij(r) is the rth element of that vector.

. Let N(i) be the set of nodes that are the neighbors of node i.

. Let k be a normalization constant.

Then this pair of value assignment formulae holds, in beliefpropagation:

mijðrÞ X

r0wðr0rÞ

Y

n2N ðiÞnjmniðr0Þ

biðrÞ kY

j2N ðiÞmjiðrÞ:

The first formula states that the messages that node i passes to node j(that is to say, i’s opinion about the belief of j in state r) is assigned thevalue obtained by multiplying the sums of all the propagation matrix cellswhich are the likelihoods that a node is in state r given that is has a neigh-bor in state r’, by the products (done for all such nodes n other than j thatare i’s neighbors) of node n’s opinion about the belief of i in state r’.

The second formula states that the belief of i in state r (that is to say,the probability that node i is in state r) is assigned as value a normalizationconstant multiplied by the products of the messages that node j passes tonode i (that is to say, j’s opinion about the belief of i in state r), for all suchj that j is a neighbor of node i.

The standard algorithm is as follows: ‘‘Starting with a suitable prior onthe beliefs of the nodes, belief propagation proceeds by iteratively passingmessages between nodes based on previous beliefs, and updating beliefsbased on the passed messages. The iteration is stopped when the beliefsconverge (within some threshold), or a maximum limit for the numberof iterations is exceeded’’ (Pandit et al. 2007, p. 203). ‘‘In case there is

772 E. Nissan

Dow

nloa

ded

by [

T&

F In

tern

al U

sers

], [

Mr

Susa

n C

ulle

n] a

t 09:

37 2

0 Se

ptem

ber

2012

no prior knowledge available, each node is initialized to an unbiased state(i.e., it is equally likely to be in any of the possible states), and the initialmessages are computed by multiplying the propagation matrix with theseinitial, unbiased beliefs’’ (ibid., fn. 4).

A now-old book about Markov random field is Kindermann and Snell(1980). MRFs have been discussed in the AI literature about belief propa-gation (e.g., Yedidia et al. 2003). Moreover, MRFs can be used for a wide var-iety of machine vision or image processing problems (Mitchell 2010; Li 2009;Jin, Fieguth, and Winger 2005; Kato and Pong 2001; Feng and Chen 2004).Ishikawa (2003) proposed a method to solve exactly a first-order Markov ran-dom field optimization problem more generally than was available before.

In the MRF of NetProbe, each node stands for a user. Each edge standsfor one or more transactions between pairs of users. If there is an edgebetween two nodes, this indicates that the two users for whom the twonodes stand have transacted at least once. Each node can be in one of threestates, namely, fraud, accomplice, or honest, or then its state may beundetermined, in the sense that NetProbe does not assign a state to thatnode. Pandit et al. (2007, p. 204) claimed they

uncovered a different modus operandi for fraudsters in auction networks,which leads to the formation of near bipartite cores. Fraudsters create twotypes of identities and arbitrarily split them into two categories—fraudand accomplice. The fraud identities are the ones used eventually to carryout the actual fraud, while the accomplices exist only to help the fraud-sters carry out their job by boosting their feedback rating. Accomplicesthemselves behave like perfectly legitimate users and interact with otherhonest users to achieve high feedback ratings. On the other hand, theyalso interact with the fraud identities to form near bipartite cores,which helps the fraud identities gain a high feedback rating. Oncethe fraud is carried out, the fraud identities get voided by the auctionsite, but the accomplice identities linger around and can be reused tofacilitate the next fraud.

NetProbe uses the propagation matrix in order to detect bipartite coresin the graph. As mentioned earlier, in NetProbe, a particular propagationmatrix was devised, so that the belief propagation mechanism would suitthe behavior of fraudsters and their accomplices. The intuition (Panditet al. 2007, pp. 204–205) was that a fraudster would avoid linking toanother fraudster. Rather, a fraudster would link heavily to accomplices.An accomplice, instead, would link to both honest nodes and fraudsters,but the accomplice has a higher affinity for fraudsters. As to honest nodes(i.e., innocent users), they link to honest nodes as well as to accomplices,because the honest user believes the accomplice to be honest.

What is meant by near bipartite cores is that inside the graph, whichrepresents the online auction site, one expects to find such subsets ofthe nodes (i.e., such subsets of the users), that the given subset is a

An Overview of Data Mining for Combating Crime 773

Dow

nloa

ded

by [

T&

F In

tern

al U

sers

], [

Mr

Susa

n C

ulle

n] a

t 09:

37 2

0 Se

ptem

ber

2012

complete bipartite graph. That is to say, the given subset could be dividedinto two sub-subsets, and each node in either sub-subset has edges linking itto all nodes in the other sub-subset. If we replace ‘‘all’’ with ‘‘one or more ofthe,’’ then we would have a bipartite graph that is not a complete bipartitegraph. This is also a possibility that is relevant for detecting fraudsters andtheir accomplices at sites such as eBay.

In the application at hand, which is to fraudsters at an online auctionsite, a fraudster is linked to all of his or her accomplices, but two or morefraudsters may share accomplices. If a particular fraudster is the only fraud-ster using his or her accomplices, that is to say, if the fraudster has exclusiveuse of his or her accomplices, then one of the two sub-subsets in the (com-plete) bipartite core is a singleton set, i.e., such a set that it contains onlyone element. It may also be that if the sub-subset comprising the fraudsterscomprises more than one fraudster, then the bipartite core is not a com-plete bipartite graph, because it may be that one of the fraudsters in thefraudsters’ sub-subset is using only some of the accomplices of anotherfraudster in that sub-subset, and the former fraudster may be using someaccomplices that the latter fraudster is not using.

Pandit et al. (2007, p. 205) remark that in practice, given the very largesize of the graph, it would be too strong a requirement if NetProbe had topropagate beliefs over the entire graph each and every time that a node(i.e., a user) or an edge (i.e., a transaction) is added to the network. Inorder for the users to get answers to their queries in real time when usingNetProbe, the Carnegie Mellon team that developed NetProbe alsodeveloped Incremental NetProbe, a version which allows approximation.This avoids wasteful recomputation from scratch of node beliefs. Incremen-tal NetProbe incrementally updates node beliefs as small changes occur inthe graph. The assumption, when using Incremental NetProbe, is that theaddition of a new edge would have only a local effect.

MALWARE AND POLONIUM

Polonium is a tera-scale graph mining for malware detection (Chauet al. 2010). The reputation-based approach adopted is a Symantecprotection model that, for every application that users may encounter, com-putes a reputation score (considered as belief in belief propagation), andprotects them from files whose score is poor. Various attributes contributeto reputation: whether an application comes from known publishers,whether it already has many users, and so forth. ‘‘Good files typically appearon many machines and bad files appear on few machines’’ (sec. 4.2).Another intuition is what was called homophilic machine–file relationships:‘‘We expect that good files are more likely to appear on machines with good

774 E. Nissan

Dow

nloa

ded

by [

T&

F In

tern

al U

sers

], [

Mr

Susa

n C

ulle

n] a

t 09:

37 2

0 Se

ptem

ber

2012

reputations and bad files more likely to appear on machines with low repu-tations. In other words, the machine-file relationships can be assumed tofollow homophily’’ (sec. 4.2).

An undirected, unweighted bipartite graph of files andmachines was gen-erated ‘‘from the raw data, with almost 1 billion nodes and 37 billion edges(37,378,365,220). Forty-eight million of the nodes are machine nodes, and903 million are file nodes. An (undirected) edge connects a file to a machinethat has the file. All edges are unweighted; at most, one edge connects a fileand amachine. The graph is stored on disk as a binary file using the adjacencylist format’’ (Chau et al. 2010, sec. 3.3); ‘‘we want to label a file node as goodor bad, along with a measure of the confidence in that disposition’’ (sec. 4.1).In Polonium, there is a trade-off concerning false positives, expressed in howthe belief propagation algorithm is made to stop (Chau et al. 2010).

Chau et al. (2010, abstract) claimed: ‘‘We evaluated it with the largestanonymized file submissions dataset ever published, which spans over 60terabytes of disk space’’ (emphasis in the original), with over 900 million filesdescribed in the raw data, from a total of 47,840,574 machines. Poloniumresorts to graph mining. Like NetProbe, it also resorts to the belief propa-gation algorithm. ‘‘We adapted the algorithm for our problem. This adap-tation was non-trivial, as various components used in the algorithm had tobe fine tuned; more importantly, [ . . . ] modification to the algorithm wasneeded to induce iterative improvement in file classification’’ (sec. 4.3).A reputation-based approach was adopted. In a nutshell, ‘‘the key idea ofthe Polonium algorithm is that it infers a file’s goodness by looking at itsassociated machines’ reputations iteratively. It uses all files’ current good-ness to adjust the reputation of machines associated with those files; thisadjusted machine reputation, in turn, is used for re-inferring the files’goodness’’ (sec. 4.5).

The reputation-based approach adopted is a Symantec protectionmodel that, for every application that users may encounter computes areputation score and protects them from files whose score is poor. Variousattributes contribute to reputation: whether an application comes fromknown publishers, whether it already has many users, and so forth.‘‘Symantec has computed a reputation score for each machine based ona proprietary formula that takes into account multiple anonymous aspectsof the machine’s usage and behavior. The score is a value between 0 and 1’’(Chau et al. 2010, sec. 4.2).

Computing reputation credibly was made possible by the worldwideNorton Community Watch program, with millions of users contributing dataanonymously. This is a huge file-submissions dataset. The raw data undergoprocessing at Symantec and then are fed into Polonium, whichmines the datastatistically, and machine learning is applied. ‘‘Each contributing machine isidentified by an anonymized machine ID, and each file by a file ID which is

An Overview of Data Mining for Combating Crime 775

Dow

nloa

ded

by [

T&

F In

tern

al U

sers

], [

Mr

Susa

n C

ulle

n] a

t 09:

37 2

0 Se

ptem

ber

2012

generated based on a cryptographically-secure hashing function’’ (Chau et al.2010, sec. 3.1). An undirected, unweighted bipartite graph of files andmachines was generated ‘‘from the raw data, with almost 1 billion nodesand 37 billion edges (37,378,365,220). Forty-eight million of the nodes aremachine nodes, and 903 million are file nodes. An (undirected) edge con-nects a file to a machine that has the file. All edges are unweighted; at mostone edge connects a file and a machine. The graph is stored on disk as abinary file using the adjacency list format’’ (sec. 3.3).

Polonium computes the reputation for a given application, and is used inconcert with other Symantec malware detection technologies. In the beliefpropagation algorithm as used in Polonium, belief corresponds to reputation.The Polonium team treated each file as a random variable X, whose value is

. either xg (this being the ‘‘good’’ label)

. or xb (this being the ‘‘bad’’ label).

. The probability P(xg) is the file goodness,

. whereas P(xb) is the file badness,

and the sum of the two probabilities is 1. Therefore, by knowing the valueof one, one also knows the other. For each file i, the goal is to find themarginal probability P(Xi¼ xg), that is the goodness of that file. Domainknowledge helps infer label assignments.

Terminology with the respective definitions include: file ground truthfor ‘‘file label, good or bad, assigned by human security experts’’ (here,by file an executable file is meant; Chau et al. 2010, Table 1); known-goodfile for ‘‘file with good ground truth’’; known-bad file for ‘‘file with badground truth’’; and unknown file for ‘‘file with unknown ground truth’’(ibid.). ‘‘Symantec maintains a ground truth database that contains largenumber of known-good and known-bad files, some of which exist in ourgraph. We can leverage the labels of these files to infer those of theunknowns. The ground truth files influence their associated machines,which indirectly transfer that influence to the unknown files’’ (sec. 4.2).

Moreover, the possibility of errors is recognized: in the case ofPolonium, True Positive (TP) stands for ‘‘malware instance correctlyidentified as bad,’’ (Chau et al. 2010, Table 1) as opposed to False Positive(FP) for ‘‘a good file incorrectly identified as bad’’ (ibid.). False positivesare a price to pay that comes with some successful malware detection

TABLE 1 Edge potentials in Polonium

wij(xi, xj) xi¼ good xi¼ bad

xj¼ good 0.5þ e 0.5� exj¼bad 0.5� e 0.5þ e

776 E. Nissan

Dow

nloa

ded

by [

T&

F In

tern

al U

sers

], [

Mr

Susa

n C

ulle

n] a

t 09:

37 2

0 Se

ptem

ber

2012

tools: Tesauro, Kephart, and Sorkin (1996), who applied neural networks,were able to detect ‘‘boot sector viruses’’ with an over 90% true positive ratein identifying those viruses; but on the other hand, this came at a 15–20%false positive rate. As mentioned, in Polonium there is a trade-off concern-ing false positives that is expressed in how the belief propagation algorithmis made to stop.

Virus signatures are virus profiles, or virus definitions. Malware detec-tion comes in two major categories: anomaly-based detection, based onsome presumed ‘‘normal’’ behavior from which malware deviates; andsignature-based detection, in which malware instances are detected becausethey fit some profiles (Idika and Mathur 2007; Chau et al. 2010). It wasKephart and Arnold (1994) who first used data mining techniques to auto-matically extract virus signatures. Schultz et al. (2001) were among thosewho pioneered the application of machine learning algorithms (in theircase, Naive Bayes and Multi-Naive Bayes) to classify malware.

In Naive Bayes, for a given sample we search for a class ci that maximizesthe posterior probability P(ci j x ; h’) by applying the Bayes rule. Then x canbe classified by computing

c1 ¼ argmaxci2C

Pðci jh0ÞPðxjcih0Þ:

Concerning how belief propagation was applied in Polonium, Chauet al. (2010, sec. 4.3) explain:

At a high level, the algorithm infers the label of a node from some priorknowledge about the node, and from the node’s neighbors. This isdone through iterative message passing between all pairs of nodes viand vj. Let mij(xj) denote the message sent from i to j. Intuitively, thismessage represents i’s opinion about j’s likelihood of being in classxj. The prior knowledge about a node i, or the prior probabilities ofthe node being in each possible class are expressed through the nodepotential function u(xi).

This prior probability is called a prior. Once the procedure execution iscompleted, the goodness of each file is determined: ‘‘This goodness is anestimated marginal probability, and is also called belief, or formally bi(xi)(�P(xi)), which we can threshold into one of the binary classes. For example,using a threshold of 0:5, if the file belief falls below 0:5, the file is consideredbad’’ (Chau et al. 2010, sec. 4.3). The messages are obtained as follows:

Each edge eij is associated with messages mij(xj) and mji(xi) for each poss-ible class. Provided that all messages are passed in every iteration, theorder of passing can be arbitrary. Each message vector mij is normalizedover j (node j is the message’s recipient), so that it sums to one.Normalization also prevents numerical underflow (or zeroing-outvalues). Each outgoing message from a node i to a neighbor j is

An Overview of Data Mining for Combating Crime 777

Dow

nloa

ded

by [

T&

F In

tern

al U

sers

], [

Mr

Susa

n C

ulle

n] a

t 09:

37 2

0 Se

ptem

ber

2012

generated based on the incoming messages from the node’s otherneighbors (ibid).

Let N(i) be the set of nodes that are the neighbors of node i. Let k be anormalizing constant. Let the edge potential be notated as wij(xi, xj);‘‘intuitively, it is a function that transforms a node’s incoming messagescollected into the node’s outgoing ones. Formally, wij(xi, xj) equals theprobability of a node i being in class xi given that its neighbor j is in classxj’’ (Chau et al. 2010, sec. 4.3). The message-update equation is:

mijðxjÞ X

xi2X/ðxiÞwijðxi; xjÞ

Y

k2N ðiÞnjmkiðxiÞ:

When the execution of the belief propagation algorithm ends, thenode beliefs are determined according to this formula:

biðxiÞ ¼ k/ðxiÞY

xj2N ðiÞmjiðxiÞ:

In Polonium, the intuition that good files are (slightly) more likely toappear on machines with good reputations and bad files (slightly) morelikely to appear on machines with low reputations (that is to say, the homo-philic machine–file relationships assumption) was converted into an edgepotential defined according to Table 1, was set to 0.001.

Moreover, for machine nodes, the node potential function maps thereputation score computed by Symantec, into the machine’s prior. Thatexponential mapping obeys the following formula (where k is a constantwhose value is based on domain knowledge):

machine prior ¼ e�k�reputation:

This translates the intuition about what machine reputation contributesto the file reputation, into the machine prior. ‘‘Similarly, we use anothernode potential function to set the file prior by mapping the intuition that filesthat appear on many machines are typically good (Chau et al. 2010, sec.4.4).’’ This maps file prevalence into file prior. That is to say, the intuitionabout file goodness is translated into an unknown-file prior. And finally, theintuition about file ground truth is mapped into known-file prior: ‘‘Forknown-good files, we set their priors to 0.99. For known-bad, we use0.01.’’ (ibid.) ‘‘Note that no probability is ever 0, because it can ‘zero-out’other values multiplied with them. A lower bound of 0.01 has beenimposed on all probabilities. Upper bound is, therefore, 0.99, since prob-abilities of the two classes add up to 1 (fn. 4 in sec. 4.4).’’

778 E. Nissan

7

Dow

nloa

ded

by [

T&

F In

tern

al U

sers

], [

Mr

Susa

n C

ulle

n] a

t 09:

37 2

0 Se

ptem

ber

2012

When developing Polonium, the team modified the file-to-machinepropagation between nodes of the graph: the edge potential based onthe homophilic intuition is used in order ‘‘to propagate machinereputations to a file from its associated machines. Theoretically, we couldalso use the same edge potential function for propagating file reputationto machines. However, as we tried through numerous experiments—vary-ing the e parameter, or even ‘breaking’ the homophily assumption—wefound that machines’ intermediate beliefs were often forced to changedtoo significantly’’ (Chau et al. 2010, sec. 4.5): what was happening, was thatsuch change ‘‘led to an undesirable chain reaction that changes the filebeliefs dramatically as well, when these machine beliefs were propagatedback to the files. We hypothesized that this happens because for amachine’s reputation (used in computing the machine node’s prior) it isa reliable indicator of machine’s beliefs, while the reputations of the filesthat the machine is associated with are weaker indicators. [Based on thishypothesis, the team found this solution:] instead of propagating file repu-tation directly to a machine, [they had Polonium] pass it to the proprietaryformula that Symantec uses to generate machine reputation, whichre-computes a new reputation score for the machine.[Experimentsshowed] that this modification leads to iterative improvement of file classi-fication accuracy’’ (ibid.).

FUEL FRAUD: THE POZNAN ONTOLOGY MODEL

A team in Poznan, Poland (Jedrzejek, Falkowski, and Smolenski 2009),made a report about an application of ontologies technology to linkanalysis for investigating scams involving chains of transactions made by amultitude of straw companies, and whose goal is fuel fraud. The purposeof the Poznan team’s project was to develop an adequate analytic tool,FuelFlowVis, to help with investigations and prosecutions.

What is involved is a kind of crime known as a fuel laundering scam.‘‘This crime mechanism is to buy rebated oil (in Poland, heating oil) froma licensed distributor and then mix it (i.e., add components) and sell to theretail market as duty paid diesel’’ (Jedrzejek et al. 2009, p. 83).

In order to avoid the considerably higher excise tax or duty on dieselfuel, fraudsters process heating oil or agricultural oil by fuel laundering—i.e., by removing the dye identifying these and adding components—intodiesel fuel that is suitable for engines. The lower-quality fuel thus obtainedis then illicitly sold to drivers as diesel fuel at pumps at gas stations. Thecrime mechanism involves a flow of fuel, fuel components, and money,and this is masked by issuing fictitious invoices, either with or withoutpayment. Whereas ‘‘the methods to hide the proceeds (i.e., executing the

An Overview of Data Mining for Combating Crime 779

Dow

nloa

ded

by [

T&

F In

tern

al U

sers

], [

Mr

Susa

n C

ulle

n] a

t 09:

37 2

0 Se

ptem

ber

2012

crime scheme) are very similar’’ (Jedrzejek et al. 2009, p. 81)—for all of itsalso being true that ‘‘fraudsters may use many types of schemes, techniquesand transactions to achieve their goals’’ so that ‘‘we need a conceptualmodel of fuel laundering crime of significant generality’’—in the threemajor cases from Poland that the team studied, ‘‘prosecutors had an enor-mous problem to uncover money flows from the source of money (profitcentre) to sinks (where the money leaves companies and goes as cash toorganizers of the scheme) and in retrospect, some of them did not evenattempt to do this’’ (p. 81).

The unwieldy difficulties with which the prosecutors were faced areexplained in the paper: ‘‘This occurs because the use of traditional analysistools (spreadsheets or non-semantic visualization tools) cannot provide infor-mation about chains of transactions—a separate binary relation’s view does notgive complete insight into the crimemechanism. The consequence of this factis incomplete understanding of a crime mechanism’’ (Jedrzejek et al. 2009, p.81). The team was faced with the general problem of its being very difficult tomodel economic crime (cf. Chau et al. 2007) for the purpose of developing aknowledge-based system. Fuel laundering was still virgin territory formodeling.

The team studied mainly three large fuel-laundering cases from the2001–2003 period that went to court in Poland in 2008. For only one ofthe three cases, a rather complete set of money transfers was available, alongwith the most important invoice information. The lack of data for the othertwo cases made it impossible to apply to them a particular data mining tech-nique, namely: triggering rule-based red flags. Most defendants werecharged with money laundering, signing false documents, or fraudulentlyreclaiming value added tax (VAT) from the tax office, as well as with con-spiracy, tax evasion in the form of simple non-declaration of income, includ-ing false data in a tax statement, or of directing illegal activity performed byanother person (Jedrzejek et al. 2009, p. 81).

The Poznan team developed a formal model of fuel laundering andmoney laundering, the model being based on a minimal ontology (codedin the OWL ontology language). It is minimal in the sense that the teamdeliberately included only the necessary concepts that follow in the logicalorder of uncovering facts about a crime (Jedrzejek et al. 2009, p. 82). Thatminimal ontology—which is itself structured as eight layers in fact-uncovering order (only five layers were shown in a table; ibid.)—constitutesan application layer, which in turn is ‘‘embedded in an upper level ontologyof criminal processes and investigation procedures’’ (ibid.).

FISCAL FRAUD AND SNIPER

SNIPER is an auditing methodology applied to an area in fiscal frauddetection, namely, the detection of Value Added Tax (VAT) fraud. Such

780 E. Nissan

Dow

nloa

ded

by [

T&

F In

tern

al U

sers

], [

Mr

Susa

n C

ulle

n] a

t 09:

37 2

0 Se

ptem

ber

2012

fraud can take various forms, such as underdeclaring sales or overdeclaringpurchases. Moreover, fraudulent claims are possible for credits and refunds,because ‘‘tax charged by a seller is available to the buyer as a credit againsthis liability on his own sales and, if in excess of the output tax due,refunded to him’’(Basta et al. 2009, p. 27). A team based in Pisa, Italy, thatincludes Stefano Basta, Fosca Giannotti, Giuseppe Manco, Dino Pedreschi,and Laura Spisanti, reported about the SNIPER project (Basta et al. 2009).The Pisa-based team aims at having a rule-based computer tool that, bymeans of data mining, would ‘‘identify the taxpayers with the highest prob-ability of being VAT defrauders, in order to support the activity of planningand performing effective fiscal audit’’ (p. 27). A major constraint is the lim-ited auditing capability of the competent revenue agency: ‘‘In Italy forexample, audits are performed on only 0.4% of the overall population oftaxpayers who file a VAT refund request’’ (ibid.).

The resulting sample selection bias, by which auditors focus on suspicioussubjects, has the consequence that ‘‘the proportion of positive subjects (indi-viduals who are actually defrauders in the training set is vast compared withthat in the overall population’’ (Basta et al. 2009, p. 27). The constraint onauditing capability also ‘‘poses severe constraints in the design of the scoringsystem’’; ‘‘the scoring system should concentrate on a user-defined fixed num-ber of individuals (representing the auditing capability of the agency), withhigh fraudulent likelihood and with a minimum false positive rate’’ (ibid.).

The fraud detection scenario has several objective functions (criteria)to be optimized. Criteria enumerated include (Basta et al. 2009, p. 27): pro-ficiency (‘‘higher fraud amounts make defrauders more significant’’), equity(‘‘a weighting mechanism should highlight those cases where the fraudrepresents a significant proportion of the business volume’’), efficiency(‘‘since the focus is on refunds, scoring and detection should be sensitiveto total=partial frauds’’: ‘‘a subject claiming a VAT refund equal to $2000and entitled to $1800 is less significant than a different subject claiming$200 who is entitled to nothing’’).

A rule-based classification approach was adopted for SNIPER becauseintelligible explanations made available to the auditors are more importantthan the scores themselves. By receiving explanations, auditors can get anidea of which behavioral mechanism behind the fraud to investigate. Adrawback of rule-based classification techniques is that, as in the problemat hand, the underlying data distribution is such that occurrences of posi-tive cases of fraud are rarely observed, the classifier would be poor at pre-dicting accurately. In SNIPER, the approach is flexible, being ‘‘an ensemblemethod that combines the best of several rule-based baseline classificationtools’’ (Basta et al. 2009, p. 28), each handling specific subproblems. Thesystem should gradually learn a set of rules and should devise a scoringfunction.

An Overview of Data Mining for Combating Crime 781

Dow

nloa

ded

by [

T&

F In

tern

al U

sers

], [

Mr

Susa

n C

ulle

n] a

t 09:

37 2

0 Se

ptem

ber

2012

CONCLUSIONS

We surveyed representative systems of current applications of datamining to the fight against crime. Data mining for fraud detection is amajor breakthrough. See, for example, Phua et al. (2005); Kouet al. (2004); Weatherford (2002). The brief discussion in this section isintended to point out main trends and difficulties, and to open some direc-tions for further research. COPLINK has relatively been a very visible pro-ject, because its own merits make it an archetype and because of where itspublications appeared. Link analysis, however, is widespread, including inlaw-enforcement, and there even exist commercial software products, basedon that kind of technique, for the police or detectives. Text mining found aforensic application in the analysis of the Enron database, and this appli-cation in turn became relatively visible because of the notoriety of theEnron scandal. We considered an example of email mining only as appliedto the Enron database. Malware, the application domain of the Poloniumdata mining project, is an area practically as wide as computer security. Thissuggests that we are going to see much more research into the applicationof data mining in this area, and our coverage was by no means exhaustive:we could not do justice to computer or network security in the shortcompass of this overview.

A major direction of research that has emerged is the analysis of sus-pected complicity in networks in relation to various kinds of fraud. Onlineauction fraud is a major area of application, because of how sore the prob-lem is. Fraud involving stock brokers is another area in which there hasbeen investment in information technology, evolving from old-fashionedrule-based expert systems to the current emphasis on data mining. The pro-ject concerning stock brokers we considered to be one in which ethicalproblems are more palpable, because of a justified lingering fear that thereputation of some honest broker might be unduly tarnished under thewatch of an official regulator. It is significant that the Poznan project isapplied to fuel fraud. This is an economically important sector withinfraud, one that (as in the case of the Pisa SNIPER project) is quite relevantfor enforcing governmental fiscal policies. This is also a kind of applicationsensitive to differences of jurisdiction and of legislation or regulations. Thisin turn may lead to a geographical proliferation of projects, but this is notnecessarily a bad thing. Different implementors are likely to learn a lotfrom each other through the published literature.

NOTES

1. http://www.memex.com/cwbover.html.2. http://www.daisy.co.uk/daisy.html.

782 E. Nissan

Dow

nloa

ded

by [

T&

F In

tern

al U

sers

], [

Mr

Susa

n C

ulle

n] a

t 09:

37 2

0 Se

ptem

ber

2012

3. http://www.altaanalytics/com/.4. http://www.crimelink.com/.5. http://www.oriosci.com/productinfo/Magic.html.6. http://ai.bpa.arizona.edu/coplink.7. ‘‘The algorithm stops when the beliefs converge (within some threshold. 10�5 is commonly used), or

a maximum number of iterations has finished. Although convergence is not guaranteed theoreticallyfor generally graphs, except for those that are trees, the algorithm often converges in practice, whereconvergence is quick and the beliefs are reasonably accurate.’’ (Chau et al. 2010, sec. 4.3). In parti-cular, in Polonium, there is a departure from how usually a belief propagation is made to terminate,and this involves how true positive rates (TPR) rather than false positive rates (FPR) are treated: ‘‘the Pol-onium algorithm’s termination criterion is goal-oriented, meaning the algorithm stops when theTPR does not increase any more (at the preset 1% FPR). This is in contrast to Belief Propagation’sconvergence-oriented termination criterion. In our premise of detecting malware, the goal-orientedapproach is more desirable, because our goal is to classify software into good or bad, at as high of aTPR as possible while maintaining low FPR—the convergence-oriented approach does not promisethis; in fact, node beliefs can converge, but to undesirable values that incur poor classification accu-racy. We note that in each iteration, we are trading FPR for TPR. That is, boosting TPR comes with acost of slightly increasing FPR. When the FPR is higher than desirable, the algorithm stops’’(sec. 5.2).

8. See http://www.morganclaypool.com/doi/abs/10.2200/S00324ED1V01Y201102AIM010 Until early2011, there were five issues available, published between 2007 and February 2011.

REFERENCES

Aggarwal, C. C. 2011. Social Network Data Analysis. Berlin: Springer.Basta, S., F. Giannotti, G. Manco, D. Pedreschi, and L. Spisanti. 2009. SNIPER: A data mining method-

ology for fiscal fraud detection. Mathematics for Finance and Economy, special issue of ERCIM News78:27–28.

Bennett, K. P., and C. Campbell. 2000. Support vector machines: Hype or hallelujah? SIGKDDExplorations 2 (2): 1–13.

Bex, F. 2011. Arguments, stories and criminal evidence: A formal hybrid theory, Springer Law and PhilosophyLibrary, 92. Dordrecht, The Netherlands: Springer.

Brandes, U., J. Raab, and D. Wagner. 2001. Exploratory network visualization: Simultaneous display ofactor status and connections. Journal of Social Structure 2 (4). http://www.cmu.edu/joss/content/articles/volume2/BrandesRaabWagner.html.

Breiger, R. L. 2004. The Analysis of Social Networks. In Handbook of Data Analysis, ed. M. Hardy and A.Bryman, 505–526. London: SAGE.

Campbell, C., and Y. Ying. 2011. Learning with support vector machines. Synthesis Lectures on ArtificialIntelligence and Machine Learning 5 (1): 1–95. Published online in .pdf in February 2011 by Morganand Claypool in the United States . doi:==10.2200=S00324ED1V01Y201102AIM010

Chaoji, V., A. Hoonlor, and B. K. Szymanski. 2010. Recursive data mining for role identification in elec-tronic communications. International Journal of Hybrid Information Systems 7 (3): 89–100. Also at:http://www.cs.rpi.edu/~szymansk/papers/ijhis.09.pdf

Chau, D. H., C. Nachenberg, J. Wilhelm, A. Wright, and C. Faloutsos. 2010. Polonium: Tera-scale graphmining for malware detection. Proceedings of the 2nd workshop on large-scale data mining: theory and appli-cations (LDMTA 2010), Washington, DC, July 25. www.ml.cmu.edu/current_students/DAP_chau.pdf

Chau, D. H., S. Pandit, and C. Faloutsos. 2006. Detecting fraudulent personalities in networks of onlineauctioneers. In Proceedings of the European conference on machine learning (ECM) and principles and prac-tice of knowledge discovery in databases (PKDD) 2006, 103–114. Berlin, September 18–22.

Chau, M., J. Schroeder, J. Xu, and H. Chen. 2007. Automated criminal link analysis based on domainknowledge. Journal of the American Society for Information Science and Technology 58 (6): 842–855.

Chen, H., W. Chung, J. J. Xu, G. Wang, Y. Qin, and M. Chau. 2004. Crime data mining. IEEE Computer37 (4): 50–56.

An Overview of Data Mining for Combating Crime 783

8

Dow

nloa

ded

by [

T&

F In

tern

al U

sers

], [

Mr

Susa

n C

ulle

n] a

t 09:

37 2

0 Se

ptem

ber

2012

Chen, H., J. Schroeder, R. Hauck, L. Ridgeway, H. Atabakhsh, H. Gupta, C. Boarman, K. Rasmussen, andA. Clements. 2003. COPLINK connect: Information and knowledge management for lawen-forcement. Decision Support Systems 34 (3): 271–285.

Chen, H., D. Zeng, H. Atabakhsh, W. Wyzga, and J. Schroeder. 2003. COPLINK managing lawenforcement data and knowledge. Communications of the ACM 46 (1): 28–34.

Cortes, C., and V. Vapnik. 1995. Support-vector networks. Machine Learning 20: 273–277.Coull, S., and B. K. Szymanski. 2008. Sequence alignment for masquerade detection. Computational

Statistics and Data Analysis 52 (8): 4116–4131.de Vel, O., A. Anderson, M. Corney, and G. Mohay. 2001. Mining E-mail content for author identifi-

cation forensics. ACM SIGOD Record 30(4): 55–64.Elsayed, T., andD.W.Oard. 2006.Modeling identity in archival collections of email: A preliminary study. Pre-

sented at the Third Conference on Email and Anti-Spam, CEAS 2006, Mountain View, CA, July 27–28.Feng, Y., and W. Chen. 2004. Brain MR image segmentation using fuzzy clustering with spatial con-

straints based on Markov random field theory. In Medical Imaging and Augmented Reality: Proceedingsof the Second International Workshop (MIAR 2004), ed. G.-Z. Yang and T. Jiang, 188–195. Beijing,China, August 19–20, 2004. (Lecture Notes in Computer Science 3150.) Berlin: Springer.

Fitts, P. M., R. E. Jones, and J. L. Milton. 1950. Eye movements of aircraft pilots during instrument land-ing approaches. Aeronautical Engineering Review February 1950:9.

Freeman, L. C. 2007. Social network analysis, 4 vols. London: Sage.Gilbreth, F. B., Gilbreth, L. M. 1917. Applied motion study. New York, NY: Sturgis and Walton.Goldberg, M., M. Hayvanovych, A. Hoonlor, S. Kelley, M. Magdon-Ismail, K. Mertsalov, B. Szymanski,

and W. Wallace. 2008. Discovery, analysis and monitoring of hidden social networks and their evol-ution. IEEE Homeland Security Technologies Conference, 1–6. Boston, May.

Gray, G. L., and R. Debreceny. 2006. Continuous assurance using text mining. Presented at the 12thWorld Continuous Auditing and Reporting Symposium. Posted at http://raw.rutgers.edu/docs/wcars/12wcars/Continuous_Assurance_Text_Mining.pdf

Harper, W. R., and D. H. Harris. 1975. The application of link analysis to police intelligence. HumanFactors 17 (2): 157–164.

Hauck, R. V., H. Atabakhsh, P. Ongvasith, H. Gupta, and H. Chen. 2002. COPLINK concept space: Anapplication for criminal intelligence analysis. IEEE Computer 35 (3): 30–37.

Idika, N., and A. P. Mathur. 2007. A survey of malware detection techniques (Technical Report). Departmentof Computer Science, Purdue University, West Lafayette, Indiana.

Ishikawa, H. 2003. Exact optimization for Markov random fields with convex priors. IEEE Transactions onPattern Analysis and Machine Intelligence (IEEE PAMI) 25 (10): 1333–1336.

Jedrzejek, C., M. Falkowski, and M. Smolenski. 2009. Link analysis of fuel laundering scams and implica-tions of results for scheme understanding and prosecutor strategy. In Proceedings of legal knowledgeand information systems: JURIX 2009, ed. G. Governatori, 79–88. July 25. Amsterdam: IOS Press.

Jin, F., P. Fieguth, and L. Winger. 2005. Image denoising using complex wavelets and Markov prior mod-els. In Image analysis and recognition: Proceedings of the 2nd international conference (ICIAR 2005), ed. M.Kamel and A. Campilho, 73–80. Toronto, Canada, September 28–30. (Lecture Notes in ComputerScience, 3656.) Berlin: Springer.

Kangas, L. J., K. M. Terrones, R. D. Keppel, and R. D. La Moria. 2003. Computer aided tracking andcharacterization of homicides and sexual assaults (CATCH). In Mena (2003), 364–375.

Kaptein, H., H. Prakken, and B. Verheij, eds. 2009. Legal Evidence and Proof: Statistics, Stories, Logic,Applied Legal Philosophy Series. Farnham, England: Ashgate Publishing.

Kato, Z., and T.-C. Pong. 2001. A Markov random field image segmentation model using combinedcolor and texture features. In Computer Analysis of Images and Patterns: Proceedings of the 9th Inter-national Conference (CAIP 2001), ed. W. Skarbek, 547–554. Warsaw, Poland, September 5–7. (LectureNotes in Computer Science, 2124.) Berlin: Springer.

Kephart, J., and W. Arnold. 1994. Automatic extraction of computer virus signatures. Presented at the4th Virus Bulletin International Conference, 178–184.

Kindermann, R., and J. R. Snell. 1980. Markov random fields and their applications, ContemporaryMathematics 1. Providence, Rhode Island: American Mathematical Society.

Kou, Y., C. T. Lu, S. Sirwongwattana, and Y. P. Huang. 2004. Survey of fraud detection techniques. In Pro-ceedings of the 2004 international conference on networking, sensing, and control, 749–754. Taipei, Taiwan.

784 E. Nissan

Dow

nloa

ded

by [

T&

F In

tern

al U

sers

], [

Mr

Susa

n C

ulle

n] a

t 09:

37 2

0 Se

ptem

ber

2012

Kuflik, T. S., E. Nissan, and G. Puni. 1989. Finding excuses with ALIBI: Alternative plans that are deon-tically more defensible. In Proceedings of the international symposium on communication, meaning andknowledge vs. information technology, Lisbon, September 1989. Also in Computers and ArtificialIntelligence, 10 (4): 297–325, and in Information technology and society: Theory, uses, impacts, ed. J. LopesAlves, 484–510. Lisbon : APDC and SPF (1992).

Leary, R. 2012. FLINTS, a tool for police investigation and intelligence analysis. In Nissan (2012).Li, S. Z. 2009. Mathematical MRF models. In Markov Random Field Modeling in Image Analysis, ed. S. Z. Li,

1–28 (Advances in Computer Vision and Pattern Recognition series; originally published in theseries: Computer Science Workbench, 3rd ed.) Tokyo: Springer-Verlag (2001).

MacCrimmon, M., P. Tillers, eds. 2002. The Dynamics of Judicial Proof: Computation, Logic, and CommonSense Studies in Fuzziness and Soft Computing, vol. 94. Heidelberg: Physical-Verlag (of Springer).

Marineau, R. F. 1989. Jacob Levy Moreno, 1889–1974: Father of psychodrama, sociometry, and group psycho-therapy. London: Routledge.

Martino, A. A., and E. Nissan, eds. 2001. Formal approaches to legal evidence. Special issue, ArtificialIntelligence and Law 9 (2–3): 85–224.

Mena, J. 2003. Investigative data mining for security and criminal detection. Boston, MA: Butterworth.Mitchell, H. B. 2010. Markov random fields. Ch. 17, 205–209 In Image fusion: theories, techniques and

applications. H. B. Mitchell. Berlin: Springer. doi:==10.1007=978-3-642-11216-4_17Moreno, J. L. 1953. Who shall survive: Foundations of sociometry, group psychotherapy, and sociodrama. Boston,

MA: Beacon House. (Originally published in 1934 and later in 1953 and 1978.)Neville, J., O. Simsek, D. Jensen, J. Komoroske, K. Palmer, and H. Goldberg. 2005. Using relational

knowledge discovery to prevent securities fraud. In Proceedings of the 11th ACM SIGKDD internationalconference on knowledge discovery and data mining (KDD’05), 449–458. Chicago, Illinois, August 21–24.New York: ACM Press.

Newman, M. E. 2010. Networks: An introduction. Oxford: Oxford University Press.Nissan, E. 2012. Computer applications for handling legal evidence, police investigation, and case argumentation.

Dordrecht, The Netherlands: Springer.Nissan, E., and A. A. Martino, eds. 2001. Software, formal models, and artificial intelligence for legal

evidence. Computing and Informatics (special issue) 20 (6): 509–656.Nissan, E., and A. A. Martino, eds. 2003. Building blocks for an artificial intelligence framework in

the field of legal evidence. Cybernetics and Systems (special issue in two parts) 34 (4=5): 233–411,34 (6=7): 413–583.

Nissan, E., and A. A. Martino, eds. 2004. The construction of judicial proof: A challenge for artificialintelligence modelling. Applied Artificial Intelligence (special issue) 18(3=4): 183–393.

Pandit, S., D. H. Chau, S. Wang, and C. Faloutsos. 2007. NetProbe: A fast and scalable for fraud detec-tion in online auction networks. In WWW 2007: Proceedings of the 16th International Conference onWorld Wide Web, 201–210. Alberta, Canada. New York, NY: ACM.

Phua, C., V. Lee, K. Smith-Miles, and R. Gayler. 2005. A comprehensive survey of data-mining-basedfraud detection research. Clayton School of Information Technology, Monash University, Clayton,Victoria, Australia. http://clifton.phua.googlepages.com=

Schroeder, J., J. Xu, H. Chen, and M. Chau. 2007. Automated criminal link analysis based on domainknowledge. Journal of American Society for Information Science and Technology 58 (6): 842–855.

Schultz, M., E. Eskin, E. Zadok, and S. Stolfo. 2001. Data mining methods for detection of new maliciousexecutables. Presented at the 2001 IEEE Symposium on Security and Privacy, 38–49.

Steinwart, I., and A. Christmann. 2008. Support vector machines. New York, NY: Springer-Verlag.Stranieri, A., and J. Zeleznikow. 2005. Knowledge discovery from legal databases, Springer Law and

Philosophy Library, 69. Dordrecht, The Netherlands: Springer.Szymanski, B., and Y. Zhang. 2004. Recursive data mining for masquerade detection and author identi-

fication. In Proceedings of the 5th IEEE system, man and cybernetics information assurance (SMC IA) work-shop, 424–431. West Point, NY, June.

Tesauro, G., J. Kephart, and G. Sorkin. 1996. Neural networks for computer virus recognition. IEEEExpert 11 (4): 5–6.

Thagard, P. 1989. Explanatory coherence. Behavioural and Brain Sciences 12 (3): 435–467. (Commen-taries and riposte up to 502.)

Weatherford, M. 2002. Mining for fraud. IEEE Intelligent Systems 17:4–6.

An Overview of Data Mining for Combating Crime 785

Dow

nloa

ded

by [

T&

F In

tern

al U

sers

], [

Mr

Susa

n C

ulle

n] a

t 09:

37 2

0 Se

ptem

ber

2012

Wilson, G., and W. Banzhaf. 2009. Discovery of email communication networks from the Enron corpuswith a genetic algorithm using social network analysis. In Proceedings of the 11th conference onevolutionary computation, 3256–3263. Trondheim, Norway, May 18–21.

Xiang, Y., M. Chau, H. Atabakhsh, and H. Chen. 2005. Visualizing criminal relationships: Comparisonsof a hyperbolic tree and a hierarchical list. Decision Support System 41:69–83.

Xu, J. J., and H. Chen. 2004. Fighting organized crimes: using shortest-path algorithms to identifyassociations in criminal networks. Decision Support Systems 38:473–487.

Yedidia, J. S., W. T. Freeman, and Y. Weiss. 2003. Understanding belief propagation and its generaliza-tions. In Exploring Artificial Intelligence in the New Millennium, ed. G. Lakemeyer and B. Nebel, 239–269. San Francisco, CA: Morgan Kaufmann. (Previously, Technical Report TR-2001-22, MitsubishiElectric Research, 2001.)

786 E. Nissan

Dow

nloa

ded

by [

T&

F In

tern

al U

sers

], [

Mr

Susa

n C

ulle

n] a

t 09:

37 2

0 Se

ptem

ber

2012