Complex Rare Category Analysis - Drexel CCI · Complex Rare Category Analysis: Mining Needles in the Haystack Dawei Zhou, Jingrui He {dzhou23, jingrui.he}@asu.edu School of Computing,

Complex Rare Category Analysis:Mining Needles in the Haystack

Dawei Zhou, Jingrui He

{dzhou23, jingrui.he}@asu.edu

School of Computing, Informatics, Decision Systems Engineering

Arizona State University

Needles in the Haystack

• Chen, C.. "CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature." Journal of the Association for Information Science and Technology 57.3

(2006): 359-377.

• Spaaij, R.. "The enigma of lone wolf terrorism: An assessment." Studies in Conflict & Terrorism 33.9 (2010): 854-870.

• Ngai, E. W., Hu, Y., Wong, Y. H., Chen, Y., & Sun, X. (2011). The application of data mining techniques in financial fraud detection: A classification framework and an academic review of

literature. Decision support systems, 50(3), 559-569.

1. Insider Threat 2. Money Laundering 3. Lone Wolf Terrorism

4. Gene Disease 5. Identity Theft 6. Emerging Trend

An Example of Needles: Malicious Insiders

Malicious InsiderDefinitionCurrent or former employee or

contractor who

➢ intentionally exceeded or

misused an authorized level

of network, system or data

access in such a way that,

➢ affected the security of the

organizations’ data, systems,

or daily business operations[Cappelli et al, RSA2008]

DARPA-BAA-10-84, DARPA-BAA-11-04, DARPA-BAA-11-64, IARPA-BAA-12-01, …

Malicious Insiders: Fort Hood Shooting

webvisits

‘Obvious’ Clues:

Current

Post Inv.

…

Q: Can we reverse it?

@

Radical

http

Blog

G2. Connect the dots

?

Past

G1. Find individual traces G3. Prevent the tragedy

shooting

Rare Category Analysis

• Problem DefinitionRare category analysis (RCA) refers to the problem of representing, identifying, characterizing and tracking rare examples from underrepresented minority classes in an imbalanced data set.

− Imbalanced data setQueen: 10Overall: 5000

− Minority classes are not

separable from majority

class

• An Illustration Example

Research Questions: Challenge of Rarity

Highly-skewed distribution

➢ 0.1% of any given population of

users is malicious

➢ 99.9% of normal users

[DARPA-BAA-11-04]

Non-separability nature

➢Malicious users often camouflage

their synthetic identity

➢ Malicious users try to bypass the

fraud detection systems[DARPA-MAA-18-09]

Commonality of insiders

➢ Detecting insiders of each type

➢ Identifying relevant features

➢ Characterizing each type of insidersQ1: How to detect and characterize

insiders of each type, using as little

cost as possible?

Research Questions: Challenge of Dynamics

Extreme scarcity of insiders

➢ 0.1% of any given population of

users is malicious

➢ 20% of malicious users are

active on any given day[DARPA-BAA-11-04]

Costly access to oracle

➢65K personnel at Fort Hood

4.7B emails in 2 years➢~60 initial activity reviews per

day per operator[DARPA-BAA-11-04]

Fine-grained dynamics

Q2: How to identify and track

insiders of each type over time?

Research Questions: Challenge of Heterogeneity

Various information sources Various data types

Various types of insiders

Q3: How to detect, characterize, and

tracking more insiders with high

accuracy, using heterogeneous data?

blackmailed

psychological

vulnerability greedy

Attributed

Data

Sequential

Data

Network

Data

Why do We Care?

Terrorist Incidents Worldwide Financial Fraud [Federal Trade Commission, 2017]

PR Crisis Cybercrime [M., McGuire, RSA2018]

US

Allies

Illegal online marketsTrade secret, IP theftData Trading

Cost over 1.5 trillion in 2018

Rare Category Analysis(RCA)

Roadmap

• Part I: Rare Category Analysis for Static Attributed Data

• Part II: Rare Category Analysis for Temporal Sequential Data

• Part III: Rare Category Analysis for Network Data

• Part IV: Heterogenous Rare Category Analysis

• Part V: Challenges & Future Directions

Part I: RCA for Static Attributed Data

Taxonomy

Unsupervised Rare Category Analysis Semi-supervised Rare Category Analysis

Feature Selection

• Active learning based• Nearest neighbor based

Rare Category Analysis for Attributed Data

• Feature selection• Instance selection

Rare Category Detection

• Comparison with imbalanced classification• Comparison with outlier/anomaly detection• Rare category characterization

Active Learning for Rare Category Detection

• Problem• Given: (i) an unlabeled noisy set; (ii) a small budget of label querying

from domain expert. • Find: identify useful anomalies (rare category examples).

• Challenges• High-skewed data distribution• No training data

• D. Pelleg, A. W. Moore: Active Learning for Anomaly and Rare-Category Detection. NIPS 2004.

Diffraction Spikes Satellite Trails

Hint Selection Methods

• Proposed Algorithm• Step 1: Start with entirely unlabeled data.• Step 2: Perform semi-supervised learning (which, on the first

iteration degenerates to unsupervised learning).• Step 3: Ask an expert to classify the anomaly patterns.• Step 4: Go to Step 2.

Step 1 Step 2

Step 3

Step 4

Nearest-neighbor-based methods

• Problem• Given: a set of unlabeled examples 𝑆 = 𝑥1, … , 𝑥𝑛 , 𝑥𝑖 ∈ 𝑅𝑑, which

come from 𝑚 distinct classes.• Find: at least one example from each class.

• Intuition: select examples according to the change in local density

-6 -4 -2 0 2 4 60

0.05

0.1

0.15

0.2

0.25

is small

is large

Q: How to measure ? df x

dx

x

f(x)

• J. He, and J. Carbonell. Nearest-Neighbor-Based Active Learning for Rare Category Detection. NIPS 2007.

Intuition: How to Measure 𝑓(𝑥)?

More Purple

Higher Density

More Points in Purple Ball

Intuition: How to Measure 𝑑𝑓(𝑥)/𝑑𝑥?

Big

Small

Bigger Difference in

Color of Purple Balls

Bigger Change

in Density

dxxdf

dxxdf

NNDB Algorithm for the Binary Case

1. Calculate class-specific radius r

2. , ,

3.

4. Query

5. Rare class?

Increase t by 1

6. Output x

No

Yes

NNDB Algorithm at a Glance

Synthetic Data Sets

Majority class: 1000

Minority class: 10

Random: 101

NNDB: 3

Majority class: 3000

Smallest minority class: 79

Random: 83

NNDM: 5

-3 -2 -1 0 1 2 3 4

-1

0

1

2

3

4

5

Taxonomy


Feature Selection






Feature Selection for Rare Category Analysis

• Problem• Given: a set of unlabeled examples 𝑆 = 𝑥1, … , 𝑥𝑛 , 𝑥𝑖 ∈ 𝑅𝑑.• Find: a set of 𝑑𝑟 features, which are relevant to rare categories.

• J. He, and J. Carbonell. Co-Selection of Features and Instances for Unsupervised Rare Category Analysis. SDM 2010

Input Data

Selected Features


Output Results


• Challenges• Data exhibit highly-skewed distribution• Label information is costly and difficult to obtain for rare examples• Conventional feature selection algorithms can not select the

relevant features for rare category analysis due to it’s highly-skewed nature

Input Data

Selected Features


Output Results


• Intuition• Rare category selection: select a set of examples which are likely

to come from the minority class• Feature selection: identify the features relevant to the minority

class• Jointly dealing with the two tasks benefits both of them

Input Data

Selected Features


Selected Rare Examples

Output Results

PALM Algorithm

• Notation• Unlabeled examples: 𝐷 = 𝑥1, … , 𝑥𝑛 , 𝑥𝑖 ∈ 𝑅𝑑

• Class labels: 𝑦𝑖 ∈ {1, 2}

• Majority class: 𝑦𝑖 = 1, prior 1 − 𝑝

• Minority class: 𝑦𝑖 = 2, prior 𝑝• Relevant subspace of the minority class:

• 𝑑𝑟 features relevant to the minority class• Similar values on the 𝑑𝑟 features• Diverse values on the remaining features

PALM Algorithm

• Optimization Problem

• Constraints• There are np points from the minority class

• There are dr relevant features

Binary minority class indicator vector

Binary relevant feature indicator vector

The ith element of a

The jth element of b

Synthetic Dataset

• Majority class: 1000

• Minority class: 10

• Relevant features:• X and Y

• Using proposed PALM• Feature selection

• bx=by=1

• bz=0

• Rare category selection• Precision: 90%

Taxonomy


Feature Selection






Semi-supervised Rare Category Analysis

• Problem Semi-supervised rare category analysis refers to the problem of detect and cluster rare examples from underrepresented minority classes, given one/few-shot labeled example.

• A Pipeline

Input Data

Full No/Partial

Feature

Graph

Full No/Partial

Feature

Graph GRADE[ICDM2008]

PriorData

Type

GRADE-LI[ICDM2008]

NNDB

ALICE/MALICE[NIPS2007, ISAIM2008]

SEDER[SDM2009]

Rare Category Detection [SDM2009, ICDM2008,NIPS2007]

Rare Category Characterization[FCS2012, ICDM2010, KDD2018]

Comparison with Imbalanced Classification

• Imbalanced Classification• Problem

• Data exhibit imbalanced class-membership distribution• Labeled example from all the classes• Models for imbalanced classification focus on overall accuracy of each class

• Methodology: [Kubat & Matwin, ICML1997]; [Chawla et al, JAIR2002]; [Wu & Chang, ICML2003]; [Huang et al., CVPR2016]; …

• Semi-supervised Rare Category Analysis• Problem

• Data exhibit highly-skewed class-membership distribution.• One/few-shot labeled example from the rare categories.• Models for rare category analysis put heavy emphasis on learning minority classes

with a good performance.

• Methodology: [Fine & Mansour, COLT2006]; [Dasgupta & Hsu, ICML2008]; [Vatturi& Wong, KDD2009]; [Zhou et al., KDD2018]; …

Comparison with Outlier/Anomaly Detection

• Outlier vs. Rare Class• [Pelleg & Moore, NIPS2005]

“ Most of the objects (i.e., majority classes) are well explained by current theories and …. remainder are anomalies, but 99% of these anomalies are uninteresting, and only 1% of them (i.e., rare categories) are useful … the rest type of anomalies, called boring anomalies (i.e., outliers)”

• Anomalies are the instances that does not fit the distribution of the majority classes. • Outliers are typically the single points, separable from normal examples and are

scattered over the space. • Rare categories assumes the minority classes are compact in the feature space and

may overlap with the majority class.

Input Data

Anomalies

Outlier

Rare Useful, Compact

Uninterested, Scattered

Rare Category Characterization

• Problem• Given: a few labeled examples from both classes• Find: a set of unlabeled examples which are likely to come from the

minority class

• Binary case• One majority class, one minority class• Can be extended to multiple rare categories

• J. He, H. Tong, and J. Carbonell. Rare Category Characterization. ICDM 2010.

RACH Algorithm

• Pre-processing: Filtering• Assumption

• Rare examples enclosed by a minimum-radius hyperball

• Intuition• Safely discard unlabeled examples far away from the minority class

Labeled minorityclass examples

One-class SVM

Unlabeledexamples

Center &radius

Filtered unlabeledexamples

RACH Algorithm

• FormulationRadius

Center

Slack variablefor labeled

examples frommajority class

Slack variable forunlabeled examples

Parameters

RACH Algorithm

• Intuitions

Intuition: find the smallest hyperball that covers the denser support region of the minority class

MOST labeled examples from themajority class outside the hyperball

ALL labeled examples from theminority class inside the hyperball

MANY unlabeled examples enclosed by the hyperball

Empirical Studies: 20 Newsgroups

F-S

core

Label Percentage

RACH

KNN

Manifold-Ranking

Under-Sampling

SVM-Perf

TSVM

Empirical Studies

Label Percentage

20 NewsgroupsLabel Percentage

Shuttle

Label Percentage

AbaloneLabel Percentage

Glass

F-s

co

re

F-s

co

reF

-sco

re

F-s

co

re

Part II: RCA for Temporal Sequential Data

Application 1: Health Condition Monitoring

• Heart Disease• Inseparable from normal patterns• High skewed • Multiple sources

Body Sensor Network

Segment-level:irregular pulse

Sequence-level:

abnormal bio-signals

Application 2: Synthetic ID Detection

• Synthetic ID• Combining real and fake identifying information• Extremely imbalanced distributions• Real time transaction

Segment-level

Sequence-level

Application 3: Insider Trading Detection

Security Boundary

Sequence-level:Insider

Segment-level:Insider Trading

Challenges on Temporal Data

• Heterogeneity • Different length of temporal sequences

• Different types of temporal patterns

• Hierarchy• RCA among multiple time series

• RCA within a given time series

• Sparsity• Sequence-level targets are rare

• Segments-level targets are rare

Taxonomy

Segment-level RCA Bi-level RCA

Subsequences

• Profile similarity based• Deviants based

Rare Category Analysis for Temporal Sequential Data

• Regular subsequence• Irregular subsequence

Points

Time-series-level RCA

• Unsupervised approached• Supervised approached

Rare Category of Points

• Problem• Given: a time series 𝒙 = (𝑥1, 𝑥2, … , 𝑥𝑇).• Find: the rare categories of points in 𝒙.

• Approaches• Profile-similarity-based methods[Williams et al., IPDPS2007]

• Deviants-based methods [Jagadish et al., VLDB1999; Muthukrishnan et al., SSDBM2004]

Abnormal points = Rare Examples

Profile-Similarity-Based Methods

• Intuition• Maintain a profile for the normal patterns (majority classes)• Identify abnormal patterns (minority classes) by comparing new

time points against the profile.

• Tiresias system• Target multi-variable OS performance metric time series• Maintain a normal profile and the variance vector • Any new data points is compared both with the normal profile and

the variance vector to compute its anomaly score

• A. W. Williams, S. M. Pertet, and P. Narasimhan, “Tiresias: Black-box Failure Prediction in Distributed Systems,” in Proc. of the 21st Intl. Parallel and Distributed Processing Symposium

(IPDPS), 2007, pp. 1–8.

Deviant-Based Methods

• Intuition• Find points in a given time series whose

removal from series results in a histogram representation with a lower error bound than the original.

• Approaches• Find the optimal set of 𝑘 deviants always

consists of the 𝑙 highest and remaining 𝑘 − 𝑙lowest values for some 𝑙 <= 𝑘.

• Propose a dynamic programming based solution that maintains a partial solution only for a few interspersed indexes of the time series rather than for each value

Point 11 and 4 are the rare examples

• S. Muthukrishnan, R. Shah, and J. Vitter, “Mining Deviants in Time Series Data Streams,” in Proc. of the 16th Intl. Conf. on Scientific and Statistical Database Management (SSDBM), Jun

2004, pp. 41–50.

Taxonomy


Subsequences




Points



Rare Category of Subsequences

• Problem• Given: a time series 𝒙 = (𝑥1, 𝑥2, … , 𝑥𝑇).• Find: the rare categories of subsequences in 𝒙.

• Approaches• Regular subsequences

• Discord Discovery [Keogh et al., ICDM2005; Yeh et al., ICDM2016]

• Signal-transformation-based methods [Bu et al., SDM2007]

• Matrix-profile-based methods [Zhu et al., ICDM2018; Linaridi et al., SIGMOD2018]

• Irregular subsequences

Abnormal subsequences = Rare Examples

Discord Discovery

• Discord Definition Given a time series 𝑇, the subsequence 𝐷 of length 𝑛 beginning at position 𝑙 is said to be the discord of 𝑇 if 𝐷 has the largest distance to its nearest non-overlapping match.

• Approaches • The brute force solution is to consider all possible subsequences 𝑠 ∈ 𝑆 of length 𝑛 in 𝑇 and compute the distance of each such 𝑠 with each other non-overlapping 𝑠′ ∈ S.

• The top-K pruning reordering the subsequences to make the computation more efficient with various heuristics, such as symbolic aggregate approximation (SAX) [Keogh et al., ICDM2005] and locality sensitive hashing [Wei et al., ICDM2006].

• E. Keogh, J. Lin and A. Fu (2005). HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence. In Proc. of the 5th IEEE International Conference on Data Mining (ICDM 2005),

pp. 226 - 233., Houston, Texas, Nov 27-30, 2005.

Signal-Transformation-Based Methods

• Haar Wavelet Transform• Provides better pruning power• Can dynamically determine the word size (unlike SAX)

• Approaches [Bu et al., SDM2007]

• Haar wavelet transform• Normalize Haar wavelet coefficients• Identify cutpoints by treating this distribution as Gaussian such that area

between any 2 cutpoints is same• Map Haar coefficients to symbols based on cutpoints• Perform appropriate ordering of subsequences to use as candidates and

compare

Top-3 discords in power consumption history of a Dutch

research facility in year 1997

• Y. Bu, O. T.-W. Leung, A. W.-C. Fu, E. J. Keogh, J. Pei, and S. Meshkin, “WAT: Finding Top-K Discords in Time Series Database,” in Proc. of the 7th SIAM Intl. Conf. on Data Mining (SDM),

2007, pp. 449–454.

Matrix-Profile-Based Methods

• What is the Matrix Profile?• The Matrix Profile (MP) is a data structure that annotates a time

series.• The matrix profile at the 𝑖𝑡ℎ location records the distance of the

subsequence 𝑖 to its 𝑘𝑡ℎ-nearest neighbor.

• Illustration Examples

Time Series 𝑇

Matrix Profile

the subsequence starting at 921 happens to have a distance of 177.0 to its nearest neighbor (wherever it is).

• Mueen, A., Keogh, E. , Time Series data Mining Using the Matrix Profile: A Unifying View of Motif Discovery, Anomaly Detection, Segmentation, Classification, Clustering and Similarity Joins,

KDD tutorial 2017

• Majority classes• Relative low Matrix Profile value• Have relative similar subsequence through the data

• Minority classes• Relative high Matrix Profile value• Unique in the shape (such areas are “discords” or rare category)


Rare Categories

• Electrocardiogram • MIT-BIH Long-Term ECG Database• Two anomalies annotated by MIT cardiologists• Relative high MP values for ectopic beats


Irregular Subsequences

• Problem• The time series 𝑇 = { 𝑥1, 𝑡1 , 𝑥2, 𝑡2 , … , (𝑥𝑛, 𝑡𝑛)} is not collected with an

equal sampling rate, • Given a time series 𝑇, 𝑡𝑖+1 − 𝑡𝑖 is not all the same for all 𝑖 ∈ {1, 2, … , 𝑛}.• The pattern are defined as a subsequences of a set of consecutive points 𝑝 = 𝑥𝑖 , 𝑡𝑖 , … , 𝑥𝑖+𝑇 , 𝑡𝑖+𝑇 .

• Solution• [Chen and Zhan, JCAM2008] propose a outlier detection methods for unequal

interval time series.• Subsequences are abnormal if there are very few other patterns with the

same slope 𝑥𝑖+𝑇−𝑥𝑖

𝑡𝑖+𝑇−𝑡𝑖and the same lengths.

• To identify anomalies at multiple resolutions, Haar transform is used.

• X. Chen and Y. Zhan, “Multi-scale Anomaly Detection Algorithm based on Infrequent Pattern of Time Series,” Journal of Computational and Applied Mathematics, vol. 214, no. 1, pp. 227–

237, Apr 2008.

Taxonomy


Subsequences




Points



Time-Series-Level Rare Category Analysis

• ProblemGiven： a time series database.Find: all the rare categories of time series.

• Illustration Example

An abnormal signal = A Rare Example

Nearest-Neighbor Based Methods

• Define a similarity function to compare two time series• nLCS: length of the longest common subsequence• DTW: dynamic time warping• …

• Compute the clustering spaces • K-medoids • K-means/ Phased K-means• Single-linkage clustering

• Calculate the likelihood score for rare categories• KNN-based methods [He and Carbonell, NIPS2007; Chandola et al., 2008]

• Sampling-based methods [Dasgupta and Hsu, ICML2008]

• Mean-shift-based methods [Vatturi et al., KDD2009]

Window-Based Detection Methods

• Advantage: Better localization of anomalies compared to techniques that compute time series outlier score directly

• Disadvantage: New parameter -- window length parameter

Supervised Approaches

• Subsequences of positive and negative strings of behavior as features with

• String matching classifier [Cabrera et al., 2001; González and Dasgupta, 2003]

• Neural networks [Dasgupta and Nino, 2000; Endler, 1998; Gosh et al., 1998; Ghosh et al., 1999a; Ghosh and Schwartzbard, 1999]

• Elman network [Ghosh et al., 1999a]

• Bag of system calls features with decision tree, Naive Bayes, SVMs [Kang et al., 2005]

• Sliding window subsequences with• SVMs [Tian et al., 2007; Wang et al., 2006]

• Rule based classifiers (Classification using Hierarchical Prediction Rules (CHIP)) [Li et al., 2007]

• HMMs [Gao et al., 2002]

Taxonomy


Subsequences




Points



Bi-Level Rare Category Analysis

• Problem• Given: (i) a time series database: 𝑆={𝒙^((𝟏) ),…, 𝒙^((𝑴) )}, (ii) Prior: 𝑃 .

• Find: (i) sequence-level predicted label: {𝑌^((1) ),…, 𝑌^((𝑀) )}, (ii) Segments-level predicted label: {𝒚^((𝟏) ),…, 𝒚^((𝑴) )}.

• Dawei Zhou, Jingrui He, Yu Cao, Jae-sun Seo. Bi-level Rare Temporal Pattern Detection, IEEE International Conference on Data Mining (ICDM-2016), December 2016

BIRAD Algorithm

• Modeling Abnormal Temporal Sequence

Initial Probability

Pr(y1m|Y m = 1)

Emission Probability

Pr(𝑥𝑖𝑚|𝑦𝑖

𝑚, 𝑌 𝑚 = 1)

Transition Probability

Pr(yjm|yj−1

m, Y m = 1)


Initial Probability Transition Probability

Hidden Markov Assumptions

BIRAD Algorithm

• Modeling Normal Temporal Sequence


Pr(𝑥𝑖𝑚|𝑌 𝑚 = 0)

Assumption: All temporal segments are normal within normal temporal sequence. Pr(yj+1

m|yj

m, Y m = 0) = 1

BIRAD Algorithm

• Over-all Objective Function

Assumption: Only abnormal

sequence (𝑌(𝑚) = 1) contains

abnormal segments (𝑦𝑖(𝑚)

= 1)

Probability of 𝒙 𝒎 is

normal (𝑌 𝑚 = 0)

Probability of 𝒙 𝒎 is

abnormal (𝑌 𝑚 = 1)

BIRAD – iteration 0

Abnormal Abnormal

Abnormal Abnormal

Model Initialization: randomly select abnormal patterns


Abnormal Abnormal

Abnormal Abnormal

Model Updating: 𝜇𝑛 = 9.2, 𝜎𝑛 = 0.8; 𝜇𝑎 = 9.5, 𝜎𝑎 = 0.5


Abnormal normal

normal Abnormal

Model Updating: 𝜇𝑛 = 9.25, 𝜎𝑛 = 0.6; 𝜇𝑎 = 9.66, 𝜎𝑎 = 0.1


Abnormal normal

normal normal

Model Updating: 𝜇𝑛 = 9.25, 𝜎𝑛 = 0.6; 𝜇𝑎 = 11, 𝜎𝑎 = 0.8


Abnormal normal

normal normal

Model Updating: 𝜇𝑛 = 9.25, 𝜎𝑛 = 0.6; 𝜇𝑎 = 12, 𝜎𝑎 = 0.9


Abnormal normal

normal normal

Model Updating: 𝜇𝑛 = 9.25, 𝜎𝑛 = 0.6; 𝜇𝑎 = 12.5, 𝜎𝑎 = 1.3 Converge !

Part 3: Rare Category Analysis for Network Data

Applications

Computer Network Online Transaction NetworkSocial Network

Emerging Trends Money Laundering Network Intrusion

Taxonomy

Static Graph Dynamic Graph

Plain Graph Attributed Graph

Low-order structureHigh-order structure

Rare Category Analysis for Network Data

Community basedEmbedding based

Homogeneous networkHeterogeneous network

Rare Category in Weighted Graphs

• Problem• Q1: Given a weighted and

unlabeled graph, how can we spot strange, abnormal, extreme nodes?

• Q2: Can we explain why the spotted nodes are anomalous (rare examples) ?

• L. Akoglu, M. McGlohon, C. Faloutsos. OddBall: Spotting Anomalies in Weighted Graphs. PAKDD, 2010.

OddBall Algorithm

• Feature extraction• 𝑁𝑖: number of neighbors (degree) of egonet 𝑖

• 𝐸𝑖: number of edges in egonet 𝑖

• 𝑊𝑖: total weight of egonet 𝑖

• 𝜆𝑊,𝑖: principal eigenvalue of the weighted adjacency matrix of egonet 𝑖

OddBall ALgorithm

• Proposed method• For each node,

• Extract “ego-net” (=1-step neighborhood)• Extract features (#edges, total weight, etc.)

• features that could yield “laws”

• features fast to compute and interpret

• Detect regular patterns• Examples from majority classes• Compute the distribution of regular patterns

• Detect irregular patterns• Examples from minority classes• Compute the anomaly score = distance to fitting line

uniform, robot-like behavior

Rare Category in Bipartite Graphs

• Problem• Q1. Neighborhood formation (NF)

• Given a query node 𝑞 in 𝑽𝟏, what are the relevance scores of all the nodes in 𝑽𝟏 to 𝑞 ?

• EX: Similar authors in publication networks

• Q2. Anomaly detection (AD)• Given a query node 𝑞 in 𝑽𝟏, what are the normality

scores for nodes in 𝑽𝟐 that link to 𝑞 ?• EX: Unusual papers in publication networks

• J. Sun, H. Qu, D. Chakrabarti, and C. Faloutsos. Neighborhood formation and anomaly detection in bipartite graphs. ICDM, 2005.

Neighborhood Formation

• Main Idea• Conduct Random-Walk-with-Restart from 𝑞

• Compute steady-state 𝑉1as neighborhood relevance

• Construct transition matrix 𝑃• Fly-back probability 𝑐 to 𝑞

• Solve for steady state𝑣(𝑡+1) = 𝑃𝑣 𝑡 + 𝑐𝑞

Anomaly Detection

• Main Idea• Pairwise “normality” scores of neighbors 𝑡

• Function of (e.g., 𝑎𝑣𝑔) pair-wise scores• Find set S of nodes connected to 𝑡

• Compute |𝑆|𝑥|𝑆| normality matrix 𝑅• Asymmetric, diagonal reset to 0

• Apply score function 𝑓(𝑅)• EX: 𝑓(𝑅) = 𝑚𝑒𝑎𝑛(𝑅)

Reveal Rare Category via NNrMF

• Matrix Tool for Finding Graph Patterns

Graph Adj. Matrix A A = F x G + R

Low-rank matrices Residual matrix

• H. Tong, C. Lin: Non-Negative Residual Matrix Factorization with Application to Graph Anomaly Detection. SDM, pages 143-153, 2011.

Reveal Rare Category via NNrMF

• Matrix Tool for Finding Graph Patterns

Graph Adj. Matrix A A = F x G + R

Low-rank matrices Residual matrix

community anomalies

An Illustrative Example

Improve Interpretation by Non-negativity

• A Typical Procedure:

• An Example

Interpretation by Non-negativity

GraphAdjacencyMatrix A

A = F x G + R

community

anomalies

Non-negative Matrix Factorization

F >= 0; G >= 0(for community detection)

Non-negative Residual Matrix Factorization

R(i,j) >= 0; for A(i,j) > 0(for anomaly detection)

This Paper

Optimization Formulation

Q: How to find ‘optimal’ F and G? • D1: Quality C1: non-convexity of opt. objective• D2: Scalability C2: large size of the graph

Non-negative residual

Weighted Frobenius Form

WeightCommon in Any Matrix Factorization

Rare Category with High-Order Structures

• Problem• Given: Graph 𝐺=(𝑉, 𝐸), user-defined structure 𝑁.• Find: Find a structure-rich dense subgraph that largely preserves

the user-defined structures.

2

3

1

User-define structure

3-node line

1 2

Dense Subgraph with rich 3-node lines

3𝑮

• D. Zhou, S. Zhang, M. Y. Yildirim, S. Alcorn, H. Tong, H. Davulcu, J. He: A Local Algorithm for Structure-Preserving Graph Cut. KDD 2017: 655-664

High-Order Structures in Real Applications

High Order Conductance

• DefinitionFor any cluster 𝐶 in graph 𝐺 and the 𝑘𝑡ℎ-order structure , the 𝑘𝑡ℎ-order conductance Φ(𝐶,𝑁) is defined as

The number of network structures broken due to the partition of 𝐺 into 𝐶 and ҧ𝐶

Φ 𝐶,𝑁 =𝑐𝑢𝑡 𝐶, 𝑁

min 𝜇 𝐶,𝑁 , 𝜇( ҧ𝐶, 𝑁)

The number of network structures in 𝐶.

The number of network structures in ҧ𝐶.

High Order Conductance

Cut 𝐶 Φ 𝐶,𝑁 =𝑐𝑢𝑡 𝐶, 𝑁

min 𝜇 𝐶,𝑁 , 𝜇( ҧ𝐶, 𝑁)

Graph 𝐺

➢ 𝟐𝒏𝒅-order conductance

Φ 𝐶,𝑁 =1

min{4,11}= 1/4

➢ 𝟑𝒓𝒅-order conductance

Φ 𝐶,𝑁 =2

min{3,34}=2

3

HOSPLOC Algorithm

• Construct Adjacency TensorGiven a graph 𝐺 = (𝑉, 𝐸), the 𝑘𝑡ℎ-order network structure N on 𝐺 could be represented in a 𝑘-dimensional adjacency tensor 𝑇 as follows

For the set of nodes {2, 4, 1}𝑇 1,4,2 = 0

For the set of nodes {6, 8, 10}𝑇 10,8,6 = 1𝑮: 𝑵:

Example:

HOSPLOC Algorithm

• Compute Transition TensorGiven a graph 𝐺 = 𝑉, 𝐸 and the adjacency tensor 𝑇 for the 𝑘𝑡ℎ-order network structure N, the corresponding transition tensor 𝑃 could be computed as

For the set of nodes {2, 4, 1}𝑃 1,4,2 = 0

For the set of nodes {6, 8, 10}𝑃 10,8,6 = 1/3𝑮: 𝑵:

Example:

HOSPLOC Algorithm

• Using the “rank-1” approximation [Li and NG, 2013], the high-order random walks can formulated as

𝑞 𝑡 = 𝑃𝑞(𝑡−1)…𝑞(𝑡−𝑘+1)

• Vector based graph cut• Locally conduct high-order random walks to explore 𝑁.• Compute the permutation 𝜋 of the returned HRW distribution 𝑞 such

that:

• Iteratively check the potential cuts 𝐶_1, 𝐶_2, …,𝐶_(𝑛−1), where 𝐶_𝑖={𝜋(1), …,𝜋(𝑖)}.

Hierarchical Rare Category Detection

• Problem• Given: Adjacency matrix 𝑨, missing edge penalty 𝑝, number of

hierarchies 𝐾;.density increase ratio 𝜂.• Find: Subgraph node indicator vectors 𝒙1, 𝒙2, … , 𝒙𝐾.

Edge

Den

sity

𝑑o

ver

Laye

rs

• S. Zhang, Dawei Zhou, M. Y. Yildirim, S. Alcorn, J. He, H. Davulcu, H. Tong. HiDDen: Hierarchical Dense Subgraph Detection with Application to Financial Fraud Detection.

HiDDen Algorithm

• Density Measure• Intuition:

• #1: Maximize the number of existing edges• #2: Minimize the penalty of the missing edges

• Mathematical Details:

• Correctness:• Equivalent to edge surplus density w.r.t quasi-clique

• Relaxation:

max𝒙

𝐽 𝒙 = 𝒙𝑇𝑨𝒙 − 𝑝𝒙𝑇 𝟏𝑛×𝑛 − 𝑰 − 𝑨 𝒙

𝑠. 𝑡 𝒙 ∈ 0,1 𝑛

Intuition #1 Intuition #2

𝒙 ∈ 0,1 𝑛 𝟎 ≤ 𝒙 ≤ 𝟏

HiDDen Algorithm

• Constraints for Hierarchies• Constraints:

• #1 – Density variety: densities in two hierarchies exhibit a difference• #2 – Nested node set: larger subgraphs contain smaller subgraphs

• Mathematical Details:• Density variety:

• Nested node set:

𝒙𝑘𝑇𝑨𝒙𝑘

𝒙𝑘 𝑇 𝟏𝑛×𝑛 − 𝑰 𝒙𝑘≥ 𝜂

𝒙𝑘−1𝑇𝑨𝒙𝑘−1

𝒙𝑘−1 𝑇 𝟏𝑛×𝑛 − 𝑰 𝒙𝑘−1

Example: 𝑑3 ≥ 1.1 × 𝑑2

𝑉𝑘+1 ⊆ 𝑉𝑘 ⊆ 𝑉𝑘−1 𝒙𝑘+1 ≤ 𝒙𝑘 ≤ 𝒙𝑘−1

Example: 𝑉3 ⊆ 𝑉2 ⊆ 𝑉1 ⊆ 𝑉

HiDDen Algorithm

• Objective function:

• Observation: a non-convex quadratic constrained quadratic programming problem (QCQP)

• Optimization: alternative projected gradient descent method

max𝒙1,𝒙2,…,𝒙𝐾

𝑘=1

𝐾

𝒙𝑘𝑇

1 + 𝑝 𝑨 − 𝑝 𝟏𝑛×𝑛 − 𝑰 𝒙𝑘

𝑠. 𝑡𝒙𝑗

𝑇𝑨𝒙𝑗

𝒙𝑗 𝑇 𝟏𝑛×𝑛 − 𝑰 𝒙𝑗≥ 𝜂

𝒙𝑗−1𝑇𝑨𝒙𝑗−1

𝒙𝑗−1 𝑇 𝟏𝑛×𝑛 − 𝑰 𝒙𝑗−1

𝒙𝑗+1 ≤ 𝒙𝑗 ≤ 𝒙𝑗−1

∀ 𝑗 = 1, 2, … , 𝐾

edge surplus in 𝑘𝑡ℎ hierarchy

density variety

nested node set

Taxonomy







Community Outliers for Attributed Graph

• Problem• Given: An attributed graph 𝐺 = (𝑉, 𝐸, 𝑋), where 𝑉 presents the set

of nodes, 𝐸 represent the set of edges and 𝑋 represent the node features.

• Find: Objects (Rare Example) with features deviating from other community members

• J. Gao, F. Liang, W. Fan, C. Wang, Y. Sun, J. Han: On community outliers and their efficient detection in information networks. KDD 2010.

An Unified Probabilistic Model

Formulation

• Maximize the likelihood of data distribution 𝑃(𝑋)• 𝑃(𝑋) ∝ 𝑃 𝑋 𝑍 𝑃(𝑍)

• 𝑃(𝑋) depends on community label and model parameter 𝑠.

• 𝑃(𝑍) is higher if neighboring nodes from normal communities share the same community label

• e.g., two linked nodes are likely to be in the same community

Majority Class

Minority Class

Activate function

Algorithm

Rare Category Oriented Network Embedding

• Problem• Given: An attributed network 𝐺 = (𝑉, 𝐸, 𝑋), one-shot or few-shot

labeled rare examples L = {𝑥1, … , 𝑥𝐿}, and the desired embedding dimension.

• Find: A rare category oriented network embedding 𝐸 ∈ 𝑅𝑛×𝑑, and a list of predicted rare category examples.

• Challenges• C1: Rarity • C2: Sparsity of labeled examples• C3: Non-separability of rare example from the majority classes

• D. Zhou, J. He, H. Yang, W. Fan. SPARC: Self-Paced Network Representation for Few-Shot Rare Category Characterization, ACM: SIGKDD Conference on Knowledge Discovery and Data

Mining (KDD-2018), August 2018.

Network Layout of Pubmed

SPARK Algorithm

• Rare Category Characterization (RCC)• Cost-sensitive learning (Address C1)

• Focus on rare category examples• Learn from the highly-skewed distribution

𝐿𝑠 =

𝑙=1

𝐿

𝑐𝑦𝑖, ො𝑦𝑖 log Pr(ෝ𝑦𝑖 = 1 − 𝑦𝑖|𝑥𝑖 , 𝑒𝑖)

• Self-paced learning (Address C2)• Start from a handful labeled example• Gradually explore more via label propagation

𝐿𝑅𝐶𝐶 = 𝐿𝑠 −

𝑖=1

𝐿+𝑈

𝑣𝑖1log Pr ෝ𝑦𝑖 = 1 𝑥𝑖 , 𝑒𝑖 −

𝑖=1

𝐿+𝑈

𝜆 1 𝑣𝑖1

Penalize the error of classify minority class examples into majority classes

Self-paced Regularizers

Self-paced Vector

SPARC Algorithm

• Rare Category Oriented Network Embedding (RCE) • Minimize the cross entropy loss of predicting context pairs 𝑖, 𝑐

• Rare category oriented context sampling (Address C3)• Indicator vector 𝐼 from RCC.• With probability 𝑝, extract general network

context.• With probability 1 − 𝑝, extract rare category

oriented network context starting from the non-zero elements in 𝐼.

SPARC Algorithm

• Proposed Framework

Impact of Self-Paced Learning

2-D t-SNE visualization

Taxonomy







Rare Category Detection on Time-evolving Graphs

• Problem• Given: Time-evolving graphs ෨𝐺 = {𝐺1, 𝐺2, … , 𝐺𝑇}

• Find: At lease one example from each minority classes

• Challenges• Graphs are evolving over time

• New nodes/edges show up/die out• Edge weights change

• High Computation cost• Space complexity• Time complexity

• D. Zhou, K. Wang, N. Cao, J. He. Rare Category Detection on Time-Evolving Graphs, IEEE International Conference on Data Mining (ICDM-2015), November 2015.

BIRD Algorithm

• Proposed Updating Methods• Extract the updating matrix from the last time stamp• Update Global Similarity Matrix

Iteratively update with each updated edge based on Sherman-Morrison formula

K-NN matrixOnly update the rows in which the order of elements is changed by graph evolving

BIRD Algorithm

Time Complexity

[He et al., ICDM2006] [Zhou et al., ICDM2016]

Eigenspace Summarization in Computer System

• Problem• Given: Time-evolving graphs ෨𝐺 = {𝐺1, 𝐺2, … , 𝐺𝑇}

• Find: Detect anomalies online in an unsupervised manner.

• Challenges• Large number of nodes• Complex dependencies between servers• Edge weights are highly dynamic

• Ide, T. and Kashima, H., Eigenspace-Based Anomaly Detection in Computer Systems. KDD, 2004.

“Summary Feature” extraction

• Definition of the “service activity vector” (SAV)

• Mathematically, this equation is reduced to the eigenvalue equation:

Adjacency matrix at t (symmetric, non-negative)

Activity vector at t

The principal eigenvector gives the summaryof node “activity”!

Anomaly Detection

• The problem was reduced to anomaly detection from a time sequence of activity vectors

Anomaly Detection

• Typical Activity Pattern• Employ an LSI (Latent semantic indexing) like

pattern extraction technique.• The principal left singular vector is the solution

• Definition of Anomaly Metric• 𝑢(𝑡): activity vector at time 𝑡• 𝑟(𝑡 − 1): typical activity pattern at 𝑡 − 1

• Anomaly Scores

Track angle for the rare patterns

Summary Vector

Rare Category in Evolving Heterogenous Network

• Problem• Given: A stream of heterogenous graphs 𝐺 𝑡 = (𝑉, 𝐸, 𝑇) containing

different types 𝑇 of nodes 𝑉 and edges 𝐸. • Find: Rare categories in real-time while consuming bounded memory.

• Challenges• Nodes and edges are typed (e.g., fork, read).• Graph evolves from a steam of typed edges.• Bounded space and time complexity

• Emaad A. Manzoor, Sadegh M. Milajerdi, Leman Akoglu: Fast Memory-efficient Anomaly Detection in Streaming Heterogeneous Graphs. KDD 2016.

Example of two information flow graphs based on system logs

Graph Representation

• Graph to vectors via shingling• Compute the shingle vector for each graph 𝐺 𝑡 .

• Contain the frequencies of each 𝑘-shingle in 𝐺 𝑡 .

Graph Representation

• Sketching Graphs • Shingle universe is large and unknown• Compute 𝐿-dimension projection vector from shingle vector via

SimHash.

• Streaming Graph Representation (on each new edge)• Construct the set of shingles to update

• Hash the shingles to update• Update the projection vector and sketch

• Achlioptas, Dimitris. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of computer and System Sciences. 2003.

Identify Rare Categories

• Bootstrap K clusters

• Cluster centroid: “Average” graph

• Update clusters: Constant time

• Anomaly score: Nearest centroid

Part IV:Heterogenous Rare Category Analysis

An Overall Framework

Dynamic

Heterogenous

Data

Feedback/

Interaction

Feedback/

Interaction

DMKD’ 16IJCAI’ 15ICDM’ 15

KDD’ 18KDD’ 17SDM’ 17ICDM’ 16

DMKD’ 16ICDM’ 15

• Supported by NSF CAREER Project (Award Number: 1552654)

The Prototype System

Data Exploration Module

Rare Category Analysis Module

Feature Selection Module

Lin, H., Gao, S., Gotz, D., Du, F., He, J., & Cao, N. (2018). Rclens: Interactive rare category exploration and identification. IEEE transactions on visualization and computer graphics, 24(7), 2223-

2237.



• Represent raw data

• Interactive visualization for data querying

• Support Data filtering


Feature Selection Module







• Feature selection

• Interactive active learning

• Visualize rare examples in a salient representationFeature Selection

Module







• Feature selection

• Interactive active learning

• Visualize rare examples in a salient representationFeature Selection

Module

• Visualize the Variance of data

• Visualize the correlation of data

• Guide the feature selection and subspace investigation process

A Case Study in Financial Fraud Detection

• Problem• Given: Personal identification information (PII) network of the

bank customers.• Find: Suspicious synthetic identities.

• Identified Abnormal Patterns

PII Network Identified Rare Category

A group of suspicious identities shared

the same PIIs

Part V: Challenges & Future Directions

Challenges & Future Directions

• Scalability• How to scale up to large-scale data in real applications?

• Robustness • How ensure the performance in presences of adversarial examples?

• Rare Category Representation• Howe to learn hierarchical representation of complex rare examples?

• Rare Category Interpretation • How to interpret the prediction results by providing the relevant clues (e.g.,

relevant patterns, relevant features, relevant time stamps from time series data) ?

• Rare Category Generation• How to generate task-specific rare category examples (e.g., money laundering

activity) given a specific domain (e.g., transaction network)?

References

• Chen, C.. "CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature." Journal of the Association for Information Science and Technology 57.3 (2006): 359-377.

• Spaaij, R.. "The enigma of lone wolf terrorism: An assessment." Studies in Conflict & Terrorism 33.9 (2010): 854-870.

• Ngai, E. W., Hu, Y., Wong, Y. H., Chen, Y., & Sun, X. (2011). The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature. Decision support systems, 50(3), 559-569.

• D. Pelleg, A. W. Moore: Active Learning for Anomaly and Rare-Category Detection. NIPS 2004.

• J. He, and J. Carbonell. Co-Selection of Features and Instances for Unsupervised Rare Category Analysis. SDM 2010.

• J. He, and J. Carbonell. Nearest-Neighbor-Based Active Learning for Rare Category Detection. NIPS 2007.

• J. He, Y. Liu, and R. Lawrence. Graph-based Rare Category Detection. ICDM 2008.

• J. He, H. Tong, and J. Carbonell. Rare Category Characterization. ICDM 2010.

• A. W. Williams, S. M. Pertet, and P. Narasimhan, “Tiresias: Black-box Failure Prediction in Distributed Systems,” in Proc. of the 21st Intl.

• H. V. Jagadish, N. Koudas, and S. Muthukrishnan, “Mining Deviants in a Time Series Database,” in Proc. of the 25th Intl. Conf. on Very Large Data Bases (VLDB), 1999, pp. 102–113.

• S. Muthukrishnan, R. Shah, and J. Vitter, “Mining Deviants in Time Series Data Streams,” in Proc. of the 16th Intl. Conf. on Scientific and Statistical Database Management (SSDBM), Jun 2004, pp. 41–50.

Part

IPa

rt I

I

References

• E. Keogh, J. Lin and A. Fu (2005). HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence. In Proc. of the 5th IEEE International Conference on Data Mining (ICDM 2005), pp. 226 - 233., Houston, Texas, Nov 27-30, 2005.

• Chin-Chia Michael Yeh, Yan Zhu, Liudmila Ulanova, Nurjahan Begum, Yifei Ding, Hoang Anh Dau, Diego Furtado Silva, Abdullah Mueen, Eamonn Keogh (2016). Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets. IEEE ICDM 2016.

• Y. Bu, O. T.-W. Leung, A. W.-C. Fu, E. J. Keogh, J. Pei, and S. Meshkin, “WAT: Finding Top-K Discords in Time Series Database,” in Proc. of the 7th SIAM Intl. Conf. on Data Mining (SDM), 2007, pp. 449–454.

• X.-y. Chen and Y.-y. Zhan, “Multi-scale Anomaly Detection Algorithm based on Infrequent Pattern of Time Series,” Journal of Computational and Applied Mathematics, vol. 214, no. 1, pp. 227–237, Apr 2008.

• V. Chandola, V. Mithal, and V. Kumar, “A Comparative Evaluation of Anomaly Detection Techniques for Sequence Data,” in Proc. of the 2008 8th IEEE Intl. Conf. on Data Mining (ICDM), 2008, pp. 743–748.

• S. Budalakoti, A. N. Srivastava, and M. E. Otey, “Anomaly Detection and Diagnosis Algorithms for Discrete Symbol Sequences with Applications to Airline Safety,” IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications, vol. 39, no. 1, pp. 101–113, Jan 2009.

• T. Lane, C. Brodley et al., “Sequence Matching and Learning in Anomaly Detection for Computer Security,” in AAAI Workshop: AI Approaches to Fraud Detection and Risk Management, 1997, pp. 43–49.

• A. Nairac, N. Townsend, R. Carr, S. King, P. Cowley, and L. Tarassenko, “A System for the Analysis of Jet Engine Vibration Data,” Integrated Computer-Aided Engineering, vol. 6, no. 1, pp. 53–66, Jan 1999.

Part

II

References

• F. A. Gonz´alez and D. Dasgupta, “Anomaly Detection Using Real-Valued Negative Selection,” Genetic Programming and Evolvable Machines, vol. 4, no. 4, pp. 383–403, Dec 2003.

• D. Dasgupta and F. Nino, “A Comparison of Negative and Positive Selection Algorithms in Novel Pattern Detection,” in Proc. of the 2000 IEEE Intl. Conf. on Systems, Man, and Cybernetics, vol. 1, 2000, pp. 125–130.

• D. Endler, “Intrusion Detection Applying Machine Learning to Solaris Audit Data,” in Proc. of the 14th Annual Computer Security Applications Conf. (ACSAC), 1998, pp. 268–279.

• A. K. Gosh, J. Wanken, and F. Charron, “Detecting Anomalous and Unknown Intrusions Against Programs,” in Proc. of 18 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 1, JANUARY 2014 the 14th Annual Computer Security Applications Conf. (ACSAC), 1998, pp. 259–267.

• A. Ghosh, A. Schwartzbard, and M. Schatz, “Learning Program Behavior Profiles for Intrusion Detection,” in Proc. of the 1st USENIX Workshop on Intrusion Detection and Network

• Monitoring, 1999, pp. 51–62.

• A. K. Ghosh and A. Schwartzbard, “A Study in using Neural Networks for Anomaly and Misuse Detection,” in Proc. of the 8th Conf. on USENIX Security Symposium (SSYM), 1999, pp. 12–23.

• B. Gao, H.-Y. Ma, and Y.-H. Yang, “HMMs (Hidden Markov Models) based on Anomaly Intrusion Detection Method,” in Proc. of the 2002 Intl. Conf. on Machine Learning and Cybernetics, vol. 1, 2002, pp. 381–385.

Part

II

References

• J. a. B. D. Cabrera, L. Lewis, and R. K. Mehra, “Detection and Classification of Intrusions and Faults using Sequences of System Calls,” SIGMOD Records, vol. 30, no. 4, pp. 25–34, Dec 2001.

• Dawei Zhou, Jingrui He, Yu Cao, Jae-sun Seo. Bi-level Rare Temporal Pattern Detection, IEEE International Conference on Data Mining (ICDM-2016), December 2016.

• J. Sun, H. Qu, D. Chakrabarti, and C. Faloutsos. Neighborhood formation and anomaly detection in bipartite graphs. ICDM, 2005.

• L. Akoglu, M. McGlohon, C. Faloutsos. OddBall: Spotting Anomalies in Weighted Graphs. PAKDD, 2010.

• Hanghang Tong, Ching-Yung Lin: Non-Negative Residual Matrix Factorization with Application to Graph Anomaly Detection. SDM, pages 143-153, 2011.

• Satoshi Hara, Tetsuro Morimura, Toshihiro Takahashi, Hiroki Yanagisawa, Taiji Suzuki: A Consistent Method for Graph Based Anomaly Localization. AISTATS 2015.

• Jonathan Root, Jing Qian, Venkatesh Saligrama: Learning Efficient Anomaly Detectors from K-NN Graphs. AISTATS 2015.

• Dawei Zhou, Si Zhang, Mehmet Yigit Yildirim, Scott Alcorn, Hanghang Tong, Hasan Davulcu, Jingrui He: A Local Algorithm for Structure-Preserving Graph Cut. KDD 2017: 655-664.

• Si Zhang, Dawei Zhou, Mehmet Yigit Yildirim, Scott Alcorn, Jingrui He, Hasan Davulcu, Hanghang Tong. HiDDen: Hierarchical Dense Subgraph Detection with Application to Financial Fraud Detection.

Part

II

Part

III

References

• Jing Gao, Feng Liang, Wei Fan, Chi Wang, Yizhou Sun, Jia wei Han: On community outliers and their efficient detection in information networks. KDD 2010.

• Karthik Subbian, Charu C. Aggarwal, Jaideep Srivastava, Vipin Kumar: Rare Class Detection in Networks. SDM 2015: 406-414.

• Dawei Zhou, Jingrui He, Hongxia Yang, Wei Fan. SPARC: Self-Paced Network Representation for Few-Shot Rare Category Characterization, ACM: SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-2018), August 2018.

• Ide, T. and Kashima, H., Eigenspace-Based Anomaly Detection in Computer Systems. KDD, 2004.

• Dawei Zhou, Kangyang Wang, Nan Cao, Jingrui He. Rare Category Detection on Time-Evolving Graphs, IEEE International Conference on Data Mining (ICDM-2015), November 2015.

• Emaad A. Manzoor, Sadegh M. Milajerdi, Leman Akoglu: Fast Memory-efficient Anomaly Detection in Streaming Heterogeneous Graphs. KDD 2016: 1035-1044

Part

III

Documents

Complex Rare Category Analysis - Drexel CCI · Complex Rare Category Analysis: Mining Needles in the Haystack Dawei Zhou, Jingrui He {dzhou23, jingrui.he}@asu.edu School of Computing,