Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Complex Rare Category Analysis:Mining Needles in the Haystack
Dawei Zhou, Jingrui He
{dzhou23, jingrui.he}@asu.edu
School of Computing, Informatics, Decision Systems Engineering
Arizona State University
Needles in the Haystack
• Chen, C.. "CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature." Journal of the Association for Information Science and Technology 57.3
(2006): 359-377.
• Spaaij, R.. "The enigma of lone wolf terrorism: An assessment." Studies in Conflict & Terrorism 33.9 (2010): 854-870.
• Ngai, E. W., Hu, Y., Wong, Y. H., Chen, Y., & Sun, X. (2011). The application of data mining techniques in financial fraud detection: A classification framework and an academic review of
literature. Decision support systems, 50(3), 559-569.
1. Insider Threat 2. Money Laundering 3. Lone Wolf Terrorism
4. Gene Disease 5. Identity Theft 6. Emerging Trend
An Example of Needles: Malicious Insiders
Malicious InsiderDefinitionCurrent or former employee or
contractor who
➢ intentionally exceeded or
misused an authorized level
of network, system or data
access in such a way that,
➢ affected the security of the
organizations’ data, systems,
or daily business operations[Cappelli et al, RSA2008]
DARPA-BAA-10-84, DARPA-BAA-11-04, DARPA-BAA-11-64, IARPA-BAA-12-01, …
Malicious Insiders: Fort Hood Shooting
webvisits
‘Obvious’ Clues:
Current
Post Inv.
…
Q: Can we reverse it?
@
Radical
http
Blog
G2. Connect the dots
?
Past
G1. Find individual traces G3. Prevent the tragedy
shooting
Rare Category Analysis
• Problem DefinitionRare category analysis (RCA) refers to the problem of representing, identifying, characterizing and tracking rare examples from underrepresented minority classes in an imbalanced data set.
− Imbalanced data setQueen: 10Overall: 5000
− Minority classes are not
separable from majority
class
• An Illustration Example
Research Questions: Challenge of Rarity
Highly-skewed distribution
➢ 0.1% of any given population of
users is malicious
➢ 99.9% of normal users
[DARPA-BAA-11-04]
Non-separability nature
➢Malicious users often camouflage
their synthetic identity
➢ Malicious users try to bypass the
fraud detection systems[DARPA-MAA-18-09]
Commonality of insiders
➢ Detecting insiders of each type
➢ Identifying relevant features
➢ Characterizing each type of insidersQ1: How to detect and characterize
insiders of each type, using as little
cost as possible?
Research Questions: Challenge of Dynamics
Extreme scarcity of insiders
➢ 0.1% of any given population of
users is malicious
➢ 20% of malicious users are
active on any given day[DARPA-BAA-11-04]
Costly access to oracle
➢65K personnel at Fort Hood
4.7B emails in 2 years➢~60 initial activity reviews per
day per operator[DARPA-BAA-11-04]
Fine-grained dynamics
Q2: How to identify and track
insiders of each type over time?
Research Questions: Challenge of Heterogeneity
Various information sources Various data types
Various types of insiders
Q3: How to detect, characterize, and
tracking more insiders with high
accuracy, using heterogeneous data?
blackmailed
psychological
vulnerability greedy
Attributed
Data
Sequential
Data
Network
Data
Why do We Care?
Terrorist Incidents Worldwide Financial Fraud [Federal Trade Commission, 2017]
PR Crisis Cybercrime [M., McGuire, RSA2018]
US
Allies
Illegal online marketsTrade secret, IP theftData Trading
Cost over 1.5 trillion in 2018
Rare Category Analysis(RCA)
Roadmap
• Part I: Rare Category Analysis for Static Attributed Data
• Part II: Rare Category Analysis for Temporal Sequential Data
• Part III: Rare Category Analysis for Network Data
• Part IV: Heterogenous Rare Category Analysis
• Part V: Challenges & Future Directions
Part I: RCA for Static Attributed Data
Taxonomy
Unsupervised Rare Category Analysis Semi-supervised Rare Category Analysis
Feature Selection
• Active learning based• Nearest neighbor based
Rare Category Analysis for Attributed Data
• Feature selection• Instance selection
Rare Category Detection
• Comparison with imbalanced classification• Comparison with outlier/anomaly detection• Rare category characterization
Active Learning for Rare Category Detection
• Problem• Given: (i) an unlabeled noisy set; (ii) a small budget of label querying
from domain expert. • Find: identify useful anomalies (rare category examples).
• Challenges• High-skewed data distribution• No training data
• D. Pelleg, A. W. Moore: Active Learning for Anomaly and Rare-Category Detection. NIPS 2004.
Diffraction Spikes Satellite Trails
Hint Selection Methods
• Proposed Algorithm• Step 1: Start with entirely unlabeled data.• Step 2: Perform semi-supervised learning (which, on the first
iteration degenerates to unsupervised learning).• Step 3: Ask an expert to classify the anomaly patterns.• Step 4: Go to Step 2.
Step 1 Step 2
Step 3
Step 4
Nearest-neighbor-based methods
• Problem• Given: a set of unlabeled examples 𝑆 = 𝑥1, … , 𝑥𝑛 , 𝑥𝑖 ∈ 𝑅𝑑, which
come from 𝑚 distinct classes.• Find: at least one example from each class.
• Intuition: select examples according to the change in local density
-6 -4 -2 0 2 4 60
0.05
0.1
0.15
0.2
0.25
is small
is large
Q: How to measure ? df x
dx
x
f(x)
• J. He, and J. Carbonell. Nearest-Neighbor-Based Active Learning for Rare Category Detection. NIPS 2007.
Intuition: How to Measure 𝑓(𝑥)?
More Purple
Higher Density
More Points in Purple Ball
Intuition: How to Measure 𝑑𝑓(𝑥)/𝑑𝑥?
Big
Small
Bigger Difference in
Color of Purple Balls
Bigger Change
in Density
dxxdf
dxxdf
NNDB Algorithm for the Binary Case
1. Calculate class-specific radius r
2. , ,
3.
4. Query
5. Rare class?
Increase t by 1
6. Output x
No
Yes
NNDB Algorithm at a Glance
Synthetic Data Sets
Majority class: 1000
Minority class: 10
Random: 101
NNDB: 3
Majority class: 3000
Smallest minority class: 79
Random: 83
NNDM: 5
-3 -2 -1 0 1 2 3 4
-1
0
1
2
3
4
5
Taxonomy
Unsupervised Rare Category Analysis Semi-supervised Rare Category Analysis
Feature Selection
• Active learning based• Nearest neighbor based
Rare Category Analysis for Attributed Data
• Feature selection• Instance selection
Rare Category Detection
• Comparison with imbalanced classification• Comparison with outlier/anomaly detection• Rare category characterization
Feature Selection for Rare Category Analysis
• Problem• Given: a set of unlabeled examples 𝑆 = 𝑥1, … , 𝑥𝑛 , 𝑥𝑖 ∈ 𝑅𝑑.• Find: a set of 𝑑𝑟 features, which are relevant to rare categories.
• J. He, and J. Carbonell. Co-Selection of Features and Instances for Unsupervised Rare Category Analysis. SDM 2010
Input Data
Selected Features
Rare Category Analysis
Output Results
Feature Selection for Rare Category Analysis
• Challenges• Data exhibit highly-skewed distribution• Label information is costly and difficult to obtain for rare examples• Conventional feature selection algorithms can not select the
relevant features for rare category analysis due to it’s highly-skewed nature
Input Data
Selected Features
Rare Category Analysis
Output Results
Feature Selection for Rare Category Analysis
• Intuition• Rare category selection: select a set of examples which are likely
to come from the minority class• Feature selection: identify the features relevant to the minority
class• Jointly dealing with the two tasks benefits both of them
Input Data
Selected Features
Rare Category Analysis
Selected Rare Examples
Output Results
PALM Algorithm
• Notation• Unlabeled examples: 𝐷 = 𝑥1, … , 𝑥𝑛 , 𝑥𝑖 ∈ 𝑅𝑑
• Class labels: 𝑦𝑖 ∈ {1, 2}
• Majority class: 𝑦𝑖 = 1, prior 1 − 𝑝
• Minority class: 𝑦𝑖 = 2, prior 𝑝• Relevant subspace of the minority class:
• 𝑑𝑟 features relevant to the minority class• Similar values on the 𝑑𝑟 features• Diverse values on the remaining features
PALM Algorithm
• Optimization Problem
• Constraints• There are np points from the minority class
• There are dr relevant features
Binary minority class indicator vector
Binary relevant feature indicator vector
The ith element of a
The jth element of b
Synthetic Dataset
• Majority class: 1000
• Minority class: 10
• Relevant features:• X and Y
• Using proposed PALM• Feature selection
• bx=by=1
• bz=0
• Rare category selection• Precision: 90%
Taxonomy
Unsupervised Rare Category Analysis Semi-supervised Rare Category Analysis
Feature Selection
• Active learning based• Nearest neighbor based
Rare Category Analysis for Attributed Data
• Feature selection• Instance selection
Rare Category Detection
• Comparison with imbalanced classification• Comparison with outlier/anomaly detection• Rare category characterization
Semi-supervised Rare Category Analysis
• Problem Semi-supervised rare category analysis refers to the problem of detect and cluster rare examples from underrepresented minority classes, given one/few-shot labeled example.
• A Pipeline
Input Data
Full No/Partial
Feature
Graph
Full No/Partial
Feature
Graph GRADE[ICDM2008]
PriorData
Type
GRADE-LI[ICDM2008]
NNDB
ALICE/MALICE[NIPS2007, ISAIM2008]
SEDER[SDM2009]
Rare Category Detection [SDM2009, ICDM2008,NIPS2007]
Rare Category Characterization[FCS2012, ICDM2010, KDD2018]
Comparison with Imbalanced Classification
• Imbalanced Classification• Problem
• Data exhibit imbalanced class-membership distribution• Labeled example from all the classes• Models for imbalanced classification focus on overall accuracy of each class
• Methodology: [Kubat & Matwin, ICML1997]; [Chawla et al, JAIR2002]; [Wu & Chang, ICML2003]; [Huang et al., CVPR2016]; …
• Semi-supervised Rare Category Analysis• Problem
• Data exhibit highly-skewed class-membership distribution.• One/few-shot labeled example from the rare categories.• Models for rare category analysis put heavy emphasis on learning minority classes
with a good performance.
• Methodology: [Fine & Mansour, COLT2006]; [Dasgupta & Hsu, ICML2008]; [Vatturi& Wong, KDD2009]; [Zhou et al., KDD2018]; …
Comparison with Outlier/Anomaly Detection
• Outlier vs. Rare Class• [Pelleg & Moore, NIPS2005]
“ Most of the objects (i.e., majority classes) are well explained by current theories and …. remainder are anomalies, but 99% of these anomalies are uninteresting, and only 1% of them (i.e., rare categories) are useful … the rest type of anomalies, called boring anomalies (i.e., outliers)”
• Anomalies are the instances that does not fit the distribution of the majority classes. • Outliers are typically the single points, separable from normal examples and are
scattered over the space. • Rare categories assumes the minority classes are compact in the feature space and
may overlap with the majority class.
Input Data
Anomalies
Outlier
Rare Useful, Compact
Uninterested, Scattered
Rare Category Characterization
• Problem• Given: a few labeled examples from both classes• Find: a set of unlabeled examples which are likely to come from the
minority class
• Binary case• One majority class, one minority class• Can be extended to multiple rare categories
• J. He, H. Tong, and J. Carbonell. Rare Category Characterization. ICDM 2010.
RACH Algorithm
• Pre-processing: Filtering• Assumption
• Rare examples enclosed by a minimum-radius hyperball
• Intuition• Safely discard unlabeled examples far away from the minority class
Labeled minorityclass examples
One-class SVM
Unlabeledexamples
Center &radius
Filtered unlabeledexamples
RACH Algorithm
• FormulationRadius
Center
Slack variablefor labeled
examples frommajority class
Slack variable forunlabeled examples
Parameters
RACH Algorithm
• Intuitions
Intuition: find the smallest hyperball that covers the denser support region of the minority class
MOST labeled examples from themajority class outside the hyperball
ALL labeled examples from theminority class inside the hyperball
MANY unlabeled examples enclosed by the hyperball
Empirical Studies: 20 Newsgroups
F-S
core
Label Percentage
RACH
KNN
Manifold-Ranking
Under-Sampling
SVM-Perf
TSVM
Empirical Studies
Label Percentage
20 NewsgroupsLabel Percentage
Shuttle
Label Percentage
AbaloneLabel Percentage
Glass
F-s
co
re
F-s
co
reF
-sco
re
F-s
co
re
Part II: RCA for Temporal Sequential Data
Application 1: Health Condition Monitoring
• Heart Disease• Inseparable from normal patterns• High skewed • Multiple sources
Body Sensor Network
Segment-level:irregular pulse
Sequence-level:
abnormal bio-signals
Application 2: Synthetic ID Detection
• Synthetic ID• Combining real and fake identifying information• Extremely imbalanced distributions• Real time transaction
Segment-level
Sequence-level
Application 3: Insider Trading Detection
Security Boundary
Sequence-level:Insider
Segment-level:Insider Trading
Challenges on Temporal Data
• Heterogeneity • Different length of temporal sequences
• Different types of temporal patterns
• Hierarchy• RCA among multiple time series
• RCA within a given time series
• Sparsity• Sequence-level targets are rare
• Segments-level targets are rare
Taxonomy
Segment-level RCA Bi-level RCA
Subsequences
• Profile similarity based• Deviants based
Rare Category Analysis for Temporal Sequential Data
• Regular subsequence• Irregular subsequence
Points
Time-series-level RCA
• Unsupervised approached• Supervised approached
Rare Category of Points
• Problem• Given: a time series 𝒙 = (𝑥1, 𝑥2, … , 𝑥𝑇).• Find: the rare categories of points in 𝒙.
• Approaches• Profile-similarity-based methods[Williams et al., IPDPS2007]
• Deviants-based methods [Jagadish et al., VLDB1999; Muthukrishnan et al., SSDBM2004]
Abnormal points = Rare Examples
Profile-Similarity-Based Methods
• Intuition• Maintain a profile for the normal patterns (majority classes)• Identify abnormal patterns (minority classes) by comparing new
time points against the profile.
• Tiresias system• Target multi-variable OS performance metric time series• Maintain a normal profile and the variance vector • Any new data points is compared both with the normal profile and
the variance vector to compute its anomaly score
• A. W. Williams, S. M. Pertet, and P. Narasimhan, “Tiresias: Black-box Failure Prediction in Distributed Systems,” in Proc. of the 21st Intl. Parallel and Distributed Processing Symposium
(IPDPS), 2007, pp. 1–8.
Deviant-Based Methods
• Intuition• Find points in a given time series whose
removal from series results in a histogram representation with a lower error bound than the original.
• Approaches• Find the optimal set of 𝑘 deviants always
consists of the 𝑙 highest and remaining 𝑘 − 𝑙lowest values for some 𝑙 <= 𝑘.
• Propose a dynamic programming based solution that maintains a partial solution only for a few interspersed indexes of the time series rather than for each value
Point 11 and 4 are the rare examples
• S. Muthukrishnan, R. Shah, and J. Vitter, “Mining Deviants in Time Series Data Streams,” in Proc. of the 16th Intl. Conf. on Scientific and Statistical Database Management (SSDBM), Jun
2004, pp. 41–50.
Taxonomy
Segment-level RCA Bi-level RCA
Subsequences
• Profile similarity based• Deviants based
Rare Category Analysis for Temporal Sequential Data
• Regular subsequence• Irregular subsequence
Points
Time-series-level RCA
• Unsupervised approached• Supervised approached
Rare Category of Subsequences
• Problem• Given: a time series 𝒙 = (𝑥1, 𝑥2, … , 𝑥𝑇).• Find: the rare categories of subsequences in 𝒙.
• Approaches• Regular subsequences
• Discord Discovery [Keogh et al., ICDM2005; Yeh et al., ICDM2016]
• Signal-transformation-based methods [Bu et al., SDM2007]
• Matrix-profile-based methods [Zhu et al., ICDM2018; Linaridi et al., SIGMOD2018]
• Irregular subsequences
Abnormal subsequences = Rare Examples
Discord Discovery
• Discord Definition Given a time series 𝑇, the subsequence 𝐷 of length 𝑛 beginning at position 𝑙 is said to be the discord of 𝑇 if 𝐷 has the largest distance to its nearest non-overlapping match.
• Approaches • The brute force solution is to consider all possible subsequences 𝑠 ∈ 𝑆 of length 𝑛 in 𝑇 and compute the distance of each such 𝑠 with each other non-overlapping 𝑠′ ∈ S.
• The top-K pruning reordering the subsequences to make the computation more efficient with various heuristics, such as symbolic aggregate approximation (SAX) [Keogh et al., ICDM2005] and locality sensitive hashing [Wei et al., ICDM2006].
• E. Keogh, J. Lin and A. Fu (2005). HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence. In Proc. of the 5th IEEE International Conference on Data Mining (ICDM 2005),
pp. 226 - 233., Houston, Texas, Nov 27-30, 2005.
Signal-Transformation-Based Methods
• Haar Wavelet Transform• Provides better pruning power• Can dynamically determine the word size (unlike SAX)
• Approaches [Bu et al., SDM2007]
• Haar wavelet transform• Normalize Haar wavelet coefficients• Identify cutpoints by treating this distribution as Gaussian such that area
between any 2 cutpoints is same• Map Haar coefficients to symbols based on cutpoints• Perform appropriate ordering of subsequences to use as candidates and
compare
Top-3 discords in power consumption history of a Dutch
research facility in year 1997
• Y. Bu, O. T.-W. Leung, A. W.-C. Fu, E. J. Keogh, J. Pei, and S. Meshkin, “WAT: Finding Top-K Discords in Time Series Database,” in Proc. of the 7th SIAM Intl. Conf. on Data Mining (SDM),
2007, pp. 449–454.
Matrix-Profile-Based Methods
• What is the Matrix Profile?• The Matrix Profile (MP) is a data structure that annotates a time
series.• The matrix profile at the 𝑖𝑡ℎ location records the distance of the
subsequence 𝑖 to its 𝑘𝑡ℎ-nearest neighbor.
• Illustration Examples
Time Series 𝑇
Matrix Profile
the subsequence starting at 921 happens to have a distance of 177.0 to its nearest neighbor (wherever it is).
• Mueen, A., Keogh, E. , Time Series data Mining Using the Matrix Profile: A Unifying View of Motif Discovery, Anomaly Detection, Segmentation, Classification, Clustering and Similarity Joins,
KDD tutorial 2017
• Majority classes• Relative low Matrix Profile value• Have relative similar subsequence through the data
• Minority classes• Relative high Matrix Profile value• Unique in the shape (such areas are “discords” or rare category)
Matrix-Profile-Based Methods
Rare Categories
• Electrocardiogram • MIT-BIH Long-Term ECG Database• Two anomalies annotated by MIT cardiologists• Relative high MP values for ectopic beats
Matrix-Profile-Based Methods
Irregular Subsequences
• Problem• The time series 𝑇 = { 𝑥1, 𝑡1 , 𝑥2, 𝑡2 , … , (𝑥𝑛, 𝑡𝑛)} is not collected with an
equal sampling rate, • Given a time series 𝑇, 𝑡𝑖+1 − 𝑡𝑖 is not all the same for all 𝑖 ∈ {1, 2, … , 𝑛}.• The pattern are defined as a subsequences of a set of consecutive points 𝑝 = 𝑥𝑖 , 𝑡𝑖 , … , 𝑥𝑖+𝑇 , 𝑡𝑖+𝑇 .
• Solution• [Chen and Zhan, JCAM2008] propose a outlier detection methods for unequal
interval time series.• Subsequences are abnormal if there are very few other patterns with the
same slope 𝑥𝑖+𝑇−𝑥𝑖
𝑡𝑖+𝑇−𝑡𝑖and the same lengths.
• To identify anomalies at multiple resolutions, Haar transform is used.
• X. Chen and Y. Zhan, “Multi-scale Anomaly Detection Algorithm based on Infrequent Pattern of Time Series,” Journal of Computational and Applied Mathematics, vol. 214, no. 1, pp. 227–
237, Apr 2008.
Taxonomy
Segment-level RCA Bi-level RCA
Subsequences
• Profile similarity based• Deviants based
Rare Category Analysis for Temporal Sequential Data
• Regular subsequence• Irregular subsequence
Points
Time-series-level RCA
• Unsupervised approached• Supervised approached
Time-Series-Level Rare Category Analysis
• ProblemGiven: a time series database.Find: all the rare categories of time series.
• Illustration Example
An abnormal signal = A Rare Example
Nearest-Neighbor Based Methods
• Define a similarity function to compare two time series• nLCS: length of the longest common subsequence• DTW: dynamic time warping• …
• Compute the clustering spaces • K-medoids • K-means/ Phased K-means• Single-linkage clustering
• Calculate the likelihood score for rare categories• KNN-based methods [He and Carbonell, NIPS2007; Chandola et al., 2008]
• Sampling-based methods [Dasgupta and Hsu, ICML2008]
• Mean-shift-based methods [Vatturi et al., KDD2009]
Window-Based Detection Methods
• Advantage: Better localization of anomalies compared to techniques that compute time series outlier score directly
• Disadvantage: New parameter -- window length parameter
Supervised Approaches
• Subsequences of positive and negative strings of behavior as features with
• String matching classifier [Cabrera et al., 2001; González and Dasgupta, 2003]
• Neural networks [Dasgupta and Nino, 2000; Endler, 1998; Gosh et al., 1998; Ghosh et al., 1999a; Ghosh and Schwartzbard, 1999]
• Elman network [Ghosh et al., 1999a]
• Bag of system calls features with decision tree, Naive Bayes, SVMs [Kang et al., 2005]
• Sliding window subsequences with• SVMs [Tian et al., 2007; Wang et al., 2006]
• Rule based classifiers (Classification using Hierarchical Prediction Rules (CHIP)) [Li et al., 2007]
• HMMs [Gao et al., 2002]
Taxonomy
Segment-level RCA Bi-level RCA
Subsequences
• Profile similarity based• Deviants based
Rare Category Analysis for Temporal Sequential Data
• Regular subsequence• Irregular subsequence
Points
Time-series-level RCA
• Unsupervised approached• Supervised approached
Bi-Level Rare Category Analysis
• Problem• Given: (i) a time series database: 𝑆={𝒙^((𝟏) ),…, 𝒙^((𝑴) )}, (ii) Prior: 𝑃 .
• Find: (i) sequence-level predicted label: {𝑌^((1) ),…, 𝑌^((𝑀) )}, (ii) Segments-level predicted label: {𝒚^((𝟏) ),…, 𝒚^((𝑴) )}.
• Dawei Zhou, Jingrui He, Yu Cao, Jae-sun Seo. Bi-level Rare Temporal Pattern Detection, IEEE International Conference on Data Mining (ICDM-2016), December 2016
BIRAD Algorithm
• Modeling Abnormal Temporal Sequence
Initial Probability
Pr(y1m|Y m = 1)
Emission Probability
Pr(𝑥𝑖𝑚|𝑦𝑖
𝑚, 𝑌 𝑚 = 1)
Transition Probability
Pr(yjm|yj−1
m, Y m = 1)
Emission Probability
Initial Probability Transition Probability
Hidden Markov Assumptions
BIRAD Algorithm
• Modeling Normal Temporal Sequence
Emission Probability
Pr(𝑥𝑖𝑚|𝑌 𝑚 = 0)
Assumption: All temporal segments are normal within normal temporal sequence. Pr(yj+1
m|yj
m, Y m = 0) = 1
BIRAD Algorithm
• Over-all Objective Function
Assumption: Only abnormal
sequence (𝑌(𝑚) = 1) contains
abnormal segments (𝑦𝑖(𝑚)
= 1)
Probability of 𝒙 𝒎 is
normal (𝑌 𝑚 = 0)
Probability of 𝒙 𝒎 is
abnormal (𝑌 𝑚 = 1)
BIRAD – iteration 0
Abnormal Abnormal
Abnormal Abnormal
Model Initialization: randomly select abnormal patterns
BIRAD – iteration 1
Abnormal Abnormal
Abnormal Abnormal
Model Updating: 𝜇𝑛 = 9.2, 𝜎𝑛 = 0.8; 𝜇𝑎 = 9.5, 𝜎𝑎 = 0.5
BIRAD – iteration 2
Abnormal normal
normal Abnormal
Model Updating: 𝜇𝑛 = 9.25, 𝜎𝑛 = 0.6; 𝜇𝑎 = 9.66, 𝜎𝑎 = 0.1
BIRAD – iteration 3
Abnormal normal
normal normal
Model Updating: 𝜇𝑛 = 9.25, 𝜎𝑛 = 0.6; 𝜇𝑎 = 11, 𝜎𝑎 = 0.8
BIRAD – iteration 4
Abnormal normal
normal normal
Model Updating: 𝜇𝑛 = 9.25, 𝜎𝑛 = 0.6; 𝜇𝑎 = 12, 𝜎𝑎 = 0.9
BIRAD – iteration 5
Abnormal normal
normal normal
Model Updating: 𝜇𝑛 = 9.25, 𝜎𝑛 = 0.6; 𝜇𝑎 = 12.5, 𝜎𝑎 = 1.3 Converge !
Part 3: Rare Category Analysis for Network Data
Applications
Computer Network Online Transaction NetworkSocial Network
Emerging Trends Money Laundering Network Intrusion
Taxonomy
Static Graph Dynamic Graph
Plain Graph Attributed Graph
Low-order structureHigh-order structure
Rare Category Analysis for Network Data
Community basedEmbedding based
Homogeneous networkHeterogeneous network
Rare Category in Weighted Graphs
• Problem• Q1: Given a weighted and
unlabeled graph, how can we spot strange, abnormal, extreme nodes?
• Q2: Can we explain why the spotted nodes are anomalous (rare examples) ?
• L. Akoglu, M. McGlohon, C. Faloutsos. OddBall: Spotting Anomalies in Weighted Graphs. PAKDD, 2010.
OddBall Algorithm
• Feature extraction• 𝑁𝑖: number of neighbors (degree) of egonet 𝑖
• 𝐸𝑖: number of edges in egonet 𝑖
• 𝑊𝑖: total weight of egonet 𝑖
• 𝜆𝑊,𝑖: principal eigenvalue of the weighted adjacency matrix of egonet 𝑖
OddBall ALgorithm
• Proposed method• For each node,
• Extract “ego-net” (=1-step neighborhood)• Extract features (#edges, total weight, etc.)
• features that could yield “laws”
• features fast to compute and interpret
• Detect regular patterns• Examples from majority classes• Compute the distribution of regular patterns
• Detect irregular patterns• Examples from minority classes• Compute the anomaly score = distance to fitting line
uniform, robot-like behavior
Rare Category in Bipartite Graphs
• Problem• Q1. Neighborhood formation (NF)
• Given a query node 𝑞 in 𝑽𝟏, what are the relevance scores of all the nodes in 𝑽𝟏 to 𝑞 ?
• EX: Similar authors in publication networks
• Q2. Anomaly detection (AD)• Given a query node 𝑞 in 𝑽𝟏, what are the normality
scores for nodes in 𝑽𝟐 that link to 𝑞 ?• EX: Unusual papers in publication networks
• J. Sun, H. Qu, D. Chakrabarti, and C. Faloutsos. Neighborhood formation and anomaly detection in bipartite graphs. ICDM, 2005.
Neighborhood Formation
• Main Idea• Conduct Random-Walk-with-Restart from 𝑞
• Compute steady-state 𝑉1as neighborhood relevance
• Construct transition matrix 𝑃• Fly-back probability 𝑐 to 𝑞
• Solve for steady state𝑣(𝑡+1) = 𝑃𝑣 𝑡 + 𝑐𝑞
Anomaly Detection
• Main Idea• Pairwise “normality” scores of neighbors 𝑡
• Function of (e.g., 𝑎𝑣𝑔) pair-wise scores• Find set S of nodes connected to 𝑡
• Compute |𝑆|𝑥|𝑆| normality matrix 𝑅• Asymmetric, diagonal reset to 0
• Apply score function 𝑓(𝑅)• EX: 𝑓(𝑅) = 𝑚𝑒𝑎𝑛(𝑅)
Reveal Rare Category via NNrMF
• Matrix Tool for Finding Graph Patterns
Graph Adj. Matrix A A = F x G + R
Low-rank matrices Residual matrix
• H. Tong, C. Lin: Non-Negative Residual Matrix Factorization with Application to Graph Anomaly Detection. SDM, pages 143-153, 2011.
Reveal Rare Category via NNrMF
• Matrix Tool for Finding Graph Patterns
Graph Adj. Matrix A A = F x G + R
Low-rank matrices Residual matrix
community anomalies
An Illustrative Example
Improve Interpretation by Non-negativity
• A Typical Procedure:
• An Example
Interpretation by Non-negativity
GraphAdjacencyMatrix A
A = F x G + R
community
anomalies
Non-negative Matrix Factorization
F >= 0; G >= 0(for community detection)
Non-negative Residual Matrix Factorization
R(i,j) >= 0; for A(i,j) > 0(for anomaly detection)
This Paper
Optimization Formulation
Q: How to find ‘optimal’ F and G? • D1: Quality C1: non-convexity of opt. objective• D2: Scalability C2: large size of the graph
Non-negative residual
Weighted Frobenius Form
WeightCommon in Any Matrix Factorization
Rare Category with High-Order Structures
• Problem• Given: Graph 𝐺=(𝑉, 𝐸), user-defined structure 𝑁.• Find: Find a structure-rich dense subgraph that largely preserves
the user-defined structures.
2
3
1
User-define structure
3-node line
1 2
Dense Subgraph with rich 3-node lines
3𝑮
• D. Zhou, S. Zhang, M. Y. Yildirim, S. Alcorn, H. Tong, H. Davulcu, J. He: A Local Algorithm for Structure-Preserving Graph Cut. KDD 2017: 655-664
High-Order Structures in Real Applications
High Order Conductance
• DefinitionFor any cluster 𝐶 in graph 𝐺 and the 𝑘𝑡ℎ-order structure , the 𝑘𝑡ℎ-order conductance Φ(𝐶,𝑁) is defined as
The number of network structures broken due to the partition of 𝐺 into 𝐶 and ҧ𝐶
Φ 𝐶,𝑁 =𝑐𝑢𝑡 𝐶, 𝑁
min 𝜇 𝐶,𝑁 , 𝜇( ҧ𝐶, 𝑁)
The number of network structures in 𝐶.
The number of network structures in ҧ𝐶.
High Order Conductance
Cut 𝐶 Φ 𝐶,𝑁 =𝑐𝑢𝑡 𝐶, 𝑁
min 𝜇 𝐶,𝑁 , 𝜇( ҧ𝐶, 𝑁)
Graph 𝐺
➢ 𝟐𝒏𝒅-order conductance
Φ 𝐶,𝑁 =1
min{4,11}= 1/4
➢ 𝟑𝒓𝒅-order conductance
Φ 𝐶,𝑁 =2
min{3,34}=2
3
HOSPLOC Algorithm
• Construct Adjacency TensorGiven a graph 𝐺 = (𝑉, 𝐸), the 𝑘𝑡ℎ-order network structure N on 𝐺 could be represented in a 𝑘-dimensional adjacency tensor 𝑇 as follows
For the set of nodes {2, 4, 1}𝑇 1,4,2 = 0
For the set of nodes {6, 8, 10}𝑇 10,8,6 = 1𝑮: 𝑵:
Example:
HOSPLOC Algorithm
• Compute Transition TensorGiven a graph 𝐺 = 𝑉, 𝐸 and the adjacency tensor 𝑇 for the 𝑘𝑡ℎ-order network structure N, the corresponding transition tensor 𝑃 could be computed as
For the set of nodes {2, 4, 1}𝑃 1,4,2 = 0
For the set of nodes {6, 8, 10}𝑃 10,8,6 = 1/3𝑮: 𝑵:
Example:
HOSPLOC Algorithm
• Using the “rank-1” approximation [Li and NG, 2013], the high-order random walks can formulated as
𝑞 𝑡 = 𝑃𝑞(𝑡−1)…𝑞(𝑡−𝑘+1)
• Vector based graph cut• Locally conduct high-order random walks to explore 𝑁.• Compute the permutation 𝜋 of the returned HRW distribution 𝑞 such
that:
• Iteratively check the potential cuts 𝐶_1, 𝐶_2, …,𝐶_(𝑛−1), where 𝐶_𝑖={𝜋(1), …,𝜋(𝑖)}.
Hierarchical Rare Category Detection
• Problem• Given: Adjacency matrix 𝑨, missing edge penalty 𝑝, number of
hierarchies 𝐾;.density increase ratio 𝜂.• Find: Subgraph node indicator vectors 𝒙1, 𝒙2, … , 𝒙𝐾.
Edge
Den
sity
𝑑o
ver
Laye
rs
• S. Zhang, Dawei Zhou, M. Y. Yildirim, S. Alcorn, J. He, H. Davulcu, H. Tong. HiDDen: Hierarchical Dense Subgraph Detection with Application to Financial Fraud Detection.
HiDDen Algorithm
• Density Measure• Intuition:
• #1: Maximize the number of existing edges• #2: Minimize the penalty of the missing edges
• Mathematical Details:
• Correctness:• Equivalent to edge surplus density w.r.t quasi-clique
• Relaxation:
max𝒙
𝐽 𝒙 = 𝒙𝑇𝑨𝒙 − 𝑝𝒙𝑇 𝟏𝑛×𝑛 − 𝑰 − 𝑨 𝒙
𝑠. 𝑡 𝒙 ∈ 0,1 𝑛
Intuition #1 Intuition #2
𝒙 ∈ 0,1 𝑛 𝟎 ≤ 𝒙 ≤ 𝟏
HiDDen Algorithm
• Constraints for Hierarchies• Constraints:
• #1 – Density variety: densities in two hierarchies exhibit a difference• #2 – Nested node set: larger subgraphs contain smaller subgraphs
• Mathematical Details:• Density variety:
• Nested node set:
𝒙𝑘𝑇𝑨𝒙𝑘
𝒙𝑘 𝑇 𝟏𝑛×𝑛 − 𝑰 𝒙𝑘≥ 𝜂
𝒙𝑘−1𝑇𝑨𝒙𝑘−1
𝒙𝑘−1 𝑇 𝟏𝑛×𝑛 − 𝑰 𝒙𝑘−1
Example: 𝑑3 ≥ 1.1 × 𝑑2
𝑉𝑘+1 ⊆ 𝑉𝑘 ⊆ 𝑉𝑘−1 𝒙𝑘+1 ≤ 𝒙𝑘 ≤ 𝒙𝑘−1
Example: 𝑉3 ⊆ 𝑉2 ⊆ 𝑉1 ⊆ 𝑉
HiDDen Algorithm
• Objective function:
• Observation: a non-convex quadratic constrained quadratic programming problem (QCQP)
• Optimization: alternative projected gradient descent method
max𝒙1,𝒙2,…,𝒙𝐾
𝑘=1
𝐾
𝒙𝑘𝑇
1 + 𝑝 𝑨 − 𝑝 𝟏𝑛×𝑛 − 𝑰 𝒙𝑘
𝑠. 𝑡𝒙𝑗
𝑇𝑨𝒙𝑗
𝒙𝑗 𝑇 𝟏𝑛×𝑛 − 𝑰 𝒙𝑗≥ 𝜂
𝒙𝑗−1𝑇𝑨𝒙𝑗−1
𝒙𝑗−1 𝑇 𝟏𝑛×𝑛 − 𝑰 𝒙𝑗−1
𝒙𝑗+1 ≤ 𝒙𝑗 ≤ 𝒙𝑗−1
∀ 𝑗 = 1, 2, … , 𝐾
edge surplus in 𝑘𝑡ℎ hierarchy
density variety
nested node set
Taxonomy
Static Graph Dynamic Graph
Plain Graph Attributed Graph
Low-order structureHigh-order structure
Rare Category Analysis for Network Data
Community basedEmbedding based
Homogeneous networkHeterogeneous network
Community Outliers for Attributed Graph
• Problem• Given: An attributed graph 𝐺 = (𝑉, 𝐸, 𝑋), where 𝑉 presents the set
of nodes, 𝐸 represent the set of edges and 𝑋 represent the node features.
• Find: Objects (Rare Example) with features deviating from other community members
• J. Gao, F. Liang, W. Fan, C. Wang, Y. Sun, J. Han: On community outliers and their efficient detection in information networks. KDD 2010.
An Unified Probabilistic Model
Formulation
• Maximize the likelihood of data distribution 𝑃(𝑋)• 𝑃(𝑋) ∝ 𝑃 𝑋 𝑍 𝑃(𝑍)
• 𝑃(𝑋) depends on community label and model parameter 𝑠.
• 𝑃(𝑍) is higher if neighboring nodes from normal communities share the same community label
• e.g., two linked nodes are likely to be in the same community
Majority Class
Minority Class
Activate function
Algorithm
Rare Category Oriented Network Embedding
• Problem• Given: An attributed network 𝐺 = (𝑉, 𝐸, 𝑋), one-shot or few-shot
labeled rare examples L = {𝑥1, … , 𝑥𝐿}, and the desired embedding dimension.
• Find: A rare category oriented network embedding 𝐸 ∈ 𝑅𝑛×𝑑, and a list of predicted rare category examples.
• Challenges• C1: Rarity • C2: Sparsity of labeled examples• C3: Non-separability of rare example from the majority classes
• D. Zhou, J. He, H. Yang, W. Fan. SPARC: Self-Paced Network Representation for Few-Shot Rare Category Characterization, ACM: SIGKDD Conference on Knowledge Discovery and Data
Mining (KDD-2018), August 2018.
Network Layout of Pubmed
SPARK Algorithm
• Rare Category Characterization (RCC)• Cost-sensitive learning (Address C1)
• Focus on rare category examples• Learn from the highly-skewed distribution
𝐿𝑠 =
𝑙=1
𝐿
𝑐𝑦𝑖, ො𝑦𝑖 log Pr(ෝ𝑦𝑖 = 1 − 𝑦𝑖|𝑥𝑖 , 𝑒𝑖)
• Self-paced learning (Address C2)• Start from a handful labeled example• Gradually explore more via label propagation
𝐿𝑅𝐶𝐶 = 𝐿𝑠 −
𝑖=1
𝐿+𝑈
𝑣𝑖1log Pr ෝ𝑦𝑖 = 1 𝑥𝑖 , 𝑒𝑖 −
𝑖=1
𝐿+𝑈
𝜆 1 𝑣𝑖1
Penalize the error of classify minority class examples into majority classes
Self-paced Regularizers
Self-paced Vector
SPARC Algorithm
• Rare Category Oriented Network Embedding (RCE) • Minimize the cross entropy loss of predicting context pairs 𝑖, 𝑐
• Rare category oriented context sampling (Address C3)• Indicator vector 𝐼 from RCC.• With probability 𝑝, extract general network
context.• With probability 1 − 𝑝, extract rare category
oriented network context starting from the non-zero elements in 𝐼.
SPARC Algorithm
• Proposed Framework
Impact of Self-Paced Learning
2-D t-SNE visualization
Taxonomy
Static Graph Dynamic Graph
Plain Graph Attributed Graph
Low-order structureHigh-order structure
Rare Category Analysis for Network Data
Community basedEmbedding based
Homogeneous networkHeterogeneous network
Rare Category Detection on Time-evolving Graphs
• Problem• Given: Time-evolving graphs ෨𝐺 = {𝐺1, 𝐺2, … , 𝐺𝑇}
• Find: At lease one example from each minority classes
• Challenges• Graphs are evolving over time
• New nodes/edges show up/die out• Edge weights change
• High Computation cost• Space complexity• Time complexity
• D. Zhou, K. Wang, N. Cao, J. He. Rare Category Detection on Time-Evolving Graphs, IEEE International Conference on Data Mining (ICDM-2015), November 2015.
BIRD Algorithm
• Proposed Updating Methods• Extract the updating matrix from the last time stamp• Update Global Similarity Matrix
Iteratively update with each updated edge based on Sherman-Morrison formula
K-NN matrixOnly update the rows in which the order of elements is changed by graph evolving
BIRD Algorithm
Time Complexity
[He et al., ICDM2006] [Zhou et al., ICDM2016]
Eigenspace Summarization in Computer System
• Problem• Given: Time-evolving graphs ෨𝐺 = {𝐺1, 𝐺2, … , 𝐺𝑇}
• Find: Detect anomalies online in an unsupervised manner.
• Challenges• Large number of nodes• Complex dependencies between servers• Edge weights are highly dynamic
• Ide, T. and Kashima, H., Eigenspace-Based Anomaly Detection in Computer Systems. KDD, 2004.
“Summary Feature” extraction
• Definition of the “service activity vector” (SAV)
• Mathematically, this equation is reduced to the eigenvalue equation:
Adjacency matrix at t (symmetric, non-negative)
Activity vector at t
The principal eigenvector gives the summaryof node “activity”!
Anomaly Detection
• The problem was reduced to anomaly detection from a time sequence of activity vectors
Anomaly Detection
• Typical Activity Pattern• Employ an LSI (Latent semantic indexing) like
pattern extraction technique.• The principal left singular vector is the solution
• Definition of Anomaly Metric• 𝑢(𝑡): activity vector at time 𝑡• 𝑟(𝑡 − 1): typical activity pattern at 𝑡 − 1
• Anomaly Scores
Track angle for the rare patterns
Summary Vector
Rare Category in Evolving Heterogenous Network
• Problem• Given: A stream of heterogenous graphs 𝐺 𝑡 = (𝑉, 𝐸, 𝑇) containing
different types 𝑇 of nodes 𝑉 and edges 𝐸. • Find: Rare categories in real-time while consuming bounded memory.
• Challenges• Nodes and edges are typed (e.g., fork, read).• Graph evolves from a steam of typed edges.• Bounded space and time complexity
• Emaad A. Manzoor, Sadegh M. Milajerdi, Leman Akoglu: Fast Memory-efficient Anomaly Detection in Streaming Heterogeneous Graphs. KDD 2016.
Example of two information flow graphs based on system logs
Graph Representation
• Graph to vectors via shingling• Compute the shingle vector for each graph 𝐺 𝑡 .
• Contain the frequencies of each 𝑘-shingle in 𝐺 𝑡 .
Graph Representation
• Sketching Graphs • Shingle universe is large and unknown• Compute 𝐿-dimension projection vector from shingle vector via
SimHash.
• Streaming Graph Representation (on each new edge)• Construct the set of shingles to update
• Hash the shingles to update• Update the projection vector and sketch
• Achlioptas, Dimitris. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of computer and System Sciences. 2003.
Identify Rare Categories
• Bootstrap K clusters
• Cluster centroid: “Average” graph
• Update clusters: Constant time
• Anomaly score: Nearest centroid
Part IV:Heterogenous Rare Category Analysis
An Overall Framework
Dynamic
Heterogenous
Data
Feedback/
Interaction
Feedback/
Interaction
DMKD’ 16IJCAI’ 15ICDM’ 15
KDD’ 18KDD’ 17SDM’ 17ICDM’ 16
DMKD’ 16ICDM’ 15
• Supported by NSF CAREER Project (Award Number: 1552654)
The Prototype System
Data Exploration Module
Rare Category Analysis Module
Feature Selection Module
Lin, H., Gao, S., Gotz, D., Du, F., He, J., & Cao, N. (2018). Rclens: Interactive rare category exploration and identification. IEEE transactions on visualization and computer graphics, 24(7), 2223-
2237.
The Prototype System
Data Exploration Module
• Represent raw data
• Interactive visualization for data querying
• Support Data filtering
Rare Category Analysis Module
Feature Selection Module
The Prototype System
Data Exploration Module
• Represent raw data
• Interactive visualization for data querying
• Support Data filtering
Rare Category Analysis Module
• Feature selection
• Interactive active learning
• Visualize rare examples in a salient representationFeature Selection
Module
The Prototype System
Data Exploration Module
• Represent raw data
• Interactive visualization for data querying
• Support Data filtering
Rare Category Analysis Module
• Feature selection
• Interactive active learning
• Visualize rare examples in a salient representationFeature Selection
Module
• Visualize the Variance of data
• Visualize the correlation of data
• Guide the feature selection and subspace investigation process
A Case Study in Financial Fraud Detection
• Problem• Given: Personal identification information (PII) network of the
bank customers.• Find: Suspicious synthetic identities.
• Identified Abnormal Patterns
PII Network Identified Rare Category
A group of suspicious identities shared
the same PIIs
Part V: Challenges & Future Directions
Challenges & Future Directions
• Scalability• How to scale up to large-scale data in real applications?
• Robustness • How ensure the performance in presences of adversarial examples?
• Rare Category Representation• Howe to learn hierarchical representation of complex rare examples?
• Rare Category Interpretation • How to interpret the prediction results by providing the relevant clues (e.g.,
relevant patterns, relevant features, relevant time stamps from time series data) ?
• Rare Category Generation• How to generate task-specific rare category examples (e.g., money laundering
activity) given a specific domain (e.g., transaction network)?
References
• Chen, C.. "CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature." Journal of the Association for Information Science and Technology 57.3 (2006): 359-377.
• Spaaij, R.. "The enigma of lone wolf terrorism: An assessment." Studies in Conflict & Terrorism 33.9 (2010): 854-870.
• Ngai, E. W., Hu, Y., Wong, Y. H., Chen, Y., & Sun, X. (2011). The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature. Decision support systems, 50(3), 559-569.
• D. Pelleg, A. W. Moore: Active Learning for Anomaly and Rare-Category Detection. NIPS 2004.
• J. He, and J. Carbonell. Co-Selection of Features and Instances for Unsupervised Rare Category Analysis. SDM 2010.
• J. He, and J. Carbonell. Nearest-Neighbor-Based Active Learning for Rare Category Detection. NIPS 2007.
• J. He, Y. Liu, and R. Lawrence. Graph-based Rare Category Detection. ICDM 2008.
• J. He, H. Tong, and J. Carbonell. Rare Category Characterization. ICDM 2010.
• A. W. Williams, S. M. Pertet, and P. Narasimhan, “Tiresias: Black-box Failure Prediction in Distributed Systems,” in Proc. of the 21st Intl.
• H. V. Jagadish, N. Koudas, and S. Muthukrishnan, “Mining Deviants in a Time Series Database,” in Proc. of the 25th Intl. Conf. on Very Large Data Bases (VLDB), 1999, pp. 102–113.
• S. Muthukrishnan, R. Shah, and J. Vitter, “Mining Deviants in Time Series Data Streams,” in Proc. of the 16th Intl. Conf. on Scientific and Statistical Database Management (SSDBM), Jun 2004, pp. 41–50.
Part
IPa
rt I
I
References
• E. Keogh, J. Lin and A. Fu (2005). HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence. In Proc. of the 5th IEEE International Conference on Data Mining (ICDM 2005), pp. 226 - 233., Houston, Texas, Nov 27-30, 2005.
• Chin-Chia Michael Yeh, Yan Zhu, Liudmila Ulanova, Nurjahan Begum, Yifei Ding, Hoang Anh Dau, Diego Furtado Silva, Abdullah Mueen, Eamonn Keogh (2016). Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets. IEEE ICDM 2016.
• Y. Bu, O. T.-W. Leung, A. W.-C. Fu, E. J. Keogh, J. Pei, and S. Meshkin, “WAT: Finding Top-K Discords in Time Series Database,” in Proc. of the 7th SIAM Intl. Conf. on Data Mining (SDM), 2007, pp. 449–454.
• X.-y. Chen and Y.-y. Zhan, “Multi-scale Anomaly Detection Algorithm based on Infrequent Pattern of Time Series,” Journal of Computational and Applied Mathematics, vol. 214, no. 1, pp. 227–237, Apr 2008.
• V. Chandola, V. Mithal, and V. Kumar, “A Comparative Evaluation of Anomaly Detection Techniques for Sequence Data,” in Proc. of the 2008 8th IEEE Intl. Conf. on Data Mining (ICDM), 2008, pp. 743–748.
• S. Budalakoti, A. N. Srivastava, and M. E. Otey, “Anomaly Detection and Diagnosis Algorithms for Discrete Symbol Sequences with Applications to Airline Safety,” IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications, vol. 39, no. 1, pp. 101–113, Jan 2009.
• T. Lane, C. Brodley et al., “Sequence Matching and Learning in Anomaly Detection for Computer Security,” in AAAI Workshop: AI Approaches to Fraud Detection and Risk Management, 1997, pp. 43–49.
• A. Nairac, N. Townsend, R. Carr, S. King, P. Cowley, and L. Tarassenko, “A System for the Analysis of Jet Engine Vibration Data,” Integrated Computer-Aided Engineering, vol. 6, no. 1, pp. 53–66, Jan 1999.
Part
II
References
• F. A. Gonz´alez and D. Dasgupta, “Anomaly Detection Using Real-Valued Negative Selection,” Genetic Programming and Evolvable Machines, vol. 4, no. 4, pp. 383–403, Dec 2003.
• D. Dasgupta and F. Nino, “A Comparison of Negative and Positive Selection Algorithms in Novel Pattern Detection,” in Proc. of the 2000 IEEE Intl. Conf. on Systems, Man, and Cybernetics, vol. 1, 2000, pp. 125–130.
• D. Endler, “Intrusion Detection Applying Machine Learning to Solaris Audit Data,” in Proc. of the 14th Annual Computer Security Applications Conf. (ACSAC), 1998, pp. 268–279.
• A. K. Gosh, J. Wanken, and F. Charron, “Detecting Anomalous and Unknown Intrusions Against Programs,” in Proc. of 18 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 1, JANUARY 2014 the 14th Annual Computer Security Applications Conf. (ACSAC), 1998, pp. 259–267.
• A. Ghosh, A. Schwartzbard, and M. Schatz, “Learning Program Behavior Profiles for Intrusion Detection,” in Proc. of the 1st USENIX Workshop on Intrusion Detection and Network
• Monitoring, 1999, pp. 51–62.
• A. K. Ghosh and A. Schwartzbard, “A Study in using Neural Networks for Anomaly and Misuse Detection,” in Proc. of the 8th Conf. on USENIX Security Symposium (SSYM), 1999, pp. 12–23.
• B. Gao, H.-Y. Ma, and Y.-H. Yang, “HMMs (Hidden Markov Models) based on Anomaly Intrusion Detection Method,” in Proc. of the 2002 Intl. Conf. on Machine Learning and Cybernetics, vol. 1, 2002, pp. 381–385.
Part
II
References
• J. a. B. D. Cabrera, L. Lewis, and R. K. Mehra, “Detection and Classification of Intrusions and Faults using Sequences of System Calls,” SIGMOD Records, vol. 30, no. 4, pp. 25–34, Dec 2001.
• Dawei Zhou, Jingrui He, Yu Cao, Jae-sun Seo. Bi-level Rare Temporal Pattern Detection, IEEE International Conference on Data Mining (ICDM-2016), December 2016.
• J. Sun, H. Qu, D. Chakrabarti, and C. Faloutsos. Neighborhood formation and anomaly detection in bipartite graphs. ICDM, 2005.
• L. Akoglu, M. McGlohon, C. Faloutsos. OddBall: Spotting Anomalies in Weighted Graphs. PAKDD, 2010.
• Hanghang Tong, Ching-Yung Lin: Non-Negative Residual Matrix Factorization with Application to Graph Anomaly Detection. SDM, pages 143-153, 2011.
• Satoshi Hara, Tetsuro Morimura, Toshihiro Takahashi, Hiroki Yanagisawa, Taiji Suzuki: A Consistent Method for Graph Based Anomaly Localization. AISTATS 2015.
• Jonathan Root, Jing Qian, Venkatesh Saligrama: Learning Efficient Anomaly Detectors from K-NN Graphs. AISTATS 2015.
• Dawei Zhou, Si Zhang, Mehmet Yigit Yildirim, Scott Alcorn, Hanghang Tong, Hasan Davulcu, Jingrui He: A Local Algorithm for Structure-Preserving Graph Cut. KDD 2017: 655-664.
• Si Zhang, Dawei Zhou, Mehmet Yigit Yildirim, Scott Alcorn, Jingrui He, Hasan Davulcu, Hanghang Tong. HiDDen: Hierarchical Dense Subgraph Detection with Application to Financial Fraud Detection.
Part
II
Part
III
References
• Jing Gao, Feng Liang, Wei Fan, Chi Wang, Yizhou Sun, Jia wei Han: On community outliers and their efficient detection in information networks. KDD 2010.
• Karthik Subbian, Charu C. Aggarwal, Jaideep Srivastava, Vipin Kumar: Rare Class Detection in Networks. SDM 2015: 406-414.
• Dawei Zhou, Jingrui He, Hongxia Yang, Wei Fan. SPARC: Self-Paced Network Representation for Few-Shot Rare Category Characterization, ACM: SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-2018), August 2018.
• Ide, T. and Kashima, H., Eigenspace-Based Anomaly Detection in Computer Systems. KDD, 2004.
• Dawei Zhou, Kangyang Wang, Nan Cao, Jingrui He. Rare Category Detection on Time-Evolving Graphs, IEEE International Conference on Data Mining (ICDM-2015), November 2015.
• Emaad A. Manzoor, Sadegh M. Milajerdi, Leman Akoglu: Fast Memory-efficient Anomaly Detection in Streaming Heterogeneous Graphs. KDD 2016: 1035-1044
Part
III