Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Copyright©2016 NTT corp. All Rights Reserved.
Multi-view Anomaly Detection via Robust Probabilistic Latent Variable Models
Tomoharu Iwata NTT Communication Science Labs Makoto Yamada RIKEN, AIP
NIPS2016
2
Multi-view anomaly • Instances that have inconsistent views • Application
– information disparity management • find documents that contain different information
across multilingual Wikipedia documents – malicious insider detection – purchase behavior analysis
• find movies inconsistently purchased by users based on the genre (animation by purchased by grown-ups)
image annotation
text URLs
audio video
Japanese text English text
Multi-view data
3
Single-view/multi-view anomaly
B M C A
F D G
S
I J
E
B M C D
F A
G
S
I J
E
H
H
View1 View2
• single-view anomaly is an instance that does not conform to expected behavior
• S is single-view anomaly, but not multi-view anomaly • M is not single-view anomaly, but multi-view
anomaly
4
Existing method: HOAD (1) • HOrizontal Anomaly Detection (HOAD)
– A Spectral Framework for Detecting Inconsistency across Multi-source Object Relationships, Gao et. al. ICDM 2011
• Step1: soft clustering two views together with the constraint that an instance should be assigned to the same cluster
two-view combined similarity
graph
view1
view2
view1 view2 diagonal
diagonal
spectral clustering
5
Existing method: HOAD (2)
• Step2: quantify the difference between the two clustering solutions
• weak points – anomalous instances also have the constraint to
be assigned to the same cluster – require hyper-parameter tuning (e.g. weight for
the constraint)
6
Proposed model • Normal (non-anomalous) instance
– all views are generated from a single latent vector • Anomaly
– different views are generated from different latent vectors
B M C A
F D G
S
I J
E
B M C D
F A
G
S
I J
E
H
H
B M C D
F A G
S
I J E H M
Observed view 1 Observed view 2
Latent space
𝑾1 𝑾2
7
Proposed model • Each instance has potentially a countably infinite number
of latent vectors • Each view of an instance is generated depending on a
view-specific projection matrix and a latent vector
A
Observed view 1 Observed view 2
Latent space A
A
A A
Observed view D
𝑾1 𝑾2 𝑾𝐷
8
Proposed model • Each instance has potentially a countably infinite number
of latent vectors • Each view of an instance is generated depending on a
view-specific projection matrix and a latent vector
A
Observed view 1 Observed view 2
Latent space A
A
A A
Observed view D
𝑾1 𝑾2 𝑾𝐷
9
Proposed model • Each instance has potentially a countably infinite number
of latent vectors • Each view of an instance is generated depending on a
view-specific projection matrix and a latent vector
A
Observed view 1 Observed view 2
Latent space A
A
A A
Observed view D
𝑾1 𝑾2 𝑾𝐷
A
10
Proposed model • Each instance has potentially a countably infinite number
of latent vectors • Each view of an instance is generated depending on a
view-specific projection matrix and a latent vector
A
Observed view 1 Observed view 2
Latent space A
A
A A
Observed view D
𝑾1 𝑾2 𝑾𝐷
A
11
Proposed model • Each instance has potentially a countably infinite number
of latent vectors • Each view of an instance is generated depending on a
view-specific projection matrix and a latent vector
Observed view 1 Observed view 2
Latent space A
A
A
Observed view D
𝑾1 𝑾2 𝑾𝐷
A A
A A
12
Proposed model • Each instance has potentially a countably infinite number
of latent vectors • Each view of an instance is generated depending on a
view-specific projection matrix and a latent vector
Observed view 1 Observed view 2
Latent space A
A
A
Observed view D
𝑾1 𝑾2 𝑾𝐷
A A
A A
13
Proposed model • Each instance has potentially a countably infinite number
of latent vectors • Each view of an instance is generated depending on a
view-specific projection matrix and a latent vector
Observed view 1 Observed view 2
Latent space A
A
A
Observed view D
𝑾1 𝑾2 𝑾𝐷
A A
A
A A
14
Proposed probabilistic model
• By using Dirichlet processes prior for each instance, the number of latent vectors is automatically determined from the given data – if views are consistent, they are clustered; otherwise, they
need different latent vectors • By using view-dependent projection matrices, we can handle
different properties across different views
𝑝 𝒙𝑛𝑛 𝒛𝑛,𝑾𝑛 ,𝜽𝑛,𝛼 = �𝜃𝑛𝑛𝑁(𝒙𝑛𝑛|𝑾𝑛𝒛𝑛𝑛 ,𝛼−1𝑰)∞
𝑛=1
view d of instance n
j-th latent vectors of instance n
projection matrix for view d
mixture weight for j-th latent vector
precision
𝑣𝑛 =1𝐻� 𝐼(𝐽𝑛 ℎ > 1)𝐻
ℎ=1
#latent vectors of instance n at the h-th iteration after burn-in
indicator function
Anomaly score: probability that the instance uses more than one latent vector
15
Proposed model • clustering views for each instance in the latent space
Observed view 1 Observed view 2
Latent space A1 A2
A4
Observed view 5
𝑾1 𝑾2 𝑾5
A
A
A
A5 A3
16
• Proposed model – 𝑝 𝒙𝑛𝑛 𝒛𝑛,𝑾𝑛 ,𝜽𝑛,𝛼 = ∑ 𝜃𝑛𝑁(𝒙𝑛𝑛|𝑾𝑛𝒛𝑛𝑛 ,𝛼−1𝑰)∞
𝑛=1
• Probabilistic PCA – 𝑝 𝒙𝑛𝑛 𝒁,𝑾 = 𝑁(𝒙𝑛𝑛|𝑾𝒛𝑛𝑛 ,𝛼−1𝑰)
• Probabilistic CCA – 𝑝 𝒙𝑛𝑛 𝒁,𝑾 = 𝑁(𝒙𝑛𝑛|𝑾𝑛𝒛𝑛𝑛 ,𝚺)
• Infinite Gaussian mixture – 𝑝 𝒙𝑛𝑛 𝝁,𝜽 = ∑ 𝜃𝑛𝑁(𝒙𝑛𝑛|𝝁𝑛 ,𝛼−1𝑰)∞
𝑛=1
Relation with other latent variable models
17
Generative process
18
Inference based on stochastic EM • Analytically integrate out the latent vectors Z, mixture
weights Θ, precision 𝛼 • E-step: collapsed Gibbs sampling of latent vector
assignment s for each view of each instance ℓ = (𝑛,𝑑)
• M-step: maximum joint likelihood estimation of
projection matrices W
19
Experiments • Data
– 11 data sets from LIBSVM data – generated multiple views by randomly splitting
the features – anomalies were added by swapping views of two
randomly selected instances • Comparing methods
– PCCA: probabilistic canonical correlation analysis – HOAD: horizontal anomaly detection – CC: consensus clustering based anomaly detection – OCSVM: one-class support vector machine
20
Multi-view anomaly detection with different anomaly rate
The proposed model achieved the best with 8 of the 11 data sets
21
Multi-view anomaly detection with different latent dimensionality
22
MovieLens data analysis instance: movie, view1: user list who rated, view2: genre
‘The Full Monty’ and ‘Liar Liar’ were ‘Comedy’ genre. They are rated by not only users who likes ‘Comedy’, but also who likes ‘Romance’ and ‘Action-Thriller’. ‘The Professional’ was anomaly because it was rated by two different user groups, where a group prefers ‘Romance’ and the other prefers ‘Action’. Since ‘Star Trek’ series are typical Sci-Fi and liked by specific users, its anomaly score was low.
high anomaly score movies low anomaly score movies
23
Conclusion • We proposed a generative model approach for multi-
view anomaly detection, which finds instances that have inconsistent views.
• In the experiments, we confirmed that the proposed model could perform much better than existing methods for detecting multi-view anomalies
• Future work – nonlinear projection using Gaussian processes or
deep neural nets