Upload
kristopher-sharp
View
216
Download
3
Embed Size (px)
Citation preview
Modeling, Searching, and Explaining Abnormal Instances in Multi-
Relational NetworksChapter 1. Introduction
Speaker: Cheng-Te Li
2007 . 7 . 9
2
Outline• Introduction• Problem Definition
– Multi-relational Networks– The Importance of Abnormal Instances– Explanation
• Design Considerations• Objective and Challenges• Approach• Contributions
3
Introduction• A discovery is said to be an accident meeting
a prepared mind. – Albert Szent Gyorgyi
• For CS, to model the discovery process via AI• Motivation: “Natural Selection”• The discovery process
4
Outline• Introduction• Problem Definition
– Multi-relational Networks– The Importance of Abnormal Instances– Explanation
• Design Considerations• Objective and Challenges• Approach• Contributions
5
Problem Definition
• Essentially, how to model through AI?– Our general framework
• Three key features– Multi-relational network (MRN)– Abnormal Instances– Human-understandable explanation
6
Multi-relational Networks• Definition
– Nodes : objects of different types– Links : binary relationships between objects– Multi-relational : multiple different types of
links– Attributes
• Encode semantic relationship between different types of object
• E.g. Bibliography network
7
Multi-relational Networks (con’t)
• More examples– Kinship network (親屬網絡 )– WWW : incoming, outgoing, and email links– WordNet : lexical relationship between concepts
• Multiple relationship types carry different kinds of semantic information to compare and contrast
• PageRank, Centrality Theory– Cannot deal with relation types in a network
8
Abnormal Instances• Discovery from a network
– Identify central nodes, recognize frequent subgraphs, learn interesting property
• Our goal is to discover those look different !– Attraction of “light bulb”– An unheard-of anomaly detection via relational data– Potential applications :
• Information Awareness and Homeland Security• Fraud Detection and Law Enforcement• General Scientific Discovery• Data Cleaning
9
Explanation• The difficulty of verification
– To find something previously unknown– False positive problem may exists even if
high precision and high recall, which likes unsupervised discovery
• Explanation-based discovery– Human-understandable explanation– Intuitive validation by user– Further investigation
10
Outline• Introduction• Problem Definition
– Multi-relational Networks– The Importance of Abnormal Instances– Explanation
• Design Considerations• Objective and Challenges• Approach• Contributions
11
Design Considerations• Three strategies to identify abnormal
instancesRule-based
learning
Pattern-matching e.g. “abnormal if it doesn’t cite any other people’s papers”
Supervised Learning
Manual labeling for training and classification Merit : high precision Demerit : domain dependent expensive to create sensitive to human bias can only find expected, not for novel
Unsupervised
Learning
Comparison-based due to our definition Property : Easily adapted to new domain without training More suitable to security-related problems
12
Design Considerations (con’t)
• System Requirements
Utilize information of MRN, e.g. type of links
Adapt to different domains, no training
Explainable
Scalable
Provide high-level bias
Support different levels of detail for explanations
13
Outline• Introduction• Problem Definition
– Multi-relational Networks– The Importance of Abnormal Instances– Explanation
• Design Considerations• Objective and Challenges• Approach• Contributions
14
Objectives & Challenges• Objectives
Discovery stage : identify abnormal nodes Explanation stage : produce descriptions for nodes fou
nd– e.g. organized crime network
• Challenges Make anomaly detection obey previous requirements
• Identify suspicious instances in MRN : rule-based, supervised• Conventional unsupervised algo. for propositional or numerical da
ta• PageRank, HITS, Random Walk : not consider link types
Consider understandable explanations as discovery• Need a complex-enough and not-over-complicated model
15
Approach Design a model capturing the semantic of nodes
– Select a set of relevant path types as semantic features– Compute statistical dependency between nodes and
path types as feature values
Find nodes with abnormal semantics– Distance-based outlier detection with semantic
profiles
Explain them !– Apply a classification to separate abnormal from others– Translate generated rules into natural language
16
Contributions
An unsupervised way to identify abnormal in MRN
Outperform state-of-the-art algo. by a large margin
Generate understandable explanations
Do complex data analysis accurately and efficiently
Generality and applicability
17
Q & A
Thanks for your listening !