Upload
gwendolyn-sharp
View
218
Download
0
Embed Size (px)
Citation preview
Email Alias Detection Using Social Network Analysis Ralf Holzer, Bradley Malin, Latanya Sweeney
LinkKDD 2005
Advisor: Dr. Koh Jia-LingReporter: Che-Wei, Liang
Date: 2008/08/141
Outline
• Introduction• Alias Detection Method– Data Representation– Ranking Algorithms
• Experiment
2
Introduction
• Individuals use aliases for various communication purposes
• Alias detection– Useful to both legitimate and illegitimate applications– Important to understand the extent to which the
process can be automated
3
Introduction
• Aliases are listed on the same webpage can indicate there exists some form of relationship between them
• Many people use several email addresses– This paper attempt to determine which email
addresses correspond to the same entity
4
Introduction
5
Data Representation
• Let S represent the set of sourcesModeled as an undirected graph G = (I, E)– I be the set of unique email addresses– Cab = |eab| denote the number of sources
associated with each edge connecting a and b
6
Ranking Algorithms
• Ranking method– Top-k list of possible aliases– Shortest path algorithm• Used geodesic distance to generate a ranking of nodes
closest to a given originating node
• Relationship strength is augmented with– Number of aliases on a source– Number of collocations of aliases
7
Ranking Algorithms• Geodesic distance– Length of the shortest path from a to b– Potential aliases are ranked from lowest to
highest geodesic distance
• Multiple Collocation– Two aliases which collocate on more than one
webpage signifies a stronger relationship
8
Ranking Algorithms
• Source Size– Strength between two aliases in inversely
correlated with the number of aliases in a source
• Combined– Integrates both of previous assumptions
9
Experiment
• Derived from CMU web pages– 1978 distinct email aliases
• Data Set Statistics
10
Experiment
11
Experiment
• Geodesic Alias Distances
12
Experiment
13
Experiment
14
Experiment
15