17
Connecting Users across Social Media Sites: A Behavioral-Modeling Approach REZA ZAFARANI AND HUAN LIU DATA MINING AND MACHINE LEARNING LABORATORY (DMML) ARIZONA STATE UNIVERSITY KDD 2013 – CHICAGO, ILLINOIS

Connecting Users across Social Media Sites: A Behavioral-Modeling Approach

  • Upload
    geneva

  • View
    21

  • Download
    0

Embed Size (px)

DESCRIPTION

Connecting Users across Social Media Sites: A Behavioral-Modeling Approach. Reza Zafarani and Huan Liu Data Mining and Machine Learning Laboratory (DMML) Arizona State University KDD 2013 – Chicago, Illinois. How hard can it be to identify an individual across sites? - PowerPoint PPT Presentation

Citation preview

Page 1: Connecting  Users across Social Media Sites: A Behavioral-Modeling  Approach

Connecting Users across Social Media Sites:A Behavioral-Modeling Approach

REZA ZAFARANI AND HUAN LIU

DATA MINING AND MACHINE LEARNING LABORATORY (DMML)

ARIZONA STATE UNIVERSITY

KDD 2013 – CHICAGO, ILLINOIS

Page 2: Connecting  Users across Social Media Sites: A Behavioral-Modeling  Approach

How hard can it be to identify

an individual across sites?

Privacy Experts Claim Advertisers

Know a lot about People

Can they stop showing you the

same repetitive ads across sites?

Page 3: Connecting  Users across Social Media Sites: A Behavioral-Modeling  Approach

More information about individuals

Many social media sites

Partial Information

Complementary Information

Better User Profiles

Facebook

Google+

Age

Location

Education

Huan Liu

N/A

Tempe,AZ

USC

N/A

USA

USC (1985-89)

Can we connect individualsacross sites?

Connectivity is not available

Consistency in Information Availability

Page 4: Connecting  Users across Social Media Sites: A Behavioral-Modeling  Approach

Can we verify that the information provided across sites belong to the same individual?

Page 5: Connecting  Users across Social Media Sites: A Behavioral-Modeling  Approach

MOdeling Behavior for Identifying Users across Sites

Human behavior generates Information redundancy

Information shared across sites

provides a behavioral fingerprint

MOBIUS

- Behavioral Modeling

- Minimum Information

Page 6: Connecting  Users across Social Media Sites: A Behavioral-Modeling  Approach

Identification Function

Minimum information available on ALL sites:

Usernames

CandidateUsername (john.smith)

Prior Usernames ({jsmith, john.s})

Page 7: Connecting  Users across Social Media Sites: A Behavioral-Modeling  Approach

Behavior 1

Behavior 2

Behavior n

Information RedundancyInformation Redundancy

Information Redundancy

Feature Set 1

Feature Set 2

Feature Set n

GeneratesCaptured

Via

Learning Framewor

kData

IdentificationFunction

Page 8: Connecting  Users across Social Media Sites: A Behavioral-Modeling  Approach

Behaviors

Human Limitation

Time & Memory Limitation

Knowledge Limitation

Exogenous Factors

Typing Patterns

Language Patterns

Endogenous Factors

Personal Attributes &

Traits

Habits

Page 9: Connecting  Users across Social Media Sites: A Behavioral-Modeling  Approach

Using Same Usernames

Username Length

Likelihood

Time and Memory Limitation

59% of individuals use the same

username

1 2 3 4 5 6 7 8 9 10 11 120 0 0 0 0 0 0

2

4

5

1

0

Page 10: Connecting  Users across Social Media Sites: A Behavioral-Modeling  Approach

Limited Vocabulary

Limited Alphabet

Knowledge Limitation

Identifying individuals by their

vocabulary size

Alphabet Size is correlated to

language: शमं�त कु� मं�र -> Shamanth Kumar

Page 11: Connecting  Users across Social Media Sites: A Behavioral-Modeling  Approach

Typing Patterns

QWERTY Keyboard Variants: AZERTY, QWERTZ

DVORAK Keyboard

Keyboard type impacts your usernames

QWER1234 AOEUISNTH

Page 12: Connecting  Users across Social Media Sites: A Behavioral-Modeling  Approach

Modifying Previous

Usernames

Creating Similar

UsernamesUsername Observatio

n Likelihood

Habits - old habits die hardAdding Prefixes/Suffixes, Abbreviating, Swapping or Adding/Removing Characters

Nametag and Gateman

Usernames come from a language

model

Page 13: Connecting  Users across Social Media Sites: A Behavioral-Modeling  Approach

Experiment Setup

Data:

200,000 instances (50% class balance)

414 Features

Previous Methods:

1) Zafarani and Liu, 2009

2) Perito et al., 2011

Baselines:

3) Exact Username Match

4) Substring Match

5) Patterns in Letters

Page 14: Connecting  Users across Social Media Sites: A Behavioral-Modeling  Approach

Exac

t Use

rnam

e M

atch

Subs

trin

g M

atch

ing

Patte

rns in

Let

ters

Zafar

ani a

nd L

iu

Perito

et a

l.

Naï

ve B

ayes

0

20

40

60

80

100

7763.12

49.2566 77.59

91.38

MOBIUS Performance

Page 15: Connecting  Users across Social Media Sites: A Behavioral-Modeling  Approach

Naï

ve B

ayes J4

8

Rando

m F

ores

t

L2-reg

L2-

Loss

SVM

L1-reg

L2-

Loss

SVM

L2-reg

Log

istic Reg

ress

ion

L1-reg

Log

istic Reg

ress

ion

89909192939495

91.3890.87

93.5993.793.7193.7793.8

Choice of Learning Algorithm

Page 16: Connecting  Users across Social Media Sites: A Behavioral-Modeling  Approach

Diminishing Returns for Adding More Usernames

Page 17: Connecting  Users across Social Media Sites: A Behavioral-Modeling  Approach

Discover applications of connecting users across sites

Information shared across sites acts as a behavioral fingerprint

Human Behavior Results in Information RedundancyIncorporating features indigenous to specific sitesA methodology for connecting individuals across sitesA behavioral modeling approachUses minimum information across

sitesAllows for integration of additional

behaviors when required

Conclusions + Future Work