58
© 2013 Columbia University E6885 Network Science Lecture 7: Analysis of Network Flow E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept. of Electrical Engineering, Columbia University October 21st, 2013

E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University

E6885 Network Science Lecture 7: Analysis of Network Flow

E 6885 Topics in Signal Processing -- Network Science

Ching-Yung Lin, Dept. of Electrical Engineering, Columbia University

October 21st, 2013

Page 2: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University2 E6885 Network Science – Lecture 7: Network Flow

Course StructureClass Date Lecture Topics Covered

09/09/13 1 Overview of Network Science

09/16/13 2 Network Representation and Feature Extraction

09/23/13 3 Network Paritioning, Clustering and Visualization

09/30/13 4 Network Analysis Use Case

10/07/13 5 Network Sampling, Estimation, and Modeling

10/14/13 6 Network Topology Inference

10/21/13 7 Network Information Flow

10/28/13 8 Graph Database

11/11/13 9 Final Project Proposal Presentation

11/18/13 10 Dynamic and Probabilistic Networks

11/25/13 11 Information Diffusion in Networks

12/02/13 12 Impact of Network Analysis

12/09/13 13 Large-Scale Network Processing System

12/16/13 14 Final Project Presentation

Page 3: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University3 E6885 Network Science – Lecture 7: Network Flow

Gravity Models

Gravity models are a class of models, for describing aggregate levels of interaction among the people of different populations. Traditionally used in:

– Geography

– Economics

– Sociology

– Hydrology

– Analysis of Computer Network Traffic

For instance,

– New York > Los Angeles = 20,124,377 * 15,781,273 / (2462 miles)^2 = 52.4 million.

– El Paso (Texas) <-> Tucson (Arizona) = 703,127 * 790,755 / (263)^2 = 8.0 million

– El Paso (Texas) <-> Los Angeles = 21.0 million Predict migration and traffic flow

Page 4: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University4 E6885 Network Science – Lecture 7: Network Flow

Common Gravity Model

The general gravity model specifies that the traffic flows Zij be in the form of counts, with independent Poisson distributions and mean functions of the form:

Some commonly used (standard) forms:

( ) ( ) ( ) ( )ij O D S ijE Z h i h j h= c

Positive function of the origin i

Positive function of the origin j

Separation attributes: distance, cost, etc.

( ),( )O O ih ia

p= ( ),( )D D jh jb

p= ( ) ( )S ij ijh q-=c c

Page 5: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University5 E6885 Network Science – Lecture 7: Network Flow

Example: Austrian call data

Phone traffic between 32 telecommunication districts in Austria in 1991.

Call flow volume versus each of origin Gross Regional Product (GRP), destination GRP, and distance.

Linear regression (dotted), and a nonparametric smoother (solid)

Page 6: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University6 E6885 Network Science – Lecture 7: Network Flow

Inference for Gravity Models

Focusing on this model (general gravity model):

Generic iteratively re-weighted least-squares method can be used.

log Tij i j ijm a b q= + + c

Page 7: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University7 E6885 Network Science – Lecture 7: Network Flow

Example: Gravity Model Accuracy of

estimates of traffic volume made by the standard (left, in blue) and general (right, in green) gravity models for the Austrian call data.

The standard model tends to over-estimate in somewhat greater frequency than the general model, particularly for medium- and low-volume flows.

The relative error decreases with volume.

Page 8: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University8 E6885 Network Science – Lecture 7: Network Flow

SIR Model

Model Human Nodes as S-I-R (Susceptible, Infected, and Removed).

Did not consider individual node’s behavior distinctly in network structure/topology did not consider edge status.

S RI

Page 9: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University

Page 10: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University

Page 11: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University

Page 12: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University12 E6885 Network Science – Lecture 7: Network Flow

Modeling and Predicting Personal Information Dissemination Behavior

Xiaodan Song, Ching-Yung Lin, Belle Tseng, and Ming-Ting Sun -- KDD 2005

Page 13: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University13 E6885 Network Science – Lecture 7: Network Flow

Utilizing relational and temporal info provides more insight than pure content analysis

What are a person’s

– role in events?

– with whom do you discuss what is going on in the company?

– behavior evolution?

– interests, tastes?

In a certain event,

– who played the most influential roles?

– who knew the information?

– how will a person or group of person response for future event?

Email,Publications

Time

Page 14: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University14 E6885 Network Science – Lecture 7: Network Flow

Outline

Motivation

The Content-Time-Relation (CTR) model

Experimental results

Conclusions and ongoing work

Page 15: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University15 E6885 Network Science – Lecture 7: Network Flow

Motivation Goal

– Personal information management

– Modeling and Predicting Personal Behaviors

Prior-Art Systems -- Linkedin, Orkut, Friendster, Yahoo! 360o

– Share what matters to you

•Create your own place online

•Share photos

•Create a blog

•List your favorites

•Send a blast, and more

– Keep your friends and family close

– Control who sees what

•Share as much as you want, with whomever you want

Tools for visually managing personal social networks

However in current solutions– Users need to manually input, update, and manage these networks

– Do not model or predict personal behaviors

Page 16: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University16 E6885 Network Science – Lecture 7: Network Flow

Enron Email Dataset

Enron Email Dataset

– A huge collection of real e-mail messages sent and received by employees of the Enron corporation.

– 493,391 emails from 154 users within 1999 – 2002 (’99 -11,196, ‘00 – 196,157, ‘01 – 272,875, ‘02 – 35,922)

– Unique messages – 166,653, Intra-Enron messages – 25,428

Page 17: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University17 E6885 Network Science – Lecture 7: Network Flow

Overview of CTR Model

CTR model incorporates content, time and relations in a generative probabilistic way

: observations

A

ND

T

u

z

wr

CTR

S

Tm

t

: observations

A

ND

T

u

z

wr

CTR

S

Tm

t

Input: Emails

From: [email protected]: [email protected]

Subject: Re: timing of submitting information to Risk Controls

Good memo - let me know if …

Information Extraction

People: Jeff … (154)Role: Sender/ReceiverContent: Bag of Words

Time: 1999 – 2002

CommunityNet

CTRModel

ApplicationsReceiver Recommendation system

PredictionFiltering

Topic: California Energy

Time: 2000-2001 CTR Mode

l

Page 18: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University18 E6885 Network Science – Lecture 7: Network Flow

Related Work (I) – Social Network Analysis

Static Social Network Analysis– Small world: six degrees of separation [Milgram 1967]

– Introduce “link analysis” into information retrieval (Page rank [BP ‘98] , Hits [K’98])

– Mine communities from the web [Flake 2002]

– Mining the network value of customers (Domingos et al. 2001, Kempe et al. 2003)

– Exponential Random Graph Model (ERGM [Wasserman et al.1996])

Dynamic Social Network Analysis– Link prediction [Nowell and Kleinberg 2003]

– Tracking network changes [Kubica et al. 2002]

– Dynamic actor-oriented social network [Snijders 2003]

Page 19: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University19 E6885 Network Science – Lecture 7: Network Flow

Related Work (II) – Content Analysis

Latent Semantic Analysis (LSA) [Deerwester et al. 1990]– Capture the semantic concepts of documents by

mapping words into the latent semantic space which captures the possible synonym and polysemy of words

– Based on truncated SVD of document-term matrix: optimal least-square projection to reduce dimensionality

Probabilistic LSA [Hofmann 1999]

– Statistical view of LSA Latent Dirichlet Allocation (LDA) [Blei et al. 2003]

– A generative model which assigns Dirichlet priors to the class modeling at PLSA

– Assume a document is a mixture of topics Author-Topic model [Rosen-Zvi et al. 2004]

– Try to recognize which part of the document is

contributed by which co-author. A document with

multiple authors is a mixture of the distributions

associated with authors.

– Each author is associated with a multinomial distribution

over topic; Each topic is associated with a multinomial

distribution over words Author-Recipient-Topic model [McCallum et al. 2005]

– Given the sender and the set of receivers of an email, find senders have similar role in events

X T0

S0

D0

N x M N x K K x K K x M

· ·=te

rms

documents

00

~

LSA

None of the previous models use temporal information and social/relational information

Page 20: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University20 E6885 Network Science – Lecture 7: Network Flow

Our Contribution

Assumption

– People tend to send emails to different groups of people regarding different time periods

Approach

– Identify context

•Who does one user communicate with regarding a given topic?

– Identify temporal evolution

•How do relations change over time?

Content-Time-Relation Model (CTR model)

Content

Time

Social• Who knows who• People influence each other• Information flow

• Networks grow & decay• Information diffusion

Who knows what

Page 21: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University21 E6885 Network Science – Lecture 7: Network Flow

w

Content-Relation (CR)

– Content topic classification

– Integrate social network model

Combine content and social relation information with Dirichlet allocations, a causal Bayesian network and an Exponential Random Graph Social Network Model.

Content-Time-Relation Algorithm -- I

: observations

a: sender/author, z: topic, S: social network (Exponential Random Graph Model / p* model), D: document/emailr: receivers, w: content words, N: Word set, T: Topic

a qA

ND

bT

f

u

z

Given the sender of an email:1. Get the probability of a topic given the sender

2. Get the probability of the receiver given the sender and the topic

3. Get the probability of a word given the topic

r

CR model

S

Given the sender and the set of receivers of an email:

1. Pick a receiver

2. Get the probability of a topic given the sender and receiver

3. Get the probability a word given the topic

[McCallum et. al, 2005]

p*

Page 22: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University22 E6885 Network Science – Lecture 7: Network Flow

u

w

Content-Time-Relation Algorithm -- II

: observations

a: sender/author, z: topic, S: social network (Exponential Random Graph Model / p* model), D: document/emailr: receivers, w: content words, N: Word set, T: Topic

a qA

ND

bT

f

z

Given the sender and the

time of an email:1. Get the probability of a topic given the sender

2. Get the probability of the receiver given the sender and the topic

3. Get the probability of a word given the topic

r

CTR model

S

Given the sender and the set of receivers of an email:

1. Pick a receiver

2. Get the probability of a topic given the sender and receiver

3. Get the probability a word given the topic

[McCallum et. al, 2005]

p*

Content-Time-Relation (CTR)

– Topic + time -> event

– Capture evolutionary information

– Integrate social network model

Combine content, time and social relation information with Dirichlet allocations, a causal Bayesian network and an Exponential Random Graph Social Network Model.

g jTm

t

Page 23: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University23 E6885 Network Science – Lecture 7: Network Flow

CTR algorithm

Training phase

–Input

• Old emails with content, sender and receiver information, and time stamps

–Output

Testing phase

–Input

• New emails with content and time stamps

–Output

( ) ( ) ( )| , , | , , and , | ,old old oldP w z t P z d t P u r z t

( ) ( ) ( ), | , , | , , and | ,new new newP u r d t P w z t P z d t

Page 24: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University24 E6885 Network Science – Lecture 7: Network Flow

Adaptive CTR

Social networks dynamically change and evolve over time

update the model with newest user behavior information is necessary

–Aggregative updating the model by adding new user behavior information including the senders and receivers into the model

–Assume the correlation between current data and the previous data decays over time. The more recent data are more important.

• A sliding window of size n is used to choose the data for building the prediction model

• The prediction is only dependent on the recent data, with the influence of old data ignored

( ) ( ) ( ) ( ) ( )1 /

ˆ , | , , | , | , , | | ,i

t ti old

K

i k old k i old t ik z z

P u r d t P u r z t P z d t P u r t P z d t=

= +å å

Page 25: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University25 E6885 Network Science – Lecture 7: Network Flow

Personal Social Network

PSN: who a user contacts with during a certain time period

(a) Jan-‘99 to Dec-‘99 (b) Jan-‘00 to Jun-‘00 (c) Jul-‘00 to Dec-‘00

( ) number of times sends emails to |

total number of emails sent out by

u rP r u

u=

Page 26: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University26 E6885 Network Science – Lecture 7: Network Flow

CommunityNet

“Christmas” “Energy”

Provide a query apply CTR model visualize the personal topic community by CommunityNet

Page 27: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University27 E6885 Network Science – Lecture 7: Network Flow

Topic Analysis - Hot and cold topics

Hot Topics Regular Issues

Meeting Deal Petroleum Texas Document

meetingplan conference

balance presentation discussion

dealdeskbookbill

group explore

petroleum researchdear

photoEnronstation

HoustonTexasEnron

north America street

letterdraft

attach comment reviewmark

Cold Topics Specific or Sensitive Issues

Trade Stock Network Project Market

trade London

bankname

Mexico conserve

stockearn

company sharepricenew

network worldusersave

secure system

courtstateIndia

server project govern

callmarket week

trade description respond

Page 28: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University28 E6885 Network Science – Lecture 7: Network Flow

Topic Trends - yearly repeating events

Topic Trends

0

0.005

0.01

0.015

0.02

0.025

0.03

Jan Mar May Jul Sep Nov

Pop

ular

ity

Topic45(y2000)

Topic45(y2001)Topic19(y2000)

Topic19(y2001)

Topic 45, which is talking about a schedule issue, reaches a peak during June to September. For topic 19, it is talking about a meeting issue. The trend repeats year to year.

Page 29: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University29 E6885 Network Science – Lecture 7: Network Flow

CTR Model Finds Topic Categories, Key People and Communities Simultaneously

Topic Analysis for Topic 61

00.002

0.0040.0060.008

0.010.0120.014

0.0160.018

Jan-00 Apr-00 Jul-00 Oct-00 Jan-01 Apr-01 Jul-01 Oct-01

Pop

ular

ity

Key Words power 0.089361 California 0.088160 electrical 0.087345 price 0.055940 energy 0.048817 generator 0.035345 market 0.033314 until 0.030681

Key People

Jeff_Dasovich 0.249863 James_Steffes 0.139212Richard_Shapiro 0.096179 Mary_Hain 0.078131Richard_Sanders 0.052866 Steven_Kean 0.044745

Event “California Energy Crisis” occurred at exactly this time periodKey people can be identified to be active in this event

Topic Trend of “California Power”

Page 30: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University30 E6885 Network Science – Lecture 7: Network Flow

Personal Topic Trends of “California Power”

0

0.1

0.2

0.3

0.4

0.5

Jan-00 May-00 Sep-00 Jan-01 May-01 Sep-01

Pop

ular

ityOverall trend

Jeff_Dasovich

Vince_Kaminski

Page 31: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University31 E6885 Network Science – Lecture 7: Network Flow

Predicting Email Receivers

Personal social network

– People tend to send emails to the same group of people

Latent Dirichlet Allocation - Personal social network

– Topic clusters do not change over time

Content-Time-Relation model

Adaptive CTR model

05

1015202530354045

Jan-01 Mar-01 May-01 Jul-01 Sep-01 Nov-01

Adaptive CTR(aggregative)

Adaptive CTR(6 months)

CTR

LDA-PSN

PSN

Comparison using Breese evaluation metrics

Page 32: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University32 E6885 Network Science – Lecture 7: Network Flow

CTR Model: Predicting Receivers

Prediction Performance

Is a person’s behavior predictable?

Jeff Dasovich (Enron government relations executive):Whom should I discuss with about “Government” issue?

Personal behavior and intention are somewhat predictable

0

0.2

0.4

0.6

0.8

1

Jan-01 Mar-01 May-01 Jul-01 Sep-01 Nov-01

Time

Acc

urac

y

by PSNby LDA-PSNby CTRAdaptive CTR

Page 33: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University33 E6885 Network Science – Lecture 7: Network Flow

Conclusions and ongoing work

Conclusion

– Automatically model and predict human behavior of receiving and disseminating information

– Establish personal CommunityNet profiles based on the Content-Time-Relation algorithm, which incorporates contact, content, and time information simultaneously from personal communication

– Explore many interesting results,

• Finding the most important employees in events

• Predicting senders or receivers of emails

– Perform better than both the social network-based and the content-based predictions

Personal behavior and intention are somewhat predictable

Ongoing work

– incorporate nonparametric Bayesian methods such as hierarchical LDA with contact and time information

– Extend the CTR model to Content-Time-Context model for personalized Retrieval and Recommendation

Page 34: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University34 E6885 Network Science – Lecture 7: Network Flow

Personalized Recommendation Driven by Information Flow

Xiaodan Song, Belle Tseng, Ching-Yung Lin and Ming-Ting Sun -- SIG 2006

Page 35: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University35 E6885 Network Science – Lecture 7: Network Flow

A

B

People with similar tastes

adopt?Infer

adoptGiven

Recommendation by Collaborative Filtering (CF)

Similarity is symmetric!

A

B

People with similar tastes

adopt?

Infer

adoptGiven

Page 36: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University36 E6885 Network Science – Lecture 7: Network Flow

Adoptions follow a sequenceN

um

ber

of

Acc

esse

d U

sers

0 5 1 0 1 5 2 00

2

4

6

8

1 0

Early adopterLate adopter

Apr. 2004 Jul. 2005

Page 37: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University37 E6885 Network Science – Lecture 7: Network Flow

Rogers’ Diffusion of Innovations Theory

2.5

13.5

34 34

16

0

5

10

15

20

25

30

35

40

Innovators Earlyadopters

Early majority Late majority Laggards

Users’ adoption patterns: Some users tend to adopt items earlier than others

Percentage over all adopters

Page 38: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University38 E6885 Network Science – Lecture 7: Network Flow

Innovators

People with similar tastes

Early adopters

Early majorityLate majority

Laggards

Innovators

People with similar tastes

Early adopters

Early majorityLate majority

Laggards

Recommendation Driven by Information Flow – An Intuitive Example

Influence is not symmetric!

adopt

adopt

?

adopt?

adopt?Most likely!

Less likely!

Page 39: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University39 E6885 Network Science – Lecture 7: Network Flow

Utilize Information Flow for Personalized Recommendation -- Problem Formulation

The typical CF question: What items will user U like?

Our Formulation

Given user U adopts item Y, who would be likely to adopt item Y next?

Information “flows” from earlier adopters to later adopters

Innovators

Laggards

Page 40: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University40 E6885 Network Science – Lecture 7: Network Flow

Analogy: Information Adoption As A Diffusion Process

Given user U adopts item Y, who would be likely to adopt item Y next?

Information Adoption Information Flow (Diffusion)

In physics, diffusion process is usually related to a random walk [R. Kondor and J.-P. Vert, Diffusion Kernels, 2004]

Information Adoption is modeled as a random walk

Users are ranked by the state probabilities

Page 41: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University41 E6885 Network Science – Lecture 7: Network Flow

Scheme Overview (I)

Leverage the asymmetric influence

–Information Flow Network (IF) – model the asymmetric influences between users

–Information Propagation Model – if a user adopts the information, who will likely be the follower?

1 5 2005/3/1u v

r1

r2 r3

IF Information Propagation

Model

Application:Personalized

RecommendationTimestamp

Item ID

User ID

Page 42: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University42 E6885 Network Science – Lecture 7: Network Flow

Adoptions patterns are typically category specific

–Topic-Sensitive Information Flow Network (TIF) – model the asymmetric influences between users under the same topic

Scheme Overview (II)

TIF TopicDetection

IF Information Propagation

Model

Application:Personalized

RecommendationTimestamp

Item ID

User ID

Page 43: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University43 E6885 Network Science – Lecture 7: Network Flow

Objective

–Model the asymmetric influences between users

IF (I)

Early Adoption Matrix (EAM)

–Count how many items one user adopts earlier than the other – pairwise comparison

Dataset IF Information Propagation

Model

Application:Personalized

RecommendationTimestampItem IDUser ID

0…300User N

……………

18…027User 2

8…100User 1

User N…User 2User 1

0…300User N

……………

18…027User 2

8…100User 1

User N…User 2User 1

Page 44: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University44 E6885 Network Science – Lecture 7: Network Flow

IF (II)

IF – A Random Walk Model

–Network structure

• Each user as a node (state)

• The value on edge (i j) represents how likely user j will follow user i to adopt the information

–Normalize EAM to a Transition probability Matrix – F

i jFij

Page 45: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University45 E6885 Network Science – Lecture 7: Network Flow

The random walk over the following graphs does not converge

Make the random walk have a unique stationary distribution (= F + random jump)

1) Make the matrix stochastic 2) Make the matrix irreducible

IF (III) Dataset IF Information Propagation

Model

Application:Personalized

RecommendationTimestampItem IDUser ID

Sink Cycle

F

, ,

,

0

1

u v u vv

u v

if

elseN

¹ìï= íïî

åF FF

( )1 Na a= + - TF F ee

N: number of the nodes

Page 46: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University46 E6885 Network Science – Lecture 7: Network Flow

LatentDirichletAllocation[Blei et al. 2003]

TIF IF Information Propagation

Model

Application:Personalized

Recommendation

TIF Topic

Detection

TimestampItem IDUser ID

TOPIC 1

TOPIC 3

TOPIC 2TOPIC 1

TOPIC 3

TOPIC 2

TOPIC 2TOPIC 1

TOPIC 3TIF

θθ

wwW D

ββ

αα

zz

Tf: Observations

Page 47: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University47 E6885 Network Science – Lecture 7: Network Flow

Information Propagation Models (I)

1. Summation of various propagation steps

a special case:

when and

N: number of the nodes

( )1

1

0

...

0

N Tif N -

é ùê úê ú® =ê úê úê úë û

F F U U

( ) ( )2 mif m m= + + +F F F F

Dataset IF Information Propagation

Model

Application:Personalized

RecommendationTimestampItem IDUser ID

1m N= - N ® ¥

u v

r1

r2 r3

where U: eigenvector

Page 48: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University48 E6885 Network Science – Lecture 7: Network Flow

Information Propagation Models (II)

1. Exponential weighted summation

The longer the path, the less reliable it is

( )( )

( )

( )

2(exp)

exp 1

expexp

exp

Tif

N

bb l

b

b l

é ù×ê ú×ê úµ = ê úê ú

×ê úë û

F F U U⋯

N: number of the nodes

Dataset IF Information Propagation

Model

Application:Personalized

RecommendationTimestamp

Item ID

User ID

u v

r1

r2 r3

1 21 Nl l l= > > >⋯where eigenvalues

Page 49: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University49 E6885 Network Science – Lecture 7: Network Flow

Personalized Recommendation

Construct IF or TIF based on the historical data

Trigger earliest users to start the process

Predict who will be also interested in these items by information propagation models

Dataset IF Information Propagation

Model

Application:Personalized

RecommendationTimestampItem IDUser ID

Page 50: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University50 E6885 Network Science – Lecture 7: Network Flow

Experimental Setup

Sales-force dataset

– Apr. 2004 to Apr. 2005 as training data

– May 2005 to Jul. 2005 as test data

•1033 users, 586 documents

MovieLens dataset

– 943 users, 1682 movies, 100,000 actions

– The log data regarding early 80% disclosed movies as training data, late 20% as test data

Evaluation

– Baseline – Collaborative Filtering (CF)

– Metric – Precision & Recall

Page 51: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University51 E6885 Network Science – Lecture 7: Network Flow

How consistent are users’ pairwise adoption behaviors over time?Calculate Transition Prob. Matrices (TPM) of both training and test data

For each user I, calculate the correlation value of

Consistency of Early Adoption Patterns

0 0 . 2 0 . 4 0 . 6 0 . 8 10

1 0 0

2 0 0

3 0 0

4 0 0

C o r r e l a t i o n V a l u e

Num

ber

of U

sers

0 0 . 2 0 . 4 0 . 6 0 . 8 10

2 0 0

4 0 0

6 0 0

C o r r e l a t i o n V a l u e

Num

ber

of U

sers

ER MovieLensBaseline 1Baseline 2IF

Baseline 1Baseline 2IF

The ith row from TPM of the test data and uniform(1/(N-1)) (Baseline 1) The ith row from TPM of the test data and uniform(1/M) (M is the number of users used

in CF) (Baseline 2) The ith rows from these two TPMs

Page 52: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University52 E6885 Network Science – Lecture 7: Network Flow

Experimental Results --Recommendation Quality

Comparing to Collaborative Filtering (CF) Precision: IF is 91% better, TIF is 108% better Recall: IF is 87% better, TIF is 113% better

P recision Com parison (Num ber of T riggered Users =1, P ropagat ion St eps = 1)

0

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4No. of ret rieved users

Prec

isio

n

CFEABIFT EABIF

Recall Com parison (Number of T riggered Users = 1,P ropagat ion St eps = 1)

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

1 2 3 4No. of ret rieved users

Rec

all

CFEABIFT EABIF

Number of recommended users Number of recommended users

CFIFTIF

CFIFTIF

Page 53: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University53 E6885 Network Science – Lecture 7: Network Flow

Experimental Results -- Propagation Performance

P recision Im provem ent Comparison (Number of t riggeredusers = 1, Baseline: CF)

00.20.40.60.8

11.21.41.6

m =

1

m =

2

m =

3

m =

4

m =

5

sum

exp(

β= 1

)

exp(

β= 1

.5)

exp(

β= 2

)

exp(

β= 3

)

exp(

β= 4

)

exp(

β= 5

)

exp(

β= 8

)

exp(

β= 1

6)

Rat

io (

x100

%)

EABIFT EABIF

Recall Im provem ent Com parison (Number of t riggeredusers = 1 , Baseline: CF)

00.20.40.60.8

11.21.4

m =

1

m =

2

m =

3

m =

4

m =

5

sum

exp(

β= 1

)

exp(

β= 1

.5)

exp(

β= 2

)

exp(

β= 3

)

exp(

β= 4

)

exp(

β= 5

)

exp(

β= 8

)

exp(

β= 1

6)

Rat

io (

x100

%)

EABIFT EABIF

TIF with exponential weighted summation ( ) achieves the best performance: improves 136% on precision and 126% on recall comparing to CF

4b =

IFTIF

IFTIF

m ®

Page 54: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University54 E6885 Network Science – Lecture 7: Network Flow

Experimental Results --Recommendation Quality

P recision Comparison (Number of T riggered Users =1, P ropagat ion Steps = 1)

0

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4No. of ret rieved users

Prec

isio

n

CFEABIFT EABIF

Recall Com parison (Num ber of T riggered Users = 1,P ropagat ion St eps = 1)

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

1 2 3 4No. of ret rieved users

Rec

all

CFEABIFT EABIF

P recision Com parison (Num ber of T riggered Users =2, P ropagat ion St eps = 1)

0

0.1

0.2

0.3

0.4

0.5

1 2 3 4No. of ret rieved users

Prec

isio

n

CFEABIFT EABIF

Recall Comparison (Number of T riggered Users = 2,P ropagat ion Steps = 1)

0

0.02

0.04

0.06

0.08

0.1

0.12

1 2 3 4No. of ret rieved users

Rec

all

CFEABIFT EABIF

Comparing to Collaborative Filtering (CF) Precision: IF is 91% better, TIF is 108% better Recall: IF is 87% better, TIF is 113% better

Number of recommended users Number of recommended users

Number of recommended users Number of recommended users

CFIFTIF

CFIFTIF

CFIFTIF

CFIFTIF

Page 55: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University55 E6885 Network Science – Lecture 7: Network Flow

Conclusions and Next Steps

Conclusions

–Utilize sequential adoption patterns

–Leverage asymmetric influences between users– IF

–Leverage category-specific patterns – TIF

–Identify how information flows through the network – information propagation models

Next Steps

–Leverage the diffusion rate

–Improve the information propagation models

–Evaluate by online user study

Page 56: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University56 E6885 Network Science – Lecture 7: Network Flow

Final Project

Team work: 1 – 3 people per team

Project Idea proposal, no later then 10/30/2013 please send me and TA an email of your initial rough idea / plan, discussing with me / TA.

Project initial preparation & plan presentation: 11/11/2013 5 mins per person. Team will present together.

Potential Topics:

–Classification & Prediction of People Behavior in Networks

–Predicting Network Characteristics Evolution

–Visualizing Networks

–Verifying fitness of various network models to practical data

–Anything related to network…..

Page 57: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University57 E6885 Network Science – Lecture 7: Network Flow

Anything on Networks Formation of Network

–Communications–Information–People–Companies / Organizations–Nations

Network Data Collection

Network Science Infrastructure Network Applications

Network Visualization

Network Sampling, Indexing and Compression

Network Flow

Network Evolution and Dynamics

Network Impact

Cognitive Networks

Electrical Engineering

Computer Science

Sociology, Public Health

Economics, Management, Politics

International Relationships, History

Physics

Law

Arts, Math

Bio, Cognition, Behavior Science

Math

Page 58: E6885 Network Science Lecture 7 - Columbia University · Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction

© 2013 Columbia University58 E6885 Network Science – Lecture 7: Network Flow

Other possible topics not discussed in the textbook

Network compression

Cognitive networks

Mobile applications

Graph databases