22
Trusted Decentralized PageRank Introduction JXP TrustJXP Experimental Results Conclusion and Future Work Efficient and Decentralized PageRank Approximation in P2P Networks with Malicious Agents Josiane Xavier Parreira , Debora Donato , Carlos Castillo , Gerhard Weikum Max-Planck Institute for Informatics Yahoo! Research Barcelona The Future of Web Search June 18, 2007

Efficient and Decentralized PageRank Approximation in P2P ...tfws07/program/talks/donato.pdf · Approximation in P2P Networks with Malicious Agents Josiane Xavier Parreira?, Debora

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Efficient and Decentralized PageRank Approximation in P2P ...tfws07/program/talks/donato.pdf · Approximation in P2P Networks with Malicious Agents Josiane Xavier Parreira?, Debora

TrustedDecentralized

PageRank

Introduction

JXP

TrustJXP

ExperimentalResults

Conclusion andFuture Work

Efficient and Decentralized PageRankApproximation in P2P Networks with

Malicious Agents

Josiane Xavier Parreira ? , Debora Donato � ,Carlos Castillo � , Gerhard Weikum ?

? Max-Planck Institute for Informatics� Yahoo! Research Barcelona

The Future of Web SearchJune 18, 2007

Page 2: Efficient and Decentralized PageRank Approximation in P2P ...tfws07/program/talks/donato.pdf · Approximation in P2P Networks with Malicious Agents Josiane Xavier Parreira?, Debora

TrustedDecentralized

PageRank

Introduction

JXP

TrustJXP

ExperimentalResults

Conclusion andFuture Work

Introduction

Distributed Web Search

Limitation of current centralized approach to Web search:

political issues

privacy

scalability

cope with the dynamicity of the Web

Solution

Distribute Web search facilities in distributed environment

Page 3: Efficient and Decentralized PageRank Approximation in P2P ...tfws07/program/talks/donato.pdf · Approximation in P2P Networks with Malicious Agents Josiane Xavier Parreira?, Debora

TrustedDecentralized

PageRank

Introduction

JXP

TrustJXP

ExperimentalResults

Conclusion andFuture Work

Introduction

Peer-to-peer technology

for storing and sharing information

to guarantee scalability and robustness

P2P Web Search advantages

lighter load

smaller data volume

more computational resources

Limitations

Decentralized nature opens doors to malicious behaviors frompeers.

Page 4: Efficient and Decentralized PageRank Approximation in P2P ...tfws07/program/talks/donato.pdf · Approximation in P2P Networks with Malicious Agents Josiane Xavier Parreira?, Debora

TrustedDecentralized

PageRank

Introduction

JXP

TrustJXP

ExperimentalResults

Conclusion andFuture Work

Focus – Decentralized Ranking

Ranking

Ranking is a fundamental task in Web Search.

Decentralized PageRank – JXP algorithm[VLDB’06]

Decentralized algorithm for computing authority scores ofpages in a P2P Network

Assumes peers are always honest.

Trusted Decentralized PageRank – TrustJXP[AIRWeb’07]

Decentralized reputation system to be integrated intoJXP.

Allows computation of “trusted” authority scores.

Page 5: Efficient and Decentralized PageRank Approximation in P2P ...tfws07/program/talks/donato.pdf · Approximation in P2P Networks with Malicious Agents Josiane Xavier Parreira?, Debora

TrustedDecentralized

PageRank

Introduction

JXP

TrustJXP

ExperimentalResults

Conclusion andFuture Work

JXP Algorithm [VLDB’06]

1

W node:

G → C

J → E

A

B

D

E

WC

W node:

K → E

L → G

F

G

WE

A → F

E → G

G → C

F → A

E → B

W node:

G → C

J → E

F → A

F → E

K → E

A

B

D

E

WC

W node:

K → E

F

A → F

E → GF → E

F → G

Peer X

Peer Y

Peer X

Subgraph relevant to Peer X

F → A

Runs locally at every peer

Combines local PageRank computations + Meetings betweenpeers

JXP scores converge to the true global PageRank scores

Page 6: Efficient and Decentralized PageRank Approximation in P2P ...tfws07/program/talks/donato.pdf · Approximation in P2P Networks with Malicious Agents Josiane Xavier Parreira?, Debora

TrustedDecentralized

PageRank

Introduction

JXP

TrustJXP

ExperimentalResults

Conclusion andFuture Work

TrustJXP Algorithm

Goal

Detect when peers report false scores at the meeting phase.

Idea

Analyze peer’s deviation from common features thatconstitute usual peer profile.

Forms of attack addressed

Peers report higher scores for a subset of their localpages.

Peers permute the scores of its local pages.

Page 7: Efficient and Decentralized PageRank Approximation in P2P ...tfws07/program/talks/donato.pdf · Approximation in P2P Networks with Malicious Agents Josiane Xavier Parreira?, Debora

TrustedDecentralized

PageRank

Introduction

JXP

TrustJXP

ExperimentalResults

Conclusion andFuture Work

Malicious Increase of Scores

Why peers cheat

High authority scores for local pages can bring benefits to apeer.

Our approach

Analyze the distribution of the scores reported by a peer.

Use histograms to store and compare score distributions.

Motivation: Web graph is self-similar → local scoresdistribution should resemble global distribution after afew iterations.

Page 8: Efficient and Decentralized PageRank Approximation in P2P ...tfws07/program/talks/donato.pdf · Approximation in P2P Networks with Malicious Agents Josiane Xavier Parreira?, Debora

TrustedDecentralized

PageRank

Introduction

JXP

TrustJXP

ExperimentalResults

Conclusion andFuture Work

Histograms

Histograms

Each peer stores a histogram H.

Scores from other peers are inserted after each meeting.

A novelty factor accounts for the dynamics of the scores.

H(t+1) = (1− ρ)Ht + ρD

D is the score distribution of the other peer, and ρ is thenovelty factor.

Page 9: Efficient and Decentralized PageRank Approximation in P2P ...tfws07/program/talks/donato.pdf · Approximation in P2P Networks with Malicious Agents Josiane Xavier Parreira?, Debora

TrustedDecentralized

PageRank

Introduction

JXP

TrustJXP

ExperimentalResults

Conclusion andFuture Work

Histograms

Comparing Histograms

Hellinger Distance

HDi ,j =1√2[∑k

(√

Hi (k)−√

Dj(k))2]12

k = total number of bucketsHi (k) and Dj(k) = number of elements at bucket k at thetwo distributions

Page 10: Efficient and Decentralized PageRank Approximation in P2P ...tfws07/program/talks/donato.pdf · Approximation in P2P Networks with Malicious Agents Josiane Xavier Parreira?, Debora

TrustedDecentralized

PageRank

Introduction

JXP

TrustJXP

ExperimentalResults

Conclusion andFuture Work

Malicious Permutation of Scores

Problem

Peers can cheat and yet keep the original scoredistribution.

Histogram comparison not effective in this case.

Our approach

Compare the rankings from both peers for theoverlapping graph.

Observation: Relative order of scores very close to theactual ordering, after few meetings.

Page 11: Efficient and Decentralized PageRank Approximation in P2P ...tfws07/program/talks/donato.pdf · Approximation in P2P Networks with Malicious Agents Josiane Xavier Parreira?, Debora

TrustedDecentralized

PageRank

Introduction

JXP

TrustJXP

ExperimentalResults

Conclusion andFuture Work

Comparing Rankings

Tolerant Kendall’s Tau Distance

K ′i ,j =|(a, b) : a < b ∧ scorei (a)− scorei (b) ≥ ∆

∧ τi (a) < τi (b) ∧ τj(a) > τj(b)|

scorei (a), scorei (b) = scores of pages a and b at peer iτi , τj = rankings of pages at peers i and j∆ = tolerance threshold

Page 12: Efficient and Decentralized PageRank Approximation in P2P ...tfws07/program/talks/donato.pdf · Approximation in P2P Networks with Malicious Agents Josiane Xavier Parreira?, Debora

TrustedDecentralized

PageRank

Introduction

JXP

TrustJXP

ExperimentalResults

Conclusion andFuture Work

TrustJXP Algorithm

Computing Trust Scores

Idea: Combine previous measures to assign trust scoresto peers.

Each peer assigns its own trust score to another peer, ateach meeting step.

How to combine the measures? We take a “safer”approach.

θi ,j = min(1− HDi ,j , 1− K ′i ,j)

Trust score is integrated to the JXP computing, at themerging lists phase.

Page 13: Efficient and Decentralized PageRank Approximation in P2P ...tfws07/program/talks/donato.pdf · Approximation in P2P Networks with Malicious Agents Josiane Xavier Parreira?, Debora

TrustedDecentralized

PageRank

Introduction

JXP

TrustJXP

ExperimentalResults

Conclusion andFuture Work

Integrating Trust Scores and JXP Scores

Integrating Trust Scores and JXP Scores

When merging lists, scores from both lists can becombined by either averaging or taking the max score.

If page is not present on a list → score = 0.

Averaging the scores

JXP: L′(i) = (LA(i) + LB(i))/2TrustJXP: L′(i) = (1− θ/2) ∗ LA(i) + θ/2 ∗ LB(i)

Taking max score

JXP: L′(i) = max(LA(i), LB(i))TrustJXP: L′(i) = max(LA(i), θ ∗ LB(i))

Page 14: Efficient and Decentralized PageRank Approximation in P2P ...tfws07/program/talks/donato.pdf · Approximation in P2P Networks with Malicious Agents Josiane Xavier Parreira?, Debora

TrustedDecentralized

PageRank

Introduction

JXP

TrustJXP

ExperimentalResults

Conclusion andFuture Work

Experimental Results

Web collection

Obtained using a focused crawler.

134,405 pages, 1,915,401 links.

10 categories.

Setup

100 honest peers, 10 peers/category.

Malicious peers

Perform JXP meetings and local PR computation like anormal peer.Lie when asked by another peer about the local scores,according to attacks previously described.

Page 15: Efficient and Decentralized PageRank Approximation in P2P ...tfws07/program/talks/donato.pdf · Approximation in P2P Networks with Malicious Agents Josiane Xavier Parreira?, Debora

TrustedDecentralized

PageRank

Introduction

JXP

TrustJXP

ExperimentalResults

Conclusion andFuture Work

Experimental Results

Evaluation Measures

“Global” JXP ranking vs. Global PageRank ranking.

Spearman’s Footrule Distance at top-k.

Linear error score at top-k.

Cosine at full ranking.

L1 norm of full JXP ranking (L1 norm of Global PRalways 1).

Page 16: Efficient and Decentralized PageRank Approximation in P2P ...tfws07/program/talks/donato.pdf · Approximation in P2P Networks with Malicious Agents Josiane Xavier Parreira?, Debora

TrustedDecentralized

PageRank

Introduction

JXP

TrustJXP

ExperimentalResults

Conclusion andFuture Work

JXP Performance - No Malicious Peers

0 5000 10000 15000 200000

0.1

0.2

0.3

0.4

Number of Meetings in the Network

Spearman’s Footrule Distance

100 Peers − No Malicious

0 1000 2000 3000 4000 50000

0.5

1

1.5

2x 10

−4

Number of Meetings in the Network

Linear Error Score

100 Peers − No Malicious

0 5000 10000 15000 200000.8

0.85

0.9

0.95

1

Number of Meetings in the Network

Cosine

100 Peers − No Malicious

0 5000 10000 15000 200000.4

0.6

0.8

1

Number of Meetings in the Network

L1 Norm

100 Peers − No Malicious

Page 17: Efficient and Decentralized PageRank Approximation in P2P ...tfws07/program/talks/donato.pdf · Approximation in P2P Networks with Malicious Agents Josiane Xavier Parreira?, Debora

TrustedDecentralized

PageRank

Introduction

JXP

TrustJXP

ExperimentalResults

Conclusion andFuture Work

Impact of Malicious Peers(Peers report 2x the true score value for all local pages)

0 1000 2000 3000 40000.2

0.4

0.6

0.8

1

Number of Meetings in the Network

Spearman’s Footrule Distance

110 Peers − 10 Malicious150 Peers − 50 Malicious

0 1000 2000 3000 40000

2

4

6

8x 10−4

Number of Meetings in the Network

Linear Score Error

110 Peers − 10 Malicious150 Peers − 50 Malicious

0 1000 2000 3000 40000.7

0.8

0.9

1

Number of Meetings in the Network

Cosine

110 Peers − 10 Malicious150 Peers − 50 Malicious

0 1000 2000 3000 40000

1

2

3

4

Number of Meetings in the Network

L1 Norm

110 Peers − 10 Malicious150 Peers − 50 Malicious

Page 18: Efficient and Decentralized PageRank Approximation in P2P ...tfws07/program/talks/donato.pdf · Approximation in P2P Networks with Malicious Agents Josiane Xavier Parreira?, Debora

TrustedDecentralized

PageRank

Introduction

JXP

TrustJXP

ExperimentalResults

Conclusion andFuture Work

Averaging the Scores

0 1000 2000 3000 4000 50000.1

0.2

0.3

0.4

Number of Meetings in the Network

Spearman’s Footrule Distance

Peers reporting 2x the scoresPeers reporting 5x the scores

0 1000 2000 3000 4000 50000.5

1

1.5

2x 10−4

Number of Meetings in the Network

Linear Score Error

Peers reporting 2x the scoresPeers reporting 5x the scores

0 1000 2000 3000 4000 50000.8

0.85

0.9

0.95

1

Number of Meetings in the Network

Cosine

Peers reporting 2x the scoresPeers reporting 5x the scores

0 1000 2000 3000 4000 50000.5

1

1.5

0.75

1.25

Number of Meetings in the Network

L1 Norm

Peers reporting 2x the scoresPeers reporting 5x the scores

Page 19: Efficient and Decentralized PageRank Approximation in P2P ...tfws07/program/talks/donato.pdf · Approximation in P2P Networks with Malicious Agents Josiane Xavier Parreira?, Debora

TrustedDecentralized

PageRank

Introduction

JXP

TrustJXP

ExperimentalResults

Conclusion andFuture Work

Trust Model

0 1000 2000 3000 4000 50000

0.5

1Histograms Divergence

Number of Meetings in the Network

(a)

0 1000 2000 3000 4000 50000

0.5

1Rank Divergence

Number of Meetings in the Network

(b)

0 1000 2000 3000 4000 50000

0.5

1Histograms Divergence

Number of Meetings in the Network

(c)

0 1000 2000 3000 4000 50000

0.5

1Rank Divergence

Number of Meetings in the Network

(d)

Figure: Increased-scores attack: (a) and (b). Permuted-scoresattack: (c) and (d). A green circle (◦) represents a meetingbetween two honest peers, and a red cross (×) a meeting betweenan honest and a dishonest peers.

Page 20: Efficient and Decentralized PageRank Approximation in P2P ...tfws07/program/talks/donato.pdf · Approximation in P2P Networks with Malicious Agents Josiane Xavier Parreira?, Debora

TrustedDecentralized

PageRank

Introduction

JXP

TrustJXP

ExperimentalResults

Conclusion andFuture Work

Trust Scores (Random Attacks)

0 1000 2000 3000 4000 50000

0.5

1Histograms Divergence

Number of Meetings in the Network0 1000 2000 3000 4000 5000

0

0.5

1Rank Divergence

Number of Meetings in the Network

0 1000 2000 3000 4000 50000

0.5

1Trust Scores

Number of Meetings in the Network

Max. Detection Falseθ rate positives

0.9 37.4% 4.7%0.8 86.9% 12.1%0.6 98.0% 54.5%

Page 21: Efficient and Decentralized PageRank Approximation in P2P ...tfws07/program/talks/donato.pdf · Approximation in P2P Networks with Malicious Agents Josiane Xavier Parreira?, Debora

TrustedDecentralized

PageRank

Introduction

JXP

TrustJXP

ExperimentalResults

Conclusion andFuture Work

Trust JXP

0 5000 10000 15000 200000

0.1

0.2

0.3

0.4

Number of Meetings in the Network

Spearman’s Footrule Distance

Ideal Trust ModelOur Trust ModelNo Trust Model

0 5000 10000 15000 200000

1

2

3

4x 10−4

Number of Meetings in the Network

Linear Score Error

Ideal Trust ModelOur Trust ModelNo Trust Model

0 5000 10000 15000 200000.8

0.85

0.9

0.95

1

Number of Meetings in the Network

Cosine

Ideal Trust ModelOur Trust ModelNo Trust Model

0 5000 10000 15000 200000.5

1

1.5

2

Number of Meetings in the Network

L1 Norm

Ideal Trust ModelOur Trust ModelNo Trust Model

* 150 Peers - 50 Malicious; Mixed malicious behavior

Page 22: Efficient and Decentralized PageRank Approximation in P2P ...tfws07/program/talks/donato.pdf · Approximation in P2P Networks with Malicious Agents Josiane Xavier Parreira?, Debora

TrustedDecentralized

PageRank

Introduction

JXP

TrustJXP

ExperimentalResults

Conclusion andFuture Work

Conclusion

TrustJXP algorithm for identifying and reducing theimpact of cheating peers.

Uses scores distribution and ranking analysis to detectmalicious behavior.

Experiments demonstrate viability of the method.

Future Work

Detect other types of malicious behaviors.

Network dynamics.