23
L23: PageRank JeM. Phillips April 13, 2020

L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · collection of websites. ... ↳ truthfulness of webpage. Word Count Consider as input all of English Wikipedia stored in DFS. Goal

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · collection of websites. ... ↳ truthfulness of webpage. Word Count Consider as input all of English Wikipedia stored in DFS. Goal

L23: PageRank

Je↵ M. Phillips

April 13, 2020

Page 2: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · collection of websites. ... ↳ truthfulness of webpage. Word Count Consider as input all of English Wikipedia stored in DFS. Goal

Final Report

At most 4 pages/student. Don’t cram in too much!

I Succinct title (and names)

I Problem definition and motivation.

I Explain your Data.

I key idea

I What did you do (which techniques, an implementation, acomparison, an extension)

I What did you learn? Artifacts (charts, plots, examples, math)and Intuition (in words, did it work?)

I ← Same ten Posters

o

Page 3: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · collection of websites. ... ↳ truthfulness of webpage. Word Count Consider as input all of English Wikipedia stored in DFS. Goal

webpage-simia.it( Search ). InvertedIndex

year ;i¥¥ . .

bout→ page 't

, P ?,"

car"

→¥pQgirl)

-

Define most relevantwebpages-

D- d-'

ne'EE¥÷÷÷÷:÷:i:smarm :÷÷÷i÷"¥xE÷÷g÷÷:X

Page 4: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · collection of websites. ... ↳ truthfulness of webpage. Word Count Consider as input all of English Wikipedia stored in DFS. Goal

Crawlers : program ; that walksaround web : ① read page

update fentwiuecfor

② follow random

inverting hyperlinks

use hyperlink infoL a hired "

www.pie.com" )pie

- 3

Spammersbuild fleet pages i link to your

Page w/ hyperlink tag .

Page 5: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · collection of websites. ... ↳ truthfulness of webpage. Word Count Consider as input all of English Wikipedia stored in DFS. Goal

studies : Alternative to search Engine

Yahoo ! and Look Smart

Built an organized , curatedcollection of websites

Page 6: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · collection of websites. ... ↳ truthfulness of webpage. Word Count Consider as input all of English Wikipedia stored in DFS. Goal

pageR-ankfslpjiturml-fktytg.fi?ggignhspiddeieafe

1ttqitlinkedto

balanceboth pages .

page is important ifa

random MCMC' ' random surfer "

were

to fond it.

-

Web is a big graph G- L V,

E)

V = I set do all pages }

F- = l Ei ; = link pi → B- 3

Petone ML → g* ⇐ converged to vectordistribution

q*Lj ) says how important page ; is .

Page 7: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · collection of websites. ... ↳ truthfulness of webpage. Word Count Consider as input all of English Wikipedia stored in DFS. Goal

Compute q* of lwepgraph-

• Keep truck of crawlers : how frequentreturn .

• Buy bag computer : Compute eigcp )← probtra.ca)

• Precompute P#

=P - P - p .

. . ..

. p

← too big°

q*= go ←last night

a :÷÷÷o÷÷÷t÷÷¥power

method

Page 8: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · collection of websites. ... ↳ truthfulness of webpage. Word Count Consider as input all of English Wikipedia stored in DFS. Goal

Anatomy of Web

Strongly ConnectedComponent

IN

OUTTubes

tendrilsOUT

tendrilsIN

disconnected

ANATOMY of WEB

is this G ergodic

÷ . . Is•

i'④ yo

Page 9: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · collection of websites. ... ↳ truthfulness of webpage. Word Count Consider as input all of English Wikipedia stored in DFS. Goal

Anatomy of Web

0

O

Page 10: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · collection of websites. ... ↳ truthfulness of webpage. Word Count Consider as input all of English Wikipedia stored in DFS. Goal

canwemaheGergodi.ci#• Teleportation ,/ taxation

→ about once every I steps

→ jumpto randomnode

.

P prob trans (G)13--0-15 p

R= a - is ) Pepe Q¥fIf¥¥Z

↳ dense

Rq . - Ul - B) Ptp'

G)ginLt B) Pain + qt nxt vector- -

Page 11: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · collection of websites. ... ↳ truthfulness of webpage. Word Count Consider as input all of English Wikipedia stored in DFS. Goal

Spam Farms

÷ .

Google counter

^seafhs¥at

Page 12: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · collection of websites. ... ↳ truthfulness of webpage. Word Count Consider as input all of English Wikipedia stored in DFS. Goal

Trust ( noes ? )

only teleport to trustedpages .

r ← qx , pagerank

t ← get trusted teleport

rts ) - ft ; ) if l arse → spam

↳ truthfulness of webpage

Page 13: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · collection of websites. ... ↳ truthfulness of webpage. Word Count Consider as input all of English Wikipedia stored in DFS. Goal
Page 14: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · collection of websites. ... ↳ truthfulness of webpage. Word Count Consider as input all of English Wikipedia stored in DFS. Goal
Page 15: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · collection of websites. ... ↳ truthfulness of webpage. Word Count Consider as input all of English Wikipedia stored in DFS. Goal
Page 16: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · collection of websites. ... ↳ truthfulness of webpage. Word Count Consider as input all of English Wikipedia stored in DFS. Goal

Word CountConsider as input all of English Wikipedia stored in DFS. Goal is tocount how many times each word is used.

Page 17: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · collection of websites. ... ↳ truthfulness of webpage. Word Count Consider as input all of English Wikipedia stored in DFS. Goal

Inverted IndexConsider as input all of English Wikipedia stored in DFS. Goal is tobuild an index, so each word has a list of pages it is in.

Page 18: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · collection of websites. ... ↳ truthfulness of webpage. Word Count Consider as input all of English Wikipedia stored in DFS. Goal

PhrasesConsider as input all of English Wikipedia stored in DFS. Goal is tobuild an index, on 3-grams (sequence of 3 words) that appears onexactly one page, with link to page.

Page 19: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · collection of websites. ... ↳ truthfulness of webpage. Word Count Consider as input all of English Wikipedia stored in DFS. Goal

Label Propagation (Graph)Consider a large graph G = (V ,E ) (e.g., a social network), with asubset of notes V 0 ⇢ V with labels (e.g., {pos, neg}). Each nodestores its label (if any) and edges.Assign a vertex a label if (a) unlabled, (b) has � 5 labeledneighbors, (c) based on majority vote.

Page 20: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · collection of websites. ... ↳ truthfulness of webpage. Word Count Consider as input all of English Wikipedia stored in DFS. Goal

Label Propagation (Embedding)Consider a data set X ⇢ Rd , with a subset of points X 0 ⇢ X withlabels (e.g., {pos, neg}). Implicitly defines graph with V = X andE using k = 20 nearest neighbors.Assign a vertex a label if (a) unlabled, (b) has � 5 labeledneighbors, (c) based on majority vote.

Page 21: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · collection of websites. ... ↳ truthfulness of webpage. Word Count Consider as input all of English Wikipedia stored in DFS. Goal

Example PageRank

M =

2

664

0 1/2 0 01/3 0 1 1/21/3 0 0 1/21/3 1/2 0 0

3

775

Stripes:

M1 =

2

664

01/31/31/3

3

775 M2 =

2

664

1/2001/2

3

775 M3 =

2

664

0100

3

775 M4 =

2

664

01/21/20

3

775

These are stored as�1 : (1/3, 2), (1/3, 3), (1/3, 4)

�,�

2 : (1/2, 1)(1/2, 4)�,�3 : (1, 3)

�, and

�4 : (1/3, 1), (1/2, 2)

�.

Page 22: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · collection of websites. ... ↳ truthfulness of webpage. Word Count Consider as input all of English Wikipedia stored in DFS. Goal

Example PageRank

M =

2

664

0 1/2 0 01/3 0 1 1/21/3 0 0 1/21/3 1/2 0 0

3

775

Stripes:

M1 =

2

664

01/31/31/3

3

775 M2 =

2

664

1/2001/2

3

775 M3 =

2

664

0100

3

775 M4 =

2

664

01/21/20

3

775

These are stored as�1 : (1/3, 2), (1/3, 3), (1/3, 4)

�,�

2 : (1/2, 1)(1/2, 4)�,�3 : (1, 3)

�, and

�4 : (1/3, 1), (1/2, 2)

�.

Page 23: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · collection of websites. ... ↳ truthfulness of webpage. Word Count Consider as input all of English Wikipedia stored in DFS. Goal

Example PageRank

M =

2

664

0 1/2 0 01/3 0 1 1/21/3 0 0 1/21/3 1/2 0 0

3

775

Blocks:

M1,1 =

0 1/2

1/3 0

�M1,2 =

0 01 1/2

�M2,1 =

1/3 01/3 1/2

�M2,2 =

0 1/20 0

These are stored as�1 : (1/2, 2)

�,�2 : (1/3, 1)

�, as�

2 : (1, 3), (1/2, 4)�, as

�3 : (1/3, 1)

�,�4 : (1/3, 1), (1/2, 2)

�, and

as�3 : (1/2, 4)

�.