45
Page Rank Modifications & Alternatives Brett Harper

Page Rank Modifications & Alternatives

  • Upload
    holland

  • View
    39

  • Download
    5

Embed Size (px)

DESCRIPTION

Page Rank Modifications & Alternatives. Brett Harper. Overview. Computing Customized Page Ranks Adaptive Ranking of Web Pages Generalizing PageRank Damping Functions for Link-Based Ranking Algorithms An Approach to Confidence Based Page Ranking for User-Oriented Web Search - PowerPoint PPT Presentation

Citation preview

Page 1: Page Rank Modifications & Alternatives

Page Rank Modifications & AlternativesBrett Harper

Page 2: Page Rank Modifications & Alternatives

Overview

• Computing Customized Page Ranks

• Adaptive Ranking of Web Pages

• Generalizing PageRank Damping Functions for Link-Based Ranking Algorithms

• An Approach to Confidence Based Page Ranking for User-Oriented Web Search

• Web Page Ranking using Link Attributes

Page 3: Page Rank Modifications & Alternatives

Computing Customized Page Ranks

• Page rank usually depends on how related a document is to a query, and the quality of the document.

• PageRank introduces document authority.

• Similar to the citation problem.

• Most proposed web ranking algorithms are based on connectivity rather than content.

• For customized ranks, the concept of page importance depends on the situation.

Page 4: Page Rank Modifications & Alternatives

Computing Customized Page Ranks

• Current solutions build different ranks for topics, users, or queries.

• Automatic building of the ranking function from a set of user examples.

Page 5: Page Rank Modifications & Alternatives

Computing Customized Page Ranks

• Brin & Page's PageRank

• Generalized PageRank, where x is a vector containing ranks, W is an n*n matrix, and e is an n-vector.

• Parametric PageRank, where the sum of each of the a's is 1.

Page 6: Page Rank Modifications & Alternatives

Computing Customized Page Ranks

• User requirements are represented as an optimization problem where the variables are the user requirements and the total number of constraints.

• The issue of how to obtain constraints is not discussed.

• A cost function allows the ranks to be changed in accordance with the requirements. (Quadratic and linear)

• Methods for infeasible requirements.

– Penalty Function

– Number of satisfied constraints, in addition to the cost function.

Page 7: Page Rank Modifications & Alternatives

Computing Customized Page Ranks

• WT10G data set

– Constraints defined

– Adaptive rank computed

– Compared to PageRank on entire WT10G dataset

Page 8: Page Rank Modifications & Alternatives

Computing Customized Page Ranks

Page 9: Page Rank Modifications & Alternatives

Computing Customized Page Ranks

Page 10: Page Rank Modifications & Alternatives

Adaptive Ranking of Web Pages

• Alter PageRank by modifying the PageRank equation.

• Can be done from perspective of the user or web site administrators.

• Modify rank by changing (1-d) in the original PageRank.

– Dynamic Control

– Static Control

Page 11: Page Rank Modifications & Alternatives

Adaptive Ranking of Web Pages

• Rules

– B is an r*n matrix, b is a rule vector of size r

– Inputs and outputs should be positive

• The cost function allows the rank of certain pages to be modified while keeping the current rank of other pages.

Page 12: Page Rank Modifications & Alternatives

Adaptive Ranking of Web Pages

• Initial solution was to structure the problem as a quadratic programming problem.

• Second solution uses clusters to reduce the number of dimensions.

• Pages are clustered based on score

• Vector E contains k parameters.

• Vector A is the sum of the columns in (I-dW)^-1 that correspond to a certain class.

Page 13: Page Rank Modifications & Alternatives

Adaptive Ranking of Web Pages

• Vector E contains k parameters.

• Vector A is the sum of the columns in M that correspond to a certain class.

• H is defined as BA

• is the quadratic term

• is the linear term

Page 14: Page Rank Modifications & Alternatives

Adaptive Ranking of Web Pages

• Contradicting constraints

– Relax constraints to arrive at sub-optimal solution

– Add s to the cost function (used to balance importance of contraints and original cost function)

Page 15: Page Rank Modifications & Alternatives

Adaptive Ranking of Web Pages

• Use a clustering algorithm to split webpages into clusters.

• Compute Ai

• If there is a feasible solution, use the first formula to find the optimal parameters e1,...,ek.

• If no feasible solution exists, use the version for relaxed constraints to find sub-optimal parameters e1,...,ek.

• Compute rank as

Page 16: Page Rank Modifications & Alternatives

Adaptive Ranking of Web Pages

• Used the WT10G data set for experiments

• First experiment: Swap importance of two pages located some distance Δ apart.

– Effectively modifies the PageRank

– Constraints on highly ranked pages disturbs the rest of the pages more significantly.

– These disruptions appear in blocks due to clustering.

– When swapping two pages, effect is greater on lower ranked than higher ranked pages.

• Quality of results is influenced by # of clusters.

Page 17: Page Rank Modifications & Alternatives

Adaptive Ranking of Web Pages

• Second experiment: Change # of clusters

– Gradually increase # of clusters used from 5 to 100.

– Cost function stops improving at ~60 clusters.

– Clustering can reduce the complexity level of the problem.

– # of clusters quite small compared to the size of the collection.

Page 18: Page Rank Modifications & Alternatives

Adaptive Ranking of Web Pages

• Clustering techniques

– Cluster by score

– Cluster by rank (variable-sized cluster dimensions)

– Cluster by rank with fixed size cluster dimensions

Page 19: Page Rank Modifications & Alternatives

Adaptive Ranking of Web Pages

• PageRanks can be modified, but constraints on some pages causes the ranks of all pages to be affected.

• The effect of these constraints depends on how highly ranked the constrained page is.

Page 20: Page Rank Modifications & Alternatives

Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms

• Damping functions reduce page importance propogation on long paths.

• Focus on linear, exponential, and hyperbolic decay.

• Exponential corresponds to original PageRank.

Page 21: Page Rank Modifications & Alternatives

Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms

• For functional rankings, a link matrix is used.

– Normalization

– Dangling nodes

• If P is the resulting matrix after normalization, the rank is defined as

Page 22: Page Rank Modifications & Alternatives

Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms

• An equivalent approach takes into account the branching contribution.

• Rank of a node is the weighted sum of incoming paths, with weights that decay exponentially with path length.

• PageRank is a functional ranking where the damping function is (1-α)α^t.

Page 23: Page Rank Modifications & Alternatives

Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms

Page 24: Page Rank Modifications & Alternatives

Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms

• Linear Damping

Page 25: Page Rank Modifications & Alternatives

Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms

• Hyperbolic Damping

Page 26: Page Rank Modifications & Alternatives

Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms

• Empirical Damping

– Pages that are linked are similar, but the topic changes as the distance increases.

– Use decrease in text similarity as an approximation to an empirical damping function.

– .uk domain, 18m pages, 200 pages chosen at random, similarity measured using TF.IDF without stemming or stop-word removal

– Results show that this is better approximated by linear damping with L=8 or 9 than by exponential damping.

Page 27: Page Rank Modifications & Alternatives

Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms

Page 28: Page Rank Modifications & Alternatives

Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms

• Approximating Hyperbolic with Exponential Damping

– Find the α that minimizes the difference of weights for different values of β and the maximum path length l.

Page 29: Page Rank Modifications & Alternatives

Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms

• Approximating Exponential with Linear Damping

– Find the L that minimizes the difference of weights for different values of α and the maximum path length l.

Page 30: Page Rank Modifications & Alternatives

Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms

• Parameters for the damping function

– Characteristic path length (average distance between two nodes) grows sub-logarithmically with the size of the graph.

– For a smaller graph, the damping function should decay faster.

– The sum of the weights up to the average path lengths of graphs L1 and L2 have to be similar for both rankings to behave in a similar way.

Page 31: Page Rank Modifications & Alternatives

Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms

• Experimental Comparison of precision (PageRank vs. LinearRank)

– Used the WebTREC Gov2 collection (25m documents, .gov domain, 2004)

– Chose 50 queries at random to run.

– PageRank took 39 iterations to run. LinearRank was run for 5, 10, and 20 iterations.

– After first 5 results, LinearRank had precision similar to PageRank.

– Useful when rankings can't be computed in advance.

Page 32: Page Rank Modifications & Alternatives

An Approach to Confidence Based Page Ranking for User Oriented Web Search

• Confidence is the probability of accessing a page for a specific query given past behavior.

• Use this probability to enhance page rankings of most relevant pages.

• Should also take link structure into account.

• Merge pages with similar categories since users lose interest after first few results.

Page 33: Page Rank Modifications & Alternatives

An Approach to Confidence Based Page Ranking for User Oriented Web Search

• Extract important features and categories from web pages.

• Prune pages from the graph that are not relevant.

• Calculate confidence for all features and categories of each page.

• Use citations (link structure) and confidence measure to recursively compute the page rank.

Page 34: Page Rank Modifications & Alternatives

An Approach to Confidence Based Page Ranking for User Oriented Web Search

• Extract important features and categories from web pages.

– Search the full-text and extended anchor text for most relevant features/categories.

– in the set of features where N(P,i) is the total # of times page P is accessed for query i and O(i) is the total number of queries made for i.

– Pages with high E(P,a) will likely be accessed for the topic a.

Page 35: Page Rank Modifications & Alternatives

An Approach to Confidence Based Page Ranking for User Oriented Web Search

• Prune pages from the graph that are not relevant.

– Pages without similar features/categories can be connected.

– These pages are used for extracting features/ categories, but are pruned if the confidence does not meet a certain threshold.

– Citations of pruned pages are also removed.

Page 36: Page Rank Modifications & Alternatives

An Approach to Confidence Based Page Ranking for User Oriented Web Search

• Calculate confidence for all features and categories of each page.

– in the customized graph.

– Calculating C(a,P) for the entire history is not realistic, so only take recent history into account.

Page 37: Page Rank Modifications & Alternatives

An Approach to Confidence Based Page Ranking for User Oriented Web Search

• Use citations (link structure) and confidence measure to recursively compute the page rank.

– PR(P,a) = (1-d) + d[PR(T1,a)/O(T1)+...+ PR(Tn,a)/O(Tn)], where Ti is a citing page and O(Ti) is the # of outgoing links.

– RPR(P,a) = PR(P,a) * C(a,P)

– New pages cited by many many relevant high-ranked pages. Can be suppressed by including a time period.

– Substitute damping factor d with (1-C(a,P))

Page 38: Page Rank Modifications & Alternatives

An Approach to Confidence Based Page Ranking for User Oriented Web Search

• The data set was constructed from a list of 7 queries, from which the top 30 results were obtained from Google.

• A graph of these nodes was then created, and further expanded to a depth of 2. This new graph contained 500-800 nodes.

• Higher ranked pages are not always accessed a higher number of times.

• Pages can be accessed for multiple queries.

• Pages with higher confidence tend to be ranked higher.

Page 39: Page Rank Modifications & Alternatives

Web Page Ranking using Link Attributes

• Tries to improve on current ranking techniques by assigning different weights to links. (WLRank)

• Relative position in the page

• Tag where the link is contained

• Length of anchor text

Page 40: Page Rank Modifications & Alternatives

Web Page Ranking using Link Attributes

• L(j,i) is 1 if a link exists or 0 otherwise, and c is a constant that gives a base weight to every link

• T(j,i) depends on the tag

• AL(j,i) is length of anchor text divided by average anchor text length d.

• RP(j,i) is the relative position weighted by constant b.

• If W(j,i) = L(j,i) then it is equal to PageRank.

Page 41: Page Rank Modifications & Alternatives

Web Page Ranking using Link Attributes

• Tested against 460k pages in the .CL domain.

• Several users provided relevance judgements on the first 10 results of several queries.

• Used c=1, b=1, and d=100.

• Only used weights for <b> and <h1> tags.

• Compare precision based on a perfect ranking for the first 10 answers.

• Improvement of 13% on average.

Page 42: Page Rank Modifications & Alternatives

Web Page Ranking using Link Attributes

Page 43: Page Rank Modifications & Alternatives

Conclusions

• PageRank can be modified to fit user requirements and specific categories.

• Different functions can be used to decay PageRank influence on path lengths.

• Can improve PageRank through clustering.

Page 44: Page Rank Modifications & Alternatives

References

• Tsoi, A. C., Hagenbuchner, M., and Scarselli, F. 2006. Computing customized page ranks. ACM Trans. Interet Technol. 6, 4 (Nov. 2006), 381-414.

• Tsoi, A. C., Morini, G., Scarselli, F., Hagenbuchner, M., and Maggini, M. 2003. Adaptive ranking of web pages. In Proceedings of the 12th international Conference on World Wide Web (Budapest, Hungary, May 20 - 24, 2003). WWW '03. ACM, New York, NY, 356-365.

• Baeza-Yates, R., Boldi, P., and Castillo, C. 2006. Generalizing PageRank: damping functions for link-based ranking algorithms. In Proceedings of the 29th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Seattle, Washington, USA, August 06 - 11, 2006). SIGIR '06. ACM, New York, NY, 308-315.

• Mukhopadhyay, D., Giri, D., and Singh, S. R. 2003. An approach to confidence based page ranking for user oriented Web search. SIGMOD Rec. 32, 2 (Jun. 2003), 28-33.

• Baeza-Yates, R. and Davis, E. 2004. Web page ranking using link attributes. In Proceedings of the 13th international World Wide Web Conference on Alternate Track Papers &Amp; Posters (New York, NY, USA, May 19 - 21, 2004). WWW Alt. '04. ACM, New York, NY, 328-329.

Page 45: Page Rank Modifications & Alternatives

Questions