Just In Time Contextual Advertising

1

Just-in-Time Contextual Advertising

Aris Anagnostopoulos, Andrei Z. Broder, Evgeniy Gabrilovich, Vanja Josifovski, Lance Riedel, CIKM’07.

Advisor: Chia-Hui ChangPresenter: Teng-Kai Fan

Date: 2008-08-20

2

Outline

Introduction Web Advertising Basic Methodology Empirical Evaluation Conclusion

3

Introduction

The Internet advertising spending is estimated over 17 billion dollars in 2006.

Two main types of textual Web advertising: Sponsored search which serves ads in response

to search queries. Content match which places ads on third-party

pages.

4

Introduction cont.

Web advertising for two types of Web page: Static page (Offline): the matching of ads can be

based on prior analysis of their entire content. Dynamic page (Online): ads need to be matched

to the page while it is being served to the end-user. Thus, limiting the amount of time allotted for its content analysis.

5

Introduction cont.

In this paper, the challenge is to find relevant ads while maintaining low latency and communication costs: Using the text summarization techniques to

extract short excerpt that are representative of the entire page content.

Using the classification technique to classify the page summaries with respect to a large taxonomy of advertising categories.

They perform page-ad matching based on both bag-of-words and classification features.

6

Contextual Advertising Basic

Four interactive entities: The publisher is the owner of Web pages on which advertising is

displayed.

The advertiser provides the supply of ads.

The ad network is a mediator between the advertiser and the publisher, who selects the ads that are put on the pages.

End-users visit the Web pages of the publisher and interact with the ads.

7

Overview of Ad display

WebPageAd Agency

System

Web Page+

Ads

register

(Publisher)

(End-User)

match

browse

WebPageAd Agency

System

Web Page+

Ads

(Adviser)

WebPageAd Agency

System

Web Page+

Ads

register

(Publisher)

(End-User)

match

browse

WebPageAd Agency

System

Web Page+

Ads

(Adviser)

8

Advertising Basic cont.

Four pricing models: CPM (Cost Per Impression) is where advertisers pay for exposure of their

message to a specific audience.

CPV (Cost Per Visitor) is where advertisers pay for the delivery of a Targeted Visitor to the advertisers website.

CPC (Cost Per Click) is also known as Pay per click (PPC). Advertisers pay every time a user clicks on their listing and is redirected to their website. They do not actually pay for the listing, but only when the listing is clicked on.

CPA (Cost Per Action) is based on each time an order is transacted.

9

Overview of the Proposed Solution Using text summarization techniques paired with

external knowledge to craft short page summaries in real-time.

Balance of two conflicts: analyzing as much page content as possible for better ad match vs. analyzing as little as possible to save transmission and analysis time.

External knowledge: URL often contain meaningful words. Reference URL might contain relevant words that to some

extent capture the user intent. Page Classification.

10

Text Summarization

Text summarization techniques are divided into extractive and non-extractive approaches.

Considering the following components in constructing summaries: Title (T) Meta knowledge and description (M) Headings (H): the contents of <h1> and <h2> HTML tags. Tokenized URL of the page (U) Tokenized referrer URL (R) First N bytes of the page text. (P<N>). Anchor text of all outgoing link on the page (A) Full of the page (F).

11

Text Classification

Using a summary of the page in place of its entire content can ostensibly eliminate some information.

To alleviate harmful effect of summarization, they study the effects of using text classification. They classify both page excerpts and ads with

respect to a taxonomy and use classification-based features to augment the original bag of words.

12

Choice of Taxonomy

Taxonomy: they employ a large taxonomy of approximately 6,000 nodes, arranged in a hierarchy with median depth 5 and maximum depth 9.

Human editors populated the taxonomy with labeled bid phrase of ad (approx. 150 phrases per node)

13

Classification Method

For each taxonomy node, they concatenated all the phrases associated with this node into a single meta-document.

Then, they computed a centroid for each node by summing up the TFIDF values of individual terms, and normalizing by the number of phrases in the class:

where, is the centroid for class Cj and p iterates over the phrase

s in class.

14

Classification Method cont.

The classification is based on the cosine of the angle between the document and the centroid meta-document:

where, F is the set of features ci and di represent the weight of the ith feature in the class a

nd the document.

15

Using Classification Features &Ad Retrieval Function Each page and as were represented as a bag

of words (BOW) and as additional vector of classification feature.

The ad retrieval function was formulated as a linear combination of similarity scores based on both BOW and classification features:

16

Dataset

From 12,000 human judgments (page-ad pairs): Dataset 1 consists of 105 Web pages that are

accessible through a major search engine. 2680 ads and 2946 page-ad score (some ads have been

scored for more than one page) The classification precision was 70% for the pages and

86% for the ads. Dataset 2: consists of 827 pages from publishers

that are not found in the search engine index. 5056 unique ads.

17

Evaluation Metrics

Precision MAP (Mean Average Precision) bpref-10 (Buckley et al., SIGIR’04)

Its idea is to measure the effectiveness of a system on the basis of judged documents only.

Since the scores for MAP and P@(N) are completely determined by the ranks of the relevant documents in the result set, these measures make no distinction in pooled collections between documents that are explicitly judged as nonrelevant and documents that are assumed to be nonrelevant because they are unjudged.

18

bpref-10

The preference measure is a function of the number of times judged non-relevant documents are retrieved before relevant document.

Formulation: Naïve: Simple counts of the number of judged nonrelevant documents retriev

ed before some relevant document are poor because the score is dependent on the absolute numbers of relevant judged nonrelevant documents.

For a topic with R relevant documents where r is a relevant document and n is a member of the first R judged nonrelevant documents

bprep:

bprep-10:

19

The effect of Focused Page Analysis

FullText(F), AnchorText(A), First 500 bytpes(P500), MetaData(M), Headings(H), Title(T), PageURL(U), ReferrerURL(R)

20

The contribution of individual fragments

FullText(F), AnchorText(A), First 500 bytpes(P500), MetaData(M), Headings(H), Title(T), PageURL(U), ReferrerURL(R)

21

Precision-Recall tradeoff

22

Incremental Addition of Information

23

The Effect of Classification

24

Conclusion

They presented a new methodology for contextual Web advertising in real time. They focused on the contributions of the different

fragments of the pages.

Business

Just In Time Contextual Advertising