Upload
willis
View
28
Download
6
Tags:
Embed Size (px)
DESCRIPTION
Information technology in business and society. Session 9 – Search and Advertising Sean J. taylor. Administrativia. Assignment 2 online d ue Saturday 2/25 at 1am Assignment 2 resources Assignment 3 preview Guest speaker on Tuesday 2/28: Chrys Wu discussing IT and Journalism - PowerPoint PPT Presentation
Citation preview
INFORMATION TECHNOLOGY IN BUSINESS AND SOCIETYSESSION 9 – SEARCH AND ADVERTISING
SEAN J. TAYLOR
ADMINISTRATIVIA
• Assignment 2 onlinedue Saturday 2/25 at 1am
• Assignment 2 resources• Assignment 3 preview• Guest speaker on Tuesday 2/28:
Chrys Wu discussing IT and Journalism• Substitute on Thursday 3/1
Professor Dylan Walker
LEARNING OBJECTIVES
1. Learn how search engines rank pages
2. Learn how to design effectively for high rankings
3. Learn how online advertising works, especially search ads and keyword auctions
4. The future of search
SEARCH ENGINES AND WEB DIRECTORIESResources on the Web that help you find sites with the information and/or services you want.
• Directory search engine - organizes listings of Web sites into hierarchical lists.
• Search engine - uses software agent technologies (or “spiders”, or “bots”) to search the Web for key words and place them into indexes.
WEB DIRECTORIES EXAMPLE
Advantages? Disadvantages?
SEARCH ENGINE EXAMPLES
Advantages? Disadvantages?
SEARCH ENGINES DRIVE ECOMMERCE!
WHERE IS CONSUMERS ATTENTION?
EYETRACKING STUDY OF GOOGLE RESULTS
– Search engines discover new pages by following links
– Keep track of words that appear in pages and when you enter a query, the search engine returns a ranked list
– Text content is important! But is not enough! (Why?)
How do search engines rank pages?(why does this matter?)
HOW SEARCH ENGINES WORK
PAGERANK IS REALLY A “RANDOM SURFER” MODEL
Random Surfer Model:
T 1 W)1( 22)1( WW)1(1
1
What about getting stuck in loops? takes care of that
Let’s count the surfer’s that pass through each point:
Transfer Matrix: The probability that a surfer follows a link from webpage i to webpage j is = [Prob. you were not “picked up”] * [prob. of following link i->j ]
The matrix if page i links to page j
MEASURING IMPORTANCE OF LINKING
PageRank Algorithm
Idea: important pages are pointed to by other important pages
Method:• Each link from one page to another is counted as a “vote” for the
destination page
• The number of incoming links is important!• But it is not enough!
• But each “vote” is different! PageRank places more importance to votes that come from pages with large number of votes (and so on, and so on)
Compare, for example, the cases for the circled page in cases A and B
A
B
People who bought this also bought…
BOOK A
book Bbook Cbook D
People who bought this also bought…
BOOK D
book CPeople who bought this also bought…
BOOK C
book A
People who bought this also bought…
BOOK B
book Abook C
(ignoring damping factor for illustration)
COMPUTING PAGERANK
People who bought this also bought…
BOOK A
book Bbook Cbook D
People who bought this also bought…
BOOK D
book CPeople who bought this also bought…
BOOK C
book A
People who bought this also bought…
BOOK B
book Abook C
COMPUTING PAGERANK
(ignoring damping factor for illustration)
PAGERANK
People who bought this also bought…
BOOK A
book Bbook Cbook D
People who bought this also bought…
BOOK D
book CPeople who bought this also bought…
BOOK C
book A
People who bought this also bought…
BOOK B
book Abook C.250 .250
.250 .250
(ignoring damping factor for illustration)
PAGERANK
People who bought this also bought…
BOOK A
book Bbook Cbook D
People who bought this also bought…
BOOK D
book CPeople who bought this also bought…
BOOK C
book A
People who bought this also bought…
BOOK B
book Abook C.250 .250
.250 .250
.250/3
.250
.250/3
.250/2
.250.250/3 .250/2
(ignoring damping factor for illustration)
PAGERANK
People who bought this also bought…
BOOK A
book Bbook Cbook D
People who bought this also bought…
BOOK D
book CPeople who bought this also bought…
BOOK C
book A
People who bought this also bought…
BOOK B
book Abook C
.250/3
.250
.250/3
.250/2
.250.250/3 .250/2
.375 .083
.083 .458
(ignoring damping factor for illustration)
PAGERANK
People who bought this also bought…
BOOK A
book Bbook Cbook D
People who bought this also bought…
BOOK D
book CPeople who bought this also bought…
BOOK C
book A
People who bought this also bought…
BOOK B
book Abook C
.375/3
.083
.375/3
.083/2
.458.375/3 .083/2
.375 .083
.083 .458
(ignoring damping factor for illustration)
PAGERANK
People who bought this also bought…
BOOK A
book Bbook Cbook D
People who bought this also bought…
BOOK D
book CPeople who bought this also bought…
BOOK C
book A
People who bought this also bought…
BOOK B
book Abook C
.375/3
.083
.375/3
.083/2
.458.375/3 .083/2
.500 .125
.125 .250
(ignoring damping factor for illustration)
PAGERANK
People who bought this also bought…
BOOK A
book Bbook Cbook D
People who bought this also bought…
BOOK D
book CPeople who bought this also bought…
BOOK C
book A
People who bought this also bought…
BOOK B
book Abook C.400 .133
.133 .333
.400/3
.133
.400/3
.133/2
.333.400/3 .133/2
(ignoring damping factor for illustration)
GAMING PAGERANK AND TRUST
TrustRank Algorithm
Initial votes come only from trusted pages
Compare, for example, the cases for the circled page in cases A and B B
trusted page
trusted page
Links from untrusted sources
A
SIMULATINGCHANGES IN PAGERANK
People who bought this also bought…
BOOK A
book Bbook Cbook D
People who bought this also bought…
BOOK D
book CPeople who bought this also bought…
BOOK C
book A
People who bought this also bought…
BOOK B
book Abook C
Change PR of A PR of C
C cuts link to A 0.18 0.50
C links to B 0.38 0.33
C links to D 0.24 0.40
C links to B & D 0.22 0.38
.400 .133
.133 .333
IMPORTANCE OF ANCHOR TEXT
<a href=http://www.sims…>INFOSYS 141</a>
<a href=http://www.sims…>A terrific course on search engines</a>
The anchor text summarizes what the website is about.
OTHER RANKING FACTORS
Location, Location, Location...and Frequency• Query words in title, or in first few sentences• The more frequent the query words, the better
Click through measurement• How often users click on your URL, when they
see it• How long do they stay (using toolbars!)
OUTLINE1. Learn how search engines rank pages
2. Learn how to design effectively for high rankings
3. Learn how online advertising works, especially search ads and keyword auctions
4. The future of search
ACHIEVING HIGHER RESULTS RANKINGS• Position your keywords (title, headings, early on page)
• Make text visible (no tiny fonts, no white-on-white)
• Frames can kill• Have relevant content• Do not change topics• Just say no to search engine spamming • Submit your key pages• Verify your listing often
Motives• Commercial, political, religious, lobbies• Promotion funded by advertising budget
Operators• Contractors (Search Engine Optimizers) for lobbies,
companies• Web masters• Hosting services
What are the techniquesused by rankings manipulators?
MANIPULATING RANKINGS
MANIPULATION TECHNOLOGIESCloaking
• Serve fake content to search engine robot• DNS cloaking: Switch IP address. Impersonate
Doorway pages• Pages optimized for a single keyword that re-direct
to the real target page Keyword Spam
• Misleading meta-keywords, excessive repetition of a term, fake “anchor text”
• Hidden text with colors, CSS tricks, etc.Link spamming
• Mutual admiration societies, hidden links, awards• Domain flooding: numerous domains that point or
re-direct to a target pageRobots
• Fake click stream• Fake query stream
Is this a SearchEngine spider?
N
Y
SPAM
FakeDoc
Cloaking
Meta-Keywords = “… London hotels, hotel, holiday inn, hilton, discount, booking, reservation, sex, mp3, britney spears, viagra, …”
Risky to use any of these as search engines aregetting better at detecting and punishing them
OUTLINE1. Learn how search engines rank pages
2. Learn how to design effectively for high rankings
3. Learn how online advertising works, especially search ads and keyword auctions
4. The future of search
PAID RANKING
Keyword bidding for targeted ads• Pay-per-click• Higher bids result in higher ranks for the ad• Higher percentage of clicks on the ad, increase
the rank as well (why?)
Google's AdWords is the biggest player• Google’s 2007 revenue was more than $16
Billion, 2008 ~ $22 Billion, mostly from such ads
Promoting without Manipulation: Paid placement
EXAMPLE
AdWordsPlacement
AdWords Placement
Most relevant sites
FUND YOUR WEBSITE: ADSENSEGoogle also delivers ads to other websitesSign-up for Google AdSense, and Google delivers ads to your website (common source of income for “professional” bloggers)
How ads are delivered:
• If website best for targeted keywords
• If users of website click on results
Strategies for successful ads:
• Place the ads on top
• Blend with the rest of the website
• Ads at the bottom are ignored consistently
EXAMPLE: WASHINGTON POSTWEBSITE
Analysis of Washington Post
Website
TARGETING BANNER ADS
Request for Ad from Ad Server
IP AddressCountry, Domain, CompanyBrowser, Operating System
Surfing Behavior from cookiesDemographic Data?
Targeted Ad isDelivered to
User
Context:Movie reviewsUser Profile:
NYU userNew York
UserVisits
PublisherSites
Ads Delivered By Dart For Advertisers
DART For
Advertisers
BoomerangCaptures User
Action DataData Analysis
Databank
Boomerang Compiles & Reports Response For Future Targeting
User Clicks &Visits
Advertiser’sSite
CLOSED LOOP MARKETING
Source: Doubleclick, Inc.
FUTURE OF SEARCH
1. Information Extraction:Search on Structured Data
2. Social Search3. Privacy Preserving Search
INFORMATION EXTRACTION
Information extraction applications extract structured relations from unstructured textMay 19 1995, Atlanta -- The Centers for Disease Control
and Prevention, which is in the front line of the world's response to the deadly Ebola epidemic in Zaire , is finding itself hard pressed to cope with the crisis…
Date Disease Name LocationJan. 1995 Malaria EthiopiaJuly 1995 Mad Cow Disease U.K.
Feb. 1995 Pneumonia U.S.May 1995 Ebola Zaire
Disease Outbreaks in The New York Times
Information Extraction System
(e.g., NYU’s Proteus)
RETURN STRUCTURED ANSWERS, NOT WEBPAGES
FUTURE OF SEARCH
1. Information Extraction:Search on Structured Data
2. Social Search3. Privacy Preserving Search
Y! ANSWERSLaunched in second half of 2005
Incentive system based on points and voting for best answers
Questions grouped by category
Some statistics: • over 60 million users• over 120 million answers, available in 18 countries and
in 6 languages
Y! ANSWERS
Y! ANSWERS
LONG-TERM PROSPECTSQuestions follow a power-law:
•Large number of questions will be asked by many people (20% of questions80% of requests)
•We only need one answer for each question•Acquire quickly high-quality answers for 80% of queries
•…people will take care in time of the “long tail” of the remaining questions
FUTURE OF SEARCH
1. Information Extraction:Search on Structured Data
2. Social Search3. Privacy Preserving Search
PRIVACY PRESERVING SEARCH
NEXT CLASS:SOCIAL NETWORKS
• Work on Assignment 2