Upload
beverley-powell
View
222
Download
0
Tags:
Embed Size (px)
Citation preview
Deep Learning Powered In-Session Contextual Ranking
using Clickthrough DataXiujun Li1, Chenlei Guo2, Wei Chu2, Ye-Yi Wang2, Jude
Shavlik1
1University of Wisconsin – Madison2Microsoft Bing, Redmond, Washington, U.S.A.
12/12/2014
Outline
• Project Goal and Motivation Background & Goal Problem Definition
• Methodology Feature Generation Semantic Features Click-Based Features
• Experimental Results Offline Evaluation
• Future Work
Goal
• Incorporate some expressive signals (e.g. topic, domain) from in-session query history into the ranking, to re-rank the results
Example
Query: the dangerous days of daniel x Altrelda wiki
URL Original Rank ControlRankerScore
http://en.wikipedia.org/wiki/The_Dangerous_Days_of_Daniel_X 1 10.922
http://fanon.wikia.com/wiki/The_Dangerous_Days_of_Daniel_X 2 9.2745
http://www.amazon.com/The-Dangerous-Days-Daniel-X/dp/0316119709 3 7.444
http://danielx.wikia.com/wiki/The_Dangerous_Days_of_Daniel_X_(novel) 4 6.7845
http://danielx.wikia.com/wiki/Daniel_X 5 6.4136
http://www.goodreads.com/series/49946-daniel-x 6 5.5571
http://en.wikipedia.org/wiki/Daniel_X:_Watch_the_Skies 7 5.2691
http://www.jamespatterson.com/books_danielX.php 8 4.6821
Example 1. Current Ranking
But this Wikipedia url is an UnSat page clicked by the user in the same session?
Query 2: the dangerous days of daniel x
This is the 6th query in the same session.
Approach
• Correlate users clicks with their satisfaction• Propose a set of fine-grained semantic features to explore the
relations between in-session history queries, clicked URLs and current query, URLs, e.g. o The similarity between previous queries and current URLo The similarity of previous sat/unsat clicked URLs of last query to current URL
• Employ the semantic models (e.g. XCode, DSSM, CDSSM) to measure the similarity• Incorporate these features into the ranking, to re-rank the results
Problem1. ,…, are the history queries in one session.2. ,…, are the URLs in one impression page for query , it contains Sat and unSat clicks.3. is the current query.4. ,…, are the URLs in the impression page for query based on the current ranking algorithm.
Problem: we will re-rank , …, based on some signals (i.e. topic, domain) derived from the history in-session clickthrough data.
Outline
• Project Goal and Motivation Background & Goal Problem Definition
• Methodology Feature Generation Semantic Features Click-Based Features
• Experimental Results Offline Evaluation
• Future Work
Features Engineering – fine-grained featuresFeatures Level Description Category Feature Name
Semantic Features
URL-URLThe similarity between the urls of previous queries and urls of current query
Sat 9 raw featuresUnSat 9 raw features
Query-URL The similarity between the previous queries and urls of current query
Sat 5 raw featuresUnSat 5 raw features
Weighted URL-URLThe weighted similarity between the urls of previous queries and urls of current query
Sat 9 raw featuresUnSat 9 raw features
Click-Based Features
URL Level The domain and history click statistics on the current url of current query
History Click1.clickcount_url_total2.clickcount_url_sat3.clickcount_url_unsat
Domain Click1.domain_click_total2.domain_click_sat3.domain_click_unsat
Impression Level The Sat/UnSat statistics on the in-session history queries
Clicked url 1.sat_click_urls2.unsat_click_urls
Clicked count 1.sat_click_count2.unsat_click_count
Last click 1.last_click_dwelltime
Semantic FeaturesThree Distance to leverage the similarity for semantic features1. dis( , ): The distance between query and , say .2. dis( , ): The distance between i-th url of query and j-th url of
query .3. dis( , ): The distance between query and j-th url of query .
Semantic FeaturesSemantic distance calculation
Model Distance Range Semantics
XCode Hamming distance [0, 512]0 is the most similar,512 is the least similar
DSSM Cosine distance [-1, 1]-1 is the least similar,1 is the most similar
CDSSM Cosine distance [-1, 1]-1 is the least similar,1 is the most similar
XCode• Map every word, query, url into an intent space.• Measure semantic similarities of words, phrases and entities in an
intent space.
Maximize Mutual Information between Queries and URLsTrained on all of Bing search logs with 100b samples
512 bits X-CodeWord: (0 1 1 1 0 1…)URL: (0 1 1 0 0 0…)
Query: (0 1 1 0 0 1…) Clicked URLs:www.teslamotors.com www.wikipedia.org …
Queries:Tesla carTesla bio … Maximize Mutual
Information between Queries and URLsTrained on all of Bing search logs with 100b samples
Maximize Mutual Information of queries and URLs on Bing logs
* Borrowed from AIM Project
Deep Structured Semantic Model (DSSM)
• Representation: use deep neural networks to extract abstract semantic representations
• Learning: maximize the similarity between the query and the doc
• Scalability: use sub-word unit (e.g., letter-ngram) as raw input so that to handle an unlimited vocabulary (word hashing)
DSSM Double Model Training
DSSM• DSSM provides a way to encode how well contextual information is
matched in the semantic level. • Map queries/titles/docs … to a common low dimensional
semantic space using DNN• Leverage contextual information, not depend only on context
terms
Training Data Source Model Target Model
<Query, Title> Query Model Title Model
<Query, BrokenURL> Query Model BrokenURL Model
DSSM Distance Three Distance Calculation
Calculate DSSM Distance between queries Calculate DSSM Distance between URLs Calculate DSSM Distance between query and URL
Query Model
Query Model Title
ModelTitle
ModelTitle
Model
Query Model
Cosine Distance Cosine Distance Cosine Distance
<URL, Title> <URL, Title>
Convolutional DSSM (Preliminary Trial)
• Difference between CDSSM and DSSM• Sliding window: capture the contextual feature
for a word.• Convolutional layer: project each word within a
context window to a local contextual feature vector.
• Max pooling layer: extract the most salient local features to form a fixed-length global contextual feature vector.
• CDSSM Training• Pre-Training on Cosmos• GPU Training.
Offline Experiment
Self-Join
Reducer
Log Data
<PreQuery, CurQuery> Pair
Distance Calculation{XCode, DSSM, CDSSM}
Feature Data Generation{curQuery, Features}
Training Data(1 week)
Validation Data(1 day)
Test Data(1 week)
Input Features
FastRank Feature Selection
Tree Ensemble
ReRanker Evaluator
Baseline Ranker
Indicator Feature
Diagram 1. Feature Data Generation Diagram 2. ReRanking Evaluation
Outline
• Project Goal and Motivation Background & Goal Problem Definition
• Methodology Feature Generation Semantic Features Click-Based Features
• Experimental Results Offline Evaluation
• Future Work
Offline Performance• Individual Features: XCode, Click-Based, DSSM
• For Semantic Features, DSSM (Query-Title) is the best.
• For Click-Based Features, it is also better, and easy to be flighted.
• More info in the table 1 of the Appendix.
• TimeToSuccess: The dwelling time to meet the first SatClick in the session.
• Session Success Position: defined by the position of the first SatClick in the session.
-0.03
-0.025
-0.02
-0.015
-0.01
-0.005
0
0.005
XCode Click-Based DSSM (Query-Title) DSSM (Query-BrokenURL) CDSSM (Query-Title)
Session Success Position
SatClick Rate UnSatClick Rate
TimeToSuccess (sec)TimeToSuccess
From Quickback
Offline Performance• Features Combination: Semantic Features + Click-Based Features
• Semantic features and Click-Based features can provide different information to boost the performance.
• More info in the table 2 in the Appendix.
-0.06
-0.05
-0.04
-0.03
-0.02
-0.01
0
0.01
DSSM (Query-Title) XCode+Click DSSM (Query-Title)+ClickDSSM (Query-BrokenURL)+Click CDSSM (Query-Title)+Click XCode+Click+DSSM (Query-Title)
TimeToSuccess (sec) Session Success PositionTimeToSuccess
From QuickbackSatClick Rate UnSatClick Rate
Click-Based Features Recap.Level Category Features Semantics Function
URL Level
History Click1.clickcount_url_total2.clickcount_url_sat3.clickcount_url_unsat
how many sat/unsat clicks for the current url in the previous queries in this session.
Improve TimeToSuccess, SessionSuccessPosition
Domain Click1.domain_click_total2.domain_click_sat3.domain_click_unsat
how many sat/unsat clicks for the domain of current url in the previous queries in this session
Increase the SatClick Rate and decrease UnSatClick Rate
Impression LevelClicked url 1.sat_click_urls
2.unsat_click_urlsSome statistics from in-session history queries
Clicked count 1.sat_click_count2.unsat_click_count
Offline Performance• Domain Click v.s. History Click
-0.45
-0.4
-0.35
-0.3
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
Domain+History Clicks Domain Clicks History Clicks
1 2 30
0.0005
0.001
0.0015
0.002
0.0025
0.003
0.0035
UnSatClick Rate
• Domain Click: Decrease the UnSat click
• History Click: Improve TimeToSuccess, SessionSuccessPosition
• More info in the table 4 in the Appendix.
TimeToSuccess (sec) Session Success PositionTimeToSuccess
From QuickbackUnSatClick RateSatClick Rate
Example
Query: the dangerous days of daniel x Altrelda wiki
URL Original Rank Cranker Score Rank Delta ControlRankerScore ExperimentalRankerPosition domain_click_total domain_click_sat domain_click_unsat
http://fanon.wikia.com/wiki/The_Dangerous_Days_of_Daniel_X 2 9.0423 +1 9.2745 1 0 0 0
http://en.wikipedia.org/wiki/The_Dangerous_Days_of_Daniel_X 1 7.8237 -1 10.922 2 5 0 5
http://danielx.wikia.com/wiki/The_Dangerous_Days_of_Daniel_X_(novel) 4 7.0838 +1 6.7845 3 0 0 0
http://www.goodreads.com/series/49946-daniel-x 6 6.1145 +2 5.5571 4 0 0 0
http://danielx.wikia.com/wiki/Daniel_X 5 5.1809 0 6.4136 6 5 5 0
http://www.amazon.com/The-Dangerous-Days-Daniel-X/dp/0316119709 3 5.0326 -3 7.444 7 5 0 5
http://www.jamespatterson.com/books_danielX.php 8 2.9647 +1 4.6821 8 5 0 5
http://en.wikipedia.org/wiki/Daniel_X:_Watch_the_Skies 7 6.0986 -1 5.2691 5 0 0 0
Example 1. Five unSat clicks on Wikipedia in session log
This is the 6th query in the same session.
In the same session, 2nd query (“the dangerous days of daniel x”), there are 5 unsat clicks for
Wikipedia page.
Appendix
• Evaluation: Table 1. Individual Features Table 2. Features Combination Table 3. Domain Click v.s. History Click Features
Offline PerformanceFeatures TimeToSuccess
(sec)Session Success
PositionTimeToSuccess From
Quickback (sec)SatClick
RateUnSatClick
RateCoverage
All XCode -0.0159 [-0.0267%]
-0.0011 [-0.0483%]
-0.0053 [-0.0194%]
0.0014 [0.1626%]
0.0004 [0.2923%]
0.3288
All Click-Based
-0.0203 [-0.0343%]
-0.0018 [-0.0758%]
-0.0038 [-0.0141%]
0.0019 [0.2178%]
0.0005 [0.3932%]
0.3288
DSSM (Query-Title Model)
-0.0282 [-0.0476%]
-0.0026 [-0.1079%]
-0.0067 [-0.0251%]
0.0016 [0.1897%]
0.0004 [0.3458%]
0.3193
DSSM (Query-BrokenURL
Model)
-0.0194 [-0.0328%]
-0.0018 [-0.0751%]
-0.0047 [-0.0176%]
0.0014 [0.1734%]
0.0004 [0.3223%]
0.3202
CDSSM (Query-Title)
-0.0255 [-0.0430%]
-0.0024 [-0.1031%]
-0.0051 [-0.0191%]
0.0016 [0.1909%]
0.0004 [0.3690%]
0.3193
Table 1. Individual Features
Offline PerformanceFeatures TimeToSuccess
(sec)Session Success
PositionTimeToSuccess From
Quickback (sec)SatClick
RateUnSatClick
RateCoverage
XCode + Click -0.0299 [-0.0504%]
-0.0025 [-0.1049%]
-0.0070 [-0.0259%]
0.0022 [0.2420%]
0.0005 [0.3825%]
0.3288
DSSM (Query-Title) + Click
-0.0366 [-0.0617%]
-0.0035 [-0.1492%]
-0.0059 [-0.0220%]
0.0022 [0.2625%]
0.0005 [0.4541%]
0.3193
DSSM (Query-BrokenURL) +
Click
-0.0286 [-0.0483%]
-0.0029 [-0.1209%]
-0.0044 [-0.0163%]
0.0021 [0.2478%]
0.0005 [0.4369%]
0.3202
XCode + Click + DSSM
(Query-Title)
-0.0499 [-0.0841%]
-0.0045 [-0.1911%]
-0.0109 [-0.0403%]
0.0025 [0.2821%]
0.0005 [0.4133%]
CDSSM(Query-Title) + Click
-0.0379 [-0.0639%]
-0.0037 [-0.1548%]
-0.0060 [-0.0224%]
0.0022 [0.2656%]
0.0005 [0.4567%]
0.3193
Table 2. The feature combination
Offline Performance
Features TimeToSuccess (sec)
Session Success Position
TimeToSuccess From Quickback
(sec)
SatClick Rate
UnSatClick Rate
Coverage UnSatClickSwap
6 Click-Based
-0.3933 [-0.5938%]
-0.0357 [-1.2221%]
-0.0806 [-0.2815%]
0.0118 [1.3100%]
0.0025 [1.0925%]
0.79% 286 -> 307 (+21)
3 domain Click
-0.1971 [-0.2976%]
-0.0147 [-0.5044%]
-0.0758 [-0.2648%]
0.0077 [0.8529%]
0.0014 [0.6157%]
0.79% 249 - > 225 (-24)
3 history Click
-0.3637 [-0.5490%]
-0.0357 [-1.2246%]
-0.0487 [-0.1701%]
0.0086 [0.9562%]
0.0030 [1.2915%]
0.79% 128 -> 213 (+85)
Table 3. Domain Clicks and History Clicks Features