Upload
gram
View
42
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Maximum Personalization: User-Centered Adaptive Information Retrieval. ChengXiang (“Cheng”) Zhai Department of Computer Science Graduate School of Library & Information Science Department of Statistics Institute for Genomic Biology University of Illinois at Urbana-Champaign. Happy Users. - PowerPoint PPT Presentation
Citation preview
Maximum Personalization:Maximum Personalization:User-Centered User-Centered
Adaptive Information Retrieval Adaptive Information Retrieval
ChengXiang (“Cheng”) ZhaiDepartment of Computer Science
Graduate School of Library & Information Science
Department of Statistics
Institute for Genomic Biology
University of Illinois at Urbana-Champaign
1Yahoo! Research, Jan. 12, 2011
Happy Users
Yahoo! Research, Jan. 12, 2011 2
Query: avatar hotel
Sad Users
Yahoo! Research, Jan. 12, 2011 3
They’ve got to know the users better!
I work on information retrieval; I searched for similar pages last week; I clicked on AIRS-related pages (including keynote); …
How can search engines better help these users?
Yahoo! Research, Jan. 12, 2011 4
Current Search Engines are Document-Centered
Documents
“airs”Search Engine “airs”
...
It’s hard for a search engine to know everyone well!
Yahoo! Research, Jan. 12, 2011 5
To maximize personalization, we must put a user in the center!
Search Engine
“airs”
...Personalized search agent
WEB
Search Engine
EmailViewed Web pages
QueryHistory
Search Engine
DesktopFiles
Personalized search agent
“airs”
A search agent knows about a particular user very well
Yahoo! Research, Jan. 12, 2011 6
User-Centered Adaptive IR (UCAIR)
• A novel retrieval strategy emphasizing
– user modeling (“user-centered”)
– search context modeling (“adaptive”)
– interactive retrieval
• Implemented as a personalized search agent that
– sits on the client-side (owned by the user)
– integrates information around a user (1 user vs. N sources as opposed to 1 source vs. N users)
– collaborates with each other
– goes beyond search toward task support
Much work has been done on personalization
Yahoo! Research, Jan. 12, 2011 7
• Personalized data collection: Haystack [Adar & Karger 99], MyLifeBit [Gemmell et al. 02], Stuff I’ve Seen [Dumais et al. 03] , Total Recall [Cheng et al. 04], Google desktop search, Microsoft desktop search
• Server-side personalization: My Yahoo! [Manber et al. 00], Personalized Google Search
• Capturing user information & search context: SearchPad [Bharat 00], Watson [Budzik & Hammond 00], Intellizap [Finkelstein et
al. 01], Understanding clickthrough data [Joachmis et al. 05]
• Implicit feedback: SVM [Joachims 02] , BM25 [Teevan et al. 05] , Language models [Shen et al. 05]
However, we are far from
unleashing the full power of personalization
UCAIR is unique in emphasizing maximum exploitation of client-side personalization
Yahoo! Research, Jan. 12, 2011 8
• Benefit of client-side personalization
• More information about the user, thus more accurate user modeling
– Can exploit the complete interaction history (e.g., can easily capture all click-through information and navigation activities)
– Can exploit user’s other activities (e.g., searching immediately after reading an email)
• Naturally scalable
• Alleviate the problem of privacy
• Can potentially maximize benefit of personalization
Maximum Personalization = Maximum User Information Maximum Exploitation of User Info.
Client-Side Agent
(Frequent + Optimal) Adaptation
Yahoo! Research, Jan. 12, 2011 9
Examples of Useful User Information• Textual information
– Current query
– Previous queries in the same search session
– Past queries in the entire search history
• Clicking activities
– Skipped documents
– Viewed/clicked documents
– Navigation traces on non-search results
– Dwelling time
– Scrolling
• Search context
– Time, location, task, …
Yahoo! Research, Jan. 12, 2011 10
Examples of Adaptation• Query formulation
– Query completion: provide assistance while a user enters a query
– Query suggestion: suggest useful related queries
– Automatic generation of queries: proactive recommendation
• Dynamic re-ranking of unseen documents
– As a user clicks on the “back” button
– As a user scrolls down on a result list
– As a user clicks on the “next” button to view more results
• Adaptive presentation/summarization of search results
• Adaptive display of a document: display the most relevant part of a document
Yahoo! Research, Jan. 12, 2011 11
Yahoo! Research, Jan. 12, 2011 12
Challenges for UCAIR• General: how to obtain maximum personalization without
requiring extra user effort?
• Specific challenges
– What’s an appropriate retrieval framework for UCAIR?
– How do we optimize retrieval performance in interactive retrieval?
– How can we capture and manage all user information?
– How can we develop robust and accurate retrieval models to maximally exploit user information and search context?
– How do we evaluate UCAIR methods?
– …
Yahoo! Research, Jan. 12, 2011 13
The Rest of the Talk
• Part I: A decision-theoretic framework for UCAIR
• Part II: Algorithms for personalized search
– Optimize initial document ranking
– Dynamic re-ranking of search results
– Personalize search result presentation
• Part III: Summary and open challenges
Yahoo! Research, Jan. 12, 2011 14
Part I
A Decision-Theoretic Framework for UCAIR
Yahoo! Research, Jan. 12, 2011 15
IR as Sequential Decision Making
User System
A1 : Enter a query Which documents to present?How to present them?
Ri: results (i=1, 2, 3, …)Which documents to view?
A2 : View documentWhich part of the document to show? How?
R’: Document contentView more?
A3 : Click on “Back” button
(Information Need) (Model of Information Need)
Yahoo! Research, Jan. 12, 2011 16
Retrieval Decisions
User U: A1 A2 … … At-1 At
System: R1 R2 … … Rt-1
Given U, C, At , and H, choosethe best Rt from all possibleresponses to At
History H={(Ai,Ri)} i=1, …, t-1
DocumentCollection
C
Query=“Jaguar”
All possible rankings of C
The best ranking for the query
Click on “Next” button
All possible rankings of unseen docs
The best ranking of unseen docs
Rt r(At)
Rt =?
Yahoo! Research, Jan. 12, 2011 17
A Risk Minimization Framework
User: U Interaction history: HCurrent user action: At
Document collection: C
Observed
All possible responses: r(At)={r1, …, rn}
User Model
M=(S, U,… ) Seen docs
Information need
L(ri,At,M) Loss Function
Optimal response: r* (minimum loss)
( )arg min ( , , ) ( | , , , )tt r r A t tM
R L r A M P M U H A C dM ObservedInferredBayes risk
Yahoo! Research, Jan. 12, 2011 18
• Approximate the Bayes risk by the loss at the mode of the posterior distribution
• Two-step procedure
– Step 1: Compute an updated user model M* based on the currently available information
– Step 2: Given M*, choose a response to minimize the loss function
A Simplified Two-Step Decision-Making Procedure
( )
( )
( )
arg min ( , , ) ( | , , , )
arg min ( , , *) ( * | , , , )
arg min ( , , *)
* arg max ( | , , , )
t
t
t
t r r A t tM
r r A t t
r r A t
M t
R L r A M P M U H A C dM
L r A M P M U H A C
L r A M
where M P M U H A C
Yahoo! Research, Jan. 12, 2011 19
Approximately Optimal Interactive Retrieval
User
A1
U C
M*1P(M1|U,H,A1,C)
L(r,A1,M*1)
R1A2
L(r,A2,M*2)
R2
M*2P(M2|U,H,A2,C)
A3 …
Collection
IR system
Many possible actions:-type in a query character- scroll down a page- click on any button -…
Many possible responses:-query completion-display relevant passage-recommendation -clarification-…
Yahoo! Research, Jan. 12, 2011 20
Refinement of Risk Minimization• r(At): decision space (At dependent)
– r(At) = all possible rankings of docs in C
– r(At) = all possible rankings of unseen docs
– r(At) = all possible summarization strategies
– r(At) = all possible ways to diversify top-ranked documents
• M: user model – Essential component: U = user information need
– S = seen documents
– n = “Topic is new to the user”; r=“reading level of user”
• L(Rt ,At,M): loss function
– Generally measures the utility of Rt for a user modeled as M
– Often encodes retrieval criteria, but may also capture other preferences
• P(M|U, H, At, C): user model inference
– Often involves estimating the unigram language model U
– May involve inference of other variables also (e.g., readability, tolerance of redundancy)
Yahoo! Research, Jan. 12, 2011 21
Case 1: Context-Insensitive IR– At=“enter a query Q”
– r(At) = all possible rankings of docs in C
– M= U, unigram language model (word distribution)
– p(M|U,H,At,C)=p(U |Q)
1
1
1 2
( , , ) (( ,..., ), )
( | ) ( || )
( | ) ( | ) ....
( || )
i
i
i t N U
N
i U di
t U d
L r A M L d d
p viewed d D
Since p viewed d p viewed d
the optimal ranking R is given by ranking documents by D
Yahoo! Research, Jan. 12, 2011 22
Case 2: Implicit Feedback
– At=“enter a query Q”
– r(At) = all possible rankings of docs in C
– M= U, unigram language model (word distribution)
– H={previous queries} + {viewed snippets}
– p(M|U,H,At,C)=p(U |Q,H)
1
1
1 2
( , , ) (( ,..., ), )
( | ) ( || )
( | ) ( | ) ....
( || )
i
i
i t N U
N
i U di
t U d
L r A M L d d
p viewed d D
Since p viewed d p viewed d
the optimal ranking R is given by ranking documents by D
Yahoo! Research, Jan. 12, 2011 23
Case 3: General Implicit Feedback
– At=“enter a query Q” or “Back” button, “Next” button
– r(At) = all possible rankings of unseen docs in C
– M= (U, S), S= seen documents
– H={previous queries} + {viewed snippets}
– p(M|U,H,At,C)=p(U |Q,H)
1
1
1 2
( , , ) (( ,..., ), )
( | ) ( || )
( | ) ( | ) ....
( || )
i
i
i t N U
N
i U di
t U d
L r A M L d d
p viewed d D
Since p viewed d p viewed d
the optimal ranking R is given by ranking documents by D
Yahoo! Research, Jan. 12, 2011 24
Case 4: User-Specific Result Summary – At=“enter a query Q”
– r(At) = {(D,)}, DC, |D|=k, {“snippet”,”overview”}
– M= (U, n), n{0,1} “topic is new to the user”
– p(M|U,H,At,C)=p(U, n|Q,H), M*=(*, n*)
( , , ) ( , , *, *)
( , *) ( , *)
( * || ) ( , *)i
i t i i
i i
d id D
L r A M L D n
L D L n
D L n
n*=1 n*=0
i=snippet 1 0i=overview 0 1
( , *)iL n
Choose k most relevant docs If a new topic (n*=1), give an overview summary;otherwise, a regular snippet summary
Yahoo! Research, Jan. 12, 2011 25
Part II. Algorithms for personalized search
- Optimize initial document ranking - Dynamic re-ranking of search results - Personalize search result presentation
Scenario 1: After a user types in a query, how to exploit long-term search history to
optimize initial results?
Yahoo! Research, Jan. 12, 2011 26
Yahoo! Research, Jan. 12, 2011 27
Case 2: Implicit Feedback
– At=“enter a query Q”
– r(At) = all possible rankings of docs in C
– M= U, unigram language model (word distribution)
– H={previous queries} + {viewed snippets}
– p(M|U,H,At,C)=p(U |Q,H)
1
1
1 2
( , , ) (( ,..., ), )
( | ) ( || )
( | ) ( | ) ....
( || )
i
i
i t N U
N
i U di
t U d
L r A M L d d
p viewed d D
Since p viewed d p viewed d
the optimal ranking R is given by ranking documents by D
28
Long-term Implicit Feedback from Personal Search Log
Search interests:user interested in X(champaign, luxury car)
consistent & distinct
Most useful forambiguous queries
Search preferences:For Y, user prefers Xquotes → newcars.com
Most useful forrecurring queries
session
query champaign map......query jaguarquery champaign jaguarclick champaign.il.auto.comquery jaguar quotesclick newcars.com......query yahoo mail......query jaguar quotesclick newcars.com
noise
recurringquery
avg 80 queries / mo
Yahoo! Research, Jan. 12, 2011
29
Estimate Query Language Model using the Entire Search History
q1D1C1
S1
θS1
q2D2C2
S2
θS2
... qt-1Dt-1Ct-1
St-1
θSt-1
θH
qtDt
St
θq
θq,H
λ1?λ2?
λq?
How can we optimize λkand λq?
-Need to distinguish informative/noisy past searches-Need to distinguish queries with strong vs. weak support from history
1-λq
λt-1?
Yahoo! Research, Jan. 12, 2011
30
Adaptive Weighting withMixture Model [Tan et al. 06]
θS1θS2
θSt-1...
θH
θq,H
λ1
λ2λt-1
λqθB
1-λq
θq
λB 1-λB
<d1>jaguar car official site racing<d2>jaguar is a big cat...<d3>local jaguar dealerin champaign...
querypast jaguar searchespast champaign searchesbackground
θmix
select {λ} to maximize P(Dt | θmix)
Dt
EM algorithm
Yahoo! Research, Jan. 12, 2011
31
Sample Results: improving initial ranking with long-term implicit feedback
recurring fresh≫
combination ≈ clickthrough > docs > query, contextless
Yahoo! Research, Jan. 12, 2011
Scenario 2: The user is examining search results, how can we further dynamically optimize search
results based on clickthroughs?
Yahoo! Research, Jan. 12, 2011 32
Yahoo! Research, Jan. 12, 2011 33
Case 3: General Implicit Feedback
– At=“enter a query Q” or “Back” button, “Next” button
– r(At) = all possible rankings of unseen docs in C
– M= (U, S), S= seen documents
– H={previous queries} + {viewed snippets}
– p(M|U,H,At,C)=p(U |Q,H)
1
1
1 2
( , , ) (( ,..., ), )
( | ) ( || )
( | ) ( | ) ....
( || )
i
i
i t N U
N
i U di
t U d
L r A M L d d
p viewed d D
Since p viewed d p viewed d
the optimal ranking R is given by ranking documents by D
Yahoo! Research, Jan. 12, 2011 34
Estimate a Context-Sensitive LM
Q2
C2={C2,1 , C2,2 ,C2,3 ,… }…
C1={C1,1 , C1,2 ,C1,3 ,…} User Clickthrough
Qk
Q1 User Query e.g., Apple software
e.g., Apple - Mac OS X The Apple Mac OS X product page. Describes features in the current version of Mac OS X, …
e.g., Jaguar
1 1 1 1,...,( | ,) ( | ,...,, ) ?k kk kp w p Q CQ Q Cw User Model:
Query History Clickthrough
Yahoo! Research, Jan. 12, 2011 35
Method1: Fixed Coeff. Interpolation (FixInt)
Qk
Q1
Qk-1
…
C1
Ck-1
…
Average user query history and clickthrough
CH
QH1
11
1
( | ) ( | )k
Q iki
p w H p w Q
11
11
( | ) ( | )k
C iki
p w H p w C
1
HLinearly interpolate history models
( | ) ( | ) (1 ) ( | )C Qp w H p w H p w H
k
1
Linearly interpolate current queryand history model
( | ) ( | ) (1 ) ( | )k kp w p w Q p w H
Yahoo! Research, Jan. 12, 2011 36
Method 2: Bayesian Interpolation (BayesInt)
Q1
Qk-1
…
C1
Ck-1
…
Average user query andclickthrough history
CH
QH1
11
1
( | ) ( | )i k
Q iki
p w H p w Q
11
11
( | ) ( | )i k
C iki
p w H p w C
Intuition: trust the current query Qk more if it’s longer
Qk
Dirichlet Prior
( , ) ( | ) ( | )
| |( | ) k Q C
k
c w Q p w H p w H
k Qp w
k
Yahoo! Research, Jan. 12, 2011 37
Method 3: Online Bayesian Updating (OnlineUp)
'1k
Qk k
C2'2
v
Q1 1Intuition: incremental updating of the language model
C1
v'
1( , )
|| )' (
|( | ) i
i
ic p ww Ci C vp w
Q2 2 '
1( ,|
))|
( |( | ) i
i
ic w Q p wi Qp w
Yahoo! Research, Jan. 12, 2011 38
Method 4: Batch Bayesian Update (BatchUp)
C2
1k
…Ck-1
'k
1
1
1
1
( , ) ( | )'
| |( | )
ij kj
ijj
c w C p w
k Cp w
Intuition: all clickthrough data are equally useful
Qk k
Q1 1
C1
1( , ) ( | )| |( | ) i i
i
c w Q p wi Qp w
Q2 2
Yahoo! Research, Jan. 12, 2011 39
Overall Effect of Search Context [Shen et al. 05b]
Query
FixInt
(=0.1,=1.0)
BayesInt
(=0.2,=5.0)
OnlineUp
(=5.0,=15.0)
BatchUp
(=2.0,=15.0)
MAP pr@20 MAP pr@20 MAP pr@20 MAP pr@20
Q3 0.0421 0.1483 0.0421 0.1483 0.0421 0.1483 0.0421 0.1483
Q3+HQ+HC 0.0726 0.1967 0.0816 0.2067 0.0706 0.1783 0.0810 0.2067
Improve 72.4% 32.6% 93.8% 39.4% 67.7% 20.2% 92.4% 39.4%
Q4 0.0536 0.1933 0.0536 0.1933 0.0536 0.1933 0.0536 0.1933
Q4+HQ+HC 0.0891 0.2233 0.0955 0.2317 0.0792 0.2067 0.0950 0.2250
Improve 66.2% 15.5% 78.2% 19.9% 47.8% 6.9% 77.2% 16.4%
• Short-term context helps system improve retrieval accuracy
• BayesInt better than FixInt; BatchUp better than OnlineUp
Yahoo! Research, Jan. 12, 2011 40
Using Clickthrough Data Only
Query MAP pr@20
Q3 0.0421 0.1483
Q3+HC 0.0766 0.2033
Improve 81.9% 37.1%
Q4 0.0536 0.1930
Q4+HC 0.0925 0.2283
Improve 72.6% 18.1%BayesInt (=0.0,=5.0)
Clickthrough is the major contributor
13.9% 67.2%Improve
0.1880.0739Q4+HC
0.1650.0442Q4
42.4%99.7%Improve
0.1780.0661Q3+HC
0.1250.0331Q3
pr@20MAPQuery
Performance on unseen docs
-4.1%15.7%Improve
0.18500.0620Q4+HC
0.19300.0536Q4
23.0%23.8%Improve
0.18200.0521Q3+HC
0.14830.0421Q3
pr@20MAPQuery
Snippets for non-relevant docs are still useful!
Yahoo! Research, Jan. 12, 2011 41
UCAIR Outperforms Google [Shen et al. 05]
PR Curve
Scenario 3: The user has not viewed any document on the first result page and is now clicking on “Next” to view more: how can we
optimize the search results on the next page?
Yahoo! Research, Jan. 12, 2011 42
Problem Formulation
Query: Q
Collection C
1st page
Results
L1
L2
…Lf
Search Engine
N
2nd page Lf+1
Lf+2
…
Lf+r
How to rerank these unseen docs?
…
101st page
U
Seen, Negative
Unseen, To be Reranked
43Yahoo! Research, Jan. 12, 2011
Strategy I: Query Modification
Q
Qnew
Q Qnew
N = {L1, …, L10}
D11
D12
D13
D14
D15
…D1010
D’11
D’12
D’13
D’14
D’15
…D’1010
NΝ D
new DQQ
||
1
parameter
44Yahoo! Research, Jan. 12, 2011
Strategy II: Score Combination
),( DQS neg
),( DQS
),(),( DQSDQS neg
D11 0.05D12 0.04D13 0.04D14 0.03 D15 0.03…D1010 0.01
D11 0.03D12 0.05D13 0.02D14 0.01 D15 0.01…D1010 0.01
D’11 0.04D’12 0.03D’13 0.03D’14 0.01 D’15 0.01…D’1010 0.01
QQneg parameter
45Yahoo! Research, Jan. 12, 2011
Multiple Negative Models
• Negative feedback examples may be quite diverse
– They may distract in totally different ways
– A single negative model is not optimal
• Multiple negative models
– Learn multiple models from N
– Score function for negative query
)),((),(1
k
i
inegneg DQSFDQS
F: aggregation function
Q
Q1neg
Q2neg
Q3neg
Q4neg Q5
neg
Q6neg
46Yahoo! Research, Jan. 12, 2011
Effectiveness of Negative Feedback[Wang et al. 08]
MAP GMAP MAP GMAP
ROBUST+LM ROBUST+VSM
OriginalRank 0.0293 0.0137 0.0223 0.0097
SingleQuery 0.0325 0.0141 0.0222 0.0097
SingleNeg1 0.0325 0.0147 0.0225 0.0097
SingleNeg2 0.0330 0.0149 0.0226 0.0097
MultiNeg1 0.0346 0.0150 0.0226 0.0099
MultiNeg2 0.0363 0.0148 0.0233 0.0100
GOV+LM GOV+VSM
OriginalRank 0.0257 0.0054 0.0290 0.0035
SingleQuery 0.0297 0.0056 0.0301 0.0038
SingleNeg1 0.0300 0.0056 0.0331 0.0038
SingleNeg2 0.0289 0.0055 0.0298 0.0036
MultiNeg1 0.0331 0.0058 0.0294 0.0036
MultiNeg2 0.0311 0.0057 0.0290 0.0036
47Yahoo! Research, Jan. 12, 2011
Scenario 4:Can we leverage user interaction history to personalize result presentation?
Yahoo! Research, Jan. 12, 2011 48
Yahoo! Research, Jan. 12, 2011 49
Need for User-Specific Summaries
Such a snippet summary may be fine for a user who knows about the topic
But for a user who hasn’t been tracking the news, a theme-based overview summary may be more useful
Query = “Asian tsunami”
Yahoo! Research, Jan. 12, 2011 50
A Theme Overview Summary (Asia Tsunami)
Immediate Reports
Statistics of Death and loss
Personal Experience of Survivors
Statistics of further impact
Aid from Local Areas Aid from the world
Donations from countries
Specific Events of Aid
…
…
Lessons from Tsunami Research inspired
Time
Doc1Doc3 Doc ..
Theme Evolutionary transitions
Theme evolution thread
Yahoo! Research, Jan. 12, 2011 51
Risk Minimization for User-Specific Summary
– At=“enter a query Q”
– r(At) = {(D,)}, DC, |D|=k, {“snippet”, “overview”}
– M= (U, n), n{0,1} “topic is new to the user”
– p(M|U,H,At,C)=p(U,n|Q,H), M*=(*, n*)( , , ) ( , , *, *)
( , *) ( , *)
( || ) ( , *)i
i t i i
i i
d id D
L r A M L D n
L D L n
D L n
n*=1 n*=0
i=snippet 1 0i=overview 0 1
( , *)iL n
Task 1 = Estimating n*: p(n=1)p(Q|H)Task 2 = Generating an overview summary
Yahoo! Research, Jan. 12, 2011 52
General problem definition: Given a text collection with time stamps Extract a theme evolution graph Model the life cycles of the most salient themes
Temporal Theme Mining for Generating Overview News Summaries
Time
Theme1.1
T1 T2 Tn…
Theme1.2…
Theme2.1
Theme2.2…
Theme3.1
Theme3.2……
T1 T2 … Tn
Theme A
Theme B
Theme life cycles
Theme evolution graph
Yahoo! Research, Jan. 12, 2011 53
A Topic Modeling Approach [Mei & Zhai 06]
tt11
12
13
21
22
31
3k
PartitioningPartitioning
Theme Evolution GraphTheme Evolution Graph
Extracting global Extracting global salient themessalient themes(mixture model)(mixture model)
…… ……
θθ11 θθ22
θθ33
BB
…… ……
(HMM)(HMM)
Decoding Decoding CollectionCollection
s
tt
Theme Life cyclesTheme Life cycles
tt
Theme extractionTheme extraction((mixture mixture modelsmodels))
……
Collection with time stampsCollection with time stamps
Model theme transitionsModel theme transitions((KL divKL div))
Computing Theme Computing Theme StrengthStrength
t1 t2 t3, …, t
Yahoo! Research, Jan. 12, 2011 54
Theme Evolution Graph: TsunamiT
aid 0.020relief 0.016U.S. 0.013military 0.011U.N. 0.011…
Bush 0.016U.S. 0.015$ 0.009relief 0.008million 0.008…
Indonesian 0.01military 0.01islands 0.008foreign 0.008aid 0.007…
system 0.0104Bush 0.008warning 0.007conference 0.005US 0.005…
system 0.008China 0.007warning 0.005Chinese 0.005…
warning 0.012system 0.012Islands 0.009Japan 0.005quake 0.003…
……
……
……
12/28/04 01/05/05 01/15/05 …
…
Yahoo! Research, Jan. 12, 2011 55
Theme Life Cycles: Tsunami
Aid from the world
$ 0.0173million 0.0135relief 0.0134aid 0.0099U.N. 0.0066 …
Personal experiences
I 0.0322wave 0.0061beach 0.0051saw 0.0046sea 0.0046 …
CNN, Absolute StrengthCNN, Absolute Strength
The UCAIR Prototype System
Yahoo! Research, Jan. 12, 2011 58
• A client-side search agent • Talks to any browser (both Firefox and IE)
http://timan.cs.uiuc.edu/proj/ucair
UCAIR Screen Shots: Immediate Implicit Feedback
Yahoo! Research, Jan. 12, 2011 59
Standard mode Adaptive mode
Screen Shots of UCAIR System: query =“airs accommodation”
Yahoo! Research, Jan. 12, 2011 60
Adaptive modeStandard mode
Screen Shots of UCAIR: “airs regisgtration”
Yahoo! Research, Jan. 12, 2011 61
Adaptive mode Standard mode
UCAIR Screenshots: Search History-Based Recommendation
Yahoo! Research, Jan. 12, 2011 62
Part III. Summary and Open Challenges
Yahoo! Research, Jan. 12, 2011 63
Yahoo! Research, Jan. 12, 2011 64
Summary • One doesn’t fit all; each user needs his/her own search
agent (especially important for long-tail search)
• User-centered adaptive IR (UCAIR) emphasizes
– Collecting maximum amount of user information and search context
– Formal models of user information needs and other user status variables
– Information integration
– Optimizing every response in interactive IR, thus potentially maximizing the effectiveness
• Preliminary results show that
– Implicit user modeling can improve search accuracy in many different ways
Yahoo! Research, Jan. 12, 2011 65
Open Challenges• Formal user models
– More in-depth analysis of user behavior (e.g., why did the user drop a query word and add it again later?)
– Exploit more implicit feedback clues (e.g., dwelling time-based language model)
– Collaborative user modeling (e.g., smoothing of user model)
• Context-sensitive retrieval models based on appropriate loss functions – Optimize long-term utility in interactive retrieval (e.g., active feedback,
exploration-exploitation tradeoff, incorporation of Fuhr’s interactive retrieval model)
– Robust and non-intrusive adaptation (e.g., considering confidence of adaptation)
• UCAIR system extension– Right architecture: client+server? P2P?
– Design of novel interface to facilitate acquisition of user info
– Beyond search to support querying+browsing+recommendation
Final Goal: A unified personal intelligent information agent
Yahoo! Research, Jan. 12, 2011 66
EmailWWW
E-COM
Blog Sports
Literature
IM
Desktop
Intranet
…
User Profile
Intelligent Adaptation
Proactive Info Service
Frequently Accessed Info
SecurityHandler
Task Support…
67
Search
Text
Filtering
Categorization
Summarization
Clustering
Natural Language Content Analysis
Extraction
Mining
VisualizationSearchApplications
MiningApplications
InformationAccess
KnowledgeAcquisition
InformationOrganization
Other Research Work & Roadmap
- Personalized- Retrieval models- Topic map - Recommender
Current focus
-Contextual text mining-Opinion integration-Controversy discovery-Abstractive summarization
Current focus
Web, Email, and Biomedical informatics
Entity/Relation Extraction
Yahoo! Research, Jan. 12, 2011
68
Towards Next-Generation Search Engines
Bag of words
Search
Keyword Queries
Access
Mining
Task Support
Entities-Relations
Knowledge Representation
Search History
Complete User Model
Current Search Engine
1.Personalization(User Modeling)
2. Large-Scale Semantic Analysis
3. Full-Fledged Text Info. Management
Yahoo! Research, Jan. 12, 2011
+ Social Networks
+ Task Environment
+Information Networks
Multiresolution Topic Map for Browsing[Want et al. 2009]
Yahoo! Research, Jan. 12, 2011 69
Turn search logs into a topic map
Naturally support collaborative surfing
Browse logs offer more opportunities
to understand user interests and intents
Make browsing a “first-class citizen”!
70
Multi-Faceted Sentiment Summary [Mei et al. 07]
(query=“Da Vinci Code”)Neutral Positive Negative
Facet 1:Movie
... Ron Howards selection of Tom Hanks to play Robert Langdon.
Tom Hanks stars in the movie,who can be mad at that?
But the movie might get delayed, and even killed off if he loses.
Directed by: Ron Howard Writing credits: Akiva Goldsman ...
Tom Hanks, who is my favorite movie star act the leading role.
protesting ... will lose your faith by ... watching the movie.
After watching the movie I went online and some research on ...
Anybody is interested in it?
... so sick of people making such a big deal about a FICTION book and movie.
Facet 2:Book
I remembered when i first read the book, I finished the book in two days.
Awesome book. ... so sick of people making such a big deal about a FICTION book and movie.
I’m reading “Da Vinci Code” now.
…
So still a good book to past time.
This controversy book cause lots conflict in west society.
Latent Aspect Rating Analysis [Wang et al. 2010]
71
Reviews + overall ratings Aspect segments
location:1amazing:1walk:1anywhere:1
0.10.70.10.9
nice:1accommodating:1smile:1friendliness:1attentiveness:1
Term weights Aspect Rating
0.00.90.10.3
room:1nicely:1appointed:1comfortable:1
0.60.80.70.80.9
Aspect Segmentation
Latent Rating Regression
1.3
1.8
3.8
Aspect Weight
0.2
0.2
0.6
+
Aspect ratings?Weights on aspects?
Reviewer Behavior Analysis & Personalized Ranking of Entities
72
People like cheap hotels because of good value
People like expensive hotels because of good service
Query: 0.9 value 0.1 others
Non-Personalized
Personalized
Major References • Xuehua Shen, Bin Tan, and ChengXiang Zhai, Implicit User Modeling for Personalized Search , In Proceedings of
CIKM 2005 pages 824-831.
• Xuehua Shen, Bin Tan, ChengXiang Zhai, Context-Sensitive Information Retrieval with Implicit Feedback, Proceedings of SIGIR 2005, 43-50, 2005.
• Bin Tan, Xuehua Shen, ChengXiang Zhai, Mining long-term search history to improve search accuracy , Proceedings of KDD 2006, pages 718-723.
• Xuanhui Wang, Hui Fang, ChengXiang Zhai. A study of methods for negative relevance feedback , Proceedings of SIGIR 2008, pages 219-226.
• Qiaozhu Mei, ChengXiang Zhai, Discovering Evolutionary Theme Patterns from Text -- An Exploration of Temporal Text Mining, Proceedings of KDD 2005, pages 198-207.
• Maryam Karimzadehgan, ChengXiang Zhai: Exploration-exploitation tradeoff in interactive relevance feedback. In Proceedings of CIKM 2010, pages1397-1400.
• Norbert Fuhr: A probability ranking principle for interactive information retrieval. Information Retrieval 11(3): 251-265 (2008)
• Xuanhui Wang, Bin Tan, Azadeh Shakery, ChengXiang Zhai, Beyond Hyperlinks: Organizing Information Footprints in Search Logs to Support Effective Browsing, Proceedings of CIKM 2009, pages 1237-1246, 2009.
• Qiaozhu Mei, Xu Ling, Matthew Wondra, Hang Su, ChengXiang Zhai, Topic Sentiment Mixture: Modeling Facets and Opinions in Weblogs, Proceedings of WWW 2007, pages 171-180.
• Hongning Wang, Yue Lu, ChengXiang Zhai. Latent Aspect Rating Analysis on Review Text Data: A Rating Regression Approach, Proceedings of KDD 2010, pages 115-124, 2010
Yahoo! Research, Jan. 12, 2011 73
More information can be found at http://timan.cs.uiuc.edu/
Looking forward to opportunities for collaborations…
Thank You!
Yahoo! Research, Jan. 12, 2011 74
Acknowledgments
Yahoo! Research, Jan. 12, 2011 75
Funding Support
Joint work with Xuehua Shen, Bin Tan, Xuanhui Wang, Qiaozhu Mei,
Hongning Wang, and other TIMAN group members
…