View
224
Download
0
Category
Preview:
Citation preview
Optimizing Recommender Systems as a Submodular Bandits Problem
Yisong YueCarnegie Mellon University
Joint work with Carlos Guestrin & Sue Ann Hong
Optimizing Recommender Systems
• Must predict what the user finds interesting
• Receive feedback (training data) “on the fly”
10K articles per day
Must Personalize!
Sports
Like!
Topic # Likes # Displayed Average
Sports 1 1 1
Politics 0 0 N/A
Economy 0 0 N/A
Celebrity 0 0 N/A
Day 1
Topic # Likes # Displayed Average
Sports 1 1 1
Politics 0 1 0
Economy 0 0 N/A
Celebrity 0 0 N/A
Politics
Boo!
Day 2
Topic # Likes # Displayed Average
Sports 1 1 1
Politics 0 1 0
Economy 1 1 1
Celebrity 0 0 N/A
Day 3
Economy
Like!
Topic # Likes # Displayed Average
Sports 1 2 0.5
Politics 0 1 0
Economy 1 1 1
Celebrity 0 0 N/A
Day 4
Boo!
Sports
Topic # Likes # Displayed Average
Sports 1 2 0.5
Politics 0 2 0
Economy 1 1 1
Celebrity 0 0 N/A
Day 5
Boo!
Politics
Topic # Likes # Displayed Average
Sports 1 2 0.5
Politics 0 2 0
Economy 1 1 1
Celebrity 0 0 N/A
Goal: Maximize total user utility(total # likes)
Celebrity EconomyExploit: Explore:
How to behave optimally at each round?
SportsBest:
Often want to recommend multiple articles at a time!
Making Diversified Recommendations
•“Israel implements unilateral Gaza cease-fire :: WRAL.com”•“Israel unilaterally halts fire, rockets persist”•“Gaza truce, Israeli pullout begin | Latest News”•“Hamas announces ceasefire after Israel declares truce - …”•“Hamas fighters seek to restore order in Gaza Strip - World - Wire …”
•“Israel implements unilateral Gaza cease-fire :: WRAL.com”•“Obama vows to fight for middle class”•“Citigroup plans to cut 4500 jobs”•“Google Android market tops 10 billion downloads”•“UC astronomers discover two largest black holes ever found”
Outline
• Optimally diversified recommendations– Minimize redundancy– Maximize information coverage
• Exploration / exploitation tradeoff– Don’t know user preferences a priori– Only receives feedback for recommendations
• Incorporating prior knowledge– Reduce the cost of exploration
•Choose top 3 documents• Individual Relevance: D3 D4 D1• Greedy Coverage Solution: D3 D1 D5
•Choose top 3 documents• Individual Relevance: D3 D4 D1• Greedy Coverage Solution: D3 D1 D5
•Choose top 3 documents• Individual Relevance: D3 D4 D1• Greedy Coverage Solution: D3 D1 D5
•Choose top 3 documents• Individual Relevance: D3 D4 D1• Greedy Coverage Solution: D3 D1 D5
•Choose top 3 documents• Individual Relevance: D3 D4 D1• Greedy Coverage Solution: D3 D1 D5
•Choose top 3 documents• Individual Relevance: D3 D4 D1• Greedy Coverage Solution: D3 D1 D5
•Choose top 3 documents• Individual Relevance: D3 D4 D1• Greedy Coverage Solution: D3 D1 D5
This diminishing returns property is called
submodularity
F (A)
SubmodularCoverage Model
Fc(A) = how well A “covers” c
Diminishing returns: Submodularity
Set of articles: AUser preferences: w
Goal:NP-hard in general
Greedy: (1-1/e) guarantee[Nemhauser et al., 1978]
Submodular Coverage Model• a1 = “China's Economy Is on the Mend, but Concerns Remain”• a2 = “US economy poised to pick up, Geithner says”• a3 = “Who's Going To The Super Bowl?”
• w = [0.6, 0.4] • A = Ø
Submodular Coverage Model• a1 = “China's Economy Is on the Mend, but Concerns Remain”• a2 = “US economy poised to pick up, Geithner says”• a3 = “Who's Going To The Super Bowl?”
• w = [0.6, 0.4] • A = Ø
F1(A+a)-F1(A) F2(A+a)-F2(A)
a1 0.9 0
a2 0.8 0
a3 0 0.5
a1 a2 a3 Best
Iter 1 0.54 0.48 0.2 a1
Iter 2
Incremental BenefitIncremental Coverage
Submodular Coverage Model• a1 = “China's Economy Is on the Mend, but Concerns Remain”• a2 = “US economy poised to pick up, Geithner says”• a3 = “Who's Going To The Super Bowl?”
• w = [0.6, 0.4] • A = {a1}
a1 a2 a3 Best
Iter 1 0.54 0.48 0.2 a1
Iter 2 -- 0.06 0.2 a3
Incremental Coverage Incremental Benefit
F1(A+a)-F1(A) F2(A+a)-F2(A)
a1 -- --
a2 0.1 (0.8) 0 (0)
a3 0 (0) 0.5 (0.5)
Example: Probabilistic Coverage
• Each article a has independent prob. Pr(i|a) of covering topic i.
• Define Fi(A) = 1-Pr(topic i not covered by A)
• Then Fi(A) = 1 – Π(1-P(i|a))
[El-Arini et al., KDD 2009]“noisy or”
Outline
• Optimally diversified recommendations– Minimize redundancy– Maximize information coverage
• Exploration / exploitation tradeoff– Don’t know user preferences a priori– Only receives feedback for recommendations
• Incorporating prior knowledge– Reduce the cost of exploration
Outline
• Optimally diversified recommendations– Minimize redundancy– Maximize information coverage
• Exploration / exploitation tradeoff– Don’t know user preferences a priori– Only receives feedback for recommendations
• Incorporating prior knowledge– Reduce the cost of exploration
Submodular information coverage model• Diminishing returns property, encourages diversity• Parameterized, can fit to user’s preferences• Locally linear (will be useful later)
Learning Submodular Coverage Models
• Submodular functions well-studied – [Nemhauser et al., 1978]
• Applied to recommender systems– Parameterized submodular functions– [Leskovec et al., 2007; Swaminathan et al., 2009; El-Arini et al., 2009]
• Learning submodular functions:– [Yue & Joachims, ICML 2008] – [Yue & Guestrin, NIPS 2011]
Interactively from user feedback
We want to personalize!
Sports
Politics
World
Interactive Personalization
-- -- -- -- --
0 0 1 1 1# Shown
Average Likes : 0
Sports
Politics
World
-- -- 1.0 0.0 0.0
0 0 1 1 1
Average Likes
# Shown
: 1
Interactive Personalization
Sports
Politics
World
Politics
Economy
Sports
-- -- 1.0 0.0 0.0
0 1 2 2 1
Average Likes
# Shown
: 1
Interactive Personalization
Sports
Politics
World
Politics
Economy
Sports
-- 1.0 1.0 0.0 0.0
0 1 2 2 1
Average Likes
# Shown
: 3
Interactive Personalization
Sports
Politics
World
Politics
Economy
Sports
Politics
Economy
Politics
-- 1.0 1.0 0.0 0.0
0 2 4 2 1
Average Likes
# Shown
: 3
Interactive Personalization
Sports
Politics
World
Politics
Economy
Sports
Politics
Economy
Politics
-- 0.5 0.7
5
0.0 0.0
0 2 4 2 1
Average Likes
# Shown
…
: 4
Interactive Personalization
Exploration vs Exploitation
-- 0.5 0.7
5
0.0 0.0
0 2 4 2 1
Average Likes
# Shown
: 4
Goal: Maximize total user utility
PoliticsExploit: Explore:
Celebrity
Economy
Politics
World
Celebrity
Best: World
Politics
World
Linear Submodular Bandits Problem
• For time t = 1…T– Algorithm recommends articles At
– User scans articles in order and rates them • E.g., like or dislike each article (reward)• Expected reward is F(At|w*) (discussed later)
– Algorithm incorporates feedback
[Yue & Guestrin, NIPS 2011]
Regret:
Best possible recommendations
• Opportunity cost of not knowing preferences• “no-regret” if R(T)/T 0
– Efficiency measured by convergence rate
Regret:Time Horizon
Linear Submodular Bandits Problem
Best possible recommendations
[Yue & Guestrin, NIPS 2011]
Local Linearity
)|()|()|( AawwAFwaAF T
IncrementalCoverage
Utility
Previous articles
Current article
User’s preferences
User Model
Politics
Economy
Celebrity
a
a
A A
a
• User scans articles in order• Generates feedback y
• Obeys:
• Independent of other feedback
“Conditional Submodular Independence”
[Yue & Guestrin, NIPS 2011]
Estimating User Preferences
wΔY
=
ObservedFeedback
Submodular CoverageFeatures of Recommendations User
[Yue & Guestrin, NIPS 2011]
Linear regression to estimate w!
Balancing Exploration vs Exploitation
• For each slot:
• Example below: select article on economy
Estimated Gain by Topic Uncertainty of Estimate
+
UncertaintyEstimated gain
Sports
Politics
World
[Yue & Guestrin, NIPS 2011]
Balancing Exploration vs Exploitation
C(a|A) shrinks as roughly: #times topic was shown
Sports
Politics
World
[Yue & Guestrin, NIPS 2011]
Balancing Exploration vs Exploitation
C(a|A) shrinks as roughly: #times topic was shown
Sports
Politics
World
Politics
Economy
Celebrity
[Yue & Guestrin, NIPS 2011]
Balancing Exploration vs Exploitation
C(a|A) shrinks as roughly: #times topic was shown
Sports
Politics
World
Politics
Economy
Celebrity
[Yue & Guestrin, NIPS 2011]
Balancing Exploration vs Exploitation
C(a|A) shrinks as roughly: #times topic was shown
Sports
Politics
World
Politics
Economy
Politics
Economy
Celebrity Sports…
[Yue & Guestrin, NIPS 2011]
C(a|A) shrinks as roughly:
Balancing Exploration vs Exploitation
#times topic was shown
LSBGreedy• Loop:
– Compute least squares estimate– Start with At empty – For i=1,…,L
• Recommend article a that maximizes
– Receive feedback yt,1,…,yt,L
UncertaintyEstimated gain
Least Squares Regression
Regret Guarantee
– Builds upon linear bandits to submodular setting• [Dani et al., 2008; Li et al., 2010; Abbasi-Yadkori et al., 2011]
– Leverages conditional submodular independence
• No-regret algorithm! (regret sub-linear in T)– Regret convergence rate: d/(LT)1/2
– Optimally balances explore/exploit trade-off
[Yue & Guestrin, NIPS 2011]
# Topics
Time Horizon
# Articles per Day
Other Approaches
• Multiplicative Weighting [El-Arini et al. 2009]
– Does not employ exploration– No guarantees (can show doesn’t converge)
• Ranked bandits [Radlinski et al. 2008; Streeter & Golovin 2008]
– Reduction, treats each slot as a separate bandit– Use LinUCB [Dani et al. 2008; Li et al. 2010; Abbasi-Yadkori et al 2011]
– Regret guarantee O(dLT1/2) (factor L1/2 worse)
• ε-Greedy– Explore with probability ε– Regret guarantee O(d(LT)2/3) (factor (LT)1/3 worse)
SimulationsLSBGreedy
RankLinUCB e-GreedyMW
Simulations
LSBGreedy
RankLinUCB e-Greedy
MW
User Study
• Tens of thousands of real news articles
• T=10 days• L=10 articles per day• d=18 topics
• Users rate articles• Count #likes
• Users heterogeneous• Requires personalization
User Study~2
7 us
ers
in st
udy
Subm
odul
ar
Band
its W
ins
StaticWeights
Subm
odul
ar
Band
its W
ins
TiesLosses
MultiplicativeUpdates
(no exploration)
Subm
odul
ar
Band
its W
ins
Ties
Losses
RankLinUCB(doesn’t directlymodel diversity)
Comparing Learned Weights vs MW
MW overfits to“world” topic
Few liked articles. MW did not learn anything
Outline
• Optimally diversified recommendations– Minimize redundancy– Maximize information coverage
• Exploration / exploitation tradeoff– Don’t know user preferences a priori– Only receives feedback for recommendations
• Incorporating prior knowledge– Reduce the cost of exploration
Submodular information coverage model• Diminishing returns property, encourages diversity• Parameterized, can fit to user’s preferences• Locally linear (will be useful later)
Linear Submodular Bandits Problem• Characterizes exploration/exploitation • Provably near-optimal algorithm• User study
The Price of Exploration
• This is the price of exploration– Region of uncertainty depends linearly on |w*|– Region of uncertainty depends linearly on d– Unavoidable without further assumptions
# Topics
Time Horizon
# Articles per dayUser’sPreferences
Have: preferences of previous users
Goal: learn faster for new users?[Yue, Hong & Guestrin, ICML 2012]
Previous Users
Observation: Systems do not serve users in a vacuum
Assumption:
Users are similar to “stereotypes”
Stereotypes described by lowdimensional subspace
Use SVD-style approach to estimate stereotype subspaceE.g., [Argyriou et al., 2007]
[Yue, Hong & Guestrin, ICML 2012]
Have: preferences of previous users
Goal: learn faster for new users?
• Suppose w* mostly in subspace– Dimension k << d– “Stereotypical preferences”
• Two tiered exploration– First in subspace – Then in full space
Suppose:
w*
Original Guarantee:
[Yue, Hong & Guestrin, ICML 2012]
Coarse-to-Fine Bandit Learning
16x Lower Regret!
Coarse-to-Fine Hierarchical Exploration
Loop:Least squares in subspace Least squares in full space Start with At empty For i=1,…,L
Recommend article a that maximizes
Receive feedback yt,1,…,yt,L
Uncertainty in Subspace
Uncertainty inFull Space
regularized to
Simulation Comparison• Naïve (LSBGreedy from before)
• Reshaped Prior in Full Space (LSBGreedy w/ prior)– Estimated using pre-collected user profiles
• Subspace (LSBGreedy on the subspace)– Often what people resort to in practice
• Coarse-to-Fine Approach– Our approach– Combines full space and subspace approaches
Naïve Baselines Reshaped Prior on Full space
SubspaceCoarse-to-Fine Approach“Atypical Users”
[Yue, Hong, Guestrin, ICML 2012]
User StudySimilar setup as before
• T=10 days• L=10 articles per day• d=100 topics• k=5 (5-dim subspace)(estimated from real users)
• Tens of thousands of real news articles
• Users rate articles• Count #likes
User Study~2
7 us
ers
in st
udy
Coar
se-t
o-Fi
ne
Win
s
Naïve LSBGreedyCo
arse
-to-
Fine
W
ins
Ties
Losses
LSBGreedy withOptimal Prior in
Full Space
Learning Submodular Functions
• Parameterized submodular functions – Diminishing returns– Flexible
• Linear Submodular Bandit Problem– Balance Explore/Exploit – Provably optimal algorithms– Faster convergence using prior knowlege
• Practical bandit learning approaches
Research supported by ONR (PECASE) N000141010672 and ONR YIP N00014-08-1-0752
Recommended