62
Recommender Systems Recommender Systems Session C Session C Robin Burke Robin Burke DePaul University DePaul University Chicago, IL Chicago, IL

Recommender Systems Session C Robin Burke DePaul University Chicago, IL

Embed Size (px)

Citation preview

Recommender Recommender SystemsSystemsSession CSession C

Robin BurkeRobin Burke

DePaul UniversityDePaul University

Chicago, ILChicago, IL

Roadmap

Session A: Basic Techniques ISession A: Basic Techniques I– IntroductionIntroduction– Knowledge SourcesKnowledge Sources– Recommendation TypesRecommendation Types– Collaborative RecommendationCollaborative Recommendation

Session B: Basic Techniques IISession B: Basic Techniques II– Content-based RecommendationContent-based Recommendation– Knowledge-based RecommeKnowledge-based Recommendation

Session C: Domains and Implementation ISession C: Domains and Implementation I– Recommendation domainsRecommendation domains– Example ImplementationExample Implementation– Lab I

Session D: Evaluation ISession D: Evaluation I– EvaluationEvaluation

Session E: ApplicationsSession E: Applications– User InteractionUser Interaction– Web PersonalizationWeb Personalization

Session F: Implementation IISession F: Implementation II– LaLab II

Session G: Hybrid RecommendationSession G: Hybrid Recommendation Session H: RobustnessSession H: Robustness Session I: Advanced TopicsSession I: Advanced Topics

– DynamicsDynamics– Beyond accuracyBeyond accuracy

New scheduleNew schedule

TuesdayTuesday– 15:00-18:00 Session C and part of Session E15:00-18:00 Session C and part of Session E– 18:00-20:00 Independent lab (programming)18:00-20:00 Independent lab (programming)

WednesdayWednesday– 8:00-11:00 Session D (Evaluation)8:00-11:00 Session D (Evaluation)– 11:15-13:00 Rest of Session E11:15-13:00 Rest of Session E– 14:30-16:00 Session H (Seminar room IST)14:30-16:00 Session H (Seminar room IST)– 17:00-19:00 Session G17:00-19:00 Session G

Programming assignmentProgramming assignment ThursdayThursday

– 8:00-9:45 Session I8:00-9:45 Session I– 10:00-11:00 Exam10:00-11:00 Exam

ActivityActivity

With your partnerWith your partner Come up with a domain for Come up with a domain for

recommendationrecommendation– Cannot beCannot be

musicmusic moviesmovies booksbooks restaurantsrestaurants

Can't be already the topic of your researchCan't be already the topic of your research 10 minutes10 minutes

Domains?Domains?

CharacteristicsCharacteristics

HeterogeneityHeterogeneity– the diversity of the item spacethe diversity of the item space

RiskRisk– the cost associated with system errorthe cost associated with system error

ChurnChurn– the frequency of changes in the item spacethe frequency of changes in the item space

Interaction styleInteraction style– how users interact with the recommenderhow users interact with the recommender

Preference stabilityPreference stability– the lifetime of user preferencesthe lifetime of user preferences

ScrutabilityScrutability– the requirement for transparencythe requirement for transparency

PortfolioPortfolio– whether recommendation needs to take history into accountwhether recommendation needs to take history into account

NoveltyNovelty– the need for novel / atypical itemsthe need for novel / atypical items

HeterogeneityHeterogeneity

How broad is the range of recommended How broad is the range of recommended items?items?

ExampleExample– NetflixNetflix

movies / TV showsmovies / TV shows diversity of subject matter, etc.diversity of subject matter, etc. still essentially serving the same goal: still essentially serving the same goal:

entertainmententertainment relatively homogeneousrelatively homogeneous

– Amazon.comAmazon.com everything from books to electronics to gardening everything from books to electronics to gardening

toolstools many different goals to be satisfiedmany different goals to be satisfied relatively heterogeneousrelatively heterogeneous

ConsiderationsConsiderations

Homogeneous items can have Homogeneous items can have standardized descriptionsstandardized descriptions– movies have actors, directors, plot movies have actors, directors, plot

summary, etc.summary, etc.– possible to develop a solid body of possible to develop a solid body of

content datacontent data Heterogeneous items will be Heterogeneous items will be

harder to representharder to represent

ImpactImpact

Content knowledge is a problem Content knowledge is a problem in heterogeneous domainsin heterogeneous domains– hard to develop a good schema that hard to develop a good schema that

represents everythingrepresents everything– hard to cover all items with useful hard to cover all items with useful

domain knowledgedomain knowledge Social knowledge is one’s best Social knowledge is one’s best

betbet

RiskRisk

Some products are inherently low riskSome products are inherently low risk– 99 cent music track99 cent music track

Some are not low riskSome are not low risk– a housea house

We meanWe mean– the cost of a false positive accepted by the the cost of a false positive accepted by the

useruser Sometimes false negatives are also costlySometimes false negatives are also costly

– scientific researchscientific research– legal precedentslegal precedents

ConsiderationsConsiderations

In a low risk domainIn a low risk domain– it doesn’t matter so much how we it doesn’t matter so much how we

choosechoose– user will be less likely to have strong user will be less likely to have strong

constraintsconstraints In a high risk domainIn a high risk domain

– important to gather more information important to gather more information about exactly what the requirements about exactly what the requirements areare

ImpactImpact

Pure social recommendation will not Pure social recommendation will not work so well for high risk domainswork so well for high risk domains– inability to take constraints into accountinability to take constraints into account– possibility of biaspossibility of bias

Knowledge-based recommendation Knowledge-based recommendation has great potential in high riskhas great potential in high risk– knowledge engineering costs worthwhileknowledge engineering costs worthwhile– user’s constraints can be employeduser’s constraints can be employed

ChurnChurn

High churn means that items come High churn means that items come and go quicklyand go quickly– newsnews

Low churn items will be around for Low churn items will be around for awhileawhile– booksbooks

In the middleIn the middle– restaurantsrestaurants– package vacationspackage vacations

ConsiderationsConsiderations

New item problemNew item problem– constant in high churn domainsconstant in high churn domains– ConsequenceConsequence

difficult to build up a history of opinionsdifficult to build up a history of opinions

Freshness may matterFreshness may matter– a good match from yesterday might a good match from yesterday might

be worse thanbe worse than– a weaker match from todaya weaker match from today

ImpactImpact

Difficult to employ social Difficult to employ social knowledge aloneknowledge alone– items won’t have big enough profilesitems won’t have big enough profiles

Need flexible representation for Need flexible representation for content datacontent data– since catalog characteristics aren’t since catalog characteristics aren’t

known in advanceknown in advance

Interaction styleInteraction style

Some recommendation scenarios are Some recommendation scenarios are passivepassive– the recommender produces content as the recommender produces content as

part of a web sitepart of a web site Others are activeOthers are active

– the user makes a direct requestthe user makes a direct request Sometimes a quick hit is importantSometimes a quick hit is important

– mobile applicationmobile application Sometimes more extensive exploration Sometimes more extensive exploration

is called foris called for– rental apartmentrental apartment

ConsiderationsConsiderations

Passive style means user Passive style means user requirements are harder to tap intorequirements are harder to tap into– not necessarily impossiblenot necessarily impossible

Brief interactionBrief interaction– means that only small amounts of means that only small amounts of

information can be communicatedinformation can be communicated Long interactionLong interaction

– like web site browsinglike web site browsing– may make up for deficiencies in may make up for deficiencies in

passive data gatheringpassive data gathering

ImpactImpact

Passive interactionsPassive interactions– favor learning-based methodfavor learning-based method– don’t need user requirementsdon’t need user requirements

Active interactions Active interactions – favor knowledge-based interactionsfavor knowledge-based interactions– other techniques don’t adapt quicklyother techniques don’t adapt quickly

Extended, passive interaction Extended, passive interaction – allow large amounts of opinion data to allow large amounts of opinion data to

be gatheredbe gathered

Preference stabilityPreference stability

Are users’ preferences stable over Are users’ preferences stable over time?time?

Some taste domains may be consistentSome taste domains may be consistent– moviesmovies– music (purchasing)music (purchasing)

But others notBut others not– restaurantsrestaurants– music (playlists)music (playlists)

Not the same as churnNot the same as churn– that has to do with items coming and goingthat has to do with items coming and going

ConsiderationsConsiderations

Preference instability makes Preference instability makes opinion data less usefulopinion data less useful

ApproachesApproaches– temporal decaytemporal decay– contextual selectioncontextual selection

Preference stabilityPreference stability– large profiles can be builtlarge profiles can be built

ImpactImpact

Preference instabilityPreference instability– opinion data will be sparseopinion data will be sparse– knowledge-based recommendation knowledge-based recommendation

may be bettermay be better Preference stabilityPreference stability

– best case for learning-based best case for learning-based techniquestechniques

ScrutabilityScrutability

““The property of being testable; open The property of being testable; open to inspection.”to inspection.”– wiktionarywiktionary

Usually refers to explanatory Usually refers to explanatory capabilities of a recommender systemcapabilities of a recommender system

Some domains need explanations of Some domains need explanations of recommendationsrecommendations– usually high riskusually high risk– also domains where users are non-expertsalso domains where users are non-experts

complex products like digital camerascomplex products like digital cameras

ConsiderationsConsiderations

Learning-based recommendations Learning-based recommendations are hard to explainare hard to explain– the underlying models are statisticalthe underlying models are statistical– some research in this area but no some research in this area but no

conclusive “best way” to explainconclusive “best way” to explain

ImpactImpact

Knowledge-based techniques are Knowledge-based techniques are usually more scrutableusually more scrutable

PortfolioPortfolio

The “portfolio effect” occurs when an The “portfolio effect” occurs when an item is purchased or vieweditem is purchased or viewed– and then is no longer interestingand then is no longer interesting

Not always the caseNot always the case– I can recommend your favorite song I can recommend your favorite song

again in a couple of daysagain in a couple of days Sometimes recommendations have to Sometimes recommendations have to

take into account the entire historytake into account the entire history– investments, for exampleinvestments, for example

ConsiderationsConsiderations

A domain with the portfolio effect requires A domain with the portfolio effect requires knowledge of the user’s historyknowledge of the user’s history– standard formulation of collaborative standard formulation of collaborative

recommendationrecommendation– only recommend items that are unratedonly recommend items that are unrated

Music recommender might need to know Music recommender might need to know – when a track was playedwhen a track was played– what a reasonable time-span between repeatswhat a reasonable time-span between repeats– avoid over-rotationavoid over-rotation

News recommendationNews recommendation– tricky because new stories on same topic might be tricky because new stories on same topic might be

interestinginteresting– as long as there is new materialas long as there is new material

ImpactImpact

A problem for content-based A problem for content-based recommendationrecommendation– another copy of an item will match bestanother copy of an item will match best– must have another way to identify must have another way to identify

overlapoverlap– or threshold “not too similar”or threshold “not too similar”

Domain-specific requirements for Domain-specific requirements for rotation and portfolio compositionrotation and portfolio composition– domain knowledge requirementdomain knowledge requirement

NoveltyNovelty

““Milk and bananas”Milk and bananas”– two most-purchased items in US two most-purchased items in US

grocery storesgrocery stores Could recommend to everybodyCould recommend to everybody

– correct very frequentlycorrect very frequently But...But...

– not interestingnot interesting– people know they want these thingspeople know they want these things– profit margin lowprofit margin low– recommender very predictablerecommender very predictable

ConsiderationConsideration

Think about itemsThink about items– where the target users predicted where the target users predicted

rating is significantly higher than the rating is significantly higher than the averageaverage

– where there is high variance where there is high variance (difference of opinion)(difference of opinion)

These recommendations might be These recommendations might be more valuablemore valuable– more “personal”more “personal”

ImpactImpact

Collaborative methods are Collaborative methods are vulnerable to the “tyranny of the vulnerable to the “tyranny of the crowd”crowd”– ““Coldplay” effectColdplay” effect

May be necessary to May be necessary to – smooth popularity spikessmooth popularity spikes– use thresholdsuse thresholds

Categorize DomainsCategorize Domains

15 min15 min 10 min Discussion10 min Discussion

Break

10 minutes

InteractionInteraction

InputInput– implicitimplicit– explicitexplicit

DurationDuration– single responsesingle response– multi-stepmulti-step

ModelingModeling– short-termshort-term– long-termlong-term

Recommendation Recommendation Knowledge Sources Knowledge Sources TaxonomyTaxonomy

RecommendationKnowledge Collaborative

Content

User

OpinionProfiles

DemographicProfiles

Opinions

Demographics

Item Features

Means-ends

DomainConstraints

Contextual Knowledge

Requirements

Query

Constraints

Preferences

Context

DomainKnowledge

FeatureOntology

Also, OutputAlso, Output

How to present results to users?How to present results to users?

InputInput

ExplicitExplicit– ask the user what you want to knowask the user what you want to know

QueriesQueries RatingsRatings PreferencesPreferences

ImplicitImplicit– gather information from behaviorgather information from behavior

RatingsRatings PreferencesPreferences QueriesQueries

Explicit QueriesExplicit Queries

Query elicitation problemQuery elicitation problem ProblemProblem

– How to get the user’s preferred How to get the user’s preferred features / constraints?features / constraints?

IssuesIssues– User expertise / terminologyUser expertise / terminology

ExampleExample

AmbiguityAmbiguity– ““madonna and child”madonna and child”

ImprecisionImprecision– ““a fast processor”a fast processor”

Terminological mismatchTerminological mismatch– ““an iTunes player”an iTunes player”

Lack of awarenessLack of awareness– (I hate lugging a heavy laptop)(I hate lugging a heavy laptop)

FeatureFeatureListsLists Assume Assume

user user familiaritfamiliarityy

Recommendation Recommendation DialogDialog Fewer questionsFewer questions

– future questions can depend on current future questions can depend on current answersanswers

Mixed-initiativeMixed-initiative– recommender can propose solutionsrecommender can propose solutions

CritiquingCritiquing– examining solutions can help users examining solutions can help users

define requirementsdefine requirements– (more about critiquing later)(more about critiquing later)

Implicit EvidenceImplicit Evidence

Watch user’s behaviorWatch user’s behavior– infer preferencesinfer preferences

BenefitBenefit– no extra user effortno extra user effort– no terminological gapno terminological gap

Typical sourcesTypical sources– web server logsweb server logs

more about this latermore about this later– purchase / shopping cart historypurchase / shopping cart history– CRM interactionsCRM interactions

ProblemsProblems

NoiseNoise– gift shoppinggift shopping– distractions on the webdistractions on the web

InterpretationInterpretation– visit = interest?visit = interest?– long stay = interest?long stay = interest?– purchasepurchase

but what about purchase and then return?but what about purchase and then return?

TradeoffsTradeoffs

Explicit Implicit

Plus Direct from userExpressiveLess data needed

Easy to gatherNo user effort

Minus Requires user effortRequires user interface designMay require user expertise

Possibly noisyChallenges in interpretation

ModelingModeling

Short-termShort-term– usually we mean “single-session”usually we mean “single-session”

Long-termLong-term– multi-sessionmulti-session

Long-term ModelingLong-term Modeling

Preferences with a long durationPreferences with a long duration– tend to be generaltend to be general

50s jazz vs Sonny Rollin’s albums on 50s jazz vs Sonny Rollin’s albums on PrestigePrestige

– tend to be personally meaningfultend to be personally meaningful preference for non-smoking hotel roomspreference for non-smoking hotel rooms

– may be not have an non-conscious may be not have an non-conscious componentcomponent

prefer the “look” of certain kinds of housesprefer the “look” of certain kinds of houses

Short-term ModelingShort-term Modeling

What does the user want right What does the user want right now?now?– usually need some kind of queryusually need some kind of query

Preferences with short durationPreferences with short duration– may be very task-specificmay be very task-specific

preference for a train that connects with preference for a train that connects with my arriving flightmy arriving flight

Application designApplication design

Have to consider the role of Have to consider the role of recommendation in the overall recommendation in the overall applicationapplication– how would the user want to how would the user want to

interact?interact?– how can the recommendation be how can the recommendation be

delivered?delivered?

Simple Coding ExerciseSimple Coding Exercise

Recommender systems Recommender systems evaluation frameworkevaluation framework– a bit different than what you would a bit different than what you would

use for a production systemuse for a production system– goal to evaluate different goal to evaluate different

alternativesalternatives

Three exercisesThree exercises

Implement a simple baselineImplement a simple baseline– average predictionaverage prediction

Implement a new similarity metricImplement a new similarity metric– Jaccard coefficientJaccard coefficient

Evaluate results on a data setEvaluate results on a data set

Download

http://www.ist.tugraz.at/rec09.html

Eclipse workspace file– student-ws.zip

StructureStructure

Evaluator

Predictor

Set<Profile>

Profile

Map<Movie, Rating>

RatingMovie

DatasetReader

PredictorPredictorPredictor

ThreePredictor

PearsonPredictor

AvePredictor

JaccardPredictor

initialize( )predict( user, item )

EvaluatorEvaluatorEvaluator

MaeEvaluator RmseEvaluator

evaluate( )

Basic flowBasic flow

Create dataset reader for datasetCreate dataset reader for dataset Read profilesRead profiles Create predictor using profilesCreate predictor using profiles Create an evaluator for the Create an evaluator for the

predictorpredictor Call the evaluate methodCall the evaluate method Output the evaluation statistic Output the evaluation statistic

PearsonPredictor

Similarity caching– We need to calculate each user’s

similarity to the others anyway For each prediction

– Might as well do it only once standard time v space tradeoff

Data sets

Four data sets– Tiny (3 users)

synthetic For unit testing

– Test (5 users) Also synthetic For quick tests

– U-filtered Subset of MovieLens dataset Standard for recommendation research

– u Full MovieLens 100K dataset

Demo

Task

Implement a better baseline ThreePredict is weak

– Better to use the item average

AvePredictor

Non-personalized prediction– What Amazon.com shows

Idea– Cache the average score for an item– When predict(user, item) is called

Ignore the target user Better idea

– Norm for user average– Calculate the average deviation for the item above

each user’s average– Average these deviations– Add to target user’s average

Existing unit test

Compare

With Pearson predictor

Process

Class time scheduled for 18:00-20:00 Use this time to complete the

assignment Due before class tomorrow Work in pairs if you prefer

– Submit by email [email protected]

– subject line: GRAZ H1– body: names of students– attach: AvePredictor.java