26
Evaluating User Evaluating User Feedback Systems Feedback Systems Worcester Polytechnic Institute Worcester Polytechnic Institute Kevin Menard Kevin Menard April 13, 2005 April 13, 2005

Thesis Presentation

Embed Size (px)

DESCRIPTION

The presentation I gave to the WPI Computer Science department in defense of my Master's thesis.

Citation preview

Page 1: Thesis Presentation

Evaluating User Evaluating User Feedback SystemsFeedback Systems

Worcester Polytechnic InstituteWorcester Polytechnic Institute

Kevin MenardKevin Menard

April 13, 2005April 13, 2005

Page 2: Thesis Presentation

ProblemProblem Lots of information split over many documentsLots of information split over many documents

Search engines are now a necessitySearch engines are now a necessity

Search engines are “dumb”Search engines are “dumb” Document relevance is a mathematical formula, not Document relevance is a mathematical formula, not

a user ratinga user rating Easy to foolEasy to fool Hard to find good info if in a “non-conforming” Hard to find good info if in a “non-conforming”

formatformat

Users know relevance values but can’t be Users know relevance values but can’t be bothered bothered

Page 3: Thesis Presentation

SolutionSolution

Use implicit user behavior in place of explicit Use implicit user behavior in place of explicit feedback ratingsfeedback ratings

WPI Curious BrowserWPI Curious Browser Discovered a set of implicit indicators that highly Discovered a set of implicit indicators that highly

correlated with feedback valuescorrelated with feedback values

Microsoft Curious BrowserMicrosoft Curious Browser Built upon WPI work and collected user feedbackBuilt upon WPI work and collected user feedback Used to train classifier with explicit & implicit Used to train classifier with explicit & implicit

data to provide predictions of web page relevancedata to provide predictions of web page relevance

Page 4: Thesis Presentation

Our WorkOur Work Investigate value of “voluntary” dataInvestigate value of “voluntary” data

Previous work only used “mandatory” dataPrevious work only used “mandatory” data

Mandorvol BrowserMandorvol Browser Extension of MS Curious BrowserExtension of MS Curious Browser Collects data using both voluntary & Collects data using both voluntary &

mandatory feedback mechanismsmandatory feedback mechanisms Collects data in Collects data in controlledcontrolled & & uncontrolleduncontrolled

scenariosscenarios

Page 5: Thesis Presentation

Mandorvol BrowserMandorvol Browser Uncontrolled scenarioUncontrolled scenario

User simply searches for anything on GoogleUser simply searches for anything on Google

Controlled scenarioControlled scenario User is given Excel tasks to completeUser is given Excel tasks to complete

Most people have experience with it, but it’s Most people have experience with it, but it’s complex enough that tasks can be chosen that will complex enough that tasks can be chosen that will require helprequire help

Search is limited to Excel help assetsSearch is limited to Excel help assets Search is performed via custom Java web application Search is performed via custom Java web application

that provides a Google-like interface to Excel help that provides a Google-like interface to Excel help assetsassets

Page 6: Thesis Presentation

Informal HypothesesInformal Hypotheses

H1: Quality of voluntary data will be higherH1: Quality of voluntary data will be higher Users will only offer feedback if they wantUsers will only offer feedback if they want Good for classifiersGood for classifiers

H2: Quantity of mandatory data will be H2: Quantity of mandatory data will be greatergreater Users must provide feedback for each pageUsers must provide feedback for each page Also good for classifiersAlso good for classifiers

H3: Quantity of controlled data will be lowerH3: Quantity of controlled data will be lower Users completing tasks don’t want to be botheredUsers completing tasks don’t want to be bothered

Page 7: Thesis Presentation

TimelineTimeline

2004:2004: Development: Aug. – Nov.Development: Aug. – Nov. Pilot Studies: Nov. Pilot Studies: Nov. Dev, Testing, Deployment: Dev, Testing, Deployment:

Dec. - Feb.Dec. - Feb.

2005:2005: Major Study: March - AprilMajor Study: March - April Rudimentary Analysis: April Rudimentary Analysis: April

– May– May Detailed Analysis:Detailed Analysis:

Sep. – Dec.Sep. – Dec.

2006:2006: Conclusions & Thesis Write-Conclusions & Thesis Write-

up: Jan. - Aprilup: Jan. - April

Page 8: Thesis Presentation

Pilot StudiesPilot Studies GoalsGoals

Test voluntary feedback mechanismTest voluntary feedback mechanism Test tasks for controlled situationsTest tasks for controlled situations

Key observationsKey observations Feedback band location mattersFeedback band location matters

Horizontal VS verticalHorizontal VS vertical ““Banner ad” effectBanner ad” effect

Vertical band with bright colorsVertical band with bright colors Double evaluationDouble evaluation

Task-oriented users don’t provide feedback once Task-oriented users don’t provide feedback once they solve their problemthey solve their problem

Page 9: Thesis Presentation

StudyStudy Ran for two months in two phasesRan for two months in two phases 161 total users across four experiment types161 total users across four experiment types

Mandatory Controlled (28)Mandatory Controlled (28) Mandatory Uncontrolled (45)Mandatory Uncontrolled (45) Voluntary Controlled (48)Voluntary Controlled (48) Voluntary Uncontrolled (40)Voluntary Uncontrolled (40)

ControlledControlled UncontrolledUncontrolled

MandatoryMandatory 17.39%17.39% 27.95%27.95%

VoluntaryVoluntary 29.81%29.81% 24.84%24.84%

Page 10: Thesis Presentation

FeedbackFeedback Feedback RatioFeedback Ratio

Amount of feedback / # search resultsAmount of feedback / # search results

Feedback OpportunitiesFeedback Opportunities Amount of feedback / # of opportunities to give Amount of feedback / # of opportunities to give

feedbackfeedback

ControlledControlled UncontrolledUncontrolled

MandatoryMandatory 0.9460431650.946043165 0.9776902890.977690289

VoluntaryVoluntary 0.7457627120.745762712 0.9181494660.918149466

ControlledControlled UncontrolledUncontrolled

MandatoryMandatory 0.6261904760.626190476 0.5735180910.573518091

VoluntaryVoluntary 0.4086687310.408668731 0.6063454760.606345476

Page 11: Thesis Presentation

Feedback DistributionFeedback Distribution

Normalized Normalized No feedbackNo feedback not considered not considered

SatisfiedSatisfied Partially Partially SatisfiedSatisfied

DissatisfiedDissatisfied

Mandatory Mandatory ControlledControlled

29.66%29.66% 23.57%23.57% 46.77%46.77%

Mandatory Mandatory UncontrolledUncontrolled

46.85%46.85% 22.28%22.28% 30.87%30.87%

Voluntary Voluntary ControlledControlled

50.76%50.76% 16.67%16.67% 32.58%32.58%

Voluntary Voluntary UncontrolledUncontrolled

49.42%49.42% 21.71%21.71% 28.88%28.88%

Page 12: Thesis Presentation

Feedback Distribution Feedback Distribution (Cont.)(Cont.)

Not normalizedNot normalized No feedbackNo feedback values included values included

SatisfiedSatisfied Partially Partially SatisfiedSatisfied

DissatisfiedDissatisfied No No FeedbackFeedback

Mandatory Mandatory ControlledControlled

28.06%28.06% 22.30%22.30% 44.24%44.24% 5.40%5.40%

Mandatory Mandatory UncontrolledUncontrolled

45.80%45.80% 21.78%21.78% 30.18%30.18% 2.23%2.23%

Voluntary Voluntary ControlledControlled

37.85%37.85% 12.43%12.43% 24.29%24.29% 25.42%25.42%

Voluntary Voluntary UncontrolledUncontrolled

45.37%45.37% 19.93%19.93% 26.51%26.51% 8.19%8.19%

Page 13: Thesis Presentation

High-level AnalysisHigh-level Analysis A distinctive voluntary feedback A distinctive voluntary feedback

mechanism yields high quantity feedbackmechanism yields high quantity feedback Data could be skewed by nature of studyData could be skewed by nature of study

Users more apt to give feedback when Users more apt to give feedback when searching leisurely in a known domainsearching leisurely in a known domain E.g., I search for “drums” and I know what to expect in E.g., I search for “drums” and I know what to expect in

the search results list -- I can better evaluate themthe search results list -- I can better evaluate them

Users more apt to give Users more apt to give SatisfiedSatisfied feedback feedback when searching leisurelywhen searching leisurely

Page 14: Thesis Presentation

In-depth AnalysisIn-depth Analysis

What?What? Build decision trees to investigate data qualitiesBuild decision trees to investigate data qualities

How?How? Weka – Open-source machine learning toolWeka – Open-source machine learning tool

Why?Why? Similar to previous work – provides validationSimilar to previous work – provides validation Relates back to original problem of improving Relates back to original problem of improving

search resultssearch results

Page 15: Thesis Presentation

Decision TreesDecision Trees

(PagePosition ≤ 1) (PagePosition ≤ 1) (LinkTextLength ≤ 5) (LinkTextLength ≤ 5) Satisfied Satisfied (PagePosition ≤ 1) (PagePosition ≤ 1) (LinkTextLength > 5) (LinkTextLength > 5) … … … … (1 < PagePosition ≤ 5) (1 < PagePosition ≤ 5) … … … … (PagePosition > 5) (PagePosition > 5) … … … …

PagePosition

PagePositionLinkTextLength

Satisfied

≤ 1 > 1

≤ 5> 5 ≤ 5 > 5

… … …

Page 16: Thesis Presentation

Data PreparationData Preparation Data pulled from DB and turned into Weka fileData pulled from DB and turned into Weka file

14 Data attributes14 Data attributes Behavior type, behavior URL length, dwell time, Behavior type, behavior URL length, dwell time,

page count in session, page or in search result list, page count in session, page or in search result list, page order in all search result lists, search result page order in all search result lists, search result URL length, link text length, page description URL length, link text length, page description length, script length, file size, image count, exit length, script length, file size, image count, exit type, feedback valuetype, feedback value

Allowed J48 to handle continuous dataAllowed J48 to handle continuous data Allowed J48 to handle missing valuesAllowed J48 to handle missing values

Script length, file size, image count, & exit type Script length, file size, image count, & exit type onlyonly

Page 17: Thesis Presentation

Classifier TypeClassifier Type Why J48?Why J48?

Easy to read rules are importantEasy to read rules are important Interested in causal relationshipsInterested in causal relationships Performs wellPerforms well

Graph of various classifiers:Graph of various classifiers:

rules.ZeroR '' trees.J48 '-C 0.25 -B -M 2' trees.J48 '-R -N 3 -Q 1 -M 2' trees.J48 '-R -N 3 -Q 1 -B -M 2' trees.J48 '-S -C 0.25 -M 2' trees.J48 '-S -C 0.25 -B -M 2' trees.J48 '-S -R -N 3 -Q 1 -B -M 2' trees.J48 '-U -M 2' rules.OneR '-B 6' trees.J48 '-C 0.25 -M 2'

Data Set

0 1 2 3 4 5 6 7 8 9 10

Cla

ssifi

catio

n A

ccur

acy

40

50

60

70

80

Page 18: Thesis Presentation

Optimizing TreesOptimizing Trees

Tree Size VS AccuracyTree Size VS Accuracy Occam’s RazorOccam’s Razor

Fewer rules create more general treesFewer rules create more general trees Classification accuracyClassification accuracy

Too few rules may not accurately model the Too few rules may not accurately model the domaindomain

PragmatismPragmatism Larger trees take longer to build and useLarger trees take longer to build and use

Page 19: Thesis Presentation

Tree Pruning Effects – Tree Pruning Effects – Classification AccuracyClassification Accuracy

Data Set

0 1 2 3 4 5 6 7 8 9 10

Cla

ssifi

catio

n A

ccur

acy

(%)

70

71

72

73

74

75

76

77

trees.J48 '-C 0.2 -M 2' trees.J48 '-C 0.15 -M 2' trees.J48 '-C 0.05 -M 2' trees.J48 '-C 0.3 -M 2' trees.J48 '-C 0.25 -M 2' trees.J48 '-C 0.1 -M 2'

Page 20: Thesis Presentation

Tree Pruning Effects – Tree Pruning Effects – Number of RulesNumber of Rules

Data Set

0 1 2 3 4 5 6 7 8 9 10

Num

ber

of R

ules

0

100

200

300

400

500

600

trees.J48 '-C 0.2 -M 2' trees.J48 '-C 0.15 -M 2' trees.J48 '-C 0.05 -M 2' trees.J48 '-C 0.3 -M 2' trees.J48 '-C 0.25 -M 2' trees.J48 '-C 0.1 -M 2'

Page 21: Thesis Presentation

ResultsResults

Mandatory Mandatory ControlledControlled

Instances: 362 (20 users)Instances: 362 (20 users)

# of Rules: 28# of Rules: 28

Tree Size: 55Tree Size: 55

Accuracy: 67.33%Accuracy: 67.33%

Mandatory Mandatory UncontrolledUncontrolled

Instances: 2050 (37 users)Instances: 2050 (37 users)

# of Rules: 168# of Rules: 168

Tree Size: 329Tree Size: 329

Accuracy: 67.32%Accuracy: 67.32%

Voluntary ControlledVoluntary Controlled

Instances: 398 (29 users)Instances: 398 (29 users)

# of Rules: 32# of Rules: 32

Tree Size: 61Tree Size: 61

Accuracy: 74.18%Accuracy: 74.18%

Voluntary Voluntary UncontrolledUncontrolled

Instances: 1348 (31 users)Instances: 1348 (31 users)

# of Rules: 114# of Rules: 114

Tree Size: 221Tree Size: 221

Accuracy: 70.10%Accuracy: 70.10%

Page 22: Thesis Presentation

Daily Classifier ResultsDaily Classifier Results

Mandatory Mandatory Uncontrolled ->Uncontrolled ->

<- Voluntary <- Voluntary UncontrolledUncontrolled

Days

0 5 10 15 20 25 30 35

% C

orre

ct

40

60

80

100

± 1 Std. Dev.Avg. % CorrectS.W. 0.05 CI**S.B. 0.05 CI*

Days

0 5 10 15 20 25 30 35

% C

orre

ct

40

60

80

100

± 1 Std. Dev.Avg. % CorrectS.B. 0.05 CI*

Page 23: Thesis Presentation

ConclusionsConclusions Mandatory feedback mechanism collects more data Mandatory feedback mechanism collects more data

(supports H2)(supports H2) May not be important – voluntary feedback mechanism May not be important – voluntary feedback mechanism

collects “enough”collects “enough”

Voluntary classifiers > Mandatory classifiersVoluntary classifiers > Mandatory classifiers Voluntary data is higher quality (supports H1)Voluntary data is higher quality (supports H1)

Controlled classifiers > Uncontrolled classifiersControlled classifiers > Uncontrolled classifiers Controlled search results are better definedControlled search results are better defined

Search domain affects feedback values & data Search domain affects feedback values & data quantityquantity Task-oriented VS leisurely browsingTask-oriented VS leisurely browsing Controlled collects less data than uncontrolled (supports H3, Controlled collects less data than uncontrolled (supports H3,

although only with voluntary feedback mechanism)although only with voluntary feedback mechanism)

Page 24: Thesis Presentation

Future WorkFuture Work

Investigate better voluntary feedback Investigate better voluntary feedback mechanismsmechanisms

More diversified populationMore diversified population

Try non-Web browser contextTry non-Web browser context

Page 25: Thesis Presentation

Daily Experiment GrowthDaily Experiment Growth

0

20

40

60

80

100

120

140

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33

Day

To

tal N

um

ber

of

Par

tici

pan

ts

Page 26: Thesis Presentation

AcknowledgementsAcknowledgements

Many thanks are extended to:Many thanks are extended to: Prof. BrownProf. Brown Prof. ClaypoolProf. Claypool Prof. PolliceProf. Pollice The NUI group at MicrosoftThe NUI group at Microsoft The CCC staffThe CCC staff Melanie BolducMelanie Bolduc Friends & familyFriends & family