45
Mining User Behavior Mining User Behavior Eugene Agichtein Eugene Agichtein Mathematics & Computer Science Mathematics & Computer Science Emory University Emory University

sdm2008 user behavior - mathcs.emory.edueugene/talks/sdm2008_user_behavior.pdfrules, test/modify based on patient DB Personalized diagnosis and care (PRETEX project): – Extract clinical

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

  • 11

    Mining User BehaviorMining User Behavior

    Eugene AgichteinEugene AgichteinMathematics & Computer ScienceMathematics & Computer ScienceEmory UniversityEmory University

  • 22

    The Big Picture:The Big Picture:Intelligent Information AccessIntelligent Information Access

  • 33

    Text Mining for Patient Medical CareText Mining for Patient Medical Carewith with E. V. GarciaE. V. Garcia (Emory (Emory SoMSoM) ) andand A. RamA. Ram (Georgia Tech(Georgia Tech))

    Rule Discovery from Medical Rule Discovery from Medical Literature (MERLIN Literature (MERLIN project):project):–– Identify articles containing Identify articles containing useful clinical knowledgeuseful clinical knowledge–– Extract new expert system Extract new expert system rules, test/modify based on rules, test/modify based on patient DBpatient DB

    Personalized diagnosis and Personalized diagnosis and care (PRETEX project):care (PRETEX project):–– Extract clinical variables from Extract clinical variables from text in patient recordstext in patient records–– Personalize expert system rules Personalize expert system rules for a given patient or populationfor a given patient or population–– Automatically identify harmful Automatically identify harmful drug interactions and side drug interactions and side effectseffects

  • 44

    Mining Textual Data in Patient Electronic Medical Records

  • 55

    More info: Archana Bhattarai et al., poster at reception this evMore info: Archana Bhattarai et al., poster at reception this eveningening

  • 66

    Example rule:IF IF LV_stress_perfusion_is_abnormalLV_stress_perfusion_is_abnormalTHEN THEN STRONG POSITIVE EVIDENCESTRONG POSITIVE EVIDENCETHAT THAT Diseased_coronary_is(LADDiseased_coronary_is(LAD))

    From Medical Literature to Structured Clinical Knowledge

  • 77

    Baoli Li et al., poster at reception this eveningBaoli Li et al., poster at reception this evening

  • 88

    This study claims WHAT?!?This study claims WHAT?!?�� If itIf it’’s printed, must be trues printed, must be true

    –– Published studies are never Published studies are never disprovendisproven–– Experimental study data is never massagedExperimental study data is never massaged

    �� Big Big PharmaPharma funding funding �� overstated claimsoverstated claimsR. Smith, 2005:R. Smith, 2005: Medical journals are an extension Medical journals are an extension

    of the marketing arm of pharmaceutical of the marketing arm of pharmaceutical companiescompanies, PLoS Medicine, PLoS Medicine

    �� How to evaluate quality/soundness of How to evaluate quality/soundness of (medical) scientific literature?(medical) scientific literature?

  • 99

    www.falsemed.orgwww.falsemed.org

  • 1010

    ChallengesChallenges�� Authority and trust of contributions, ratings, etc.Authority and trust of contributions, ratings, etc.�� Indicate authority while protecting privacy of Indicate authority while protecting privacy of

    contributorscontributors�� Many dimensions of quality (biomedical literature)Many dimensions of quality (biomedical literature)

    –– Equipment sensitivityEquipment sensitivity–– RecencyRecency (studies grow obsolete)(studies grow obsolete)–– Size of the clinical trialSize of the clinical trial–– Correlational vs. controlledCorrelational vs. controlled–– Cohort randomizationCohort randomization–– ……

    �� Work in progressWork in progress

  • 1111

    The Big Picture:The Big Picture:Intelligent Information AccessIntelligent Information Access

  • 1212

    Social mediaSocial media: Planetary: Planetary--scale scale human behavior experimenthuman behavior experiment�� Real information needs and Real information needs and subjectivesubjective

    relevance judgmentsrelevance judgments�� Traces of many interactions recordedTraces of many interactions recorded�� Allows shared, reproducible experimentsAllows shared, reproducible experiments�� Some semantic organization (tags, Some semantic organization (tags,

    categories)categories)

  • 1313

    Some ExamplesSome Examples

  • 1414

    Traditional vs. social mediaTraditional vs. social media

  • 1515

  • 1616

  • 1717

  • 1818

  • 1919

  • 2020

  • 2121

  • 2222

  • 2323

  • 2424

    CommunityCommunity

  • 2525

  • 2626

  • 2727

  • 2828

  • 2929

  • 3030

  • 3131

  • 3232

    How to find How to find relevantrelevant and and highhigh--qualityquality content in content in social media?social media?

  • 3333

    LearningLearning--based Approachbased Approach

    Content features

    Community interaction Features

    relevance

    Quality

    Unified Ranking Function

  • 3434

    Ranking Algorithm Ranking Algorithm –– GBrank GBrank [[ZhengZheng 2007]2007]�� Start with an initial guess Start with an initial guess hh00, for , for kk = 1,2,= 1,2,……�� Using Using hhkk--11 as the current approximation of as the current approximation of hh, we separate , we separate SS into two into two disjoint setsdisjoint sets

    �� Fit a regression function Fit a regression function ggkk(x)(x) using Gradient Boosting Tree using Gradient Boosting Tree [Friedman 2001] and the following training data[Friedman 2001] and the following training data

    �� Form the new ranking function asForm the new ranking function as

    1 1

    1 1

    { , | ( ) ( ) }

    { , | ( ) ( ) }i i k i k i

    i i k i k i

    S x y S h x h y

    S x y S h x h y

    ττ

    +− −

    −− −

    = < >∈ ≥ +

    = < >∈ < +

    1 1{( , ( ) ), ( , ( ) ) | , }i k i i k i i ix h y y h x x y Sτ τ−

    − −+ − < >∈

    1( ) ( )( )1

    k kk

    kh x g xh x

    k

    η− +=+

  • 3535

    Experimental ResultsExperimental Results

    Removing textual features

    Removing community interaction features

    Baseline

    GBrank

  • 3636

    YouYou’’ve Got Answers!ve Got Answers!Predicting asker satisfactionPredicting asker satisfaction�� Predict user Predict user satisfactionsatisfaction with the answerswith the answers�� Derive additional features for the task, in Derive additional features for the task, in particular prior particular prior askerasker historyhistory�� Can predict with about 75% accuracyCan predict with about 75% accuracy(forthcoming, Liu, Bian, Agichtein, SIGIR 2008)(forthcoming, Liu, Bian, Agichtein, SIGIR 2008)�� Satisfaction Satisfaction subjectivesubjective and and personalpersonal. Even . Even simple personalization models very helpfulsimple personalization models very helpful(forthcoming, Liu and Agichtein, ACL 2008)(forthcoming, Liu and Agichtein, ACL 2008)

  • 3737

    Intelligent Information AccessIntelligent Information Access

  • 3838

    User Behavior:User Behavior:The 3The 3rdrd Dimension of the WebDimension of the Web

    �� Amount exceeds web Amount exceeds web content and structurecontent and structure–– Published: 4Gb/day; Social Media: 10gb/Day Published: 4Gb/day; Social Media: 10gb/Day –– Page views: Page views: 100Gb/day100Gb/day[Andrew Tomkins, Yahoo! Search, 2007][Andrew Tomkins, Yahoo! Search, 2007]

  • 3939

    Clickthrough for Queries with Known Clickthrough for Queries with Known Position of Top Relevant ResultPosition of Top Relevant Result

    Relative clickthrough for queries with known relevant results in position 1 and 3 respectively

    1 2 3 5 10

    Result Position

    Rel

    ativ

    e C

    lick

    Fre

    qu

    ency

    All queries

    PTR=1

    PTR=3

    Higher clickthrough at top non-relevant than at top

    relevant document

    E. Agichtein, E. Brill, and S. Dumais, SIGIR 2006

  • 4040

    Full Search Engine, User Behavior: Full Search Engine, User Behavior: NDCG, MAPNDCG, MAP

    0.056 (23.71%)0.292BM25+ALL

    0.236BM25

    0.052 (19.13%)0.321RN+ALL

    0.270RN

    GainMAP

    0.56

    0.58

    0.6

    0.62

    0.64

    0.66

    0.68

    0.7

    0.72

    0.74

    1 2 3 4 5 6 7 8 9 10K

    ND

    CG

    RNRerank-AllRN+All

  • 4141

    User Behavior Complements Content User Behavior Complements Content and Web Topology and Web Topology

    0.45

    0.5

    0.55

    0.6

    0.65

    0.7

    1 3 5 10K

    Pre

    cisi

    on

    RNRN+AllBM25BM25+All

    0.162 (31%)0.687BM25+All

    0.525BM25

    0.061(10%)0.693RN + All (User Behavior)

    0.632RN (Content + Links)

    GainP@1Method

  • 4242

    Fine grained behavior analysisFine grained behavior analysis

  • 4343

    12

    3

    4 5

    6

    78

    22

    1415

    18

    19 2021 Data captured with Tobii eye

    tracker, courtesy Andy Edmonds, http://www.alwaysbetesting.com/

  • 4444

    Preliminary results on using mouse Preliminary results on using mouse trajectories to infer user intenttrajectories to infer user intent

    Q. Guo and E. Agichtein, to appear in SIGIR 2008

  • 4545http://www.ir.mathcs.emory.edu/

    SummarySummary