View
3
Download
0
Category
Preview:
Citation preview
1
RecommendingExpertsandScien3ficAr3cles
ToineBogers
Researchtalk@RSLIS,Copenhagen
June12,2008
Outline
• Aboutme
• Expertsearch&recommendaGon
• RecommendingscienGficarGcles
Aboutme
• EducaGon– 1997‐2001Master’sdegreeinInformaGonManagement&Technology
– 2002‐2004Master’sdegreeinComputaGonalLinguisGcs&ArGficialIntelligence
• Employment– 2005‐nowPhDstudentintheAProposprojectaboutpro‐acGvedocument
recommendaGon
• Teaching– 2006‐nowvariousguestlecturesaboutsearchenginesandIR– 2007InformaGonSearch,Retrieval,andRecommendaGon
– 2008InformaGonSearch,Retrieval,andRecommendaGon
Aboutme
TilburgUniversity Outline
• Aboutme
• Expertsearch&recommendaGon– DefiniGon&history– Tasks&approaches– AlookatevaluaGon&testcollecGons– ExperGseseeking– Auniversity‐wideexpertsearchengine
• RecommendingscienGficarGcles
2
Whatisit?
• Basically:searchingforexpertsinsteadofdocuments
Historyofexpertsearch
• In80’sand90’s– Implementedaslarge‐scaledatabasescontainingemployeeskills
– Problems• Putstheworkloadonemployees
• ‘Unnatural’approach• Easilyout‐of‐date
• TREC2005EnterpriseTrackintroducedtheExpertSearchTask– Large‐scaleevaluaGoneffortofexpertfinding
• 2005&2006:W3CcollecGon• 2007&2008:CSIROcollecGon
– HugeboostinresearchintoautomaGcapproaches
– Usuallyco‐occurrenceofpeopleandtopicsisseenasevidenceofexperGse
Evidenceofexper3se
• Content‐basedevidence– Documents
– E‐mails– Homepages
• Evidencefromsocialnetworks– OrganizaGonalstructure– E‐mailnetworks– BibliographicinformaGon
• AcGvity‐basedevidence– ProjectGme
– Searchhistory– PublicaGonhistory
Tasksandapproaches
• Differenttasks– Expertfinding
• Findtheexpertsonaspecifictopic– Expertprofiling
• Findoutwhatoneexpertknowsabout differenttopics
– Recommendingsimilarexperts• Findexpertswhosharethesameprofiles
Evalua3on
• MajorityofworkisevaluatedusingTRECcollecGons– W3CcollecGon
• 5.7GBand331,037documents(Webpages,mailinglists,projectpages)
• Topicsaregroupnames• Relevancejudgments
– 2005:groupmembersareexperts
– 2006:TRECparGcipantsjudgeexperGsethemselves
– CSIROcollecGon• 4.2GBand370,715documents(similardiversityasW3C)
• WorktaskscreatedbyactualCSIROsciencecommunicators– Goalistocreateanoverviewpageonacertaintopic
• Relevancejudgmentsdonebysciencecommunicatorsin2007and2008
UvTExpertCollec3on
• ProblemswithTRECcollecGons– ExperGseisneverself‐assessed– OnlyonespecifictypeoforganizaGon– OnlyinEnglish
• WethereforecreatedtheUvTExpertCollecGon– Crawlofamedium‐sizedDutchuniversity
– BasedonWebwijs(“Webwise”),ouronlineexpertprofilingdatabase• 1168experts• 1400self‐assessedexperGsetopics• Bilingual(DutchandEnglish)
– DocumentsincludepublicaGons,coursepages,researchdescripGons,andhomepages
– InformaGonaboutorganizaGonalstructureandtopichierarchy
– SeeSIGIR‘07paperformoreinformaGon
3
Exper3seseeking
• AllexpertfindingworksofarhasbeenfromanIRperspecGve– WhatismissingisanISperspecGve:exper9seseeking
• Whatwedidtoremedythis– Focusedonthetaskofrecommendingsimilarexperts
• Scenariosketch:“Themediawishestocommunicatewiththetopexpert,butheisunavailableforawhile.Whowouldyourecommendtotaketheirplace?”
– Got6ofouruniversity’scommunicaGonadvisorstoparGcipateinourstudy
– Two‐foldpurposeofourquesGonnaire• InvesGgateexperGseseekingbehavior• GetrealisGcrelevancejudgmentsforthe‘similarexperts’‐task
– Hadtojudge10recommendedexpertsfor10familiar‘focus’experts
– SeeSIGIR‘08workshoppaperformoreinformaGon
Exper3seseeking
• InvesGgateexperGseseekingbehavior– Inspiredby2007IP&MpaperbyWoudstraandVandenHooff
• IdenGfied11importantfactorsforsourceselecGon(topicofknowledge,familiarity,reliability,availability,perspec9ve,up‐to‐dateness,approachability,cogni9veeffort,contacts,physicalproximity,saves9me)
– AskedparGcipantstodescribe• TypicalrequestsforexperGse• Reasonsforpickingandnotpickingspecificexperts• Howimportanteachfactorwasfortheirdecisions
• Somefindings– Topicofknowledgewasmostimportantinrecommendingsomeone
– Familiaritywiththeexpertwasalsoimportant– NewfactorsweidenGfied
• OrganizaGonalstructure(professorsandprojectleadersarepreferred)• Mediaexperience(“oneofthemisnotsuitablefortalkingtothemedia”)
Exper3seseeking
• GetrealisGcrelevancejudgments– Used44uniquefocusexpertsdividedoverthe6PRadvisors(10each)– First,parGcipantswereaskedfortheirownsuggesGons– Generated10recommendedexpertsforeachusingsystempooling
– ParGcipantsthenrankedthesesuggestedexpertsona10‐pointscale• Integratedthefactorsintoexpertfindingmodels
– EvaluatedusingMRRandNDCG@10
• Somefindings– Bestbaselineapproachcombinedtermsfromdocumentswiththe
self‐assessedexperGseareas
– Integratedthefollowingfactorsintoretrievalmodels:organiza9onalstructure,mediaexperience,reliability,up‐to‐dateness,qualityofcontacts
– Significantimprovementsusingreliability,up‐to‐dateness,andorganiza9onalstructure
Auniversity‐wideexpertsearchengine
• WorkinprogressbyMaster’sstudentRuudLiebregts– DesigningandevaluaGngauniversity‐wideexpertsearchengine
• Design– DatasourcesincludepublicaGons,theses,coursedescripGons,research
descripGons,self‐assessedexperGseareas
– Allowsforfilteringonlanguageandfaculty– ShowscollaboraGonnetworksforpapersandthesissupervision
• EvaluaGon– System‐basedevaluaGon
• 240testtopics– 120Dutchand120English– 120basedonthesissupervisorsand120basedonpaperauthors
• Goldstandardjudgmentsfromuser‐basedevaluaGon(seenextslide)
Auniversity‐wideexpertsearchengine
• EvaluaGon(cont’d)– User‐basedevaluaGon
• 30employeesaskedto– DescribeoneoftheirexperGseareas– Listandrank5possibleexperts– Formulateaquerybasedonthetopic
– Judgethetop10searchengineresults• ±30UvTstudentswillberandomlyassigned
– 3outof5possibleexpertfindingworktasks– 3outof5possiblesupervisorfindingworktasks– Comparingthenewsearchenginevs.everythingelsetheuniversityhastooffer
– Usingthebaseline3Gmesandthenewsearchengine3Gmes
• Possibly– ±30peopleexternaltoTilburgUniversity– Twoclassesof±50Dutchhighschoolseniors
Auniversity‐wideexpertsearchengine
4
Auniversity‐wideexpertsearchengine Outline
• Aboutme
• Expertsearch&recommendaGon
• RecommendingscienGficarGcles– Whatisit?
– Approaches– Socialbookmarking
– RecommendingusingCiteULike
– Atthelibraryschool– Futurework
Whatisit?
• FormaldefiniGon– ArecommendersystemtriestoidenGfysetsofitemsthatarelikelytobeof
interesttoacertainusergivensomeinformaGonfromthatuser’sprofile.
• MorecasualdefiniGon
FEEL VIOLATED IN THEIR PRIVACY “Customerswhoboughtthisproduct
onenboughtabagofpotatochipswithit.”
BEER
Approachestorecommenda3on
• Somepopularapproaches– Mostpopularitem(non‐personalizedrecommendaGon)
– DemographicrecommendaGon(usesuserfeatures)– Content‐basedrecommendaGon(usingIRmodels)
• Useitemfeaturestofinditemssimilartopastitems
• Goodcontentmatchbetweenitems,butnoqualitycontrol
– Collabora9vefiltering(miningusagepaoerns)• Usesuser‐itempreferences(e.g.explicitraGngsdata,purchasedata)
• Goodforareaswherecontentanalysisishard(e.g.movies,music)
• Twotypes– User‐basedfiltering– Item‐basedfiltering
• User‐basedfiltering(findsimilarusers)
• Item‐basedfiltering(findsimilaritems)
Approachestorecommenda3on
5 8 4 10 7 8
10 1 6
2 3 10 4 9 9
7 3 9 10
1 5
2 9 5 10
U1
users
items
I1 I2 I3 I4 I5 I6 I7 I8 I9 I10
U2
U3
U4
U5
Ux ?
Socialbookmarking
• Wayofstoring,organizing,andmanagingbookmarksofWebpages,scienGficarGcles,books,etc.– Userscanaddbookmarks– Canbemadepublicorkeptprivate
– Onenallowuserstotag/describetheiritem
– Lotsofsocialbookmarkingservicesavailable
5
Researchsofar
• TherearegoldenopportuniGeshere!– Tonsoffree,usefuldata
• Largeamountsofcontentdescribedusingtagsandothermetadata
• UsersrevealinformaGonaboutthemselvesbyaddingandtaggingitems• Treasuretroveofuser‐itempreferences
– Canbeusedtopredictnewitems
• However,researchsGllinitsinfancy– MostlyexploratoryandtheoreGcal
– SomescaoeredaoemptsatimprovingIRusingtags– RecommendaGonforsocialbookmarking
• MostlytagrecommendaGon(easytoevaluate)
• Andofcoursethere’sStumbleUpon
Mainfocus
• Mymainfocus– RecommendinginteresGngbookmarksbasedonuserprofilesfromsocial
bookmarkingwebsites
– Experimentwithdifferent• Algorithms
• ContextualrepresentaGons• Aspects(temporal,growthcurves,spam,duplicates)
• CombinaGonsofapproaches(datafusion)
– EvaluaGon• System‐basedevaluaGon
• User‐basedevaluaGon– Preferablyfortwodifferentareas
• ScienGficarGcles(CiteULike,Bibsonomy)• Webpages(Delicious)
CiteULike
• SocialbookmarkingforscienGficpapers– Accordingtotheirwebsite,itis“afreeservicetohelpyoutostore,
organise,andsharethescholarlypapersyouarereading”
– Somefeatures• ArGclemetadata
• Tagging• Groups• Comments
• ReadingprioriGes• BatchimporGng
CiteULike
• CreaGngacollecGon– Dailydatabasedumpsavailable
• Containuser‐item‐tagtripleswithGmestamps
• ButnoneoftheaddiGonalinformaGonavailableonthewebsite
– UsedtheNovember2,2007dumpasastarGngpoint
– Crawledtherestofthewebsite• ArGcleandusermetadata
• GroupinformaGon
• ReadingprioriGes
• SomestaGsGcs– 803,521items(metadataavailablefor67%)
– 25,375users(29%spamprofiles)
– 232,937tags
Experimentalsetup&evalua3on
• System‐basedevaluaGon– Weknowwhatpapersauserlikedfromhisprofile
• Howwellcanwepredictwhatwealreadyknow?• Userprofileswehaveareuser‐itempairs
– Formalsetup• Takeout10itemsfromeachuserprofile
• Trainonremainingprofile,predictmissingitems
• Userswith≥20itemsandarGclesaddedatleasttwice
• 10‐foldcross‐validaGontopreventoverfitng
– EvaluaGon• Ifwerecommendthemissingitems,that’sgood!
• MAP,MRR,Precision@10,usercoverage
– Wecanusethissamesetupforallexperiments
Atthelibraryschool
• FirstexperimentsusingcollaboraGvefiltering– BestmodelhasaMAPof0.2478andsimilarP@10
– User‐basedfilteringperformedbest• OpGmalnumberofneighborswas5
– Usercoverageishighat99.6%• Forhowmanyuserscanwepredictsomething?
• SomeuserstooneworeclecGc
– Difficulttaskbecauseofhighsparsity(99.98%)• MAPof1.0notnecessarilyachievable(orrealisGc)
– Performanceokay,butroomforimprovement
6
Atthelibraryschool
• WhatcontextdowehaveinCiteULike?
(3) session context
signs
(1) intra-object
structures
(2) inter-object structures
(7) historic contexts
(6) techno-economic and societal contexts
(4) individual
(4-5) social, systemic,
conceptual, emotional contexts
(5) collective
IngwersenandJärvelin(2006)
Atthelibraryschool
• WhatcontextdowehaveinCiteULike?
(1) Intra‐objectstructuresProperGesofthedocumentsthemselves,such
asarGclemetadataandtheabstract(available33%oftheGme)
(2) Inter‐objectstructuresRelaGonsbetweendocuments,suchasthose
availablethroughauthorshipinformaGon,assignedtags,and
inclusionbyusers.
(4‐5)Social,systemic,conceptual,andemo3onalcontextsThefolksonomy
canrepresentsocial,conceptual,andemoGalcontext.The
informaGonaboutthegroupsandtheusagepaoernsareallsocial contextfortherecommender.
(7) HistoricalcontextsAcGvitylogsallowfor,forinstance,temporalanalysis.
Atthelibraryschool
• Birthofarecommender– WhenisasocialbookmarkingwebsitebigenoughforrecommendaGons?
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
2008200720062005
MA
P
date
model 1 (item-based w/ cosine)model 2 (item-based w/ cond. prob.)
model 3 (user-based w/ cosine)
Atthelibraryschool
• Node‐duplicaGonbyCiteULikeuponentry• Manyduplicates
– EarlyesGmatesofaround10%(onmanuallyannotatedtestset)• MismatchesonGtle,year,authors,etc.
– With20%ofthosearGcleshavingover20duplicates
– Mostduplicatesoccuronlyonce• But:someduplicatesareverypopular(40occurrencesvs.31)
• Whateffectdoesthishaveonperformance?– Canwerejoindisconnectedbutsimilarusers?
Collective dynamics of "small-world" networks Collective dynamics of 'small world' networksCollective dynamics of 'small-world' networks Collective dynamics of 'small-world' networksCollective dynamics of 'small-world' networks Collective dynamics of 'small-world' networksCollective dynamics of 'small-world' networks Collective dynamics of 'small-world' networks.Collective dynamics of 'small-world' networks. Collective dynamics of 'small-world' networks.Collective dynamics of 'small-world' networks. Collective dynamics of ``small-world'' networksCollective dynamics of ``small-world'' networks Collective dynamics of `small-world' networksCollective dynamics of `small-world' networks Collective dynamics of `small-world' networksCollective dynamics of `small-world' networks Collective dynamics of `small-world' networks.Collective dynamics of small-world networks Collective dynamics of small-world networksCollective dynamics of small-world networks Collective dynamics of small-world networksCollective dynamics of small-world networks. Collective dynamics of small-world networks.Collective dynamics ofsmall-worldnetworks. Collective dynamics ofsmall-worldnetworks.
Futurework
• Experimentwithdifferent– Algorithms
– ContextualrepresentaGons– Aspects(temporal,growthcurves,spam,duplicates)
– CombinaGonsofapproaches(datafusion)
– Datasets• User‐basedevaluaGon
– PickoneortwotasksinpaperrecommendaGondefinedbyMcNee(2006)• Maintainawareness
• Exploreresearchinterest• Findmorelikethis
– Evaluatealgorithmsusingusers• Basedontheiractualprofile• BysimulaGngoneoftheserecommendertasks
Ques3ons?Comments?Sugges3ons?
Recommended