How to eliminate ideas as soon as possible

  • View
    229

  • Download
    0

  • Category

    Science

Preview:

Citation preview

HypothesisTesting:HowtoEliminateIdeasasSoonasPossible

RomanZykovRetailRocket

Boston,RecSys2016

Context• Intro

• OfflinevsOnlinetesting

• Makeofflinetestingshorter

• Artificialdiversitymetric

• Onlinetests

RetailRocket• Personalisedreal-timerecommendations

• E-commerceonly

• Multiplechannels(site,email,…)

• Foundedin2012

• Offices:Amsterdam,Barcelona,Milan,Moscow

• 1000+retailpartners

• 100+milliondailyevents

Whytestingisimportant?

• Highlycompetitivemarket• It’snothardtocreateownrecommendation• Constantchangesintheproductandalgorithms• Fastandreliabledecisions

OfflinevsOnlinetesting

Offlinetestingforecastsonlinetestingresults• Relativelyfast,testingofminorchangesrequireshours• Fewresources:data,computationalresources,code,1dev• Hardtoforecastonlinemetricsinsomecases• Influenceofanalgorithmonusers'behaviourisignored• Badvaluesofofflinemetricspreventonlineimplementation

Onlinetest-finaldecisionpoint• Requiresmuchtime.Atleasttwocyclesofdecisionmaking• Requiresmanyresources:design,onsiteproduction,etc

Testingfacts

• Nineoutoftenideasdonotimproveanything• Mostideashaveminorimpact:

o addnewdata:extractedfromtext,images,etco adjustparametersofalgorithm

Offlinetesting

OfflinepredictsOnline

Majorchangesornewalgorithm• Alwayscheckbyonlineexperiment• Findappropriateofflinemetricafter• Trydifferentdefinitionsofusers’sessions• Trydifferenteventssequences

Minorchanges•Useofflinetestsifyouhaveprovedofflinemetric

MakeofflinetestingshorterRetailRocket

Whatwedid• FunctionalprogrammingonScala/Spark.Fourlanguages(Python,Java,Pig,Hive)hadbeenpreviouslyused.

• ResearchinScala/SparkNotebookswithaddedRintegrationforgraphics

• Offlineevaluationframeworkforallofourtaskswithmetricscalculations.ThemostcomplicatedprojectamongothersinRetailRocket

Whatwegot• Ittakeshourstoproveordisapproveanysimpleideawhereaspreviouslyitcouldhavetakendays

• Researchislimitedbythepowerofourclusterandthenumberofdatascientists

Scala/SparknotebookwithR

Offlineframework

• ScalaonSpark• Dealswithexistingweblogs• Implicitfeedback• Majormetrics:

o Recall,Diversity,RecallwithNN,EmptyRecs• Minormetrics:

o Serendipity,Novelty,Coverage• Differenttypesofeventssequences• Differentdefinitionsofusers’sessions• Personalised/Non-personalisedrecommendations• AdjustableTOPofviewablerecommendations• Testpanelofsitesfromdifferentdomains

Offlineeventssequences

view1view2view3cart1 cart2view4view5 view6purchase1

View2View View2Cart View2Purchase Cart2Purchase Cart2Cart

view1->view2

view2->view3

view3->view4

view4->view5

view5->view6

view1->cart1

view2->cart1

view3->cart1

view4->cart1

view5->cart2

view6->cart2

view1->purchase1

view2->purchase1

view3->purchase1

view4->purchase1

view5->purchase1

view6->purchase1

cart1->purchase1

cart2->purchase1

cart1->cart2

*Events:productview,addtocart,purchase,mainpageview,search,catalogpage,…

Offlinemetricexamples

view1view2view3cart1 cart2view4view5 view6purchase1

WhatCustomersBuyAfterViewingThisItem• View2Cart• View2Purchase• …

CustomersWhoBoughtThisItemAlsoBought• Cart2Cart• Cart2Purchase• View2Cart• …

Case:Artificialdiversification

ArtificialdiversificationOriginal

After

Problem:It’snotimpossibletouseRecallforevaluating

RecallwithNearestNeighbours(NN)

Top4recs

0.8 0.7 0.5 0.5

0.8 0.7 0.5 0.5

0.6 0.5 0.4

0.9 0.8 0.3 0.5

Contentbasedsimilarity(Nearestneighbours)

Realitem

0.5

Indirecthit

1.0

Directhit

Nohit

0.0

Metric=Averageoverallsessions

OnlineA/Btesting

AA/BBtests

Agroup

Agroup

Bgroup

Bgroup

Controlgroup

Testgroup

AA/BBtests

A

A

B

B

A

A

B

B

IdealDirty

Bayesianapproach• Conversionrates

o Betadistributionwithnormalpriors• AverageOrderValues

o Normaldistribution(afterlog)withnormalpriors• Priorsfromhistoricaldatabeforeexperiment

Anythingmaybedonewithposteriors.

E.g.:Thereisa95%chancethatAhasan1%liftoverB

Conclusion• Offlinetestingcanpredictonlineresults

• OneprogramminglanguageforR&Dreducesthetesttime

• TheScalalanguageisagoodalternativeforMLtasks

• Differenteventsequencesforofflinemetrics

• RecallwithNearestNeighbours(NN)metric

Thankyou!

RomanZykovRetailRocketrzykov@retailrocket.net

https://github.com/RetailRocket/SparkMultiTool

Recommended