11
Pawan Rewari CS224W 1 Co-selling recommendations from an eCommerce dataset. Pawan Rewari – SUID 01421653 – [email protected] CS224W – Project Milestone Update 1.0 Introduction/Motivation/Problem Definition A very simple customer use-case of eCommerce is as follows. Lets say Jane wants to buy a specific product (a Kielty 2 person backpacking tent) or one of a category of items (a tent). Typically she will go to the web or an eCommerce site and do a search. Then she may research the product and after several visits either the same day or over several days or weeks eventually converge to a purchasing decision. Each such interaction is an opportunity for the seller to showcase the specific product she is looking for and show other products that she is likely to be interested in. From a retailer perspective they want to keep the products showcased to the customer relevant to their purchase interest, and they want to provide an interesting variety of products to showcase. The overall problem we set out to research is how can one provide a new and novel approach to selling more products at an eCommerce site. The real world application is a data science team, as consultants, who are approached to provide strategies to sell more products and given a slice of historical data – and asked to work with what they are given. While this project is focused on an eCommerce dataset from Amazon (co-selling products) we would like to extrapolate the findings to inform any eCommerce retailer on the value of such network and data analysis. Specifically, if we are given a partial dataset to work with, as is the case with our dataset and no customer information beyond this aggregated data. We have been provided with a dataset that includes: - 548552 products (including 5868 discontinued products) - for each product relevant information includes: o Id: nodes or vertices numbered from 0 to 548551 o ASIN: Amazon Standard Identification Number for a product o Group: Book, Video, DVD, Music (the primary groups) o SalesRank: computed by Amazon as a function of historical sales, based on number of products sold; a lower rank signifies a higher selling product o Similar: products that have co-sold (Similar == Co-Selling) o Categories: the categories this product belongs to, can be 10 deep The initial exploratory analysis of the dataset revealed that there were at most 5 co-selling recommendations for any product (the “Similar” products) and this was only for 60% of the products in the dataset, for the rest none or fewer recommendations (see Figure 1 next page). Now as stated in the introductory use case, if you look at a typical customer’s behavior, they are likely to visit an eCommerce website several times before they consummate a purchase. Each such visit is an opportunity to sell the product and recommend similar ones and you want to have a variety of products to recommend, one to five is not enough we need more, many more, Therein lies the core problem statement that we address in this paper. How do you to provide an added set of recommendations that are useful, relevant and on average take the retailer to a better place than they started out with.

Co-selling recommendations from an eCommerce dataset.snap.stanford.edu/class/cs224w-2015/projects_2015/Co-selling... · Co-selling recommendations from an eCommerce dataset. ... of

  • Upload
    buitu

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Co-selling recommendations from an eCommerce dataset.snap.stanford.edu/class/cs224w-2015/projects_2015/Co-selling... · Co-selling recommendations from an eCommerce dataset. ... of

PawanRewariCS224W

1

Co-sellingrecommendationsfromaneCommercedataset.PawanRewari–SUID01421653–[email protected]

CS224W–ProjectMilestoneUpdate1.0Introduction/Motivation/ProblemDefinitionAverysimplecustomeruse-caseofeCommerceisasfollows.LetssayJanewantstobuyaspecificproduct(aKielty2personbackpackingtent)oroneofacategoryofitems(atent).TypicallyshewillgototheweboraneCommercesiteanddoasearch.Thenshemayresearchtheproductandafterseveralvisitseitherthesamedayoroverseveraldaysorweekseventuallyconvergetoapurchasingdecision.Eachsuchinteractionisanopportunityforthesellertoshowcasethespecificproductsheislookingforandshowotherproductsthatsheislikelytobeinterestedin.Fromaretailerperspectivetheywanttokeeptheproductsshowcasedtothecustomerrelevanttotheirpurchaseinterest,andtheywanttoprovideaninterestingvarietyofproductstoshowcase.TheoverallproblemwesetouttoresearchishowcanoneprovideanewandnovelapproachtosellingmoreproductsataneCommercesite.Therealworldapplicationisadatascienceteam,asconsultants,whoareapproachedtoprovidestrategiestosellmoreproductsandgivenasliceofhistoricaldata–andaskedtoworkwithwhattheyaregiven.WhilethisprojectisfocusedonaneCommercedatasetfromAmazon(co-sellingproducts)wewouldliketoextrapolatethefindingstoinformanyeCommerceretaileronthevalueofsuchnetworkanddataanalysis.Specifically,ifwearegivenapartialdatasettoworkwith,asisthecasewithourdatasetandnocustomerinformationbeyondthisaggregateddata.Wehavebeenprovidedwithadatasetthatincludes:

- 548552products(including5868discontinuedproducts)- foreachproductrelevantinformationincludes:

o Id:nodesorverticesnumberedfrom0to548551o ASIN:AmazonStandardIdentificationNumberforaproducto Group:Book,Video,DVD,Music(theprimarygroups)o SalesRank:computedbyAmazonasafunctionofhistoricalsales,basedonnumber

ofproductssold;alowerranksignifiesahighersellingproducto Similar:productsthathaveco-sold(Similar==Co-Selling)o Categories:thecategoriesthisproductbelongsto,canbe10deep

Theinitialexploratoryanalysisofthedatasetrevealedthattherewereatmost5co-sellingrecommendationsforanyproduct(the“Similar”products)andthiswasonlyfor60%oftheproductsinthedataset,fortherestnoneorfewerrecommendations(seeFigure1nextpage).Nowasstatedintheintroductoryusecase,ifyoulookatatypicalcustomer’sbehavior,theyarelikelytovisitaneCommercewebsiteseveraltimesbeforetheyconsummateapurchase.Eachsuchvisitisanopportunitytoselltheproductandrecommendsimilaronesandyouwanttohaveavarietyofproductstorecommend,onetofiveisnotenoughweneedmore,manymore,Thereinliesthecoreproblemstatementthatweaddressinthispaper.Howdoyoutoprovideanaddedsetofrecommendationsthatareuseful,relevantandonaveragetaketheretailertoabetterplacethantheystartedoutwith.

Page 2: Co-selling recommendations from an eCommerce dataset.snap.stanford.edu/class/cs224w-2015/projects_2015/Co-selling... · Co-selling recommendations from an eCommerce dataset. ... of

PawanRewariCS224W

2

Soifwelookatthebasicdistributionofrecommended/similarproductsprovidedbytheinitialdatasetwehave:Discontinued 5868 1%0Similar 163591 30%1Similar 11071 2%2Similar 10808 2%3Similar 10275 2%4Similar 9482 2%5Similar 337457 62%Total 548552 100%Sotheproblemstatementbecameveryclearwiththreeobjectives:

1. providearecommendationforalargernumberof“Similar”productssoyoucanprovidefreshdataonmultiplevisitsbytheshopper(metric–morerecommendedproducts)

2. increasethe“relevance”oftheaddedsimilarproductswithstronger“categoricalsimilarity”–explainedindetailinSection3,brieflyitisthedepthofsimilarcategoriesfor2products(metric–raisetheavgcategoricalsimilarityofrecommendedproducts)

3. improvetheaverageSalesRankoftherecommendedset(metric–lowertheavgSalesRankofrecommendedproducts)

This translates to 3 measures that we call the 3 CoSellingMetrics. We create a baseline with the initial dataset and try toimproveuponeachofthesemeasures.Intherealworldwewouldrunactualexperimentstoseeiftheserecommendationsactuallyimprovesalesandtheniterativelyimprovetherecommendations,inourcasesincewecan’tdorealtrialswearemeasuringperformancerelativetothebaselineestablishedbythe3CoSellingMetrics.Goingbacktothesimpleuse-case,wewouldliketorecommend5productsforCoSaleateveryvisitandthenrecommendotherproductsonnewvisits,butwewanttodothisfromalimitedsetofSimilarproductsthatwedrawfrom,sowhiletheuserseesdifferentproductsrecommended,theyalsoseethesameonesrecyclebackonsubsequentvisits–onestheymaybeinterestedin.Sowesetareasonabletargetfortherecommendedproductstobe~15,thisisrelativelyarbitrarybutmakessensefortheusecasewehavehere.

Page 3: Co-selling recommendations from an eCommerce dataset.snap.stanford.edu/class/cs224w-2015/projects_2015/Co-selling... · Co-selling recommendations from an eCommerce dataset. ... of

PawanRewariCS224W

3

2.0RelatedWorkAsstatedintheReferenceswedidaliteraturereviewofseveralpapersonCollaborativeFiltering,ClusteringMethodsandItem-to-ItemCollaborativeFiltering.Hereisasummary.AtraditionalcollaborativefilteringalgorithmrepresentsacustomerasanN-dimensionalvectorofitems,whereNisthenumberofdistinctcatalogitems.Thecomponentsofthevectorarepositiveforpurchasedorpositivelyrateditemsandnegativefornegativelyrateditems.Tocompensateforbest-sellingitems,thealgorithmtypicallymultipliesthevectorcomponentsbytheinversefrequency(theinverseofthenumberofcustomerswhohavepurchasedorratedtheitem),makinglesswell-knownitemsmorerelevant.Formostcustomers,thisvectorisextremelysparse.Thealgorithmgeneratesrecommendationsbasedonafewcustomerswhoaremostsimilartotheuser.Itcanmeasurethesimilarityoftwocustomers,AandB,invariousways;acommonmethodistomeasurethecosineoftheanglebetweenthetwovectors.Thealgorithmcanselectrecommendationsfromthesimilarcustomers’itemsusingvariousmethodsaswell,acommontechniqueistorankeachitemaccordingtohowmanysimilarcustomerspurchasedit.Inclustermodelstofindcustomerswhoaresimilartotheuser,dividethecustomerbaseintomanysegmentsandtreatthetaskasaclassificationproblem.Thealgorithm’sgoalistoassigntheusertothesegmentcontainingthemostsimilarcustomers.Itthenusesthepurchasesandratingsofthecustomersinthesegmenttogeneraterecommendations.Becauseoptimalclusteringoverlargedatasetsisimpractical,mostapplicationsusevariousformsofgreedyclustergeneration.Thesealgorithmstypicallystartwithaninitialsetofsegments,whichoftencontainonerandomlyselectedcustomereach.Theythenrepeatedlymatchcustomerstotheexistingsegments,usuallywithsomeprovisionforcreatingnewormergingexistingsegments.Forverylargedatasets—especiallythosewithhighdimensionality—samplingordimensionalityreductionisalsonecessary.Clustermodelshavebetteronlinescalabilityandperformancethancollaborativefilteringbecausetheycomparetheusertoacontrollednumberofsegmentsratherthantheentirecustomerbase.However,recommendationqualityislow.Ratherthanmatchingtheusertosimilarcustomers,Amazon’sitem-to-itemcollaborativefilteringmatcheseachoftheuser’spurchasedandrateditemstosimilaritems,thencombinesthosesimilaritemsintoarecommendationlist.Todeterminethemost-similarmatchforagivenitem,thealgorithmbuildsasimilar-itemstablebyfindingitemsthatcustomerstendtopurchasetogether.Theybuildaproduct-to-productmatrixbyiteratingthroughallitempairsandcomputingasimilaritymetricforeachpairbasedonthecosineofthetwovectors.Givenasimilar-itemstable,thealgorithmfindsitemssimilartoeachoftheuser’spurchasesandratings,aggregatesthoseitems,andthenrecommendsthemostpopularorcorrelateditems.Thiscomputationdependsonlyonthenumberofitemstheuserpurchasedorrated.AsstatedinSection1,withourdatasetwehaveinformationonproducts,co-sellingrelations&categories,butnotoncustomers.Thismayverywellbethecaseinarealworldsituationwhereyouareconsultingforacustomerwhoprovidesyouwithalimiteddataset.Sincewearenotprovidedwithcustomerpurchasedata,onlyaggregatedata,wecouldlearnfrombutnotdirectlyapplytheaboveideas.WecameupwithanapproachthatallowedustomakegoodrecommendationsbyimprovingonthethreeCoSellingMetricsstatedearlier.

Page 4: Co-selling recommendations from an eCommerce dataset.snap.stanford.edu/class/cs224w-2015/projects_2015/Co-selling... · Co-selling recommendations from an eCommerce dataset. ... of

PawanRewariCS224W

4

3.1ApproachWestartwithaninitialsparsesetofrecommendations(representedinG1),atreeofcategoriesandsub-categories(G2)andusethesetocomeupwithadensersetofrecommendations(G3)asshown.The3Graphs(Figure1)relevanttoourapproach&analysisare:

• G1:G(V,E),eachVertexisaproductandeachEdgerepresentsaco-sellingrelationship.Thisisanundirectedgraph(ifIsellwithyou,yousellwithme).EachVertexhasanASIN,Group,SalesRank.EdgestoallSimilar/co-sellingproducts.

• G2:(V,E)isthecategorytree.TheVerticesareallthecategories&sub-categories.Therootnodeisproduct,nextlevelisBooks,Music,Video,DVD,theirsub-categoriesatthenextlevelandsoon.TheEdgesindicatesacategorytosub-categoryrelationship.Eachproductthenbelongstomultiplecategorypathsasillustratedonthenextpage.

• G3:G(V,E)isbuiltontopofG1andusestheinformationfromG2,itcontainsasignificantnumberofaddededgesrepresentingnewlyrecommendedco-sellingrelationshipsbyourrecommendationenginedescribedin3.2.

MetricsforinitialGraphG1:G1:Nodes542684,Edges987903G1:AverageDegree:1.82G1:MaxWcc:Nodes334852,Edges925823G1:Diameter:43G1:AverageClusteringCoefficient:0.274496Figure1:G1,G2&G3

P

P

P P

P

P P

P

P P

P

P

C

C C

C C C C

Page 5: Co-selling recommendations from an eCommerce dataset.snap.stanford.edu/class/cs224w-2015/projects_2015/Co-selling... · Co-selling recommendations from an eCommerce dataset. ... of

PawanRewariCS224W

5

Baselineandwhatwesetouttomeasure.ThethreeCoSellingMetricsarebasedon:1:AvgRecommendedProducts–thenumberofSimilarproducts(avgDegree) Thisissimplythe#edges(degree)ofeachnode,averagedacrossallnodes2:AvgCategoricalSimilarity–theCategoricalSimilarityaverageoftherecommendedproducts. Thisisexplainedbelow3:AvgSalesRank–averageSalesRankofthesimilarproductsthatarerecommended Eachproduct(node)hasaSalesRank,soweaveragetheSalesRankofallitneighborsTheobjectiveistoincrease1(Co-Selling)&2(relevance)anddecrease3(lowSalesRankisbetter).Figure2:PartialtreeofG2:_________CategorytoSub-CategoryEdges_________Product1belongsto3categorypaths_________Product2belongsto3categorypathsProduct1&Product2MaxPathintersectionatLevel4CategoricalSimilarity=4[Note–eachcategorypathdoesnotneedtoreachaleafofthetree]

C

C C

C C C C

C C

C C C C

C

C C

C C

C C C

C

C C

C

Level0

Level1

Level2

Level3

Level4

Level5

Level6

Page 6: Co-selling recommendations from an eCommerce dataset.snap.stanford.edu/class/cs224w-2015/projects_2015/Co-selling... · Co-selling recommendations from an eCommerce dataset. ... of

PawanRewariCS224W

6

AverageCategoricalSimilarity:Astherealexamplebelowshowseachproduct(ASIN:0486220125)belongstoanumberofcategorypaths,eachproductitissimilarto(ASIN:0486401960&ASIN:0486229076&ASIN:0714840343)alsobelongstoanumberofcategorypaths.WelookforthehighestCategoryLevelthatiscommonbetweenthe“Subject”andtheproductsthatis“SimilarTo”…wethenaveragethisacrossall“Similarproducts”.

ASIN:0486220125 title:HowtheOtherHalfLives:StudiesAmongtheTenementsofNewYork

Similarto:=Co-PurchasingRelationship

ASIN:0486401960

ASIN:0486229076 ASIN:0714840343

Categorical

CategoryPathstheproduct(ASIN)islistedin

Similarity

CategoryLevels 1 2 3 4 5 6 7

Books[283155] Subjects[1000]Arts&Photography[1] Photography[2020] PhotoEssays[2082]

Subject Books[283155] Subjects[1000] History[9] Americas[4808] UnitedStates[4853] General[4870]

ASIN:0486220125 Books[283155] Subjects[1000] History[9] Jewish[4992] General[4993]

Books[283155] Subjects[1000] Nonfiction[53] SocialSciences[11232] Sociology[11288] Urban[11296]

[172282] Categories[493964]Camera&Photo[502394]

PhotographyBooks[733540] PhotoEssays[733676]

SimilartoSubject 6 Books[283155] Subjects[1000] History[9] Americas[4808] UnitedStates[4853] General[4870]

ASIN:0486401960 3 Books[283155] Subjects[1000] Nonfiction[53] CurrentEvents[10546] Poverty[10576] General[10578]

6 Books[283155] Subjects[1000] Nonfiction[53] SocialSciences[11232] Sociology[11288] Urban[11296]

4 Books[283155] Subjects[1000]

Arts&Photography[1] Photography[2020]| History[2032]

SimilartoSubject 4 Books[283155] Subjects[1000]Arts&Photography[1] Photography[2020] Travel[2087]

UnitedStates[2099] General[2100]

ASIN:0486229076 4 Books[283155] Subjects[1000]Arts&Photography[1] Photography[2020] Travel[2087]

UnitedStates[2099]

MidAtlantic[2101]

5 Books[283155] Subjects[1000] History[9] Americas[4808] UnitedStates[4853] State&Local[4872]

4 [172282] Categories[493964]Camera&Photo[502394]

PhotographyBooks[733540] Travel[733684]

UnitedStates[733708] General[733710]

4 [172282] Categories[493964]Camera&Photo[502394]

PhotographyBooks[733540] Travel[733684]

UnitedStates[733708]

MidAtlantic[733712]

SimilartoSubject 4 Books[283155] Subjects[1000]

Arts&Photography[1] Photography[2020]

Photographers,A-Z[2033]| General[2050]

ASIN:0714840343 5 Books[283155] Subjects[1000]Arts&Photography[1] Photography[2020] PhotoEssays[2082]

4 [172282] Categories[493964]Camera&Photo[502394]

PhotographyBooks[733540]

Photographers,A-Z[733566] General[733568]

5 [172282] Categories[493964]Camera&Photo[502394]

PhotographyBooks[733540] PhotoEssays[733676]

Toillustratethisindetail.TheSubject(ASIN:…0125)islistedin5categoryPaths.Ithasaco-sellingrelationship(Similarto)withASIN:…1960whichislistedin3CategoryPaths,withASIN:…9076whichislistedin6CategoryPaths,withASIN:…0343whichislistedin4CategoryPaths.IfIexaminethefirstCategoryPathofeachofthese,theyintersectwiththeCategoryPathsoftheSubjectatLevels6,4and4respectively.Wecontinuethisprocessandcomputeallthepathintersections(inYellow)andtaketheMaxCategoricalSimilarityineachcase(inGreen).AverageCategoricalSimilarity[fornodewithASIN:…0125]=Averageof(max)SimilaritywithASINs:196090760343=(6+5+5)/3=5.33

Page 7: Co-selling recommendations from an eCommerce dataset.snap.stanford.edu/class/cs224w-2015/projects_2015/Co-selling... · Co-selling recommendations from an eCommerce dataset. ... of

PawanRewariCS224W

7

BASELINEMetricsforG1:

OriginalDataset

WholeGraph AvgRecommendedProducts 1.82 AvgCategoricalSimilarity 3.25 AvgSalesRank 81729

Subset-AllProductswith1ormoreco-sellingproducts AvgRecommendedProducts 2.69 AvgCategoricalSimilarity 4.81 AvgSalesRank 67662Books

AvgRecommendedProducts 2.75 AvgCategoricalSimilarity 4.85 AvgSalesRank 84674Music

AvgRecommendedProducts 2.43 AvgCategoricalSimilarity 4.92 AvgSalesRank 26739Video

AvgRecommendedProducts 1.95 AvgCategoricalSimilarity 3.56 AvgSalesRank 7196DVD

AvgRecommendedProducts 3.43 AvgCategoricalSimilarity 5.03 AvgSalesRank 8430

Page 8: Co-selling recommendations from an eCommerce dataset.snap.stanford.edu/class/cs224w-2015/projects_2015/Co-selling... · Co-selling recommendations from an eCommerce dataset. ... of

PawanRewariCS224W

8

3.2Model/Algorithm/Method- formaldescriptionofanyalgorithmsused

Algorithmfollowed:

CreateG1(V,E):ReadtheinitialdatasetandeachVertexisaproductandeachEdgerepresentsaco-sellingrelationshipforeachVertex–ASIN,Group,SalesRank;EdgestoSimilarco-sellingproductsCreateG2(V,E):acategorytreewithsub-categoriesateachsub-level.Mapeachproducttothecategorypathsitbelongsto(typically3-7paths).Thepathsareprovidedbythedataset.Compute:AvgRecommendedProductsisthe#edges(degree)ofeachnode,averagedacrossallnodesAvgSalesRankistheaverageoftheSalesRankofallofitsneighbors.CategoricalSimilarityforany2productsisthehighestCategoryLeveltheyhaveincommonTheAvgCategoricalSimilarityisaverageCategoricalSimilarityacrossalltheproductsthatareinaco-sellingrelationshipwithaspecificproduct.CreateG3(V,E):StartwithG1.Foreachproduct–developalistofadditional“similar”productsbylookingatoneswiththehighestCategoricalSimilaritywiththeproduct,andamongsttheoneswiththesameCategoricalSimilarityhaveapreferenceforproductswithalowerSalesRank,stopafter~15orso(a“reasonable”targetweestablishedinSection1).AddtheseedgestoG3.Toidentifytheadditionalproductstocreateedgesto:(a) Forproducts(nodes)whichhaveoneormore“similar”productswefollowthepathofthe

neighborsandtheirneighbors…andevaluatetheonestoaddbasedonhigherCategoricalSimilarityandlowerSalesRank.

(b) Forallproductsinaddition,andforoneswithZERO“similar”productsthisistheonlywaytoaddmorerecommendations,lookforproductsthatareinthesamecategoriesandaddbasedonhigherCategoricalSimilarityandlowerSalesRank.

Re-ComputeAvgRecommendedProducts,AvgCategoricalSimilarity&AvgSalesRankandcomparewithinitialresults.Ideallywenowshouldhaveamuchdensergraphofco-sellingrelationshipswithmorerelevanceandlowerSalesRank.

Page 9: Co-selling recommendations from an eCommerce dataset.snap.stanford.edu/class/cs224w-2015/projects_2015/Co-selling... · Co-selling recommendations from an eCommerce dataset. ... of

PawanRewariCS224W

9

4.0ResultsandFindingsSoifwelookattheexamplewestartedwith,thenewsetof“similar”productsare:ASIN:0486220125 title:HowtheOtherHalfLives:StudiesAmongtheTenementsofNewYork

Newsetof"similar"products.ASIN:0486401960 title:TheBattlewiththeSlumASIN:0486229076 title:OldNewYorkinEarlyPhotographsASIN:0714840343 title:JacobRiis(55)ASIN:0451628810 title:TheFederalistPapersASIN:0395914787 title:AmericanVintage:TheRiseofAmericanWineASIN:1841762121 title:MatchlockMusketeer1588-1688(Warrior43)ASIN:0609608363 title:WondrousContrivances:TechnologyattheThresholdASIN:0807120820 title:Freedom'sLawmakers:ADirectoryofBlackOfficeholdersDuringReconstructionASIN:0060004436 title:PowerPlays:WinorLose--HowHistory'sGreatPoliticalLeadersPlaytheGameASIN:0807841625 title:TheEnduringSouth:SubculturalPersistenceinMassSocietyASIN:0805072845 title:ProjectOrion:TheTrueStoryoftheAtomicSpaceshipASIN:0811717003 title:StreetWithoutJoyASIN:0813911311 title:TheReligiousLifeofThomasJeffersonASIN:0801486955 title:GoingNative:IndiansintheAmericanCulturalImaginationASIN:0029024803 title:AnEconomicInterpretationoftheConstitutionofTheUnitedStatesFirst,justtakingacasuallookatthislist,itlookslikeareasonablesetofrecommendedproducts.Manyareinthethemeoftheoriginalproduct,andseveraltakeyoutonew,butrelatedplaces.SoeverytimeauservisitstheeCommercesitetopurchasetheywillbeshownasubsetof“peoplewholookedatthisalsolookedatthis”,“peoplewhopurchasedthis,alsopurchasedthis”.AswelookatthetransformationfromG1toG3,forthisONEproductwehavethenewmetrics:

• startingwith3(pg5)wenowhave15“similar”products(better)• theaveragecategoricalsimilaritygoesfrom5.33to5.72(better)• theaverageSalesRankfrom209144to299227(worse)

Fortheoverallgraphofsimilar/co-sellingproducts,weclearlyhaveatransformationtoamuchdensergraphwithasignificantlylargernumberofrecommendationsperproduct.MetricsforinitialGraphG1: MetricsfornewGraphG3:G1:Nodes542684,Edges987903 G3:Nodes542684,Edges7352419G1:AverageDegree:1.82 G3:AverageDegree:13.55G1:MaxWcc:Nodes334852,Edges925823 G3:MaxWcc:Nodes524997,Edges6426637G1:Diameter:43 G3:Diameter:11G1:AverageClusteringCoefficient:0.274 G1:AverageClusteringCoefficient:0.569Nowifwelookintothedetails,itclearlyshowsthatwehavesignificantlyraisedthenumberofrecommendedproductswhilesimultaneouslyimprovingaveragecategoricalsimilarity(relevance)andaverageSalesRank(rankbasedonhistoricalsales)oftherecommendedproducts.This

Page 10: Co-selling recommendations from an eCommerce dataset.snap.stanford.edu/class/cs224w-2015/projects_2015/Co-selling... · Co-selling recommendations from an eCommerce dataset. ... of

PawanRewariCS224W

10

involvedsignificantprogrammingwithmuchnewlearning,theresultsaresolidandwehaveconfidenceinthefindings.NEWMETRICS:

OriginalDataset

NewRecommendations Improvement

WholeGraph

AvgRecommendedProducts 1.82 13.55 7.4 X AvgCategoricalSimilarity 3.25 5.60 72.0% AvgSalesRank 81729 24343 70.2%

Graphsubset-AllProductswith1ormoreco-sellingproducts

AvgRecommendedProducts 2.69 13.97 5.2 X AvgCategoricalSimilarity 4.81 5.77 19.9% AvgSalesRank 67662 21654 68.0% Books

AvgRecommendedProducts 2.75 14.00 5.1 X AvgCategoricalSimilarity 4.85 5.78 19.0% AvgSalesRank 84674 27450 67.6% Music

AvgRecommendedProducts 2.43 13.96 5.7 X AvgCategoricalSimilarity 4.92 5.67 15.2% AvgSalesRank 26739 8217 69.3% Video

AvgRecommendedProducts 1.95 12.73 6.5 X AvgCategoricalSimilarity 3.56 5.83 63.7% AvgSalesRank 7196 2821 60.8% DVD

AvgRecommendedProducts 3.43 14.99 4.4 X AvgCategoricalSimilarity 5.03 6.08 20.9% AvgSalesRank 8430 2646 68.6% SincetheexamplewehaveusedisonBooksletmeexplorethebookscategoryinmoredetailtoexplainthistable.Whenwestarted,onaveragetherewere2.75recommendationsforeachbook,notnearenoughtoprovideadequatechoices,afterthetransformation,wenowhave14recommendationsperbook,thatismuchmoreusefulforaneCommerceengine.Thisalsoensuresalargersetofchoicesprovidedtoconsumers.AsameasureofrelevancewehaveusedCategoricalSimilarityandonthismeasurethereisa19%improvementinCategoricalSimilarity.TheSalesRankforaproductiscomputedbyAmazonasafunctionofhistoricalsales.AlowerSalesRanksignifiesahighersellingproduct.ThereisaniceimprovementintheaverageSalesRankby67%increasingtheprobabilityofsellingotherproducts.Nowletmechallengesomeoftheassumptionsmadeinthisproject:

Page 11: Co-selling recommendations from an eCommerce dataset.snap.stanford.edu/class/cs224w-2015/projects_2015/Co-selling... · Co-selling recommendations from an eCommerce dataset. ... of

PawanRewariCS224W

11

- increaseaverage#ofrecommendedproductsto~15,isthisareasonablenumber,doweneedmore,doweneedless;inthiscasewiththesamealgorithmthiscanbechanged

- the“weight”ofeachLevelinthecategorytreeisthesame,wecouldweighdeepersub-categorieshigherandseeifthatyieldsbetterresults,thisisworthexperimentingwith

- loweringtheaveragePageRankoftheco-sellingproductscouldleadtoarichgetricherphenomenon,isthiswhatwewantordowewanttosprinkletherecommendationswithsomehighsellersandsomelowsellers,thisiscertainlyastrategythatcanbeinvestigated

- isCategoricalSimilarityagoodmeasure,theresultsseempromising,butneedstobetested- tothelastpointandeverythingstatedsofartheultimatemeasureiswillthisleadtomore

ecommercesalesandtheonlywaywedeterminethatisbyrealworldtestsofthemodel- finallytheassumptionmadeinthisdiscussionaboutmaximizingsalesistomaximizethe

numberofitemssold,wouldwedoanythingdifferenttomaximizesalesrevenues,wesuspecttheapproachissimilar,butthiswouldbeanotherworthypathtoinvestigate

5.0ConclusionWestartedwithaDatasetthatprovidedasparsesetofrecommendationsforco-sellingproducts.Thisisarealworldscenariowhereoneneedstoproducemoreandrelevantrecommendationsgivenalimitedsetofdata.Welookedatclustermodelsandcollaborativefiltering,bothofwhichwouldrequiremoreinformationoncustomersthatwasnotavailable.Sowiththelimitedinformationavailable,wecameupwithametricaroundCategoricalSimilarityandprovidedadditionalrecommendationsthatimproveduponboththismeasureandtheaverageSalesRank.Nowinarealworldscenariothenextstepistorunatestforalimitedperiodoftime,sowecanprogressivelyrefineitbasedonrealresults.Thatsaidourtheoreticalresultsshowthatweareheadedintherightdirection.6.0References

1. J.Breese,D.Heckerman,andC.Kadie,“EmpiricalAnalysisofPredictiveAlgorithmsforCollaborativeFiltering,”Proc.14thConf.UncertaintyinArtificialIntelligence,MorganKaufmann,1998,pp.43-52.

2. B.M.Sarwaretal.,“Item-BasedCollaborativeFilteringRecommendationAlgorithms,”10thInt’lWorldWideWebConference,ACMPress,2001,pp.285-295.

3. M.BalabanovicandY.Shoham,“Content-BasedCollaborativeRecommendation,”Comm.ACM,Mar.1997,pp.66-72.

4. L.UngarandD.Foster,“ClusteringMethodsforCollaborativeFiltering,”Proc.WorkshoponRecommendationSystems,AAAIPress,1998.

5. LindenG.,SmithS.&YorkJ,–Amazon.comRecommendations:ItemtoItemCollaborativeFiltering,2003,