30
Shop Vertical Classification @ Arthur Prévot Meetup Machine Learning – Toronto – March 1 st 2016

Shop vertical classification - Meetup Presentation

  • Upload
    prevota

  • View
    55

  • Download
    1

Embed Size (px)

Citation preview

Shop Vertical

Classification@

Arthur PrévotMeetup Machine Learning – Toronto – March 1st 2016

Background

• Large ecommerce platform• 240K+ current customers• Many more shops created (churned or

didn’t make it to customer status)

Problem● No information about their industry in most cases

1st solution● ask them

2nd solution● We have html product descriptions for each shop● We have labelled data (mechanical turk)� Classifier

Context

• Started during a Shopify Hack Day• Pursued as a side project at work• Used sk-learn and • Moved to Spark MLlib for full scale testing

and production• Now in production

Product Description

Getting Label Data

• Asked Amazon Mechanical Turkers to assess 80K stores• Having to choose among 15 verticals• Involved hundreds of turkers

80K shops

Shop Aggregated product data

1 “Nice octopolo shirt !…”

2 “Nice hat and nice shirt …”

3 “Set of <b> tires </b> …”

4 “Beef and more beef…”

5 “Tire set for bikes”

... ...

Input

80K shops

Shop Text

1 “nice octopolo shirt…”

2 “nice hat and nice shirt…”

3 “set tire…”

4 “beef beef…”

5 “tire set bike”

... ...

Cleaning

• HTML code removed• Stop word removed• Words stemmed

Shops nice octopolo shirt hat set tires beef bike ... label

1 1 1 1 ... Apparel

2 2 1 1 ... Apparel

3 1 1 ... Auto

4 2 … Food

5 1 1 1 … Auto

... ... ... ... … … … … … ... …

10K words (8 in ex)

Term Frequency80

K s

hops

Joining mechturk

Model

• Few quick tests using sklearn and settled on Naïve Bayes

Shops nice octopolo shirt hat set tires beef bike label

1 1 1 1 Apparel

2 2 1 1 Apparel

3 1 1 Auto

4 2 Food

5 1 1 1 Auto

80K

sho

ps

Shops nice octopolo shirt hat set tires beef bike label priors

1, 2 P (nice | apparel)

P (octopolo | apparel)

P (shirt | aprel)

P (hat | apparel)

P (set | apparel)

P (tires | aprel)

P (beef | apparel)

P (bike | apparel)

Apparel P(apparel)

3, 5 P (nice | auto)

P (octopolo | auto)

P (shirt | auto)

P (hat || auto)

P (set || auto)

P (tires || auto)

P (beef | auto)

P (bike | auto)

Auto P(auto)

4 P (nice | food)

P (octopolo | food)

P (shirt | food)

P (hat || food

P (set || food)

P (tires || food)

P (beef | food)

P (bike | food)

Food P(food)

15 la

bels

Naïve Bayes Model

Shops nice octopolo shirt hat set tires beef bike label priors

1, 2 P (nice | apparel)

P (octopolo | apparel)

P (shirt | aprel)

P (hat | apparel)

P (set | apparel)

P (tires | aprel)

P (beef | apparel)

P (bike | apparel)

Apparel P(apprel)

3, 5 P (nice | auto)

P (octopolo | auto)

P (shirt | auto)

P (hat || auto)

P (set || auto)

P (tires || auto)

P (beef | auto)

P (bike | auto)

Auto P(auto)

4 P (nice | food)

P (octopolo | food)

P (shirt | food)

P (hat || food

P (set || food)

P (tires || food)

P (beef | food)

P (bike | food)

Food P(food)

What and why

• These are the model parameters• Needed as input to the prediction formula

!"#$%&'#$)*+,, = +"./+01! &* $2&)

Shops nice octopolo shirt hat set tires beef bike label priors

1, 2 P (nice | apparel)

P (octopolo | apparel)

P (shirt | aprel)

P (hat | apparel)

P (set | apparel)

P (tires | aprel)

P (beef | apparel)

P (bike | apparel)

Apparel P(apparel)

3, 5 P (nice | auto)

P (octopolo | auto)

P (shirt | auto)

P (hat || auto)

P (set || auto)

P (tires || auto)

P (beef | auto)

P (bike | auto)

Auto P(auto)

4 P (nice | food)

P (octopolo | food)

P (shirt | food)

P (hat || food

P (set || food)

P (tires || food)

P (beef | food)

P (bike | food)

Food P(food)

What and why

! &* $2&) = 4 15 ∗4 781 15)

4(781)

∝ ! &* ∗ ! $2& &*)

= ! &* ∗ ! ;$< &*) * ! ;$= &*) * … * ! ;$> &*)

(Bayes Theorem)

with conditional independence assumption, actually violated..

denominator not important to compare likelihoods

!"#$%&'#$)*+,, = +"./+01! &* $2&)

Shops nice octopolo shirt hat set tires beef bike label priors

1, 2 P (nice | apparel)

P (octopolo | apparel)

P (shirt | aprel)

P (hat | apparel)

P (set | apparel)

P (tires | aprel)

P (beef | apparel)

P (bike | apparel)

Apparel P(apparel)

3, 5 P (nice | auto)

P (octopolo | auto)

P (shirt | auto)

P (hat || auto)

P (set || auto)

P (tires || auto)

P (beef | auto)

P (bike | auto)

Auto P(auto)

4 P (nice | food)

P (octopolo | food)

P (shirt | food)

P (hat || food

P (set || food)

P (tires || food)

P (beef | food)

P (bike | food)

Food P(food)

Numerical Limitation

• Multiplying many values close to 0 -> float underflow

! &* $2&) ∝ ! &* ∗ ! ;$< &*) * ! ;$= &*) * … * ! ;$> &*)

Shops nice octopolo shirt hat set tires beef bike label priors

1, 2 Log(P(..)) Log(P(..)) Log(P(..))

Log(P(..)) Log(P(..)) Log(P(..))

Log(P(..)) Log(P(..)) Apparel Log(P(..))

3, 5 Log(P(..)) Log(P(..)) Log(P(..))

Log(P(..)) Log(P(..)) Log(P(..))

Log(P(..)) Log(P(..)) Auto Log(P(..))

4 Log(P(..)) Log(P(..)) Log(P(..))

Log(P(..)) Log(P(..)) Log(P(..))

Log(P(..)) Log(P(..)) Food Log(P(..))

Numerical limitation

?2. ! &* $2&) ∝ log ! &* + log( ! ;$< &*)) + log(! ;$= &*)) + … + log(! ;$> &*))

• Way around: take log -> leads to summation instead of multiplication• No impact on comparisons across classes

! &* $2&) ∝ ! &* ∗ ! ;$< &*) * ! ;$= &*) * … * ! ;$> &*) From before, so:

Shops nice octopolo shirt hat set tires beef bike label priors

1, 2 P (nice | apparel)

P (octopolo | apparel)

P (shirt | aprel)

P (hat | apparel)

P (set | apparel)

P (tires | aprel)

P (beef | apparel)

P (bike | apparel)

Apparel P(apprel)

3, 5 P (nice | auto)

P (octopolo | auto)

P (shirt | auto)

P (hat || auto)

P (set || auto)

P (tires || auto)

P (beef | auto)

P (bike | auto)

Auto P(auto)

4 P (nice | food)

P (octopolo | food)

P (shirt | food)

P (hat || food

P (set || food)

P (tires || food)

P (beef | food)

P (bike | food)

Food P(food)

Getting cell probabilities! ;$> &*) =

DEFGH∑ DEF�KLEMN

Dealing with P(wd|cl)=0which makes P(cl|doc)=0 regardless of other words

!(&*) = DEFD

≈ DEFGHP<

∑ (DEFP<)�KLEMN

= DEFGHP<

∑ (DEF)PQ81RS�KLEMN

Shops nice octopolo shirt hat set tires beef bike label

1 1 1 1 Apparel

2 2 1 1 Apparel

3 1 1 Auto

4 2 Food

5 1 1 1 Auto

80K

sho

ps

Shops nice octopolo shirt hat set tires beef bike label priors

1, 2 3 + 1

7 + 8

1 + 1

7 + 8

2 + 1

7 + 8

1 + 1

7 + 8

0 + 1

7 + 8

0 + 1

7 + 8

0 + 1

7 + 8

0 + 1

7 + 8Apparel 2

5

3, 5 Auto

4 Food

15 la

bels

Shops nice octopolo shirt hat set tires beef bike label

1 1 1 1 Apparel

2 2 1 1 Apparel

3 1 1 Auto

4 2 Food

5 1 1 1 Auto

80K

sho

ps

Shops nice octopolo shirt hat set tires beef bike label priors

1, 2 3 + 1

7 + 8

1 + 1

7 + 8

1 + 1

7 + 8

0 + 1

7 + 8

0 + 1

7 + 8

0 + 1

7 + 8

0 + 1

7 + 8

0 + 1

7 + 8Apparel 2

5

3, 5 0 + 1

5 + 8

0 + 1

5 + 8

0 + 1

5 + 8

0 + 1

5 + 8

2 + 1

5 + 8

2 + 1

5 + 8

0 + 1

5 + 8

1 + 1

5 + 8Auto 2

5

4 Food

15 la

bels

Shops nice octopolo shirt hat set tires beef bike label

1 1 1 1 Apparel

2 2 1 1 Apparel

3 1 1 Auto

4 2 Food

5 1 1 1 Auto

80K

sho

ps

Shops nice octopolo shirt hat set tires beef bike label priors

1, 2 3 + 1

7 + 8

1 + 1

7 + 8

1 + 1

7 + 8

0 + 1

7 + 8

0 + 1

7 + 8

0 + 1

7 + 8

0 + 1

7 + 8

0 + 1

7 + 8Apparel 2

5

3, 5 0 + 1

5 + 8

0 + 1

5 + 8

0 + 1

5 + 8

0 + 1

5 + 8

2 + 1

5 + 8

2 + 1

5 + 8

0 + 1

5 + 8

1 + 1

5 + 8Auto 2

5

4 0 + 1

2 + 8

0 + 1

2 + 8

0 + 1

2 + 8

0 + 1

2 + 8

0 + 1

2 + 8

0 + 1

2 + 8

2 + 1

2 + 8

0 + 1

2 + 8Food 1

5

15 la

bels

class LabeledDataFilter():...

class Featurizer():...

class Trainer()...

class Evaluator()...

class Predictor()...

class verticalPredictor():use Featurizer()use Predictor()...

product_data

Training job (every 7 days) Prediction job (every day)

modelaccuracy

product_datashop+industrymodel

Code

Change in Training Set

• Start of home card• Allowed asking for Industry in

a voluntary way• Quickly grew to 50K shops• Advantage: growing over time• Issue: training set is not fully

random

Shop NameShop URLShop AddressShop City…Shop Predicted Industry…

Shop Dimension

In the Data Warehouse

Updated daily

Results

Shops top category

turker 1 turker2 turker 3

Chive Apparel Apparel Apparel Art

Lackers Sports Sports Apparel Sports

Tesla Auto Auto Auto Sports

... ... ... ...

60-80%

Results

Shops top category

turker 1 turker2 turker 3 algotop1

algo top2

algo top3

Chive Apparel Apparel Apparel Art Apparel Sport Art

Lackers Sports Sports Apparel Sports Sports Apparel Food

Tesla Auto Auto Auto Sports Fashion Auto Electro

... ... ... ...

60-80% ~65%

ResultsShops top

categoryturker 1 turker2 turker 3 algo

top1algo top2

algo top3

Chive Apparel Apparel Apparel Art Apparel Sport Art

Lackers Sports Sports Apparel Sports Sports Apparel Food

Tesla Auto Auto Auto Sports unknown Auto Electro

... ... ... ...

90%

~75%

Business Use

Management or product teams: • What are the biggest industries per shop count, per sales made?• How does that evolve over time ?

Theme team:• We want to develop new themes for a given vertical, can we see the

top stores in this vertical to understand trends ?

Event team:• We want to be part of an event in the music business, can we get

interesting shops in this field ?

Could be improved

●More metrics: Add multiclass precision/recall○ Now available in mllib

●Better performances: Rerun for combination of parameters

○ Also added recently to mllib but missing some components

DEMO

THE END