32
Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

Embed Size (px)

Citation preview

Page 1: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

Support Vector Machines in Marketing

Georgi Nalbantov

MICC, Maastricht University

Page 2: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

2/20

Contents

Purpose

Linear Support Vector Machines

Nonlinear Support Vector Machines

(Theoretical justifications of SVM)

Marketing Examples

Conclusion and Q & A

(some extensions)

Page 3: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

3/20

Purpose

Task to be solved (The Classification Task):

Classify cases (customers) into “type 1” or “type 2” on the basis of

some known attributes (characteristics)

Chosen tool to solve this task:

Support Vector Machines

Page 4: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

4/20

The Classification Task

Given data on explanatory and explained variables, where the explained variable can take two values { 1 }, find a function that gives the “best” separation between the “-1” cases and the “+1” cases:

Given: ( x1, y1 ), … , ( xm , ym ) n { 1 }

Find: : n { 1 }

“best function” = the expected error on unseen data ( xm+1, ym+1 ), … , ( xm+k , ym+k ) is

minimal

Existing techniques to solve the classification task:

Linear and Quadratic Discriminant Analysis

Logit choice models (Logistic Regression)

Decision trees, Neural Networks, Least Squares SVM

Page 5: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

5/20

Support Vector Machines: Definition

Support Vector Machines are a non-parametric tool for classification/regression

Support Vector Machines are used for prediction rather than description purposes

Support Vector Machines have been developed by Vapnik and co-workers

Page 6: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

6/20N

umbe

r of

art

boo

ks p

urch

ased

∆ buyers ● non-buyers

Months since last purchase

Linear Support Vector Machines

A direct marketing company wants to sell a new book:

“The Art History of Florence”

Nissan Levin and Jacob Zahavi in Lattin, Carroll and Green (2003).

Problem: How to identify buyers and non-buyers using the two variables: Months since last purchase Number of art books purchased

∆ ●

●● ●

∆∆

∆∆

Page 7: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

7/20

∆ buyers ● non-buyers

Num

ber

of a

rt b

ooks

pur

chas

ed

Months since last purchase

Main idea of SVM:

separate groups by a line.

However: There are infinitely many lines that have zero training error…

… which line shall we choose?

Linear SVM: Separable Case

∆ ●

●● ●

∆∆

Page 8: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

8/20

SVM use the idea of a margin around the separating line.

The thinner the margin,

the more complex the model,

The best line is the one with thelargest margin.

∆ buyers ● non-buyers

Num

ber

of a

rt b

ooks

pur

chas

ed

margin

Months since last purchase

Linear SVM: Separable Case

∆ ●

●● ●

∆∆

Page 9: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

9/20

The line having the largest margin is:

w1x1 + w2x2 + b = 0

Where

x1 = months since last purchase x2 = number of art books purchased

Note:

w1xi 1 + w2xi 2 + b +1 for i ∆ w1xj 1 + w2xj 2 + b –1 for j ●

x2

x1

Months since last purchase

Num

ber

of a

rt b

ooks

pur

chas

ed

margin

Linear SVM: Separable Case

w 1x 1

+ w 2x 2

+ b = 1

w 1x 1

+ w 2x 2

+ b = 0

w 1x 1

+ w 2x 2

+ b = -1

w

∆ ●

●● ●

∆∆

Page 10: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

10/20

The width of the margin is given by:

Note:

maximizethe margin

2w

minimize minimize

w2 22w

||||2)1(1

margin22

21

w

ww

Linear SVM: Separable Case

x2

x1

Months since last purchase

Num

ber

of a

rt b

ooks

pur

chas

ed

w 1x 1

+ w 2x 2

+ b = 1

w 1x 1

+ w 2x 2

+ b = 0

w 1x 1

+ w 2x 2

+ b = -1

w

margin

∆ ●

●● ●

∆∆

Page 11: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

11/20

The optimization problem for SVM is:

subject to:

w1xi 1 + w2xi 2 + b +1 for i ∆ w1xj 1 + w2xj 2 + b –1 for j ●

x2

x1

maximizethe margin

2w

minimize minimize

w2 22w

Linear SVM: Separable Case

2)( minimize2

ww Lmargin

∆ ●

●● ●

∆∆

Page 12: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

12/20

“Support vectors” are those points that lie on the boundaries of the margin

The decision surface (line) is determined only by the support vectors. All other points are irrelevant

x2

x1

“Support vectors”

Linear SVM: Separable Case

∆ ●

●● ●

∆∆

Page 13: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

13/20

Non-separable case: there is no line separating errorlessly the two groups

Here, SVM minimize L(w,C) :

subject to:

w1xi 1 + w2xi 2 + b +1 – i for i

∆ w1xj 1 + w2xj 2 + b –1 + i for j

● I,j 0

x2

x1

∆ buyers ● non-buyers

Training set: 1000 targeted customers

maximizethe margin

minimize thetraining errors

i

iCCL 2),(2

ww

L(w,C) = Complexity + Errors

Linear SVM: Nonseparable Case

w 1x 1

+ w 2x 2

+ b = 1

∆ ●

●● ●

∆∆

∆∆

Page 14: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

14/20

C = 5x2

x1

Bigger C

( thinner margin )

smaller number errors( better fit on the data )

increased complexity Smaller C( wider margin )

bigger number errors( worse fit on the data )

decreased complexity

Linear SVM: The Role of C

∆∆

● ● ●

x2

x1

C = 1∆

∆∆

● ● ●

Vary both complexity and empirical error via C … by affecting the optimal w and optimal number of training errors

Page 15: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

15/20

Mapping into a higher-dimensional space

Optimization task: minimize L(w,C)

subject to:

iiiii bxwxxwxw 12 223212

211

22

222

212

2121

2221221

1211211

21

2221

1211

2

2

2

llllll x

x

x

xxx

xxx

xxx

xx

xx

xx

x2

x1

i

iCC,L 22

w)(w

jjjjj bxwxxwxw 12 223212

211

Nonlinear SVM: Nonseparable Case

∆ ●

●● ●

∆∆

∆∆

Page 16: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

16/20

Nonlinear SVM: Nonseparable Case

Map the data into higher-dimensional space: 2 3

22

21

21

2

x

xx

x

2

1

x

x

(1,-1)

x2

(1,1)(-1,1)

(-1,-1)

∆ ●

x1

12111 ,,,

12111 ,,,

12111 ,,,

12111 ,,,

212 xx

21x

22x

Page 17: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

17/20

Nonlinear SVM: Nonseparable Case

Find the optimal hyperplane in the transformed space

22

21

21

2

x

xx

x

2

1

x

x

(1,-1)

x2

(1,1)(-1,1)

(-1,-1)

∆ ●

x1

12111 ,,,

12111 ,,,

12111 ,,,

12111 ,,,

212 xx

22x

21x

Page 18: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

18/20

Nonlinear SVM: Nonseparable Case

Observe the decision surface in the original space (optional)

22

21

21

2

x

xx

x

2

1

x

x

x2

∆ ●

x1

12111 ,,,

12111 ,,,

12111 ,,,

12111 ,,,

212 xx

22x

21x

Page 19: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

19/20

Nonlinear SVM: Nonseparable Case

Dual formulation of the (primal) SVM minimization problem

jiji

i j

ji

i

i yymax xx2

1

Ci 0

i

ii y 0

i

iCmin 2

2w

Primal Dual

iii by 1xw

Subject to

0i 1iy

Subject to

1iy

Page 20: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

20/20

Nonlinear SVM: Nonseparable Case

Dual formulation of the (primal) SVM minimization problem

jiji

i j

ji

i

i yymax xx2

1

Dual

2

2

2121

2221

21

2221

21 22

ji

jjii

jjjjiiii

ji

)x,x()x,x(

x,xx,xx,xx,x

)()(

xx

xx

22

21

21

2

x

xx

x

2

1

x

x

)()(),(K jiji xxxx

(kernel function)Ci 0

iii y 0

Subject to

1iy

Page 21: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

21/20

Nonlinear SVM: Nonseparable Case

Dual formulation of the (primal) SVM minimization problem

jiji

i j

ji

i

i yymax xx2

1

Ci 0 i

ii y 0

Dual

Subject to

1iy

2

2

2121

2221

21

2221

21 22

ji

jjii

jjjjiiii

ji

)x,x()x,x(

x,xx,xx,xx,x

)()(

xx

xx

22

21

21

2

x

xx

x

2

1

x

x

)()(),(K jiji xxxx

(kernel function)

2

2

1

jiji

i j

ji

i

i yymax xx

)()(yymax jiji

i j

ji

i

i xx2

1

Page 22: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

22/20

Strengths of SVM:

Training is relatively easy No local minima It scales relatively well to high dimensional data Trade-off between classifier complexity and error can be controlled

explicitly via C Robustness of the results The “curse of dimensionality” is avoided

Weaknesses of SVM:

What is the best trade-off parameter C ? Need a good transformation of the original space

Strengths and Weaknesses of SVM

Page 23: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

23/20

The Ketchup Marketing Problem

Two types of ketchup: Heinz and Hunts

Seven Attributes Feature Heinz Feature Hunts Display Heinz Display Hunts Feature&Display Heinz Feature&Display Hunts Log price difference between Heinz and Hunts

Training Data: 2498 cases (89.11% Heinz is chosen)

Test Data: 300 cases (88.33% Heinz is chosen)

Page 24: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

24/20

C

σ

Cross-validation mean squared errors, SVM with RBF kernel

min max

Do (5-fold ) cross-validation procedure to find the best combination of the manually adjustable parameters (here: C and σ)

The Ketchup Marketing Problem

Choose a kernel mapping:

)(),( jijiK xxxx d

jijiK )1(),( xxxx22

2/),( jieK jixxxx

Linear kernel

Polynomial kernel

RBF kernel

Page 25: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

25/20

Model

Linear Discriminant Analysis

The Ketchup Marketing Problem – Training Set

HeinzPredicted Group Membership Total

Hunts Heinz Hit Rate

Original Count Hunts 68 204 272 89.51%

Heinz 58 2168 2226

% Hunts 25.00% 75.00% 100.00%

Heinz 2.61% 97.39% 100.00%

Page 26: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

26/20

Model

Logit Choice Model

The Ketchup Marketing Problem – Training Set

HeinzPredicted Group Membership Total

Hunts Heinz Hit Rate

Original Count Hunts 214 58 272 77.79%

Heinz 497 1729 2226

% Hunts 78.68% 21.32% 100.00%

Heinz 22.33% 77.67% 100.00%

Page 27: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

27/20

Model

Support Vector Machines

The Ketchup Marketing Problem – Training Set

HeinzPredicted Group Membership Total

Hunts Heinz Hit Rate

Original Count Hunts 255 17 272 99.08%

Heinz 6 2220 2226

% Hunts 93.75% 6.25% 100.00%

Heinz 0.27% 99.73% 100.00%

Page 28: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

28/20

Model

Majority Voting

The Ketchup Marketing Problem – Training Set

HeinzPredicted Group Membership Total

Hunts Heinz Hit Rate

Original Count Hunts 0 272 272 89.11%

Heinz 0 2226 2226

% Hunts 0% 100% 100.00%

Heinz 0% 100% 100.00%

Page 29: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

29/20

Model

Linear Discriminant Analysis

The Ketchup Marketing Problem – Test Set

HeinzPredicted Group Membership Total

Hunts Heinz Hit Rate

Original Count Hunts 3 32 35 88.33%

Heinz 3 262 265

% Hunts 8.57% 91.43% 100.00%

Heinz 1.13% 98.87% 100.00%

Page 30: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

30/20

Model

Logit Choice Model

The Ketchup Marketing Problem – Test Set

HeinzPredicted Group Membership Total

Hunts Heinz Hit Rate

Original Count Hunts 29 6 35 77%

Heinz 63 202 265

% Hunts 82.86% 17.14% 100.00%

Heinz 23.77% 76.23% 100.00%

Page 31: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

31/20

Model

Support Vector Machines

The Ketchup Marketing Problem – Test Set

HeinzPredicted Group Membership Total

Hunts Heinz Hit Rate

Original Count Hunts 25 10 35 95.67%

Heinz 3 262 265

% Hunts 71.43% 28.57% 100.00%

Heinz 1.13% 98.87% 100.00%

Page 32: Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

32/20

Conclusion

Support Vector Machines (SVM) can be applied in the binary

and multi-class classification problems

SVM behave robustly in multivariate problems

Further research in various Marketing areas is needed to justify

or refute the applicability of SVM

Support Vector Regressions (SVR) can also be applied

http://www.kernel-machines.org

Email: [email protected]