Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

Support Vector Machines in Marketing

Georgi Nalbantov

MICC, Maastricht University

Contents

Purpose

Linear Support Vector Machines

Nonlinear Support Vector Machines

(Theoretical justifications of SVM)

Marketing Examples

Conclusion and Q & A

(some extensions)

Purpose

Task to be solved (The Classification Task):

Classify cases (customers) into “type 1” or “type 2” on the basis of

some known attributes (characteristics)

Chosen tool to solve this task:

Support Vector Machines

The Classification Task

Given data on explanatory and explained variables, where the explained variable can take two values { 1 }, find a function that gives the “best” separation between the “-1” cases and the “+1” cases:

Given: ( x1, y1 ), … , ( xm , ym ) n { 1 }

Find: : n { 1 }

“best function” = the expected error on unseen data ( xm+1, ym+1 ), … , ( xm+k , ym+k ) is

minimal

Existing techniques to solve the classification task:

Linear and Quadratic Discriminant Analysis

Logit choice models (Logistic Regression)

Decision trees, Neural Networks, Least Squares SVM

Support Vector Machines: Definition

Support Vector Machines are a non-parametric tool for classification/regression

Support Vector Machines are used for prediction rather than description purposes

Support Vector Machines have been developed by Vapnik and co-workers

∆ buyers ● non-buyers

Months since last purchase

Linear Support Vector Machines

A direct marketing company wants to sell a new book:

“The Art History of Florence”

Nissan Levin and Jacob Zahavi in Lattin, Carroll and Green (2003).

Problem: How to identify buyers and non-buyers using the two variables: Months since last purchase Number of art books purchased

∆ ●

●● ●

∆∆

Main idea of SVM:

separate groups by a line.

However: There are infinitely many lines that have zero training error…

… which line shall we choose?

Linear SVM: Separable Case

∆ ●

●● ●

∆∆

SVM use the idea of a margin around the separating line.

The thinner the margin,

the more complex the model,

The best line is the one with thelargest margin.

margin

∆ ●

●● ●

∆∆

The line having the largest margin is:

w1x1 + w2x2 + b = 0

x1 = months since last purchase x2 = number of art books purchased

w1xi 1 + w2xi 2 + b +1 for i ∆ w1xj 1 + w2xj 2 + b –1 for j ●

margin

w 1x 1

+ w 2x 2

+ b = 1

w 1x 1

+ w 2x 2

+ b = 0

w 1x 1

+ w 2x 2

+ b = -1

∆ ●

●● ●

∆∆

The width of the margin is given by:

maximizethe margin

minimize minimize

w2 22w

||||2)1(1

margin22

w 1x 1

+ w 2x 2

+ b = 1

w 1x 1

+ w 2x 2

+ b = 0

w 1x 1

+ w 2x 2

+ b = -1

margin

∆ ●

●● ●

∆∆

The optimization problem for SVM is:

subject to:

w1xi 1 + w2xi 2 + b +1 for i ∆ w1xj 1 + w2xj 2 + b –1 for j ●

maximizethe margin

minimize minimize

w2 22w

2)( minimize2

ww Lmargin

∆ ●

●● ●

∆∆

“Support vectors” are those points that lie on the boundaries of the margin

The decision surface (line) is determined only by the support vectors. All other points are irrelevant

“Support vectors”

∆ ●

●● ●

∆∆

Non-separable case: there is no line separating errorlessly the two groups

Here, SVM minimize L(w,C) :

subject to:

w1xi 1 + w2xi 2 + b +1 – i for i

∆ w1xj 1 + w2xj 2 + b –1 + i for j

● I,j 0

Training set: 1000 targeted customers

maximizethe margin

minimize thetraining errors

iCCL 2),(2

L(w,C) = Complexity + Errors

Linear SVM: Nonseparable Case

w 1x 1

+ w 2x 2

+ b = 1

∆ ●

●● ●

∆∆

C = 5x2

Bigger C

( thinner margin )

smaller number errors( better fit on the data )

increased complexity Smaller C( wider margin )

bigger number errors( worse fit on the data )

decreased complexity

Linear SVM: The Role of C

∆∆

● ● ●

C = 1∆

∆∆

● ● ●

Vary both complexity and empirical error via C … by affecting the optimal w and optimal number of training errors

Mapping into a higher-dimensional space

Optimization task: minimize L(w,C)

subject to:

iiiii bxwxxwxw 12 223212

2221221

1211211

llllll x

iCC,L 22

jjjjj bxwxxwxw 12 223212

Nonlinear SVM: Nonseparable Case

∆ ●

●● ●

∆∆

Map the data into higher-dimensional space: 2 3

(1,-1)

(1,1)(-1,1)

(-1,-1)

∆ ●

12111 ,,,

212 xx

Find the optimal hyperplane in the transformed space

(1,-1)

(1,1)(-1,1)

(-1,-1)

∆ ●

12111 ,,,

212 xx

Observe the decision surface in the original space (optional)

∆ ●

12111 ,,,

212 xx

Dual formulation of the (primal) SVM minimization problem

i yymax xx2

ii y 0

iCmin 2

Primal Dual

iii by 1xw

Subject to

0i 1iy

Subject to

i yymax xx2

jjjjiiii

)x,x()x,x(

x,xx,xx,xx,x

)()(),(K jiji xxxx

(kernel function)Ci 0

iii y 0

Subject to

i yymax xx2

Ci 0 i

ii y 0

Subject to

jjjjiiii

)x,x()x,x(

x,xx,xx,xx,x

)()(),(K jiji xxxx

(kernel function)

i yymax xx

)()(yymax jiji

Strengths of SVM:

Training is relatively easy No local minima It scales relatively well to high dimensional data Trade-off between classifier complexity and error can be controlled

explicitly via C Robustness of the results The “curse of dimensionality” is avoided

Weaknesses of SVM:

What is the best trade-off parameter C ? Need a good transformation of the original space

Strengths and Weaknesses of SVM

The Ketchup Marketing Problem

Two types of ketchup: Heinz and Hunts

Seven Attributes Feature Heinz Feature Hunts Display Heinz Display Hunts Feature&Display Heinz Feature&Display Hunts Log price difference between Heinz and Hunts

Training Data: 2498 cases (89.11% Heinz is chosen)

Test Data: 300 cases (88.33% Heinz is chosen)

Cross-validation mean squared errors, SVM with RBF kernel

min max

Do (5-fold ) cross-validation procedure to find the best combination of the manually adjustable parameters (here: C and σ)

The Ketchup Marketing Problem

Choose a kernel mapping:

)(),( jijiK xxxx d

jijiK )1(),( xxxx22

2/),( jieK jixxxx

Linear kernel

Polynomial kernel

RBF kernel

Linear Discriminant Analysis

The Ketchup Marketing Problem – Training Set

HeinzPredicted Group Membership Total

Hunts Heinz Hit Rate

Original Count Hunts 68 204 272 89.51%

Heinz 58 2168 2226

% Hunts 25.00% 75.00% 100.00%

Heinz 2.61% 97.39% 100.00%

Logit Choice Model

Heinz 497 1729 2226

% Hunts 78.68% 21.32% 100.00%

Heinz 22.33% 77.67% 100.00%

Heinz 6 2220 2226

% Hunts 93.75% 6.25% 100.00%

Heinz 0.27% 99.73% 100.00%

Majority Voting

Heinz 0 2226 2226

% Hunts 0% 100% 100.00%

Heinz 0% 100% 100.00%

Linear Discriminant Analysis

The Ketchup Marketing Problem – Test Set

Heinz 3 262 265

% Hunts 8.57% 91.43% 100.00%

Heinz 1.13% 98.87% 100.00%

Logit Choice Model

Original Count Hunts 29 6 35 77%

Heinz 63 202 265

% Hunts 82.86% 17.14% 100.00%

Heinz 23.77% 76.23% 100.00%

Heinz 3 262 265

% Hunts 71.43% 28.57% 100.00%

Heinz 1.13% 98.87% 100.00%

Conclusion

Support Vector Machines (SVM) can be applied in the binary

and multi-class classification problems

SVM behave robustly in multivariate problems

Further research in various Marketing areas is needed to justify

or refute the applicability of SVM

Support Vector Regressions (SVR) can also be applied

http://www.kernel-machines.org

Email: nalbantov@few.eur.nl

Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

Documents

Georgi Shivarov, Vice Presidente di BIA Georgi Shivarov, Vice Presidente di BIA

PPP Georgi Borachev

Georgi Tovmasyan

OUR FUTURE - MICC

Georgi Weak

MICC Center Fort Braggncmbc.us/docs/18DTS/MICC/MICC Acquisition Forecast.pdf · LSO CSB Contracting enabling capabilities are integrated and synchronized at ... – United States

Data Privacy Micc Presentation

TeenzCollege Maastricht University The Holocaust History and Memory January 28, 2014 : History Faculty of Arts and Social Sciences (FASoS) Prof. Georgi

Pandora MICC

Recommendation Georgi Kostov

Visualizing Data using t-SNE - Semantic Scholar · Visualizing Data using t-SNE Laurens van der Maaten L.VANDERMAATEN@MICC UNIMAAS NL MICC-IKAT Maastricht University P.O. Box 616,

Howard Georgi

Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)

Little bear micc 02 (1)

Digital Signal Processing 2009 - LCI MICC -

Georgi malchev

Innovation und Improvisation in Organisationen - MICC …micc-project.org/wp-content/uploads/Aktualisiertes_Programm_der... · 'Silence, patterns, structures; this workshop will explore

Ovid Presentation New Products Body MICC 18Jun2015ark.mef.hr/MICC/micc11_Fanning.pdf · 2015-06-19 · Microsoft PowerPoint - Ovid_Presentation_New_Products_Body_MICC_18Jun2015 Author:

MICC COLLEGE PROGRAM COURSE CATALOG · MICC Community. MICC Community is a life-long program serving an integral role . in the long-term independence of MICC graduates. We evaluate

Georgi Kitov - UoM