52
1 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota [email protected] Presented at Tech Tune Ups, ECE Dept, June 1, 2011

1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota [email protected] Presented

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

1111

Part 4: ADVANCED SVM-based LEARNING METHODS

Electrical and Computer Engineering

Vladimir Cherkassky University of Minnesota

[email protected]

Presented at Tech Tune Ups, ECE Dept, June 1, 2011

Page 2: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

2

OUTLINE

• Motivation for non-standard approaches: high-dimensional data

• Alternative Learning Settings- Transduction and SSL

- Inference Through Contradictions

- Learning using privileged information (or SVM+)

- Multi-task Learning

• Summary

Page 3: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

3

Insights provided by SVM(VC-theory)• Why linear classifiers can generalize?

(1) Margin is large (relative to R)

(2) % of SV’s is small

(3) ratio d/n is small

• SVM offers an effective way to control complexity (via margin + kernel selection) i.e. implementing (1) or (2) or both

• What happens when d>>n ?

- standard inductive methods usually fail

Page 4: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

4

How to improve generalization for HDLSS?Conventional approach:

Incorporate a priori knowledge into learning method• Preprocessing and feature selection • Model parameterization (~ good kernels in SVM)Assumption: a priori knowledge about good model

Non-standard learning formulations:Incorporate a priori knowledge into new non-standard

learning formulation (learning setting)Assumption: a priori knowledge is about properties

of application data and/or goal of learning

• Which type of assumptions makes more sense?

Page 5: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

5

OUTLINE

• Motivation for non-standard approaches

• Alternative Learning Settings- Transduction and SSL

- Inference Through Contradictions

- Learning with Structured Data

- Multi-task Learning

• Summary

Page 6: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

6

Examples of non-standard settings • Application domain: hand-written digit recognition• Standard inductive setting• Transduction: labeled training + unlabeled data• Learning through contradictions:

labeled training data ~ examples of digits 5 and 8unlabeled examples (Universum) ~ all other (eight) digits

• Learning using hidden information:Training data ~ t groups (i.e., from t different persons)Test data ~ group label not known

• Multi-task learning:Training data ~ t groups (from different persons)Test data ~ t groups (group label is known)

Page 7: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

7

Modifications of Inductive Setting • Standard Inductive learning assumes

Finite training set Predictive model derived using only training dataPrediction for all possible test inputs

• Possible modifications1. Predict only for given test points transduction2. A priori knowledge in the form of additional ‘typical’ samples learning through contradiction3. Additional (group) info about training data Learning using privileged information (LUPI) aka SVM+4. Additional (group) info about training + test data Multi-task learning

ii y,x

Page 8: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

8

Transduction (Vapnik, 1982, 1995)• How to incorporate unlabeled test data into the

learning process? Assume binary classification

• Estimating function at given pointsGiven: labeled training data and unlabeled test points

Estimate: class labels at these test points

Goal of learning: minimization of risk on the test set:

where

*jx mj ,...,1 ii y,x ni ,...,1

),....( **1

*myyy

)/(,1

)( **

1

*jj

y

m

j

ydPyyLm

R xy

),(),....,( **1

* mff xxy

Page 9: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

9

Induction vs Transduction

Page 10: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

10

Transduction based on margin size

Single unlabeled test point X

Page 11: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

11

Many test points X aka working samples

Page 12: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

12

Transduction based on margin size • Binary classification, linear parameterization,

joint set of (training + working) samples

• Two objectives of transductive learning:(TL1) separate labeled training data using a large-margin hyperplane (as in standard inductive SVM)

(TL2) separating (explain) working data set using a large-margin hyperplane.

Page 13: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

13

Transduction based on margin size • Standard SVM hinge loss for labeled samples

• Loss function for unlabeled samples:

Mathematical optimization formulation

Page 14: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

14

Optimization formulation for SVM transduction• Given: joint set of (training + working) samples• Denote slack variables for training, for working • Minimize

subject to

where

Solution (~ decision boundary)• Unbalanced situation (small training/ large test)

all unlabeled samples assigned to one class • Additional constraint:

i *j

m

jj

n

ii CCbR

1

**

1

)(2

1),( www

mjni

by

by

ji

jij

iii

,...,1,,...,1,0,

1])[(

1])[(

*

**

xw

xw

mjbsigny jj ,...,1),(* xw

** )()( bD xwx

m

ji

n

ii bm

yn 11

])[(11

xw

Page 15: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

15

Optimization formulation (cont’d)• Hyperparameters control the trade-off

between explanation and margin size• Soft-margin inductive SVM is a special case of

soft-margin transduction with zero slacks• Dual + kernel version of SVM transduction• Transductive SVM optimization is not convex

(~ non-convexity of the loss for unlabeled data) – different opt. heuristics ~ different solutions

• Exact solution (via exhaustive search) possible for small number of test samples (m) – but this solution is NOT very useful (~ inductive SVM).

*CandC

0* j

Page 16: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

16

Many applications for transduction• Text categorization: classify word documents

into a number of predetermined categories• Email classification: Spam vs non-spam• Web page classification• Image database classification• All these applications:

- high-dimensional data- small labeled training set (human-labeled)- large unlabeled test set

Page 17: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

17

Example application

• Prediction of molecular bioactivity for drug discovery

• Training data~1,909; test~634 samples• Input space ~ 139,351-dimensional• Prediction accuracy:SVM induction ~74.5%; transduction ~ 82.3%Ref: J. Weston et al, KDD cup 2001 data analysis: prediction

of molecular bioactivity for drug design – binding to thrombin, Bioinformatics 2003

Page 18: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

18

Semi-Supervised Learning (SSL)• Labeled data + unlabeled data Model• Similar to transduction (but not the same):

- Goal 1 ~ prediction for unlabeled samples - Goal 2 ~ estimate an inductive model

• Many algorithms• Applications similar to transduction• Typically

- Transduction works better for HDLSS- SSL works better for low-dimensional data

Page 19: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

19

Example: Self-Learning Algorithm

Given initial labeled set L and unlabeled set U

Repeat:

(1) estimate a classifier using labeled set L

(2) classify randomly chosen unlabeled sample using decision rule estimated in Step (1)

(3) move this new labeled sample to set L

Iterate steps (1) – (3) until all unlabeled samples are classified.

Page 20: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

20

Noisy Hyperbolas: unlabeled samples in green

Initial condition:

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9Iteration 1

Unlabeled Samples

Class +1Class -1

Example of Self-Learning Algorithm

Page 21: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

21

Example of Self-Learning Algorithm

Iteration 50 Iteration 100 (final)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9Iteration 50

Unlabeled Samples

Class +1Class -1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9Iteration 100

Class +1

Class -1

Page 22: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

22

Inference through contradiction (Vapnik 2006)

• Motivation: what is a priori knowledge?- info about the space of admissible models- info about admissible data samples

• Labeled training samples + unlabeled samples from the Universum

• Universum samples encode info about the region of input space (where application data lives):- Usually from a different distribution than training/test data

• Examples of the Universum data• Large improvement for small training samples

Page 23: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

23

Inference through contradictions aka Universum learning

Page 24: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

24

Main Idea

Fig. courtesy of J. Weston (NEC Labs)

• Handwritten digit recognition: digit 5 vs 8

Page 25: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

25

Learning with the Universum• Inductive setting for binary classification

Given: labeled training data and unlabeled Universum samples Goal of learning: minimization of prediction risk (as in standard inductive setting)

• Balance between two goals:- explain labeled training data using large-margin hyperplane- achieve maximum falsifiability ~ max # contradictions on the Universum

Math optimization formulation (extension of SVM)

*jx mj ,...,1

ii y,x ni ,...,1

Page 26: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

26

-insensitive loss for Universum samples

x

y

*2

1

Page 27: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

27

Random averaging Universum

Average

Class 1

Class -1Hyper-plane

Page 28: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

28

Random Averaging for digits 5 and 8

• Two randomly selected examples

• Universum sample:

Page 29: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

29

Application Study (Vapnik, 2006)• Binary classification of handwritten digits 5 and 8• For this binary classification problem, the following

Universum sets had been used:

U1: randomly selected digits (0,1,2,3,4,6,7,9)

U2: randomly mixing pixels from images 5 and 8

U3: average of randomly selected examples of 5 and 8

Training set size tried: 250, 500, … 3,000 samples

Universum set size: 5,000 samples

• Prediction error: improved over standard SVM, i.e. for 500 training samples: 1.4% vs 2% (SVM)

Page 30: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

30

Cultural Interpretation of Universum:jokes, absurd examples:

neither Hillary nor Obama dadaism

Page 31: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

31

Application Study: predicting gender of human faces

• Binary classification setting• Difficult problem:

dimensionality ~ large (10K - 20K)labeled sample size ~ small (~ 10 - 20)

• Humans perform very well for this task• Issues:

- possible improvement (vs standard SVM)- how to choose ‘good’ Universum?- model parameter tuning

Page 32: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

32

Male Faces: examples

Page 33: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

33

Female Faces: examples

Page 34: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

34

Universum Faces:neither male nor female

Page 35: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

35

Empirical Study (cont’d)

• Universum generation:U1 Average: of male and female samples randomly selected from the training set (U. of Essex database)

U2 Empirical Distribution: estimate pixel-wise distribution of the training data. Generate a new picture from this distribution

U3 Animal faces:

Page 36: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

3636

Universum generation: examples• U1 Averaging:

• U2 Empirical Distribution:

Page 37: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

37

Results of gender classification

• Classification accuracy: improves vs standard SVM by ~ 2% with U1 Universum,

and by ~ 1% with U2 Universum.

• Universum by averaging gives better results for this problem, when number of Universum samples N = 500 or 1,000

Page 38: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

3838

Results of gender classification• Universum ~ Animal Faces:

• Degrades classification accuracy by 2-5% (vs standard SVM)

• Animal faces are not relevant to this problem

Page 39: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

39

Learning with Structured Data(Vapnik, 2006)

• Application: Handwritten digit recognitionLabeled training data provided by t persons (t >1) Goal 1: find a classifier that will generalize well for future samples generated by these persons ~ Learning with Structured Data or Learning using Hidden InformationGoal 2: find t classifiers with generalization (for each person) ~ Multi-Task Learning (MTL)

• Application: Medical diagnosisLabeled training data provided by t groups of patients (t >1), say men and women (t = 2) Goal 1: estimate a classifier to predict/diagnose a disease using training data from t groups of patients ~ LWSDGoal 2: find t classifiers specialized for each group of patients ~ MTL

Page 40: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

4040

Different Ways of Using Group Information

SVM

SVM+

SVM

SVM

svm+MTL

f(x)

f(x)

f1(x)

f2(x)

f1(x)

f2(x)

sSVM:

SVM+:

mSVM:

MTL:

Page 41: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

41

SVM+ technology (Vapnik, 2006)• Map the input vectors simultaneously into:

- Decision space (standard SVM classifier)

- Correcting space (where correcting functions model slack variables for different groups)

• Decision space/function ~ the same for all groups• Correcting functions ~ different for each group

(but correcting space may be the same)• SVM+ optimization formulation incorporates:

- the capacity of decision function

- capacity of correcting functions for group r

- relative importance (weight) of these two capacities

ww,

rr ww ,

Page 42: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

42

SVM+ approach (Vapnik, 2006)

Decision function

Decision space

Correcting space Correcting functions

Group1Group2

Class 1Class -1

Correcting space

mapping

2

2

1

1

r slack variable for group r

mapping

Page 43: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

43

SVM+ Formulation

r

tt Ti

ri

t

r

t

rrr

ddbwwwC

11,..,,,..,,

),(2

),(2

1 min

11

wwww

subject to:

trTiby rriii ,...,1,,1)),(( zw

trTi r

ri ,...,1,,0

trTid rrrri

ri ,...,1,,),( wz

]))(,[()]([ bsignfsigny Z xwxriz

r Zri

)(xz

Zizi )(xzix

Decision Space

Correcting Space

Page 44: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

4444

SVM+ for Multi-task Learning (Liang 2008)

• New learning formulation: SVM+MTL• Define decision function for each group as

• Common decision function models the relatedness among groups

• Correcting functions fine-tune the model for each group (task)

.

trdbf rrzzr r,...,1,)),(()),(()( wxwxx

bz )),(( wx

Page 45: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

4545

t

r Ti

ri

t

rrr

ddbwwwr

rt

C11

,..,,,..,,),(

2),(

2

1 min

11

wwww

subject to:

trTidby rrir

riri

ri ,...,1,,1)),(),(( zwzw

trTiri ,...,1,,0

rizr Z

ri )(xz

Zizi )(xzix Correcting Space

svm+MTL FormulationDecision Space

trdbsignf rrzzr r,...,1,))),(()),((()( wxwxx

Page 46: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

4646

Empirical Validation

• Different ways of using group info different learning settings:

- which one yields better generalization?

- how performance is affected by sample size?

• Empirical comparisons:

- synthetic data set

Page 47: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

4747

Different Ways of Using Group Information

SVM

SVM+

SVM

SVM

svm+MTL

f(x)

f(x)

f1(x)

f2(x)

f1(x)

f2(x)

sSVM:

SVM+:

mSVM:

MTL:

Page 48: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

4848

Comparison for Synthetic Data Set

• Generate x where each • The coefficient vectors of three tasks are specified as

• For each task and each data vector, • Details of methods used:

- linear SVM classifier (single parameter C)- SVM+, SVM+MTL classifier (3 parameters: linear kernel for decision space, RBF kernel for correcting space, and parameter γ) - Independent validation set for model selection

20R 20,...,1,)1,1(~ i�uniformxi

)5.0( xisigny ]0,0,0,0,0,1,0,0,0,0,0,1,0,[

]0,0,0,0,0,0,0,0,1,0,1,0,1,[

]0,0,0,0,0,0,0,0,0,0,1,1,1,[

3

2

1

1,11,1,1,1,1,β

1,11,1,1,1,1,β

1,11,1,1,1,1,β

Page 49: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

4949

Experimental Results • Comparison results (ave over 10 trials):

n ~ number of training samples per taskave test error (%):

Note: relative performance depends on sample size

Note: SVM+ always better than SVMSVM+MTL always better than mSVM

Methods: sSVM SVM+ mSVM SVM+MTL

n=15 19.9 19.1 29.3 20.8n=100 11.9 11.7 8.8 8.5

Page 50: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

50

OUTLINE

• Motivation for non-standard approaches

• Alternative Learning Settings• Summary: Advantages/limitations of non-

standard settings

Page 51: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

51

Advantages+limitations of nonstandard settings• Advantages

- make common sense- follow methodological framework (VC-theory)- yield better generalization (but not always)

• Limitations- need to formalize application requirements need to understand application domain- generally more complex learning formulations- more difficult model selection- few known empirical comparisons (to date)

• SVM+ is a promising new technology for hard problems

Page 52: 1111 Part 4: ADVANCED SVM-based LEARNING METHODS Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu Presented

52

References and Resources• Vapnik, V. Estimation of Dependencies Based on Empirical Data. Empirical

Inference Science: Afterword of 2006, Springer, 2006 • Cherkassky, V. and F. Mulier, Learning from Data, second edition, Wiley, 2007• Chapelle, O., Schölkopf, B., and A. Zien, Eds., Semi-Supervised Learning, MIT

Press, 2006• Cherkassky, V. and Y. Ma, Introduction to Predictive learning, Springer, 2011

(to appear)• Hastie, T., R. Tibshirani and J. Friedman, The Elements of Statistical Learning.

Data Mining, Inference and Prediction, New York: Springer, 2001 • Schölkopf, B. and A. Smola, Learning with Kernels. MIT Press, 2002.

Public-domain SVM software• Main web page link http://www.kernel-machines.org• LIBSVM software library http://www.csie.ntu.edu.tw/~cjlin/libsvm/• SVM-Light software library http://svmlight.joachims.org/• Non-standard SVM-based methodologies: Universum, SVM+, MTL

http://www.ece.umn.edu/users/cherkass/predictive_learning/