Feature-Cost Sensitive Learning · ASTC, SOFT 121 X 48 X 46 X 18 X 15 X 8.2 X 6.4 X 8.0 X 6.4 X 5.7 X 4.5 X 2.8 X 1.5 X 5.3 X 2.3 X 0.62 X 0.27 X 0.13 X 7.2 X 6.2 X 5.9 X 4.2 X 4.3

v3

v4

v5

v6

v1

v2

v0

�1, �1

�0, �0

�2, �2

�5

�6

�3

x

>�0 > �0x

>�1 ✓1

x

x

>�4

Matt J. Kusner, Wenlin Chen, Quan Zhou, Zhixiang (Eddie) Xu, Kilian Q. Weinberger, Yixin ChenDepartment of Computer Science & Engineering, Washington University in St. Louis, USA

Feature-Cost Sensitive Learningwith Submodular Trees of Classifiers

Department of Automation, Tsinghua University, Beijing, China

Cost-Sensitive Tree of Classifiers[Xu et al, 2014]

Budgeted Learning

B[c1, · · · , cd]>D = {(x(i), y(i))}ni=1 ⇢ Rd ⇥ Ydataset feature costs cost budget

[Chen et al., 2012; Xu et al., 2013]

Approximately Submodular Tree of Classifiers

v0

v1v3

v4

v2

(�2, �2)v5

v6

v0

v1

v2

CSTC optimization ASTC optimization

�2 = (X>X)�1X>y

✓2

Results

References

set function optimization

0 500 1000 15000.09

0.095

0.1

0.105

0.11

0.115

0.12

0.125

0.13

Prec

isio

n @

5

ASTCASTC, soft

rcost-weighted l1-classifier

feature cost

Yahoo!

(hig

her i

s be

tter)

CSTC

test

err

or

0 10 20 30 40 500.23

0.24

0.25

0.26

0.27

0.28

0.29

0.3

0.31

0.32

Forest

feature cost0 50 100 150 200 250

0.32

0.34

0.36

0.38

0.4

0.42

0.44

test

err

or

CIFAR

feature cost0 10 20 30 40 50

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

MiniBooNE

feature cost

test

err

or

TRAINING SPEED-UPYAHOO! FOREST CIFAR MINIBOONE

COST BUDGETS 10 52 86 169 468 800 1495 3 5 8 13 23 50 9 24 76 180 239 4 5 12 14 18 33 47ASTC 119X 52X 41X 21X 15X 9.2X 6.6X 8.4X 7.0X 6.3X 4.9X 3.1X 1.4X 5.6X 2.3X 0.68X 0.25X 0.14X 7.4X 7.9X 5.5X 5.2X 4.1X 3.1X 2.0X

ASTC, SOFT 121X 48X 46X 18X 15X 8.2X 6.4X 8.0X 6.4X 5.7X 4.5X 2.8X 1.5X 5.3X 2.3X 0.62X 0.27X 0.13X 7.2X 6.2X 5.9X 4.2X 4.3X 2.5X 1.7X

Figure 3. Plot of ASTC, CSTC, and a cost-sensitive baseline on a real-world feature-cost sensitive dataset (Yahoo!) and three non-cost sensitive datasets (Forest, CIFAR, MiniBooNE). For Yahoo! circles mark the CSTC points that are used for training time comparison, otherwise, all points are compared.

Table 1. Training speed-up of ASTC over CSTC for different cost budgets on all datasets.

Expected Tree Cost

Expected Tree Loss

E[c(T )] =X

vl2L

pl"X

a

ca

��

X

vj2⇡l

|�ja|

��0

#

min�1,...,�|V |

E[`(T )] + �E[c(T )]

E[`(T )] = 1

n

X

vk2V

nX

i=1

pki (y(i) � �k>

x

(i))2

Objective

+ state-of-the-art performance- expensive global optimization- requires continuous relaxation- difficult to implement

`k(A) , min�k

1

n

nX

i=1

pki (y(i) � �k>

x

(i)A )2

(1� e��) -approximation

Greedy Tree Building

Results

v0

v1v3

v4

v2

(�2, �2)v5

v6

v0

v1

v2

CSTC optimization ASTC optimization

�2 = (X>X)�1X>y

✓2

v0

�0, ✓0 v1

v0

�1, ✓1

�0, ✓0 v1

v2

v0

�1, ✓1

�0, ✓0

�2, ✓2

v3

v1

v2

v0

�1, ✓1

�0, ✓0

�2, ✓2

�3

Greedy Feature Selection (node k)

Fast Selection via QR-Decomposition

[Grubb & Bagnell, 2012]

[Das & Kempe, 2011]

`k(A)� `k(A [ a)

ca=

(q>y)2

ca

q =Xa �QQ>Xa

kXa �QQ>Xak2

XA = QRwhere

argmax

a2{1,...,d}

"`k(A)� `k(A [ a)

ca

#

greedy is

near-optimal!

best loss for

features ---A

New Objective

max

A`k(A) s.t. c(A) B

loss reduction per cost

instead of a single classifier...

Chen, M., Weinberger, K. Q., Chapelle, O., Kedem, D., Xu, Z. Classifier cascade for minimizing feature evaluation cost. AISTATS, 2012 Xu, Z., Kusner, M. J., Weinberger, K. Q., Chen, M. Cost-sensitive tree of classifiers. ICML, 2013 Xu, Z., Kusner, M. J., Weinberger, K. Q., Chen, M., Chapelle, O. Classifier Cascades and Trees for Minimizing Feature Evaluation Cost. JMLR, 2014

goal: a ready

practical tool

min�

`(�, {(x(i), y(i))}ni=1) s.t. c(�) B

gradient-based

optimization

Documents

Feature-Cost Sensitive Learning · ASTC, SOFT 121 X 48 X 46 X 18 X 15 X 8.2 X 6.4 X 8.0 X 6.4 X 5.7 X 4.5 X 2.8 X 1.5 X 5.3 X 2.3 X 0.62 X 0.27 X 0.13 X 7.2 X 6.2 X 5.9 X 4.2 X 4.3