45
Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario, Canada [email protected] http:// www.csd.uwo.ca /faculty/cling Joint work with Victor Sheng, Qiang Yang, …

Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Embed Size (px)

Citation preview

Page 1: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Active Cost-sensitive Learning

(Intelligent Test Strategies)

Charles X. Ling, PhDDepartment of Computer Science

University of Western Ontario, Ontario, Canada

[email protected]://www.csd.uwo.ca/faculty/clingJoint work with Victor Sheng, Qiang

Yang, …

Page 2: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Outline

Introduction Cost-sensitive decision trees Test strategies

Sequential Test Single Batch Test Sequential Batch Test

Conclusions and future work

Page 3: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Outline

Introduction Cost-sensitive decision trees Test strategies

Sequential Test Single Batch Test Sequential Batch Test

Conclusions and future work

Page 4: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Everything has a cost/benefit!

Materials, products, services Disease, working/living condition, waiting, … Happiness, love, life, …

Money, Sex and Happiness: An Empirical Study, by David G. Blanchflower & Andrew J. Oswald, in Journal The Scandinavian Journal of Economics. 106:3, 2004. Pages: 393-415

Lasting/happy marriage is worth about $100,000 in happiness

Utility-based learning: optimization; unifies many issues & is ultimate goal

Page 5: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Everything has a cost/benefit!

In medical diagnosis… Tests have costs: temperature ($1), X-ray ($30), biopsy

($900) Diseases have costs: flu ($100), diabetes (100k), cancer

(108) Misdiagnosis has (different) costs

Cost of false alarm ($500) << cost of missing a cancer ($500,000)

Doctors: balance the cost of tests and misdiagnosis

Our goal: to minimize the total cost Many other similar applications… Model this process

Cost-sensitive learning Intelligent test strategies

Patient Test 1 Test 2 … Test n Cancer?

(Cost) $1 $30 ... $900 FP/FN= 100/300k

001 39 Low … High 1002 35 Med … ? 0003 42 ? … ? 0… … … … … …

New1 ? Med … ? ?

Page 6: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Review of Previous Work

Cost-sensitive learning: a survey (Turney 2000) Active research, also for imbalanced data problem

CS meta learning (wrapper): thresholding, sampling, weighting, …

CS learning algorithms. CSNB, our CS trees …but all consider misclassification costs only

Some work considers test costs only A few previous works consider both test costs and

misclassification costs (Turney 1995, Zubek and Dietterich 2002, Lizotte et al 2003); all computationally expensive

Page 7: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Review of Previous Work

Active learning: actively seeking for extra info

Pool-based: a pool of unlabeled examples, which ones to label

Membership query: Is this instance positive? Feature value acquisition

During training. But “missing is useful!” During testing: our work

Human learning is active in many ways

Page 8: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Review of Previous Work

Diagnosis: wide applications in medicine, mechanical systems, software, …

Most previous AI-based diagnosis systems…

Manually built (partially) Does not incorporate costs/benefit Cannot actively suggest the processes

Our work: cost-sensitive and active; useful for diagnosis and policy setting

Page 9: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Outline

Introduction Cost-sensitive decision trees Test strategies

Sequential Test Single Batch Test Sequential Batch Test

Conclusions and future work

Page 10: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Cost-sensitive Decision Tree

Patient Test 1 Test 2 … Test n Cancer?

(Cost) $1 $30 ... $900 FP/FN= 100/300k

001 39 Low … High 1

002 35 Med … ? 0

003 42 ? … ? 0

… … … … … … 1

T1

T60

0

T2

T3

10

Low Med

<36 >=36

0

1 2

a cb

Advantages: tree structure, comprehensiblity

Objective: minimizing the total cost of tests and misclassification.

Page 11: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Attribute Splitting Criteria Previous methods: C4.5 reduces

the entropy (randomness), performs badly on cost sensitive tasks

New (ICML’04): we reduce the total expected cost

E

E3E2E1

1 2 3

Choose T such that E – (E1+E2+E3) is maxC

C3C2C1

1 2 3

Choose T such that C – (C1+C2+C3+C_Test) is max

Page 12: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Case Study: Heart Disease

Predict coronary artery disease Class 0: less than 50% artery

narrowing; Class 1: more than 50% artery narrowing

~300 patients, collected from hospitals

13 non-invasive tests on patients

Page 13: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

13 Tests (Heart Disease)Tests Costs Meaning

age $1 age of the patient

sex $1 sex

cp $1 chest pain type

trestbps

$1 resting blood pressure

chol $7.27 cholesterol in mg/dl

fbs $5.20 fasting blood sugar

restecg $15.50 resting electrocardiography results

thalach $102.90 maximum heart rate

thal $102.90 maximum heart rate reached

exang $87.30 exercise induced angina

oldpeak $87.30 ST depression induced by exercise

slope $87.30 slope of the peak exercise ST segment

ca $100.90 number of major vessels colored by fluoroscopy

Page 14: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Cost-sensitive tree for Heart Disease

1

2

2

3211

11 1

2

2 2

1

2 3

41

2

3

1 2

thal($102.9)

fbs($5.2)

restecg

($15.5)

sex($1)

chol($7.27)

0

cp ($1)

0

slope($87.3)

restecg

($15.5)

age($1)

thal($102.9)

1 0 11

1 0 01 1

1 10 0

21

• Naturally prefer tests with small cost

• Balance cost and discriminating power

• Local heart-failure specialist thinks this tree is reasonable.

Page 15: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Considering Group Discount

Tests Costs Meaning

age $1 age of the patient

sex $1 sex

cp $1 chest pain type

trestbps

$1 resting blood pressure

chol $7.27 cholesterol in mg/dl

fbs $5.20 fasting blood sugar

restecg $15.50 resting electrocardiography results

thalach $102.90 maximum heart rate

thal $102.90 finishing heart rate

exang $87.30 exercise induced angina

oldpeak $87.30 ST depression induced by exercise

slope $87.30 slope of the peak exercise ST segment

ca $100.90 number of major vessels colored by fluoroscopy

Discount: $2.10

Discount: $101.90

Discount: $86.30

Page 16: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

1

2

2

3211

11 1

2

2 2

1

2 3

41

2

3

1 2

thal($102.9)

fbs($5.2)

restecg

($15.5)

sex($1)

chol($7.27)

0

cp ($1)

0

slope($87.3)

restecg

($15.5)

age($1)

thal($102.9)

1 0 11

1 0 01 1

1 10 0

21

1

2

2

3211

11 1

2

2 2

1

2 3

41

2

3

1 2

thal($102.9)

fbs($5.2)

restecg

($15.5)

sex($1)

chol($7.27)

0

cp ($1)

0

slope($87.3)

thalach($1)

age($1)

thal($102.9)

1 0 11

1 0 01 1

1 10 0

21

individual cost: $102.9

Before After

Different trees without/with group discount

Page 17: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Algorithm of Cost-sensitive Decision Tree

CSDT(Examples, Attributes, TestCosts) If all examples are positive, return root with label=+ If all examples are negative, return root with label=- If maximum cost reduction <0, return root with label

according to min(PTP+ NFP, NTN+ PFN) Let A be an attribute with maximum cost reduction root A Update TestCosts if discount applies For each possible value vi of the attribute A

Add a new branch A=vi below root Segment the training examples Example_vi into the new

branch Call CSDT(examples_vi, Attributes-A, TestCosts) to build

subtree

Page 18: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Outline

Introduction Cost-sensitive decision trees Test strategies

Sequential Test Single Batch Test Sequential Batch Test

Conclusions and future work

Page 19: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Patient Test 1 Test 2 … Test n Cancer?

(Cost) $1 $30 ... $900 FP/FN= 100/300k

001 39 Low … High 1

002 35 Med … ? 0

003 42 ? … ? 0

… … … … … … 1

T1

T60

0

T2

T3

10

Low Med

<36 >=36

0

1 2

a cb

New1 ? ? … ? ?

Three categories of intelligent test strategies1. Sequential Test: one test, wait, … then predict 2. Single Batch Test: one batch of tests, then predict3. Sequential Batch Test: batch 1, batch 2, … then predictMinimize total cost of tests and misclassification, not trivialOur methods: utilizing the minimum-cost tree structure

Page 20: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Outline

Introduction Cost-sensitive decision trees Test strategies

Sequential Test Single Batch Test Sequential Batch Test

Conclusions and future work

Page 21: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Sequential Test

Use tree structure to guide test sequence

“Optimal” because tree is (locally) optimal

Page 22: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Sequential Test

1

2

2

3211

11 1

2

2 2

1

2 3

41

2

3

1 2

thal($102.9)

fbs($5.2)

restecg

($15.5)

sex($1)

chol($7.27)

0

cp ($1)

0

slope($87.3)

thalach($1)

age($1)

thal($102.9)

1 0 11

1 0 01 1

1 10 0

21

Page 23: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Experimental Comparison

Using 10 datasets from UCI

No. of Attributes

No. of Examples

Class dist. (N/P)

Ecoli 6 332 230/102

Breast 9 683 444/239

Heart 8 161 98/163

Thyroid 24 2000 1762/238

Australia 15 653 296/357

Tic-tac-toe

9 958 332/626

Mushroom

21 8124 4208/3916

Kr-vs-kp 36 3196 1527/1669

Voting 16 232 108/124

Cars 6 446 328/118

Page 24: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Comparing Sequential Test Eager learning: Sequential Test (OST) (ICML’04) Lazy learning: Lazy Sequential Test (LazyOST) (TKDE’05) Cost-sensitive Naïve Bayes (CSNB) (ICDM’04)

40

50

60

70

80

90

100

0.2 0.4 0.6 0.8 1

Ratio of Unknown Attributes

To

tal C

ost

CSNB OST LazyOST

Page 25: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Outline

Introduction Cost-sensitive decision trees Test strategies

Sequential Test Single Batch Test Sequential Batch Test

Conclusions and future work

Page 26: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Single Batch Test Only one batch – not an easy task If too few, important tests not

requested; prediction is not accurate; total cost high

If too many, some tests are wasted; total cost high

The test example may not be classified by a leaf

Page 27: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Single Batch Test Expected cost reduction: if a test is

done, what are the possible outcomes and cost reduction

))](())(()([)()( iRmisciRpicimisciE

R(.): all reachable unknown nodes and leaves

i

j3j2j1

1 2 3

Page 28: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Single Batch Test

A*-like search algorithm Form a candidate list (L) and a batch list (B) Choose a test with maximum positive

expected cost reduction from L, add it to B Update L: add all reachable unknowns to L

Efficient with tree structure until expected cost reduction is 0

Page 29: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

L = empty /* list of reachable and unknown attributes */B = empty /* the batch of tests */u = the first unknown attribute when classifying a test caseAdd u into L Loop For each i L, calculate E(i): E(i)= misc(i) – [c(i) + ] E(t) = max E(i) /* t has the maximum cost reduction */ If E(t) > 0 then add t into B, delete t from L, add r(t) into L else exit Loop /* No positive cost reduction */Until L is emptyOutput B as the batch of tests

))(())(( iRmisciRp

Single Batch Test

Page 30: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

1

2

2

3211

11 1

2

2 2

1

2 3

41

2

3

1 2

thal($102.9)

fbs($5.2)

restecg

($15.5)

sex($1)

chol($7.27)

0

cp ($1)

0

slope($87.3)

thalach($1)

age($1)

thal($102.9)

1 0 11

1 0 01 1

1 10 0

21

]

Single Batch Test

Page 31: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

1

2

2

3211

11 1

2

2 2

1

2 3

41

2

3

1 2

thal($102.9)

fbs($5.2)

restecg

($15.5)

sex($1)

chol($7.27)

0

cp ($1)

0

slope($87.3)

thalach($1)

age($1)

thal($102.9)

1 0 11

1 0 01 1

1 10 0

21

]

Single Batch Test

cp is unknown. cp has positive expected cost reduction. cp is added to the batch. cp’s reachable unknown nodes are added into the candidate list.

Page 32: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

1

2

2

3211

11 1

2

2 2

1

2 3

41

2

3

1 2

thal($102.9)

fbs($5.2)

restecg

($15.5)

sex($1)

chol($7.27)

0

cp ($1)

0

slope($87.3)

thalach($1)

age($1)

thal($102.9)

1 0 11

1 0 01 1

1 10 0

21

]

From the candidate list, choose one with maximum positive expected cost reduction. Add it to the batch, and update the candidate list. Repeat. After 7 steps, expected cost reduction is 0.

Single Batch Test

Page 33: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

1

2

2

3211

11 1

2

2 2

1

2 3

41

2

3

1 2

thal($102.9)

fbs($5.2)

restecg

($15.5)

sex($1)

chol($7.27)

0

cp ($1)

0

slope($87.3)

thalach($1)

age($1)

thal($102.9)

1 0 11

1 0 01 1

1 10 0

21

]

Single Batch Test

Do all tests in the batch

Page 34: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

1

2

2

3211

11 1

2

2 2

1

2 3

41

2

3

1 2

thal($102.9)

fbs($5.2)

restecg

($15.5)

sex($1)

chol($7.27)

0

cp ($1)

0

slope($87.3)

thalach($1)

age($1)

thal($102.9)

1 0 11

1 0 01 1

1 10 0

21

]

Predict by internal node

Single Batch Test

Make a prediction. Some tests are wasted.

Page 35: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Comparing Single Batch Tests

Naïve Single Batch (NSB) (ICML’04) Cost-sensitive Naïve Bayes Single Batch (CSNB-SB) (ICDM’04) Greedy Single Batch (GSB) (TKDE’05) Single Batch Test (OSB) (TKDE’05)

350

400

450

500

550

600

650

700

750

0.2 0.4 0.6 0.8 1

Ratio of Unknown Attributes

Tota

l Cos

t

CSNB-SB NSB GSB OSB

Page 36: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Outline

Introduction Cost-sensitive decision trees Test strategies

Sequential Test Single Batch Test Sequential Batch Test

Conclusions and future work

Page 37: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Sequential Batch Batch 1, batch 2, … , prediction Must include the cost of waiting in tests Wait cost of a batch: maximum wait cost in the

batch Less than the sum

Combines Sequential Test and Single Batch Test If all waiting costs =0, it becomes Sequential Test If all waiting costs very large, Single Batch

Page 38: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Sequential Batch

The wait cost is derived from wait time

age sex cp trestbpscho

lfbs

restecg

thalach

exang

oldpek

slope

ca thal

0.001 0.001 0.001 0.01 4 4 0.5 1 1 1 1 1 1

Test wait time in hours

Page 39: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Sequential Batch Extending the Single Batch to include the batch

cost An additional constraint: cumulative ROI

BatchCosttestCost

ionCostReductROI

No more batches!

Page 40: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Loop L = empty /* list of reachable and unknown attributes */ B = empty /* the batch of tests */ u = the first unknown attribute when classifying a test case Add u into L Loop For each i L, calculate E(i): E(i)= misc(i) – [c(i) + ] E(t) = max E(i) /* t has the maximum cost reduction */ If E(t) > 0 & ROI increases then add t into B, delete t from L, add r(t) into L else exit Loop /* No positive cost reduction */ Until L is emptyIf (B is not empty) then Output B as the current batch of tests; obtain their values at a cost Classify the test example further, until encountering another unknown testElse exit the first Loop

))(())(( iRmisciRp

Sequential Batch

Page 41: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Comparing Sequential Batch Test

120

170

220

270

320

370

420

470

0.2 0.4 0.6 0.8 1Unknow n attribute ratio

Tota

l cost

SingBSeqTSBT

Page 42: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Outline

Introduction Cost-sensitive decision trees Test strategies

Sequential Test Single Batch Test Sequential Batch Test

Conclusions and future work

Page 43: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Future Work Deal with different test examples differently Consider more costs: acquiring new examples

If $10 for each new example, how many do I need? For $10, tell me if this patient has cancer

If test is not accurate (e.g. 90%), how to build trees and how to do tests (will I do it again)?

From cost-sensitive trees, derive medical policy for expensive/risky or cheap/effective tests

Page 44: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

Conclusions Cost-sensitive decision tree: effective for

learning with minimal total cost Can be used to model learning from data with costs

Design and compare various test strategies Sequential Test: one test, wait, …: low cost but long wait Single Batch Test: one batch of tests: quick but higher cost Sequential Batch Test: batch, wait, batch, …: best tradeoff

Our methods perform better than previous ones

Can be readily applied to real-world diagnoses

Page 45: Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario,

References C.X. Ling, Q. Yang, J. Wang, and S. Zhang. Decision Trees with Minimal Costs. ICML'2004. X. Chai, L. Deng, Q. Yang, and C.X. Ling. Test-Cost Sensitive Naive Bayes Classification. ICDM'2004. C.X. Ling, S. Sheng, Q. Yang. “Intelligent Test Strategies for Cost-sensitive Decision Trees. IEEE TKDE, to appear, 2005. S. Zhang, Z. Qin, C.X. Ling, S. Sheng. "Missing is Useful": Missing Values in Cost-sensitive Decision Trees. IEEE TKDE, to appear, 2005. Turney, P.D. 2000. Types of cost in inductive concept learning. Workshop on Cost-Sensitive Learning at ICML’2000. Zubek, V.B., and Dietterich, T. 2002. Pruning improves heuristic search for cost-sensitive learning. ICML’2002. Turney, P.D. 1995. Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm. JAIR, 2:369-409. Lizotte, D., Madani, O., and Greiner R. 2003. Budgeted Learning of Naïve-Bayes Classifiers. In Uncertainty in AI.