26
Making Time-series Classification More Accurate Using Learned Constraints © Chotirat “Ann” Ratanamahatana Eamonn Keogh 2004 SIAM International Conference on DATA MINING April 22, 2004

Making Time-series Classification More Accurate Using Learned Constraints © Chotirat “Ann” Ratanamahatana Eamonn Keogh 2004 SIAM International Conference

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Making Time-series Classification More Accurate Using Learned

Constraints

© Chotirat “Ann” Ratanamahatana

Eamonn Keogh

2004 SIAM International Conference on DATA MININGApril 22, 2004

Roadmap

• Time series and their similarity measures

• Euclidean distance and its limitation

• Dynamic time warping (DTW)

• Global constraints

• R-K band

• Experimental Evaluation

• Conclusions and future work

Important Note!You are free to use any slides in this talk for teaching purposes, provide that the authorship of the slides is clearly attributed to Ratanamahatana and Keogh.

You may not use any text or images contained here in a paper (including tech reports or unpublished works) or tutorial, without the express permission of Dr.Keogh.

Chotirat Ann Ratanamahatana and Eamonn Keogh. Making Time-series Classification More Accurate Using Learned Constraints.  In proceedings of SIAM International Conference on Data Mining (SDM '04), Lake Buena Vista, Florida, April 22-24, 2004. pp. 11-22

Classification in Time SeriesClassification, in general, maps data into predefined groups (supervised learning)

Pattern Recognition is a type of supervised classification where an input pattern is classified into one of the classes based on its similarity to these predefined classes.

Class BClass BClass AClass A

Which class does

belong to?

Age Income Student CreditRating Class: buy comp.

28 High No Fair No

25 High No Excellent No

35 High No Fair Yes

45 Medium No Excellent No

18 Low Yes Fair Yes

49 High No Fair ??Will this person buy a

computer?

Will this person buy a computer?

Euclidean Distance MetricGiven 2 time series

Q = q1, …, qn and

C = c1, …, cn

their Euclidean distance is

defined as

n

iii cqCQD

1

2)(),(

0 50 100 150-1.5

-1

-0.5

0

0.5

1

1.5

0 50 100 150-1.5

-1

-0.5

0

0.5

1

1.5

Q

C

Limitations of Euclidean MetricVery sensitive to some distortion in the data

Training data consistsof 10 instances fromeach of the 3 classes

Training data consistsof 10 instances fromeach of the 3 classes

Perform a 1-nearest neighbor algorithm, with “leaving-one-out”

evaluation, averaged over 100 runs.

Euclidean distance Error rate:29.77%

DTW Error rate:3.33 %

Dynamic Time Warping (DTW)

Euclidean DistanceOne-to-one alignments

Time Warping DistanceNon-linear alignments are allowed

How Is DTW Calculated? (I)

QC

K

k kwCQDTW1

min),(

Warping path w

Q

C

Q

C

How Is DTW Calculated? (II)Each warping path w can be found using dynamic programming to evaluatethe following recurrence:

)}1,(),,1(),1,1(min{),(),( jijijicqdji ji

where γ(i, j) is the cumulative distance of the distance d(i, j) and its minimumcumulative distance among the adjacent cells.

(i-1, j)

(i, j-1)

(i, j)

(i-1, j-1)

Global Constraints (I)

C

Q

C

Q

C

Q

C

Q

Sakoe-Chiba Band Itakura Parallelogram

Prevent any unreasonable

warping

Prevent any unreasonable

warping

Global Constraints (II)

Ri

Sakoe-Chiba Band Itakura Parallelogram

A Global Constraint for a sequence of size m is defined by R, whereRi = d 0 d m, 1 i m.

Ri defines a freedom of warping above and to the right of the diagonal at any given point i in the sequence.

Is Wider the Band, the Better?

DTW dist = 1.6389R = 1

DTW dist = 1.0204R = 25

DTW dist = 1.0204R = 10

Euclidean distance = 2.4836

identical

Wider Isn’t Always Better

0 10 20 30 40 50 60 700

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5x 10

4

Warping Window Size

CP

U T

ime

(mse

c)

auslan gun digit trace wordspotting

auslangundigittracewordspotting

Larger warping window is not always a good thing.

0 10 20 30 40 50 60 7060

65

70

75

80

85

90

95

100

Warping Window Size

Acc

ura

cy (

%) auslan

gun digit trace wordspotting

auslangundigittracewordspotting

Euclidean

0 10 20 30 40 50 60 7060

65

70

75

80

85

90

95

100

Warping Window Size

Acc

ura

cy (

%) auslan

gun digit trace wordspotting

auslangundigittracewordspotting

Euclidean

0 10 20 30 40 50 60 7060

65

70

75

80

85

90

95

100

Warping Window Size

Acc

ura

cy (

%) auslan

gun digit trace wordspotting

auslangundigittracewordspotting

Euclidean

0 10 20 30 40 50 60 7060

65

70

75

80

85

90

95

100

Warping Window Size

Acc

ura

cy (

%) auslan

gun digit trace wordspotting

auslangundigittracewordspotting

Euclidean

Recall this example

Most accuracies peak at smaller window size

Ratanamahatana-Keogh Band (R-K Band)

Solution: we create an arbitrary shape and size of the band that is appropriate for the data we want to classify.

How Many Bands Do We Need?• Of course, we could use ONE same band to classify

all the classes, as almost all of the researchers do.

• But…the width of the band does depend on the characteristic of the data within each class. Having one single band for classification is unlikely to generalize.

• Our proposed solution:We create an arbitrary band (R-K band) for each class and use it accordingly for classification.

How Do We Create an R-K Band?First Attempt: We could look at the data and manually create the shape of the bands.

(then we need to adjust the width of each band as well until we get a good result)

50 100 150 200 250

50

100

150

200

250

50 100 150 200 250

50

100

150

200

250

0 50 100 150 200 250-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

0 50 100 150 200 250-2.5

-2

-1.5

-1

-0.5

0

0.5

1

100 % Accuracy!

Learning an R-K Band Automatically

50 100 150 200 250

50

100

150

200

250

0 50 100 150 200 250-2.5

-2

-1.5

-1

-0.5

0

0.5

1

0 50 100 150 200 250-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

50 100 150 200 250

50

100

150

200

250

Our heuristic search algorithm automatically learns the bands from the data.(sometimes, we can even get an unintuitive shape that give a good result.)

100 % Accuracy as well!

Calculate h(1)

Calculate h(2)

h(2) > h(1) ? Yes No

Calculate h(1)

Calculate h(2)

h(2) > h(1) ? Yes No

R-K Band Learning With Heuristic Search

R-K Band Learning in Action!

Click on figure to animate

1. Gun Problem

2. Trace (transient classification benchmark)

3. Handwritten Word Spotting data

Experiment: Datasets

0 50 100 150 200 250 300-3

-2

-1

0

1

2

3

4

0 50 100 150 200 250 300-3

-2

-1

0

1

2

3

4

0 50 100 150 200 250 300-3

-2

-1

0

1

2

3

4

0 50 100 150 200 250 300-3

-2

-1

0

1

2

3

4

0 50 100 150-1

-0.5

0

0.5

1

1.5

2

2.5

0 50 100 150-1

-0.5

0

0.5

1

1.5

2

2.5

Experimental Design

We measure the accuracy and CPU time of each dataset, using the following methods:

1. Euclidean distance2. Uniformed warping window (size 1 to 100)3. Learning different R-K bands for all classes, and

perform classification based on them.

The leaving-one-out in 1-nearest-neighbor classification is used to Measure the accuracy.

The lower bounding method is also used to prune off unnecessary Calculation of DTW.

Experimental Results (I)

0 50 100 150-1

-0.5

0

0.5

1

1.5

2

2.5

20 40 60 80 100 120 140

20

40

60

80

100

120

140

Gun Draw

0 50 100 150-1

-0.5

0

0.5

1

1.5

2

2.5

20 40 60 80 100 120 140

20

40

60

80

100

120

140

Point

Euclidean Best Unif. Warping 10% Unif. Warping DTW with R-K Band

Error Rate (%) 5.5 1.0 (width = 4) 4.5 (width = 15) 0.5 (max width = 4)

CPU Time (msec) N/A 2,440 5,430 1,440

CPU Time (no LB) 60 11,820 17,290 9,440

Experimental Results (II)

0 50 100 150 200 250 300-3

-2

-1

0

1

2

3

4

0 50 100 150 200 250 300-3

-2

-1

0

1

2

3

4

0 50 100 150 200 250 300-3

-2

-1

0

1

2

3

4

0 50 100 150 200 250 300-3

-2

-1

0

1

2

3

4

50 100 150 200 250

50

100

150

200

250

50 100 150 200 250

50

100

150

200

250

50 100 150 200 250

50

100

150

200

250

50 100 150 200 250

50

100

150

200

250

Euclidean Best Unif. Warping 10% Unif. Warping DTW with R-K Band

Error Rate (%) 11 0 (width = 8) 0 (width = 27) 0 (max width = 7)

CPU Time (msec) N/A 16,020 34,980 7,420

CPU Time (no LB) 210 144,470 185,460 88,630

Conclusions

• Different shapes and widths of the band contributes to the classification accuracy.

• Each class can be better recognized using its own individual R-K Band.

• Heuristic search algorithm is a good approach to R-K Band learning.

• R-K Band combining with the Lower Bounding technique yields higher accuracy and makes a classification task much faster.

Future Work

• Investigate other choices that may make envelope learning more accurate.– Heuristic functions– Search algorithm (refining the search)

• Is there a way to always guarantee an optimal solution?• Examine the best way to deal with multi-variate time

series.• Consider a more generalized form of our framework, i.e.

a single R-K Band is learned for a particular domain.• Explore the utility of R-K Band specifically on real-world

problems: music, bioinformatics, biomedical data, etc.

UCR Time Series Data Mining Archive: http://www.cs.ucr.edu/~eamonn/TSDMA

Contact: [email protected] [email protected]

Homepage: http://www.cs.ucr.edu/~ratana

All datasets are publicly available at: