16
Exact Indexing of Dynamic Time Warpin Exact Indexing of Dynamic Time Warpin Dr Eamonn Keogh Dr Eamonn Keogh University of California – Riverside University of California – Riverside Computer Science & Engineering Department Computer Science & Engineering Department

Exact Indexing of Dynamic Time Warping Dr Eamonn Keogh University of California – Riverside Computer Science & Engineering Department

Embed Size (px)

Citation preview

Page 1: Exact Indexing of Dynamic Time Warping Dr Eamonn Keogh University of California – Riverside Computer Science & Engineering Department

Exact Indexing of Dynamic Time WarpingExact Indexing of Dynamic Time Warping

Dr Eamonn KeoghDr Eamonn Keogh

University of California – RiversideUniversity of California – RiversideComputer Science & Engineering DepartmentComputer Science & Engineering Department

Page 2: Exact Indexing of Dynamic Time Warping Dr Eamonn Keogh University of California – Riverside Computer Science & Engineering Department

What are Time Series?What are Time Series?

• A time series is a collection of observations made sequentially in time.A time series is a collection of observations made sequentially in time.

• Lots of useful information can be obtained by measuring time series dataLots of useful information can be obtained by measuring time series dataover times.over times.

• Time series occur in virtually every medical, scientific and businessesTime series occur in virtually every medical, scientific and businessesdomaindomain

• Finding out the similarity between two time series is the heart of many Finding out the similarity between two time series is the heart of many time series data mining applicationstime series data mining applications

pattern that are commonly being classifiedpattern that are commonly being classified

Page 3: Exact Indexing of Dynamic Time Warping Dr Eamonn Keogh University of California – Riverside Computer Science & Engineering Department

What are the challenges of working with Time Series What are the challenges of working with Time Series data?data?

• subjective notion of similaritysubjective notion of similarity How do we define similarityHow do we define similarity

• large amount of datalarge amount of data• different type of data formatdifferent type of data format How do we search quicklyHow do we search quickly

Any solutions available?Any solutions available?

•We need a method that allows an elastic shifting of the time axis, toWe need a method that allows an elastic shifting of the time axis, toaccommodate sequences which are similar, but out of time phaseaccommodate sequences which are similar, but out of time phase

•Euclidean DistanceEuclidean Distance•most popular approach for defining similarity and indexing of time series most popular approach for defining similarity and indexing of time series data.data.•a very brittle distance approach which cannot index time series accurately a very brittle distance approach which cannot index time series accurately among two different time phases.among two different time phases.

•Dynamic Time WarpingDynamic Time Warping•base on dynamic programming which proved to be a very reliable method.base on dynamic programming which proved to be a very reliable method.•does not obey the triangular inequality. This has resisted attempts at exact does not obey the triangular inequality. This has resisted attempts at exact indexing. indexing. •“… “… performance on large database may be a limitation.”performance on large database may be a limitation.”

Page 4: Exact Indexing of Dynamic Time Warping Dr Eamonn Keogh University of California – Riverside Computer Science & Engineering Department

mean error ratemean error rate speedspeed

Euclidean Distance MetricEuclidean Distance Metric 0.27340.2734 XX

Dynamic Time Warping (DTW)Dynamic Time Warping (DTW) 0.02690.0269 230X230X

•The result proved the reliability of DTW and motivates the necessity of introducing The result proved the reliability of DTW and motivates the necessity of introducing technique to index DTWtechnique to index DTW

Comparison of two approachesComparison of two approaches

What are the challenges of working with Time Series What are the challenges of working with Time Series data? data? cont.cont.

Times Series ATimes Series A

Times Series BTimes Series B

Classification experiment on Cylinder-Bell-Funnel datasetClassification experiment on Cylinder-Bell-Funnel dataset• Training data consists of 10 exemplars from each class.• (One) Nearest Neighbor Algorithm.• “Leaving-one-out” evaluation, averaged over 100 runs.

Page 5: Exact Indexing of Dynamic Time Warping Dr Eamonn Keogh University of California – Riverside Computer Science & Engineering Department

shifting of time axisshifting of time axis

two series in different time phasetwo series in different time phase

What is Dynamic Time Warping ?What is Dynamic Time Warping ?

• DTW is being used in different area like chemical engineering, pattern DTW is being used in different area like chemical engineering, pattern matching, bioinformatics, . . .matching, bioinformatics, . . .

What is Time Warping?What is Time Warping?

Given: two sequences Given: two sequences xx11,,xx22,...,,...,xxnn and and yy11,,yy22,...,,...,yymm

Wanted: align two sequence base on a common Wanted: align two sequence base on a common time-axistime-axis

two time series Q and C, two time series Q and C, length n and m respectivelylength n and m respectively an (n*m) matrix is constructed an (n*m) matrix is constructed

to store the distance between to store the distance between items in Q and C.items in Q and C.

the result alignmentthe result alignment

optimal warping path

Aligning time series with Dynamic Programming MatrixAligning time series with Dynamic Programming Matrix

Page 6: Exact Indexing of Dynamic Time Warping Dr Eamonn Keogh University of California – Riverside Computer Science & Engineering Department

What is Dynamic Time Warping? What is Dynamic Time Warping? cont.cont.

Demonstration of computing the Minimal Editing Distance: http://isl.ira.uka.de/speechCourse/slides/dtw/editdist/applet/applet.html

•In the matrix, there are many warping paths that satisfyIn the matrix, there are many warping paths that satisfythe three basic constraints.the three basic constraints.

Goal Goal : : How can we find a path that gives How can we find a path that gives the minimal overall distance? the minimal overall distance?

formula of dynamic programming

(i,j) = d(qi,cj) + min{ (i-1,j-1) , (i-1,j ) , (i,j-1) }

• There're three basic constraints for timeThere're three basic constraints for timewarpingwarping

Boundary conditionsBoundary conditions-we want the path not to skip a partwe want the path not to skip a partat the beginning or ending of utteranceat the beginning or ending of utterance

ContinuityContinuity-no jumpsno jumps

MonotonicityMonotonicity - - we can't go back in timewe can't go back in time

Page 7: Exact Indexing of Dynamic Time Warping Dr Eamonn Keogh University of California – Riverside Computer Science & Engineering Department

What is Global Path Constraints ?What is Global Path Constraints ?- path should be close to diagonal- path should be close to diagonal- in theory, it limits warping path by how far it mayin theory, it limits warping path by how far it may stay from the diagonalstay from the diagonal- in practice, it constrains the range of indices in thein practice, it constrains the range of indices in the warping pathwarping path

WhyWhy usingusing globalglobal constraints ?constraints ? - speed up the DTW distance calculation- speed up the DTW distance calculation (reduces the search effort from (reduces the search effort from OO((nn22) to ) to OO((nn))))

- to avoid a relatively small section of one sequence- to avoid a relatively small section of one sequence maps onto a relatively large section of anothermaps onto a relatively large section of another sequence. sequence.

How to speed up the calculation of DTW?How to speed up the calculation of DTW?Basic idea: Approximate the time series with some compressed or down sampledBasic idea: Approximate the time series with some compressed or down sampled representation, and do DTW on the new representation.representation, and do DTW on the new representation.

Q

C

n11

p

w1

wk

i

j

SolutionSolution : : Lower Bounding Measure with Global Path ConstraintLower Bounding Measure with Global Path Constraint

warping windowwarping window

Page 8: Exact Indexing of Dynamic Time Warping Dr Eamonn Keogh University of California – Riverside Computer Science & Engineering Department

What is Lower Bounding Measure ?What is Lower Bounding Measure ?- a dimensionality reduction techniquea dimensionality reduction technique

WhyWhy usingusing Lower Bounding Measure ?Lower Bounding Measure ?- both Euclidean metric and DTW is either heavily I/O bound or very demanding in terms of CPU- both Euclidean metric and DTW is either heavily I/O bound or very demanding in terms of CPU time. time. - a fast lower bounding function can address this problem by erasing sequences that could not- a fast lower bounding function can address this problem by erasing sequences that could not

possibly be a best match.possibly be a best match.

How to define a good Lower Bounding Measure ?How to define a good Lower Bounding Measure ?A good lower bounds function should basically match two criteriaA good lower bounds function should basically match two criteria - must be fast to compute- must be fast to compute - must produces a relatively tight lower bounds which means that it can more tightly- must produces a relatively tight lower bounds which means that it can more tightly

approximates the true DTW distanceapproximates the true DTW distance

Two existing type of Lower bounding measureTwo existing type of Lower bounding measure

How to speed up the calculation of DTW? How to speed up the calculation of DTW? cont.cont.

[LB_Kim][LB_Kim] [LB_Yi][LB_Yi]

squared different between two series' squared different between two series' first (A), last (D), min (B) and max (C)first (A), last (D), min (B) and max (C)

sum of squared length of gray line sum of squared length of gray line represent the minimum the corresponding represent the minimum the corresponding points contribution to the overall DTWpoints contribution to the overall DTW

Page 9: Exact Indexing of Dynamic Time Warping Dr Eamonn Keogh University of California – Riverside Computer Science & Engineering Department

How to speed up the calculation of DTW? How to speed up the calculation of DTW? cont.cont.

Proposed lower bounding measure Proposed lower bounding measure : LB_Keogh with global constraint: LB_Keogh with global constraint

NotationNotationA: bounding envelope - A: bounding envelope - Sako-China Band (Sako-China Band (global constraint)global constraint)B: bounding envelope - B: bounding envelope - Itakura Parallelogram (Itakura Parallelogram (global constraint)global constraint)Q: original sequence U: Upper L: LowerQ: original sequence U: Upper L: Lower

[LB_Keogh][LB_Keogh]

squared sum of the distances from every part of the squared sum of the distances from every part of the candidate sequence C not falling within the bounding candidate sequence C not falling within the bounding envelope, to the nearest orthogonal edge of the envelope, to the nearest orthogonal edge of the bounding envelope is returned as the lower bound.bounding envelope is returned as the lower bound.

1.1. Limiting the range of warping path by using Limiting the range of warping path by using global constraintglobal constraint

2.2. Approximating the tightest lower bound by using Approximating the tightest lower bound by using LB_KeoghLB_Keogh

Itakura Parallelogram and LB_Keogh together Itakura Parallelogram and LB_Keogh together produces the tightest boundsproduces the tightest bounds

LB_Keogh lower bound LB_Keogh lower bound <=<= DTW bound DTW bound

Page 10: Exact Indexing of Dynamic Time Warping Dr Eamonn Keogh University of California – Riverside Computer Science & Engineering Department

How to index Dynamic Time Warping?How to index Dynamic Time Warping?

Piecewise Constant Approximation (PAA)Piecewise Constant Approximation (PAA)Basic idea: Represent the time series as a sequence of box basis functions.Basic idea: Represent the time series as a sequence of box basis functions. Each box is in same lengthEach box is in same length

• Reducing the time series from Reducing the time series from n n dimensions to dimensions to N N dimensions, the data is divided into dimensions, the data is divided into N N equal sized equal sized “frames”.“frames”.

A sequence of length 256 is A sequence of length 256 is reduced to 16 dimensionsreduced to 16 dimensions

Why using PAA ?Why using PAA ?

• time series data may include hundreds to thousands items, time series data may include hundreds to thousands items, this will rapidly degrade the performance of indexing.this will rapidly degrade the performance of indexing.

• 16 dimension time series will be reasonably handled by multi-16 dimension time series will be reasonably handled by multi-dimensional index structure.dimensional index structure.

• a way is needed to further reduce the dimension of lower a way is needed to further reduce the dimension of lower bound by LB_Keoghbound by LB_Keogh

• PAA is the most efficient technique among other approaches PAA is the most efficient technique among other approaches (Wavelets, Fourier Transforms, Adaptive Piecewise Constant Approximation)(Wavelets, Fourier Transforms, Adaptive Piecewise Constant Approximation)

Page 11: Exact Indexing of Dynamic Time Warping Dr Eamonn Keogh University of California – Riverside Computer Science & Engineering Department

• there are two time series data sets (Q and C) in length n, both are being divided into N there are two time series data sets (Q and C) in length n, both are being divided into N dimension. C is a candidate sequence. Q is a query sequence.dimension. C is a candidate sequence. Q is a query sequence.

• approximate the minimum bounding rectangle (R) in each dimension of candidate approximate the minimum bounding rectangle (R) in each dimension of candidate sequence Csequence C

• approximate the max (U^) and min (L^) point in each dimension of query sequence Qapproximate the max (U^) and min (L^) point in each dimension of query sequence Qby using LB_PAAby using LB_PAA

How to index Dynamic Time Warping? How to index Dynamic Time Warping? cont.cont.

Modified PAA to index time warped queries [LB_PAA]Modified PAA to index time warped queries [LB_PAA]

Page 12: Exact Indexing of Dynamic Time Warping Dr Eamonn Keogh University of California – Riverside Computer Science & Engineering Department

•define a define a MINDIST(Q,R)MINDIST(Q,R) function that returns a lower bounding measure of the distance function that returns a lower bounding measure of the distance between a query Q, and R, were R is a Minimum Bounding Rectangle (MBR) of C.between a query Q, and R, were R is a Minimum Bounding Rectangle (MBR) of C.

How to index Dynamic Time Warping? How to index Dynamic Time Warping? cont.cont.

Modified PAA to index time warped queriesModified PAA to index time warped queries

h2

L^2

l4

U^4

U^5

l5

reduced dimensionreduced dimension

original dimensionoriginal dimension

Page 13: Exact Indexing of Dynamic Time Warping Dr Eamonn Keogh University of California – Riverside Computer Science & Engineering Department

How to search time series with DTW ?How to search time series with DTW ?

K-Nearest Neighbor Search AlgorithmK-Nearest Neighbor Search Algorithm

What is K-NN Search - What is K-NN Search - KNNSearch(Q,K)KNNSearch(Q,K)??

• query sequence Q and desired number of K time series neighbors query sequence Q and desired number of K time series neighbors from a set Cfrom a set C

• priority queue is being used for storing the index in an increasing priority queue is being used for storing the index in an increasing order of distanceorder of distance

RangeSearch AlgorithmRangeSearch Algorithm

What is RangeSearch Algorithm What is RangeSearch Algorithm - RangeSearch(Q,E,T)?- RangeSearch(Q,E,T)?

• answering a range queriesanswering a range queries

• a classic R-tree-style recursive search algorithma classic R-tree-style recursive search algorithm

Page 14: Exact Indexing of Dynamic Time Warping Dr Eamonn Keogh University of California – Riverside Computer Science & Engineering Department

Experimental EvaluationExperimental EvaluationEvaluation among three lower bounding measures Evaluation among three lower bounding measures (LB_Kim, LB_Yi, LB_Keogh)(LB_Kim, LB_Yi, LB_Keogh)

•Comparing tightness of lower bound againstComparing tightness of lower bound against query lengthquery length

•Comparing pruning power against database sizeComparing pruning power against database size

Evaluation between linear scan and LB_KeoghEvaluation between linear scan and LB_Keogh

•Comparing normalized CPU cost against data sizeComparing normalized CPU cost against data size

Page 15: Exact Indexing of Dynamic Time Warping Dr Eamonn Keogh University of California – Riverside Computer Science & Engineering Department

ConclusionConclusion

• This paper override the traditional believe of "dynamic time This paper override the traditional believe of "dynamic time warping ...cannot be speeded up by indexingwarping ...cannot be speeded up by indexing

• However, it based on two assumptionHowever, it based on two assumption - both time series data are in the same length- both time series data are in the same length ((out of time phase is allowedout of time phase is allowed)) - index sequence when warping path is constrained- index sequence when warping path is constrained ((Boundary conditions, Continuity, Monotonicity, Global constraintBoundary conditions, Continuity, Monotonicity, Global constraint))

• The proposed approach is state of the art in terms of efficiency The proposed approach is state of the art in terms of efficiency and flexibility. It may benefit the matching of 2 and 3 dimensional and flexibility. It may benefit the matching of 2 and 3 dimensional shapes.shapes.

Page 16: Exact Indexing of Dynamic Time Warping Dr Eamonn Keogh University of California – Riverside Computer Science & Engineering Department

AcknowledgeAcknowledge

Dr Eamonnn KeoghDr Eamonnn Keogh(Computer Science & Engineering Department, University of California (Computer Science & Engineering Department, University of California – Riverside, Riverside,CA 92521– Riverside, Riverside,CA 92521))

Exact Indexing of Dynamic time WarpingExact Indexing of Dynamic time WarpingA Tutorial on Indexing and Mining Time Series DataA Tutorial on Indexing and Mining Time Series Data

http://www.cs.ucr.edu/~eamonn/tutorial_on_time_series.ppthttp://www.cs.ucr.edu/~eamonn/tutorial_on_time_series.ppt

Carnegie Mellon UniversityCarnegie Mellon University

Automatic Speech RecognitionAutomatic Speech Recognitionhttp://werner.ira.uka.de/speechCoursehttp://werner.ira.uka.de/speechCourse