18
Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn Keogh Lin, J, Keogh, E., Lonardi, S., Lankford, J.P. and Nystrom, D.M. In Proceedings of the 10 th ACM SIGKDD International Converence on Knowledge Discovery and Data Mining, 2004.

Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn

Visually Mining and Monitoring Massive

Time Series

Amy KarlsonV. Shiv Naga Prasad15 February 2004

CMSC 838S

Images courtesy of Jessica Lin and Eamonn Keogh

Lin, J, Keogh, E., Lonardi, S., Lankford, J.P. and Nystrom, D.M.In Proceedings of the 10th ACM SIGKDD International Converence on

Knowledge Discovery and Data Mining, 2004.

Page 2: Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn

What are Time Series? Simply:

Observations of a variable made over time

Typical across a wide variety of domains Medicine Physiology Finance Microbiology Meteorology Surveillance

Page 3: Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn

3

Motivation:Critical Decision

Making Domains

Spacecraft Launch Medicine

Research Directions Mining Archives

Extract rules, patterns, regularities Visualizing Streams

Novel visualization and interaction for: Query by content Motif discovery Anomaly detection

Page 4: Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn

Some Visual Time Series Systems Time Searcher

Direct Manipulation Pattern Query

Theme Rivers Theme strength

over time

Spirals Periodic Data withknown period

dot.com stocks

1999-2002

Havre, Hetzler, Whitney & Nowell InfoVis 2000

Hochheiser and Shniederman

Weber et. al

Page 5: Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn

VizTree Construct a subsequence tree to span the

space of subsequences of a given time series.

Use this to collect statistics about the series.

Size of the structure is independent of the length of the series.

Page 6: Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn

VizTree Approach - Overview Place windows along the time series to

obtain subsequences. Quantize along time and value dimension

to obtain sequences of discrete symbols. Construct a subsequence tree to represent

all possible such sequences. Collect frequencies of traversal of the

branches of the subsequence tree. Use these for motif and anomaly

detection, and for comparing time series.

Page 7: Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn

Subsequences

Place windows along the

time series to obtain subsequences.

Page 8: Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn

Discretization

Subsequences are patterns. Take windows along time series

– length of window ~ length of subsequence.

Discretize the range of data - one symbol for each quantum.

Divide window into segments ~ represent one segment with one symbol.

Page 9: Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn

Symbolic Aggregate approXimation

(SAX)

One subsequence

Quantization levels

Segments

Representative

symbols

Discrete version = acdcbdba

Page 10: Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn

Subsequence Tree - example

a b

a

c

bb

a

c

c b

a

c

symbols={a,b,c}

#segments per window=2

Tree spans the space of subsequences.

#Branch factor ~ # symbols (size of alphabet)

Depth ~ # segments per window

Branch thickness ~ freq. of occurrence of subsequence.

Page 11: Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn

VisTree Tool

Demo

Page 12: Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn

Query by Content:Subsequence

Matching Finding known patterns Chunking

Breaking a time series into individual series Methods

Time (e.g. power usage) Shape (e.g. heart beats)

Search Approaches Exact - Slow Approximate - Fast

Exploration Hypothesis Testing

VizTree

---------

---------------------

VizTree

Page 13: Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn

Motif Discovery Finding unknown patterns Not exact matches VisTree allows exploration at varying

levels of precision E.g., cc** vs. ccac

Winding Dataset (The angular speed of reel 2)

0 500 1000 1500 2000 2500

0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140

A B C

A B C

Page 14: Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn

Anomaly Detection Finding abnormal patterns. Use data already seen to identify

anomalies Identified by thin

branches

Page 15: Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn

Comparing Series:Diff Tree

Same parameters same tree structure Compare the test branch frequencies with

respect to reference branch frequencies Blue = underrepresented Green = overrepresented Red = equivalent Thickness = magnitude

Page 16: Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn

Thoughts on VizTree (Vis.) Most of “discovery” is implicit

Manual search Parameter setting might be an issue Automation might help

Tree Visualization Use of real estate? Effective? Intuitive? Alternatives?

Page 17: Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn

Thoughts on VizTree (HCI) Primarily a tool to for researchers now

(Also, we might have an outdated version)

Even so, some HCI suggestions: Indication of how tree detail relates to tree

overview Zoom into a specific area of the time series

(rather than zoom+scroll) Selection in subsequence detail relates to

subsequence overview Unfortunately, least interesting patterns are

most easily accessed (branches at root) “snap to branch” or “snap to intersection” ?

Ability to turn off highlighting (undo)

Page 18: Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn

Summary:Unique

Contributions

Fundamental support for aperiodic series Scalable

Resource requirements do not grow linearly with length series

Rich visual feature set Global summaries Diff-trees between multiple series Local patterns and anomalies