Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004...

Preview:

Citation preview

Visually Mining and Monitoring Massive

Time Series

Amy KarlsonV. Shiv Naga Prasad15 February 2004

CMSC 838S

Images courtesy of Jessica Lin and Eamonn Keogh

Lin, J, Keogh, E., Lonardi, S., Lankford, J.P. and Nystrom, D.M.In Proceedings of the 10th ACM SIGKDD International Converence on

Knowledge Discovery and Data Mining, 2004.

What are Time Series? Simply:

Observations of a variable made over time

Typical across a wide variety of domains Medicine Physiology Finance Microbiology Meteorology Surveillance

3

Motivation:Critical Decision

Making Domains

Spacecraft Launch Medicine

Research Directions Mining Archives

Extract rules, patterns, regularities Visualizing Streams

Novel visualization and interaction for: Query by content Motif discovery Anomaly detection

Some Visual Time Series Systems Time Searcher

Direct Manipulation Pattern Query

Theme Rivers Theme strength

over time

Spirals Periodic Data withknown period

dot.com stocks

1999-2002

Havre, Hetzler, Whitney & Nowell InfoVis 2000

Hochheiser and Shniederman

Weber et. al

VizTree Construct a subsequence tree to span the

space of subsequences of a given time series.

Use this to collect statistics about the series.

Size of the structure is independent of the length of the series.

VizTree Approach - Overview Place windows along the time series to

obtain subsequences. Quantize along time and value dimension

to obtain sequences of discrete symbols. Construct a subsequence tree to represent

all possible such sequences. Collect frequencies of traversal of the

branches of the subsequence tree. Use these for motif and anomaly

detection, and for comparing time series.

Subsequences

Place windows along the

time series to obtain subsequences.

Discretization

Subsequences are patterns. Take windows along time series

– length of window ~ length of subsequence.

Discretize the range of data - one symbol for each quantum.

Divide window into segments ~ represent one segment with one symbol.

Symbolic Aggregate approXimation

(SAX)

One subsequence

Quantization levels

Segments

Representative

symbols

Discrete version = acdcbdba

Subsequence Tree - example

a b

a

c

bb

a

c

c b

a

c

symbols={a,b,c}

#segments per window=2

Tree spans the space of subsequences.

#Branch factor ~ # symbols (size of alphabet)

Depth ~ # segments per window

Branch thickness ~ freq. of occurrence of subsequence.

VisTree Tool

Demo

Query by Content:Subsequence

Matching Finding known patterns Chunking

Breaking a time series into individual series Methods

Time (e.g. power usage) Shape (e.g. heart beats)

Search Approaches Exact - Slow Approximate - Fast

Exploration Hypothesis Testing

VizTree

---------

---------------------

VizTree

Motif Discovery Finding unknown patterns Not exact matches VisTree allows exploration at varying

levels of precision E.g., cc** vs. ccac

Winding Dataset (The angular speed of reel 2)

0 500 1000 1500 2000 2500

0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140

A B C

A B C

Anomaly Detection Finding abnormal patterns. Use data already seen to identify

anomalies Identified by thin

branches

Comparing Series:Diff Tree

Same parameters same tree structure Compare the test branch frequencies with

respect to reference branch frequencies Blue = underrepresented Green = overrepresented Red = equivalent Thickness = magnitude

Thoughts on VizTree (Vis.) Most of “discovery” is implicit

Manual search Parameter setting might be an issue Automation might help

Tree Visualization Use of real estate? Effective? Intuitive? Alternatives?

Thoughts on VizTree (HCI) Primarily a tool to for researchers now

(Also, we might have an outdated version)

Even so, some HCI suggestions: Indication of how tree detail relates to tree

overview Zoom into a specific area of the time series

(rather than zoom+scroll) Selection in subsequence detail relates to

subsequence overview Unfortunately, least interesting patterns are

most easily accessed (branches at root) “snap to branch” or “snap to intersection” ?

Ability to turn off highlighting (undo)

Summary:Unique

Contributions

Fundamental support for aperiodic series Scalable

Resource requirements do not grow linearly with length series

Rich visual feature set Global summaries Diff-trees between multiple series Local patterns and anomalies

Recommended