View
214
Download
0
Embed Size (px)
Citation preview
Visually Mining and Monitoring Massive
Time Series
Amy KarlsonV. Shiv Naga Prasad15 February 2004
CMSC 838S
Images courtesy of Jessica Lin and Eamonn Keogh
Lin, J, Keogh, E., Lonardi, S., Lankford, J.P. and Nystrom, D.M.In Proceedings of the 10th ACM SIGKDD International Converence on
Knowledge Discovery and Data Mining, 2004.
What are Time Series? Simply:
Observations of a variable made over time
Typical across a wide variety of domains Medicine Physiology Finance Microbiology Meteorology Surveillance
3
Motivation:Critical Decision
Making Domains
Spacecraft Launch Medicine
Research Directions Mining Archives
Extract rules, patterns, regularities Visualizing Streams
Novel visualization and interaction for: Query by content Motif discovery Anomaly detection
Some Visual Time Series Systems Time Searcher
Direct Manipulation Pattern Query
Theme Rivers Theme strength
over time
Spirals Periodic Data withknown period
dot.com stocks
1999-2002
Havre, Hetzler, Whitney & Nowell InfoVis 2000
Hochheiser and Shniederman
Weber et. al
VizTree Construct a subsequence tree to span the
space of subsequences of a given time series.
Use this to collect statistics about the series.
Size of the structure is independent of the length of the series.
VizTree Approach - Overview Place windows along the time series to
obtain subsequences. Quantize along time and value dimension
to obtain sequences of discrete symbols. Construct a subsequence tree to represent
all possible such sequences. Collect frequencies of traversal of the
branches of the subsequence tree. Use these for motif and anomaly
detection, and for comparing time series.
Subsequences
Place windows along the
time series to obtain subsequences.
Discretization
Subsequences are patterns. Take windows along time series
– length of window ~ length of subsequence.
Discretize the range of data - one symbol for each quantum.
Divide window into segments ~ represent one segment with one symbol.
Symbolic Aggregate approXimation
(SAX)
One subsequence
Quantization levels
Segments
Representative
symbols
Discrete version = acdcbdba
Subsequence Tree - example
a b
a
c
bb
a
c
c b
a
c
symbols={a,b,c}
#segments per window=2
Tree spans the space of subsequences.
#Branch factor ~ # symbols (size of alphabet)
Depth ~ # segments per window
Branch thickness ~ freq. of occurrence of subsequence.
VisTree Tool
Demo
Query by Content:Subsequence
Matching Finding known patterns Chunking
Breaking a time series into individual series Methods
Time (e.g. power usage) Shape (e.g. heart beats)
Search Approaches Exact - Slow Approximate - Fast
Exploration Hypothesis Testing
VizTree
---------
---------------------
VizTree
Motif Discovery Finding unknown patterns Not exact matches VisTree allows exploration at varying
levels of precision E.g., cc** vs. ccac
Winding Dataset (The angular speed of reel 2)
0 500 1000 1500 2000 2500
0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140
A B C
A B C
Anomaly Detection Finding abnormal patterns. Use data already seen to identify
anomalies Identified by thin
branches
Comparing Series:Diff Tree
Same parameters same tree structure Compare the test branch frequencies with
respect to reference branch frequencies Blue = underrepresented Green = overrepresented Red = equivalent Thickness = magnitude
Thoughts on VizTree (Vis.) Most of “discovery” is implicit
Manual search Parameter setting might be an issue Automation might help
Tree Visualization Use of real estate? Effective? Intuitive? Alternatives?
Thoughts on VizTree (HCI) Primarily a tool to for researchers now
(Also, we might have an outdated version)
Even so, some HCI suggestions: Indication of how tree detail relates to tree
overview Zoom into a specific area of the time series
(rather than zoom+scroll) Selection in subsequence detail relates to
subsequence overview Unfortunately, least interesting patterns are
most easily accessed (branches at root) “snap to branch” or “snap to intersection” ?
Ability to turn off highlighting (undo)
Summary:Unique
Contributions
Fundamental support for aperiodic series Scalable
Resource requirements do not grow linearly with length series
Rich visual feature set Global summaries Diff-trees between multiple series Local patterns and anomalies