Upload
ernest-ryan
View
228
Download
0
Tags:
Embed Size (px)
Citation preview
FlowString: Partial Streamline Matching using Shape Invariant
Similarity Measure for Exploratory Flow Visualization
Jun Tao, Chaoli Wang, Ching-Kuang Shene
Michigan Technological University
Presented at IEEE Pacific Visualization Symposium
March 5, 014Yokohama, Japan
FlowString interface
Query result
Streamline set
Query string
Alphabet and vocabulary
Parameters
Textual
Visual
Streamline query
Streamline similarity measures
• Proximity-based measures– Leverage spatial proximity between integral curves
• Feature-based measures– Extract geometrical, topological or domain specific
features for similarity analysis
• Distribution-based measures– Capture feature distributions for more robust
similarity comparison
• Transformation-based measures– Map data properties or features into a transformed
space for similarity measuring
Our solution• Shape-based measure
– Extract features that are invariant under translation, rotation and scaling
– Support flexible partial streamline matching
• Approach– Advocate a vocabulary approach – Construct character-level alphabet and word-
level vocabulary– Design intuitive and convenient user
interface and interaction
Terms (1/2)• Character (low-level shape descriptor)
– Unique local shape primitive extracted from streamlines
• Alphabet– A set of characters describing various local shapes
• Word (high-level shape descriptor)– A sequence of characters encoding a streamline
shape pattern
• Vocabulary– A set of words describing various regional shapes
Terms (2/2)• String
– Mapping of a global streamline to a sequence of characters
• Substring– Encoding a portion of the corresponding
streamline
Notations• Character
– a (same order)– a’ (reversed order)– A (both orders)
• Multiple characters with common features (|)– (a1 | a2 | … am)
• Word concatenation (|and &)– [abc]|[bbc] (segments that match either abc or bbc)– [abc]&[bbc] (segments that match both abc and bbc with
some distance apart)
• Other symbols– a+ (single character repetition)– ? and * (wildcard symbols)
Outline of FlowString approach
• Alphabet generation– Streamline resampling– Dissimilarity measure– Affinity propagation clustering
• String operation– Streamline suffix tree– Vocabulary construction– Exact vs. approximate search
Streamline resampling (1/2)
• Goal: the number of sample points is similar to the local features with the same shape but different scales
• Criteria:– A streamline segment between two sample points
should be simple enough (no feature is ignored)– The density of sample points should be related to
the local feature size
• Solution: maintain a constant accumulative curvature between two neighboring sample points along the streamline
Character concatenation
• (a): characters assigned to all sample points, which produces a deterministic shape
• (b) and (c): characters assigned to every r-1 sample points, which produces different shapes
Dissimilarity measure
• Dissimilarity between the local shapes of two sample points (Pa and Pb)– Use Procrustes distance which minimizes a
measure of shape difference– Ignore geometric positions and orientations– Require a registration (Procrustes
superimposition) before distance calculation
Affinity propagation clustering
• Apply affinity propagation for clustering– Simultaneously consider all data points
as potential exemplars– Automatically determine the best
number of clusters
• Perform two-level clustering to generate characters
Streamline suffix tree
• Convert each streamline to a string using the alphabet
• Construct a suffix tree to enable efficient operations on these strings– Linear time and space cost to construct the
tree– Transform the problem of searching for a
string to searching for a node in the tree– O(m+z) searching time, where m is the length
of the string and z is its number of appearance
Vocabulary construction
• Automatically identify meaningful words to construct the vocabulary– Select the most common patterns from the
streamlines (i.e., detect the most frequently appeared substrings)
– Achieve through a simple depth-first search traversal of the streamline suffix tree
– O(n) time, where n is the total length of the original strings (i.e., the number of nodes is linear to n)
Exact vs. approximate search (1/2)
• The need for approximate search– Similarities among the shapes represented by
different characters are different– Different numbers of repetition of a certain
shape often seem to be similar
• K-approximate search using dynamic programming where k is a threshold used in the edit distance
• Extend to handle single character repetition (+) and multiple characters with common features (|)
Exact matching EE
Exact matching FF
Exact matching
(E|F)(E|F)
Approx. matching
(k =15) (E|F)(E|F)
E: spiral with large torsionF: spiral with small torsion
Exact vs. approximate search (2/2)
• FlowString– Robust partial streamline matching using shape
invariant features– Characters / alphabets and words / vocabulary
metaphors– Intuitive user interface and interaction support
• Future work– Conduct domain expert evaluation– Extend FlowString to handle multiple data sets– Release FlowString to benefit the community
• Acknowledgements– U.S. National Science Foundation
Summary