FlowString: Partial Streamline Matching using Shape Invariant Similarity Measure for Exploratory Flow Visualization Jun Tao, Chaoli Wang, Ching-Kuang Shene

FlowString: Partial Streamline Matching using Shape Invariant

Similarity Measure for Exploratory Flow Visualization

Jun Tao, Chaoli Wang, Ching-Kuang Shene

Michigan Technological University

Presented at IEEE Pacific Visualization Symposium

March 5, 014Yokohama, Japan

FlowString interface

Query result

Streamline set

Query string

Alphabet and vocabulary

Parameters

Textual

Visual

Streamline query

First look of FlowString (crayfish)

Streamline similarity measures

• Proximity-based measures– Leverage spatial proximity between integral curves

• Feature-based measures– Extract geometrical, topological or domain specific

features for similarity analysis

• Distribution-based measures– Capture feature distributions for more robust

similarity comparison

• Transformation-based measures– Map data properties or features into a transformed

space for similarity measuring

Our solution• Shape-based measure

– Extract features that are invariant under translation, rotation and scaling

– Support flexible partial streamline matching

• Approach– Advocate a vocabulary approach – Construct character-level alphabet and word-

level vocabulary– Design intuitive and convenient user

interface and interaction

Terms (1/2)• Character (low-level shape descriptor)

– Unique local shape primitive extracted from streamlines

• Alphabet– A set of characters describing various local shapes

• Word (high-level shape descriptor)– A sequence of characters encoding a streamline

shape pattern

• Vocabulary– A set of words describing various regional shapes

Terms (2/2)• String

– Mapping of a global streamline to a sequence of characters

• Substring– Encoding a portion of the corresponding

streamline

Notations• Character

– a (same order)– a’ (reversed order)– A (both orders)

• Multiple characters with common features (|)– (a1 | a2 | … am)

• Word concatenation (|and &)– [abc]|[bbc] (segments that match either abc or bbc)– [abc]&[bbc] (segments that match both abc and bbc with

some distance apart)

• Other symbols– a+ (single character repetition)– ? and * (wildcard symbols)

Outline of FlowString approach

• Alphabet generation– Streamline resampling– Dissimilarity measure– Affinity propagation clustering

• String operation– Streamline suffix tree– Vocabulary construction– Exact vs. approximate search

Streamline resampling (1/2)

• Goal: the number of sample points is similar to the local features with the same shape but different scales

• Criteria:– A streamline segment between two sample points

should be simple enough (no feature is ignored)– The density of sample points should be related to

the local feature size

• Solution: maintain a constant accumulative curvature between two neighboring sample points along the streamline

Streamline resampling (2/2)

Neighborhood size r = 7

Character concatenation

• (a): characters assigned to all sample points, which produces a deterministic shape

• (b) and (c): characters assigned to every r-1 sample points, which produces different shapes

Dissimilarity measure

• Dissimilarity between the local shapes of two sample points (Pa and Pb)– Use Procrustes distance which minimizes a

measure of shape difference– Ignore geometric positions and orientations– Require a registration (Procrustes

superimposition) before distance calculation

Affinity propagation clustering

• Apply affinity propagation for clustering– Simultaneously consider all data points

as potential exemplars– Automatically determine the best

number of clusters

• Perform two-level clustering to generate characters

Character generation (1/3)

Second-level clustering result


First-level clustering result


Original shape primitives

Streamline suffix tree

• Convert each streamline to a string using the alphabet

• Construct a suffix tree to enable efficient operations on these strings– Linear time and space cost to construct the

tree– Transform the problem of searching for a

string to searching for a node in the tree– O(m+z) searching time, where m is the length

of the string and z is its number of appearance

Vocabulary construction

• Automatically identify meaningful words to construct the vocabulary– Select the most common patterns from the

streamlines (i.e., detect the most frequently appeared substrings)

– Achieve through a simple depth-first search traversal of the streamline suffix tree

– O(n) time, where n is the total length of the original strings (i.e., the number of nodes is linear to n)

Exact vs. approximate search (1/2)

• The need for approximate search– Similarities among the shapes represented by

different characters are different– Different numbers of repetition of a certain

shape often seem to be similar

• K-approximate search using dynamic programming where k is a threshold used in the edit distance

• Extend to handle single character repetition (+) and multiple characters with common features (|)

Exact matching EE

Exact matching FF

Exact matching

(E|F)(E|F)

Approx. matching

(k =15) (E|F)(E|F)

E: spiral with large torsionF: spiral with small torsion

Exact vs. approximate search (2/2)

Parameter settings

Timing performance

Solar plume

Tornado

Two swirls

• FlowString– Robust partial streamline matching using shape

invariant features– Characters / alphabets and words / vocabulary

metaphors– Intuitive user interface and interaction support

• Future work– Conduct domain expert evaluation– Extend FlowString to handle multiple data sets– Release FlowString to benefit the community

• Acknowledgements– U.S. National Science Foundation

Summary

Thank you!

Documents

FlowString: Partial Streamline Matching using Shape Invariant Similarity Measure for Exploratory Flow Visualization Jun Tao, Chaoli Wang, Ching-Kuang Shene