View
221
Download
1
Embed Size (px)
Citation preview
Overlapping Matrix Pattern Visualization: a Hypergraph Approach
Ruoming Jin
Kent State University
Joint with Yang Xiang, David Fuhry, and Feodor F. Dragan (KSU)
The Problem• Given a set of discovered submatrices, how can
we reorder the rows and columns of the data matrix to best display these submatrices and their relationship?
Motivation: Overlapping Bicluster Visualization
• Gene expression profiles (row: genes, columns: conditions, matrix entry: expression level)
• Biclustering: homogeneous submatrices (genes conditions)
• Biclustering visualization problem [GMM06, KG07]
Motivation: Transactional Data Visualization
• Shopping-basket data (rows: transaction, columns: item, binary matrix)
• Transactional data summarization using a set of dense submatrices [CK07, WK06, XJFD08]
t1t2
t3
t6
t4t5
t7t8
i1 i2 i3 i4 i5 i6 i7 i8 i9t1t2t7
t4
t2t3
t8
t6t7
t5
i1 i2 i8 i9
i4 i5 i6
i2 i3 i7 i8
{t1,t2,t7,t8}X{i1,i2,i8,i9}
{t2,t3,t6,t7}X{i2,i3,i7,i8}
{t4,t5}X{i4,i5,i6}
Summarization Cost=8+8+5=21
Roadmap
• Problem Definition– Visualization cost
• Hardness of the visualization problem– Hypergraph ordering problem– Minimum linear arrangement (MLA)
• Algorithm– Leveraging MLA and local convergence
• Experimental Results
Submatrix Visualization Cost
t1t2
t3
t6
t4t5
t7t8
i1 i2 i3 i4 i5 i6 i7 i8 i9
t1
t2
t3
t6
t4t5
t7
t8
i1 i2 i3 i4 i5 i6i7i8i9
• Given a display of the matrix (a fixed row-order and column-order), how can we measure the goodness of “visualization” of a submatrix?
{t1,t2,t7,t8}X{i1,i2,i8,i9} {t1,t2,t7,t8}X{i1,i2,i8,i9}
Why the second one is intuitively better than the second one?
Submatrix Visualization Cost
t1t2
t3
t6
t4t5
t7t8
i1 i2 i3 i4 i5 i6 i7 i8 i9
t1
t2
t3
t6
t4t5
t7
t8
i1 i2 i3 i4 i5 i6i7i8i9
• Area: 8x8, 6x6, 4x4, 4x4• Perimeter: 8+8, 6+6, 4+4, 4+4• Given a row order and a column order, the visualization cost
of a submatrix is the sum of– difference between its first and last row w.r.t. the row order – difference between its first and last column w.r.t. the column order
{t1,t2,t7,t8}X{i1,i2,i8,i9} {t1,t2,t7,t8}X{i1,i2,i8,i9}
Matrix Visualization Cost• Given a row order and a column order, and a
set of submatrices, the matrix visualization cost is the sum of these submatrices’ visualization cost.
• Matrix Optimal Visualization Problem: – Find the optimal row order and column order such that
the matrix visualization cost is minimal.
Roadmap
• Problem Definition– Visualization cost
• Hardness of the visualization problem– Hypergraph ordering problem– Minimal linear arrangement (MLA)
• Algorithm– Leveraging MLA and Local convergence
• Experimental Results
Hypergraph Ordering• Hypergraph HG=(V,X),
– V is the set of vertices
– X={x1,x2,…,} is the set of hyperedges, where each hyperedge is the set of vertices
• Hyperedge cost and Hypergraph cost
• Hypergraph Ordering Problem
0 1 2 3 4 5 6
Hyperedge {0,2,3,4} cost = 4
Hyperedge {1,3,5} cost = 4Hypergraph cost=16
The Link between Matrix Visualization and Hypergraph Ordering
• Relationship between matrix visualization cost and hypergraph cost
• Finding minimum visualization (or hypergraph) cost is NP-hard
t1t2
t3
t6
t4t5
t7t8
i1 i2 i3 i4 i5 i6 i7 i8 i9i1
i2
i3
i7
i8
i9
t1t2 t3
t6t7t8
i4i5i6
t5
t4
HG 1
HG2
Hypergraph Ordering Problem is the Generalization of MLA
• Graph cost w.r.t. a vertex order
• MLA (Minimal Linear Arrangement): Find an optimal vertex ordering to minimize graph cost
0 1 2 3 4 5 6
0 1 2 345 6
Graph cost=2+2+2*1+1+4+3+2=16
Graph cost=2+4+2*3+4+2+1+1=18
Roadmap
• Problem Definition– Visualization cost
• Hardness of the visualization problem– Hypergraph ordering problem– Minimal linear arrangement
• Algorithm– Leveraging MLA and Local convergence
• Experimental Results
Basic Idea for Hypergraph Ordering
• Many existing work on solving MLA problem (heuristic or bounded-approximation)
• Instead of working from scratch for the hypergraph ordering problem, can we somehow leverage the MLA algorithms?– The answer is YES!
Basic Procedure Given the hypergraph HG=(V,X), and starts with
a random vertex order :• Step 1: Transforming the hypergraph HG into a
graph G=(V,E) based on the vertex order ; – cost(HG, )=cost(G, )
• Step 2: Run MLA algorithm for graph G to produce a new optimal vertex order ’ – cost(G, ) cost(G, ’)
• Step 3: If the new order improve the hypergraph cost, cost(HG, ) > cost(HG, ’), then use ’ as the new order (= ’), and repeat Step 1 and 2. – cost(G, ’) cost(HG, ’)
Cost(HG, )=cost(G, ) cost(G, ’) cost(HG, ’)
(Step1) Transformation: Hyperedge->Path
0 1 2 3 4 5 6
0 1 2 3 4 5 6
0 1 2 3 4 5 6
Hyperedge cost=path cost!
Step 1->Step 2
0 1 2 3 4 5 6
0 12 34 5 6
Step 1 (Hypergraph->Graph): cost(G, )=2+2+2*1+1+4+3+2=16=cost(HG, )
Step 2 (MLA): cost(G, ’)=1+2+2*1+2+1+2+3=13<cost(G, )
0 1 2 3 4 5 6
Step 1->Step 2->Step 3
0 1 2 3 4 5 6 0 12 34 5 6
0 12 34 5 6
0 12 34 5 6
Step 1 (Hypergraph->Graph): cost(G, )=cost(HG, )=16
Step 2 (MinLA): cost(G, ’)=13<cost(G, )
With the new ordering, hyperedge costpath cost!
Step 1->Step 2->Step 3
0 1 2 3 4 5 6 0 12 34 5 6
0 12 34 5 6
Step 1 (Hypergraph->Graph): cost(G, )=cost(HG, )=16
Step 2 (MinLA): cost(G, ’)=13<cost(G, )
Step 3: cost(HG, ’)=10<cost(G, ’)=13
0 1 2 3 4 5 6
Cost(HG, )=cost(G, )>cost(G, ’)>cost(HG, ’)
Run Iteratively and Local Convergence
Other conversions of hyperedge
• Converting hyperedge to cycle
• Converting hyperedge to mulicycles
Roadmap
• Problem Definition– Visualization cost
• Hardness of the visualization problem– Hypergraph ordering
• Algorithm– Minimum linear arrangement (MLA)– Leveraging MLA and local convergence
• Experimental Results
Visualization effects
Visualization effects (continued)
Visualization effects (continued)
Cost and running time
Conclusion
• We found an interesting link from matrix visualization problem to a well-know graph theoretical problem: the minimal linear arrangement (MLA) problem.
• Theoretically, we introduce a generalization of the MLA problem for the hypergraphs, and develop a novel local convergence algorithm
• Our method can be incorporated into an interactive visualization environment to allow users to focus on different parts of the data and patterns.
Thanks!!