Upload
stephen-martin
View
220
Download
2
Tags:
Embed Size (px)
Citation preview
FODAVA-Lead Education, Community Building, and
Research:Dimension Reduction and Data Reduction:Foundations for Interactive Visualization
Haesun ParkSchool of Computational Science and
EngineeringGeorgia Institute of Technology
FODAVA Review Meeting, Dec. 9, 2010
Challenges in Analyzing High Dimensional Massive Data
on Visual Analytics System
• Screen Space and Visual Perception: low dim and
number of available pixels fundamentally limiting constraints
• High dimensional data: Effective dimension reduction
• Large data sets: Informative representation of data
• Speed: necessary for real-time, interactive use
• Scalable algorithms
• Adaptive algorithms
Development of Fundamental Theory and Algorithms in Data Representations and Transformations to enable Visual Understanding
• Dimension Reduction • Dimension reduction with prior info/interpretability constraints• Manifold learning
• Informative Presentation of Large Scale Data• Sparse recovery by L1 penalty
• Clustering, semi-supervised clustering• Multi-resolution data approximation
• Fast Algorithms • Large-scale optimization/matrix decompositions• Adaptive updating algorithms for dynamic and time-varying data,
and interactive vis.
• Data Fusion • Fusion of different types of data from various sources• Fusion of different uncertainty level
• Integration with DAVA systems • Testbed, Jigsaw, iVisClassifier, iVisClustering, ..
FODAVA-Lead Research Topics
FODAVA-Lead Research Presentation• H. Park – Overview of the FODAVA-lead research, FODAVA Test-bed;
Two stage method for 2D/3D representation of clustered data, InteractiveVisualClassifier, InteractiveVisualClustering, Info space alignments for information fusion (multi-language document analysis)
• A. Gray – Nonlinear dimension reduction (manifold learning), Fast computation of neighborhood graphs, Fast optimizations for SVMs
• V. Koltchinskii – Low rank matrix estimation and kernel learning on graphs,Sparse recovery, Multiple kernel learning and fusion of datawith heterogeneous types (multi language document analysis)
• J. Stasko – Improved analytical capabilities in JIGSAW, Interplay between math/comp and interactive visualization
• R. Monteiro – Sparse Principal Component Analysis and Feature selection based on L1 regularized optimization (POSTER)
FODAVA Research Test Bed for High Dimensional Massive Data
• Open source software• Integrates foundational results from FODAVA teams as
well as other widely utilized methods (e.g. PCA)• Easily accessible to a wide community of researchers
• Makes methods/algorithms readily available to VA research community and relevant to applications• Identifies effective methods for specific problems (evaluation)• A base for specialized VA systems (e.g. iVisClassifier, iVisClustering)
FODAVAFundamentalResearch
ApplicationsApplications
Test BedTest Bed
Vector Rep. of
Raw Data
• Text
• Image
• Audio …
Informative Representation and
Transformation
Visual Representation
• Dimension Reduction (2D/3D)
• Temporal Trend
• Uncertainty
• Anomaly/Outlier
• Causal relationship
• Zoom in/out by dynamic updating …
• Clustering
• Summarization
• Regression
• Multi-Resolution Data Reduction
•Multiple Kernel Leaning …
Label
Similarity
Density
Missing value …
Interactive Analysis
0
1
2
34
5
6
7
8
9
Modules in FODAVA Test Bed
iVisClassifier [VAST10]
(J. Choo, H. Lee, J. Kim, HP)
Interactive visual classification system using supervised dimension reduction– Biometric recognition– Text classification– Search space reduction
iVisClustering(H. Lee, J. Kihm, J. Choo, J. Stasko, HP)
Interactive visual clustering system using topic modeling (LDA) for text clustering
Two-stage Linear Discriminant Analysis for2D/3D Representation of Clustered Data and
Computational Zooming in/out [VAST09, J. Choo, S. Bohn, HP]
max (GT Sb G) min (GT Sw
G)
&max trace ((GT SwG)-1 (GT Sb
G))
• Regularization in LDA
Small regularization Large regularization
2D Visualization of Clustered Image and Audio Data
Spoken Letters (Audio)Handwritten Digits (Image)
PCA
Rank-2 LDA
PCA
Rank-2 LDA
iVisClassifier: Computational Zoom-inLDA scatter plot, Cluster level PC, Bases view and Heat Map
Applying LDA recursively on the selected subset of data
Fusion based on Information Space Alignment (J. Choo, S. Bohn, G. Nakamura, A. White, HP)
•Want: Unified vector representations of heterogeneous data sets•Utilize: Reference correspondence information between data pairs, cluster correspondence, etc.• Multi-lingual iVisClassifier
Two conflicitng criteria: maximize alignment and minimize deformation
Data set A (English) Data set B (Spanish) Fused data sets
Existing methods: Constrained Laplacian Eigenmap, Parafac2, Procrustes analysis, …
Graph Embedding Approach
1. Represent each data matrix as a graph2. Add zero-length edges between reference point pairs3. Apply graph embedding algorithm
Data sets Similarity graph
Fused dataMatrix representation of graphs
e.g., Nonmetric multidimensional scaling (preserving rank order of distances)
min ∑(dfA(i,j)-ḋA(i,j))2 + ∑(df
B(i,j)-ḋB(i,j))2 + µ∑(dfAB(r,r)-ḋAB(r,r))2
subject toḋAB(r,r)<ḋA(i,j), ḋAB(r,r)<ḋB(i,j) for 1 ≤ r ≤ R and i ≠ j, ḋ: rank orders
(POSTER)
Evaluation: Cross-domain RetrievalEnglish-Spanish Documents Document(Eng)-Phoneme Data
De
form
atio
n A
lign
me
nt
Parafac2 Nonmetric MDS Metric MDS Laplacian Eig.
Procrustes
K in K-NN in fused spaceK in K-NN in fused space
K in K-NN in fused space K in K-NN in fused space
Summary / Future Research• Informative 2D/3D Representation of Data
• Clustered Data: Two-stage dimension reduction methods effective for a wide range of problems• Interpretable Dimension Reduction for nonnegative data: NMF• Customized Fast Algorithms for 2D/3D Reduction needed• Dynamic Updating methods for Efficient and Interactive Visualization
• Visual Analytic Methods for Foundational Problems• Classification• Information Fusion by Space Alignment • Clustering
• Information Fusion via Space Alignment
• FODAVA Research Test bed and VA System Development
• Sparse methods with L1 regularization• Sparse Solution for Regression• Sparse PCA (with Renato Monteiro)