31
UC Berkeley, 09/19/00 An Introduction to Multivariate Data Visualization and XmdvTool Matthew O. Ward Computer Science Department Worcester Polytechnic Institute This work was supported under NSF Grant IIS-9732897

UC Berkeley, 09/19/00 An Introduction to Multivariate Data Visualization and XmdvTool Matthew O. Ward Computer Science Department Worcester Polytechnic

  • View
    224

  • Download
    0

Embed Size (px)

Citation preview

UC Berkeley, 09/19/00

An Introduction to Multivariate Data Visualization and XmdvTool

Matthew O. WardComputer Science DepartmentWorcester Polytechnic Institute

This work was supported under NSF Grant IIS-9732897

UC Berkeley, 09/19/00

What is Multivariate Data?

Each data point has N variables or observations

Each observation can be: nominal or ordinal discrete or continuous scalar, vector, or tensor

May or may not have spatial, temporal, or other connectivity attribute

UC Berkeley, 09/19/00

Characteristics of a Variable

Order: grades have an order, brand names do not. Distance metric: for income, distance equals

difference. For rankings, difference is not a distance metric.

Absolute zero: temperature has an absolute zero, bank account balances do not.

A variable can be classified by these three attributes, called Scale.

Effective visualizations attempt to match the scale of the data dimension with the graphical attribute conveying it.

UC Berkeley, 09/19/00

Sources of Multivariate Data

Sensors (e.g., images, gauges)SimulationsCensus or other surveysCommerce (e.g., stock market)Communication systemsSpreadsheets and databases

UC Berkeley, 09/19/00

Issues in Visualizing Multivariate Data

How many variables? How many records? Types of variables? User task (exploration, confirmation,

presentation) Data feature of interest (clusters, anomalies,

trends, patterns, ….) Background of user (domain expert, visualization

specialist, decision-maker, ….)

UC Berkeley, 09/19/00

Methods for Visualizing Multivariate Data

Dimensional SubsettingDimensional ReorganizationDimensional EmbeddingDimensional Reduction

UC Berkeley, 09/19/00

Dimensional Subsetting

Scatterplot matrix displays all pairwise plots

Selection allows linkage between views

Clusters, trends, and correlations readily discerned between pairs of dimensions

UC Berkeley, 09/19/00

Dimensional Reorganization

Parallel Coordinates creates parallel, rather than orthogonal, dimensions.

Data point corresponds to polyline across axes

Clusters, trends, and anomalies discernable as groupings or outliers, based on intercepts and slopes

UC Berkeley, 09/19/00

Dimensional Reorganization (2)

Glyphs map data dimensions to graphical attributes

Size, color, shape, and orientation are commonly used

Similarities/differences in features give insights into relations

UC Berkeley, 09/19/00

Dimensional Embedding

Dimensional stacking divides data space into bins

Each N-D bin has a unique 2-D screen bin

Screen space recursively divided based on bin count for each dimension

Clusters and trends manifested as repeated patterns

UC Berkeley, 09/19/00

Dimensional Reduction

Map N-D locations to M-D display space while best preserving N-D relations

Approaches include MDS, PCA, and Kohonen Self Organizing Maps

Relationships conveyed by position, links, color, shape, size, etc.

UC Berkeley, 09/19/00

The Role of Selection

User needs to interact with display, examine interesting patterns or anomalies, validate hypotheses

Selection allows isolation of subset of data for highlighting, deleting, focussed analysis

Direct (clicking on displayed items ) vs. indirect (range sliders)

Screen space (2-D) vs. data space (N-D)

UC Berkeley, 09/19/00

Demonstration of XmdvTool

UC Berkeley, 09/19/00

Problems with Large Data Sets

Most techniques are effective with small to moderate sized data sets

Large sets (> 50K records) are increasingly common

When traditional visualizations used, occlusion and clutter make interpretation difficult

UC Berkeley, 09/19/00

One Potential Solution

Multiresolution displays with aggregation Explicit clustering

Break dimensions into bins Aggregate in a particular order (datacubes)

Implicit clustering Hierarchical clustering (proximity-based merging) Hierarchical partitioning (proximity-based splits)

Problem: many ways to cluster, each revealing different aspects of data

UC Berkeley, 09/19/00

Display Options

For each cluster, show Center Extents for each dimension Population Other descriptors (e.g., quartiles)

Color clusters such that siblings have similar color to parents

UC Berkeley, 09/19/00

Hierarchical Parallel Coordinates

Bands show cluster extents in each dimension

Opacity conveys cluster population

Color similarity indicates proximity in hierarchy

UC Berkeley, 09/19/00

Hierarchical Scatterplots

Clusters displayed as rectangles, showing extents in 2 dimensions

Color/opacity consistently used for relational and population info

UC Berkeley, 09/19/00

Hierarchical Glyphs

Star glyph with bands

Analogous to parallel coordinates, with radial rather than parallel dimensions

Glyph position critical for conveying relational info

UC Berkeley, 09/19/00

Hierarchical Dimensional Stacking

Clusters occupy multiple bins

Overlaps can be reduced by increasing number of bins

Cell colors can be blended, or display last cluster mapped to space

UC Berkeley, 09/19/00

Hierarchical Star Fields

Dimensional reduction techniques commonly displayed with starfields

Each cluster becomes circle/sphere in field

Alternatively, can show glyph at cluster location

UC Berkeley, 09/19/00

Navigating Hierarchies

Drill-down, roll-up operations for more or less detail

Need selection operation to identify subtrees for exploration, pruning

Need indications of where you are in hierarchy, and where you’ve been during exploration process

UC Berkeley, 09/19/00

Structure-Based Brushing

Enhancement to screen-based and data-based methods

Specify focus, extents, and level of detailIntuitive - wedge of tree and depth of

interestImplemented by labeling/numbering

terminals and propagating ranges to parents

UC Berkeley, 09/19/00

Structure-Based Brush

White contour links terminal nodes

Red wedge is extents selection

Color curve is depth specification

Color bar maps location in tree to unique color

Direct and indirect manipulation of brush

UC Berkeley, 09/19/00

Demonstration of Hierarchical Features in XmdvTool

UC Berkeley, 09/19/00

Auxiliary Tools

Extent scaling to reduce occlusion of bandsDimensional zooming - fill display with

selected subspace (N-D distortion)Dynamic masking to fade out selected or

unselected dataSaving selected subsetsEnabling/disabling dimensionsUnivariate displays (Tukey box plots, tree

maps)

UC Berkeley, 09/19/00

Summary

Many ways to map multivariate data to images, each with strengths and weaknesses

Linking between and within displays with brushing enhances static displays and combines their strengths.

Hierarchies and aggregations allow visualization of large data sets

Intuitive navigation, filtering, and focus critical to exploration process

Each basic multivariate visualization method is readily extensible to displaying cluster information

UC Berkeley, 09/19/00

Problems and Future Work

Many data characteristics not currently supported (e.g., text fields, records with missing entries, data quality)

Navigation tool for hierarchies assumes linear order of nodes Looking at tools for dynamic reorganization Exploring 2-D or higher navigation interfaces

Very large hierarchies are difficult to focus on a narrow subset Developing multiresolution interface Investigating distortion techniques for navigation

UC Berkeley, 09/19/00

Other Future Work

Projection pursuit or view recommender

Linking structure and data brushingUser studiesCustomization based on domainQuery optimization via caching and

prefetching

UC Berkeley, 09/19/00

For More Information

XmdvTool available to the public domainBoth Unix and Windows supportNext release will have Oracle interface,

with query optimization to support exploratory operations

http://davis.wpi.edu/~xmdvPapers in Vis ‘94, ‘95, ‘99, Infovis ‘99,

IEEE TVCG Vol. 6, No. 2, 2000

UC Berkeley, 09/19/00

Thanks to…..

Elke RundensteinerYing-Huey FuaDaniel StroeYang JingSuggestions from Xmdv usersNSF