ENV 20064.1 Envisioning Information Lecture 4 – Multivariate Data Exploration Glyphs and other...

Preview:

Citation preview

ENV 2006 4.1

Envisioning Information

Lecture 4 – Multivariate Data Exploration

Glyphs and other methods

Hierarchical approaches

Ken Brodlie

ENV 2006 4.2

Glyph Techniques

ENV 2006 4.3

Glyph Techniques

• Map data values to geometric and colour attributes of a glyph – or marker symbol

• Very many types of glyph have been suggested:

– Star glyphs– Faces – Arrows– Sticks– Shape coding

ENV 2006 4.4

Glyph Layouts

• How do we place the glyphs on a chart?

• Sometimes there will be a natural location – for example?

• If not… two of the variates can be allocated to spatial position, and the remainder to the attrributes of the glyph

ENV 2006 4.5

Glyph Techniques – Star Plots

• Each observation represented as a ‘star’

• Each spike represents a variable

• Length of spike indicates the value

ENV 2006 4.6

Glyph Techniques – Star Plots

• Each observation represented as a ‘star’

• Each spike represents a variable

• Length of spike indicates the value

Crime inDetroit

ENV 2006 4.7

Star Glyphs – Iris Data Set

ENV 2006 4.8

• Chernoff suggested use of faces to encode a variety of variables - can map to size, shape, colour of facial features - human brain rapidly recognises faces

Chernoff Faces

ENV 2006 4.9

Chernoff Faces

• Here are some of the facial features you can use

http://www.bradandkathy.com/software/faces.html

ENV 2006 4.10

Chernoff Faces

• Demonstration applet at:– http://www.hesketh.com/schampeo/projects/Faces/

ENV 2006 4.11

Chernoff’s Face

• .. And here is Chernoff’s face

http://www.fas.harvard.edu/~stats/People/Faculty/Herman_Chernoff/Herman_Chernoff_Index.html

ENV 2006 4.12

Stick Figures

• Glyph is a matchstick figure, with variables mapped to angle and length of limbs • As with Chernoff faces, two

variables are mapped to display axes

• Stick figures useful for very large data sets

• Texture patterns emerge

• Idea due to RM Pickett & G Grinstein

- different anglesthat may be variedare shown

ENV 2006 4.13

5D imagedata fromGreat Lakesregion

Stick Figures

ENV 2006 4.14

• Suitable where a variable has a Boolean value, ie on/off• A data item is represented as an array of elements, each

element corresponding to a variable

1

2

3

4

5

6

shade in boxif value ofcorrespondingvariable is ‘on’

Arrays laid out in a line, or plane, as with othericon-based methods

Shape Coding

ENV 2006 4.15

Time series of NASAearthobservationdata

Shape Coding

ENV 2006 4.16

Dry

Wet

Showery

Saturday

Sunday

Leeds

Sahara

Amazon

* variables and their values placed around circle

* lines connect the values for one observation

This item is { wet, Saturday, Amazon }http://www.daisy.co.uk

Daisy Charts

ENV 2006 4.17

Daisy Charts - Underground Problems

ENV 2006 4.18

Daisy Charts – News Analysis

• Four variates: day, source, search terms, keywords

ENV 2006 4.19

Reducing Complexity in Multivariate Data Exploration

ENV 2006 4.20

Clustering as a Solution

• Success has been achieved through clustering of observations

• Hierarchical parallel co-ordinates

– Cluster by similarity– Display using translucency

and proximity-based colour

http://davis.wpi.edu/~xmdv/docs/vis99_HPC.pdf

ENV 2006 4.21

Comparison

One of 3 clusters

ENV 2006 4.22

Hierarchical Parallel Co-ordinates

ENV 2006 4.23

Reduction of Dimensionality of Variable Space

• Reduce number of variables, preserve information

• Principal Component Analysis– Transform to new co-ordinate

system– Hard to interpret

• Hierarchical reduction of variable space

– Cluster variables where distance between observations is typically small

– Choose representative for each cluster

• Subgroup has then been identified – showing what?

http://davis.wpi.edu/%7Exmdv/docs/vhdr_vissym.pdf

42 dimensions, 200 observations

Recommended