67
Data Visualization or Graphical Data Presentation Jerzy Stefanowski Instytut Informatyki Data mining for SE -- 2013

or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

  • Upload
    leanh

  • View
    218

  • Download
    3

Embed Size (px)

Citation preview

Page 1: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Data Visualization

or Graphical Data Presentation

Jerzy StefanowskiInstytut Informatyki

Data mining for SE -- 2013

Page 2: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Ack.

Inspirations are coming from:•G.Piatetsky Schapiro lectures on KDD•J.Han on Data Mining•Ken Brodlie “Envisioning Information”•Chris North “Information Visualisation”

Page 3: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s
Page 4: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

What is visualization and data mining?

• Visualize: “To form a mental vision, image, or picture of (something not visible or present to the sight, or of an abstraction); to make visible to the mind or imagination.”

• Visualization is the use of computer graphics to create visual images which aid in the understanding of complex, often massive representations of data.

• Visual Data Mining is the process of discovering implicit but useful knowledge from large data sets using visualization techniques.

Page 5: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Tables vs graphs

A table is best when: • You need to look up

specific values• Users need precise

values• You need to precisely

compare related values • You have multiple data

sets with different units ofmeasure

A graph is best when:• The message is

contained in the shape of the values

• You want to revealrelationships among multiple values (similarities anddifferences)

• Show general trends• You have large data sets

• Graphs and tables serve different purposes. Choose the appropriate data display to fit your purpose.

Page 6: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Exploratory Data Analysis

• Pioneer -> John Tukey• New approach to data

analysis, heavily based on visualization, as an alternative to classical data analysis

• See its bio

• Two stage process:– Exploratory: Search for

evidence using all tools available

– Confirmatory: evaluate strength of evidence using classical data analysis

Page 7: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Box Plots

• In some situations we have, not a single data value at a point, but a number of data values, or even a probability distribution

• When might this occur?• Tukey proposed the idea of a

boxplot to visualize the distribution of values

• For explanation and some history, see:

http://mathworld.wolfram.com/Box-and-WhiskerPlot.html

http://en.wikipedia.org/wiki/Box_plot

M – medianQ1, Q3 – quarrtilesWhiskers –1.5 * interquartile rangeDots - outliers

http://www.upscale.utoronto.ca/GeneralInterest/Harrison/Visualisation/Visualisation.html

Darwin’s plant study

Page 8: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Distribution visualisation – US Crime Story

Page 9: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Data Visualization – Common Display Types

Common Display Types– Bar Charts

– Line Charts

– Pie Charts

– Bubble Charts

– Stacked Charts

– Scatterplots

Page 10: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

When to use which type?

Line Graph – x-axis requires quantitative variable– Variables have contiguous values– Familiar/conventional ordering among

ordinals

Bar Graph– Comparison of relative point values

Scatter Plot– Convey overall impression of relationship

between two variables

Pie Chart– Emphasizing differences in proportion

among a few numbers

R2 = 0.87

0%

20%

40%

60%

80%

100%

0.0 0.2 0.4

05

101520

1 2 3 4 5 6 7 8

0

5

10

15

1 2 3 4 5 6 7 8

Page 11: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Line Graph – Trend visualization

• Fundamental technique of data presentation

• Used to compare two variables

– X-axis is often the control variable

– Y-axis is the response variable

• Good at:– Showing specific values– Trends– Trends in groups (using

multiple line graphs)

Students participating in sporting activities

MobilePhone use

Note: graph labelling is fundamental

Page 12: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Time line graph – show dynamics of measurements

Page 13: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Stratified graphs

• Trends of values with respect to time and different qualitative categories

Page 14: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Demo – Baby Names Voyager

http://www.babynamewizard.com/voyager

Page 15: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Scatter Plot – Wykresy rozrzutu XY

• Used to present measurements of two variables

• Effective if a relationship exists between the two variables

Car ownership by household income

Example taken fromNIST Handbook –Evidence of strongpositive correlation

Page 16: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s
Page 17: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Simple Representations – Bar Graph

• Bar graph– Presents categorical variables– Height of bar indicates value– Double bar graph allows

comparison– Note spacing between bars– Can be horizontal (when would

you use this?)

Internet use at a school

Number of police officers

Note more space for labels

Page 18: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Dot Graph

• Very simple but effective…• Horizontal to give more space

for labelling

Page 19: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

19

Bad Visualization: Spreadsheet

Year Sales

1999 2,1102000 2,1052001 2,1202002 2,1212003 2,124

Sales

20952100210521102115212021252130

1999 2000 2001 2002 2003

Sales

What is wrong with this graph?

Page 20: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

20

Bad Visualization: Spreadsheet with misleading Y –axis

Year Sales

1999 2,1102000 2,1052001 2,1202002 2,1212003 2,124

Sales

20952100210521102115212021252130

1999 2000 2001 2002 2003

Sales

Y-Axis scale gives WRONGimpression of big change

Page 21: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

21

Better Visualization

Year Sales

1999 2,1102000 2,1052001 2,1202002 2,1212003 2,124

Sales

0500

10001500

20002500

3000

1999 2000 2001 2002 2003

Sales

Axis from 0 to 2000 scale gives correct impression of small change + small formatting tricks

Page 22: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Integrating various graphs

Page 23: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Pie Chart

• Pie chart summarises a set of categorical/nominal data

• But use with care…

• … too many segments are harder to compare than in a bar chart

Should we have a long lecture?

Favourite movie genres

Page 24: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Visualizing in 4+ Dimensions

• Extensions of Scatterplots• Parallel Coordinates• Radar Figures• Other tools • …

Page 25: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Multiple Views

Give each variable its own display

A B C D E1 4 1 8 3 52 6 3 4 2 13 5 7 2 4 34 2 6 3 1 5

A B C D E

1

2

3

4

Problem: does not show correlations

Page 26: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Tableau bar comparisons

Page 27: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Buisness Analytics Tools – Manager Dashboards

Page 28: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Scatterplot Matrix

Represent each possiblepair of variables in theirown 2-D scatterplot(car data)

Q: Useful for what?A: linear correlations

(e.g. horsepower & weight)

Q: Misses what?A: multivariate effects

Page 29: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Parallel Coordinates

• Encode variables along a horizontal row• Vertical line specifies values

Dataset in a Cartesian coordinates

Same dataset in parallel coordinates

Invented by Alfred Inselbergwhile at IBM, 1985

Page 30: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Parallel Coordinates: 4 D

Sepal Length

5.1

Sepal Width

Petal length

Petal Width

3.5

sepal length

sepal width

petal length

petal width

5.1 3.5 1.4 0.2

1.4 0.2

Page 31: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Parallel Coordinates Plots for Iris Data

Page 32: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Radar Figures

• Agregate multidimensionalobservations

• Each observation gets a separate colour or graphsymbols

• Variables corresponds to angles

Page 33: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Wykres radarowy –oceny wskaźników w ramach dziedzinyI poziom oceny

Wybrana dziedzina

Page 34: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

F. Nightingale (1856) – abstract representation

Page 35: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Buisness Analytics Tools – Typical Reports

Raport more traditional Other forms

Page 36: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Buisness Analytics Tools – Manager Dashboards

Page 37: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Bars in business dashboards – Tableau Software

Page 38: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Data analytics – kokpity menadżerskie

• SAS Enterprise BI

Page 39: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Multidimensional Stacking

Page 40: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Multidimensional presentation of nominal attributes

• VL1 diagrams (Michalski 70) for machine learning

STAGGER and concept drif

Page 41: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Hierarchiczne wizualizacje - Treemaps

• Treemaps display hierarchical data using rectangles. Each branch of the tree is assigned a rectangle. Then each sub-branch gets assigned to a rectangle and this continues recursively until a leaf node is found.

• Depending on choice the rectangle representing the leaf node is colored, sized or both according to chosen attributes.

Page 42: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s
Page 43: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s
Page 44: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Gapminder – Motion Charts

http://www.gapminder.org/ Using Bubble presentations

Page 45: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Spotfire

Page 46: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Chernoff Faces

Encode different variables’ values in characteristicsof human face

http://www.cs.uchicago.edu/~wiseman/chernoff/http://hesketh.com/schampeo/projects/Faces/chernoff.htmlCute applets:

Page 47: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s
Page 48: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

48

Hierarchical Techniques

Cone Trees [RMC91]• animated 3D

visualizations of hierarchical data

• file system structure visualized as a cone tree

Page 49: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Abstract Hierarchical Information – Preview

TreemapTraditional

ConeTree SunTree Botanical

Hyperbolic Tree

Page 50: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Visualization of Search Results & Inter-Document Similarities

Page 51: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Abstract Text – MetaSearch Previews

Grokker Kartoo

AltaVistaLycos

MSN

MetaCrystal searchCrystal

Page 52: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Other buisness tools

Page 53: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Visualization of different conditions

Page 54: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Overview and Detail

Page 55: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Brushing and Linking

Page 56: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Census Data

Page 57: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

57

Visualization of Association Rules in SGI/MineSet 3.0

Page 58: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

IBM Miner – visualization of mining results

Page 59: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

SGI – other tools

Page 60: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

60

Graph-based Techniques

Narcissus• Visualization of a large

number of web pages• visualization of complex

highly interconnected data

Page 61: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Visualization of knowledge discovery process

• A graphical tool for arranging components / steps of KDD• Just a graph flow of actions• Graphical objects – plug and place• Parametrization• Often → you may produce a kind of scipt representing a

graphical flow of KD process

Page 62: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Statsoft – Data mining graphical panel

Page 63: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

RapidMiner (YALE)

Page 64: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Tukey’s recommendations

Page 65: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Tufte’s Principles of Graphical Excellence

• Give the viewer – the greatest number of ideas – in the shortest time – with the least ink in the smallest space.

• Tell the truth about the data!

(E.R. (E.R. TufteTufte, , ““The Visual Display of Quantitative InformationThe Visual Display of Quantitative Information””, 2nd edition), 2nd edition)

Page 66: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Look for other referencesAnd play with different software tools

Excel is not the only and best software

Page 67: or Graphical Data Presentation - Poznań University of ... · or Graphical Data Presentation ... Visualization of Association Rules in SGI/MineSet 3.0. ... RapidMiner (YALE) Tukey’s

Thank you for you coming to my lecture and asking questions!