34
“The human understanding, on account of its own nature, readily supposes a greater order and uniformity in things than it finds. And ... it devises parallels and correspondences and relations which are not there.” —Francis Bacon, 1620 Wednesday, 10 November 2010

Graphical inference

Embed Size (px)

Citation preview

Page 1: Graphical inference

“The human understanding, on account of its own nature, readily supposes a greater order and uniformity in things than it finds. And ... it devises parallels and correspondences and relations which are not there.”—Francis Bacon, 1620

Wednesday, 10 November 2010

Page 2: Graphical inference

“The human understanding, on account of its own nature, readily supposes a greater order and uniformity in things than it finds. And ... it devises parallels and correspondences and relations which are not there.”—Francis Bacon, 1620

Is what we see really there?

Wednesday, 10 November 2010

Page 4: Graphical inference

Which one of these plots is not like the others? Which of these plots just doesn’t belong?

Wednesday, 10 November 2010

Page 5: Graphical inference

7 of those plots were plots of random (null) data. 1 plot was the real data.

If you correctly picked the true plot from the null plots then wehave evidence that it really is different.

In fact, we have rigorous statistical evidence that there is a difference, just using Sesame Street skills!

Wednesday, 10 November 2010

Page 6: Graphical inference

1. The statistical justice system

2. Line up protocol

3. Rorschach protocol

4. Future work

Wednesday, 10 November 2010

Page 7: Graphical inference

http://www.flickr.com/photos/joegratz/117048243

Hypothesis testing?

Wednesday, 10 November 2010

Page 8: Graphical inference

http://www.flickr.com/photos/joegratz/117048243

The statistical justice systemHypothesis testing?

Wednesday, 10 November 2010

Page 9: Graphical inference

Ho: null hypothesisHa: alternative hypothesis

DefenceProsecution

Wednesday, 10 November 2010

Page 10: Graphical inference

Ho: null hypothesisHa: alternative hypothesis

DefenceProsecution

Null distribution Innocents

Wednesday, 10 November 2010

Page 11: Graphical inference

Ho: null hypothesisHa: alternative hypothesis

DefenceProsecution

Reject the nullFail to reject the null

GuiltyNot guilty

Null distribution Innocents

Wednesday, 10 November 2010

Page 12: Graphical inference

Ho: null hypothesisHa: alternative hypothesis

DefenceProsecution

Reject the nullFail to reject the null

GuiltyNot guilty

Null distribution Innocents

p-value Probability that a truly innocent dataset would look as guilty as the suspect

Wednesday, 10 November 2010

Page 13: Graphical inference

Line up

Wednesday, 10 November 2010

Page 14: Graphical inference

believe believe

casecase closelyclosely descendantsdescendants few fewlong long modifiedmodified variations

variations veryvery view view

believe believe

casecase closelyclosely descendantsdescendants few fewlong long modifiedmodified variationsvariations veryvery view view

believe believe

casecase closelyclosely descendantsdescendants few fewlong long modifiedmodified variationsvariations veryvery view view

believe believe

casecase closelyclosely descendantsdescendants few fewlong long modifiedmodified variations

variations veryvery view view

believe believe

casecase closelyclosely descendantsdescendants few fewlong long modifiedmodified variations

variations veryvery view view

Five tag clouds of selected words from the 1st (red) and 6th (blue) editions of Darwin’s “Origin of Species”. Four of the tag clouds were generated under the null hypothesis of no difference between editions, and one is the true data. Can you spot it?

Wednesday, 10 November 2010

Page 15: Graphical inference

believe believe

casecase closelyclosely descendantsdescendants few fewlong long modifiedmodified variations

variations veryvery view view

believe believe

casecase closelyclosely descendantsdescendants few fewlong long modifiedmodified variationsvariations veryvery view view

believe believe

casecase closelyclosely descendantsdescendants few fewlong long modifiedmodified variationsvariations veryvery view view

believe believe

casecase closelyclosely descendantsdescendants few fewlong long modifiedmodified variations

variations veryvery view view

believe believe

casecase closelyclosely descendantsdescendants few fewlong long modifiedmodified variations

variations veryvery view view

Five tag clouds of selected words from the 1st (red) and 6th (blue) editions of Darwin’s “Origin of Species”. Four of the tag clouds were generated under the null hypothesis of no difference between editions, and one is the true data. Can you spot it?

Wednesday, 10 November 2010

Page 16: Graphical inference

ProtocolGenerate n-1 decoys (null datasets)

Plot the decoys + the real data (randomly positioned)

Show to an impartial observer. Can they spot the real data?

If so, you have evidence for true difference(p-value = 1/n)

Wednesday, 10 November 2010

Page 17: Graphical inference

E. L. Scott, C. D. Shane, and M. D. Swanson. Comparison of the synthetic and actual distribution of galaxies on a photographic plate. Astrophysical Journal, 119:91–112, Jan. 1954.Wednesday, 10 November 2010

Page 18: Graphical inference

A. M. Noll. Human or machine: A subjective comparison of Piet Mondrian’s “composition with lines” (1917) and a computer-generated picture. The Psychological Record, 16:1–10, 1966.Wednesday, 10 November 2010

Page 19: Graphical inference

vs. classical tests

Of course, if we know what we’re looking for, we can always develop an algorithm or numerical test.

The advantage of visual inference is that works for very general tasks, including when you don’t know exactly what you’re looking for.

Wednesday, 10 November 2010

Page 20: Graphical inference

Power of the test

!

Pow

er

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

sigma = 12

!15 !10 !5 0 5 10 15

sigma = 5

!15 !10 !5 0 5 10 15

sam

ple

siz

e =

100

sam

ple

siz

e =

300

power_curve

Theoretical test

Visual test

lower_CL

upper_CL

M. Majumder (Stat. Dept., ISU) Visual Inference for Regression Parameters Oct. 1 17 / 24

Recent work shows that power only a little worse than classical test

Wednesday, 10 November 2010

Page 21: Graphical inference

Plot Task

Choropleth map

Is there a spatial trend?

TreemapIs the distribution in higher level categories the same?

ScatterplotAre the two variables

independent?

Time seriesIs there a trend in mean or

variability?

Wednesday, 10 November 2010

Page 22: Graphical inference

Wednesday, 10 November 2010

Page 23: Graphical inference

Wednesday, 10 November 2010

Page 24: Graphical inference

Wednesday, 10 November 2010

Page 25: Graphical inference

Once we’ve seen the plot, we’re no longer impartial

Wednesday, 10 November 2010

Page 26: Graphical inference

Code# Support package written in R# http://github.com/ggobi/nullabor# Provides reference implementation of ideas

library(nullabor)library(ggplot2)

qplot(angle * 180 / pi, r, data = threept) %+% lineup(null_model(r ~ poly(angle, 2)), n = 10) + facet_wrap(~ .sample, ncol = 5)

Wednesday, 10 November 2010

Page 27: Graphical inference

Rorschach

Wednesday, 10 November 2010

Page 28: Graphical inference

RorschachWe’re surprisingly bad at appreciating the amount of variation in random data.

Showing only null plots is a good way to calibrate our intuition.

We also plan on using these plots as an empirical tool to understand what features people pick up on. Anecdotally, undergrads focus too much on outliers

Wednesday, 10 November 2010

Page 29: Graphical inference

result

count

020406080100

020406080100

020406080100

1

4

7

0.0 0.2 0.4 0.6 0.8 1.0

2

5

8

0.0 0.2 0.4 0.6 0.8 1.0

3

6

9

0.0 0.2 0.4 0.6 0.8 1.0

Wednesday, 10 November 2010

Page 30: Graphical inference

Future work

Wednesday, 10 November 2010

Page 31: Graphical inference

Future workHow can visual inference be integrated into visualisation software at a fundamental level?

How does training impact results? How do novices vs. experts differ?

What patterns do people pick up on? What are the alternatives that people respond to?

Wednesday, 10 November 2010

Page 32: Graphical inference

Questions?

Wednesday, 10 November 2010

Page 33: Graphical inference

Wednesday, 10 November 2010

Page 34: Graphical inference

This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

Wednesday, 10 November 2010