Upload
hadley-wickham
View
1.447
Download
0
Embed Size (px)
Citation preview
“The human understanding, on account of its own nature, readily supposes a greater order and uniformity in things than it finds. And ... it devises parallels and correspondences and relations which are not there.”—Francis Bacon, 1620
Wednesday, 10 November 2010
“The human understanding, on account of its own nature, readily supposes a greater order and uniformity in things than it finds. And ... it devises parallels and correspondences and relations which are not there.”—Francis Bacon, 1620
Is what we see really there?
Wednesday, 10 November 2010
October 2010
Hadley Wickham, Dianne Cook, Heike Hofmann, Andreas Buja
Graphical inference for infovis
Wednesday, 10 November 2010
Which one of these plots is not like the others? Which of these plots just doesn’t belong?
Wednesday, 10 November 2010
7 of those plots were plots of random (null) data. 1 plot was the real data.
If you correctly picked the true plot from the null plots then wehave evidence that it really is different.
In fact, we have rigorous statistical evidence that there is a difference, just using Sesame Street skills!
Wednesday, 10 November 2010
1. The statistical justice system
2. Line up protocol
3. Rorschach protocol
4. Future work
Wednesday, 10 November 2010
http://www.flickr.com/photos/joegratz/117048243
Hypothesis testing?
Wednesday, 10 November 2010
http://www.flickr.com/photos/joegratz/117048243
The statistical justice systemHypothesis testing?
Wednesday, 10 November 2010
Ho: null hypothesisHa: alternative hypothesis
DefenceProsecution
Wednesday, 10 November 2010
Ho: null hypothesisHa: alternative hypothesis
DefenceProsecution
Null distribution Innocents
Wednesday, 10 November 2010
Ho: null hypothesisHa: alternative hypothesis
DefenceProsecution
Reject the nullFail to reject the null
GuiltyNot guilty
Null distribution Innocents
Wednesday, 10 November 2010
Ho: null hypothesisHa: alternative hypothesis
DefenceProsecution
Reject the nullFail to reject the null
GuiltyNot guilty
Null distribution Innocents
p-value Probability that a truly innocent dataset would look as guilty as the suspect
Wednesday, 10 November 2010
Line up
Wednesday, 10 November 2010
believe believe
casecase closelyclosely descendantsdescendants few fewlong long modifiedmodified variations
variations veryvery view view
believe believe
casecase closelyclosely descendantsdescendants few fewlong long modifiedmodified variationsvariations veryvery view view
believe believe
casecase closelyclosely descendantsdescendants few fewlong long modifiedmodified variationsvariations veryvery view view
believe believe
casecase closelyclosely descendantsdescendants few fewlong long modifiedmodified variations
variations veryvery view view
believe believe
casecase closelyclosely descendantsdescendants few fewlong long modifiedmodified variations
variations veryvery view view
Five tag clouds of selected words from the 1st (red) and 6th (blue) editions of Darwin’s “Origin of Species”. Four of the tag clouds were generated under the null hypothesis of no difference between editions, and one is the true data. Can you spot it?
Wednesday, 10 November 2010
believe believe
casecase closelyclosely descendantsdescendants few fewlong long modifiedmodified variations
variations veryvery view view
believe believe
casecase closelyclosely descendantsdescendants few fewlong long modifiedmodified variationsvariations veryvery view view
believe believe
casecase closelyclosely descendantsdescendants few fewlong long modifiedmodified variationsvariations veryvery view view
believe believe
casecase closelyclosely descendantsdescendants few fewlong long modifiedmodified variations
variations veryvery view view
believe believe
casecase closelyclosely descendantsdescendants few fewlong long modifiedmodified variations
variations veryvery view view
Five tag clouds of selected words from the 1st (red) and 6th (blue) editions of Darwin’s “Origin of Species”. Four of the tag clouds were generated under the null hypothesis of no difference between editions, and one is the true data. Can you spot it?
Wednesday, 10 November 2010
ProtocolGenerate n-1 decoys (null datasets)
Plot the decoys + the real data (randomly positioned)
Show to an impartial observer. Can they spot the real data?
If so, you have evidence for true difference(p-value = 1/n)
Wednesday, 10 November 2010
E. L. Scott, C. D. Shane, and M. D. Swanson. Comparison of the synthetic and actual distribution of galaxies on a photographic plate. Astrophysical Journal, 119:91–112, Jan. 1954.Wednesday, 10 November 2010
A. M. Noll. Human or machine: A subjective comparison of Piet Mondrian’s “composition with lines” (1917) and a computer-generated picture. The Psychological Record, 16:1–10, 1966.Wednesday, 10 November 2010
vs. classical tests
Of course, if we know what we’re looking for, we can always develop an algorithm or numerical test.
The advantage of visual inference is that works for very general tasks, including when you don’t know exactly what you’re looking for.
Wednesday, 10 November 2010
Power of the test
!
Pow
er
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
sigma = 12
!15 !10 !5 0 5 10 15
sigma = 5
!15 !10 !5 0 5 10 15
sam
ple
siz
e =
100
sam
ple
siz
e =
300
power_curve
Theoretical test
Visual test
lower_CL
upper_CL
M. Majumder (Stat. Dept., ISU) Visual Inference for Regression Parameters Oct. 1 17 / 24
Recent work shows that power only a little worse than classical test
Wednesday, 10 November 2010
Plot Task
Choropleth map
Is there a spatial trend?
TreemapIs the distribution in higher level categories the same?
ScatterplotAre the two variables
independent?
Time seriesIs there a trend in mean or
variability?
Wednesday, 10 November 2010
Wednesday, 10 November 2010
Wednesday, 10 November 2010
Wednesday, 10 November 2010
Once we’ve seen the plot, we’re no longer impartial
Wednesday, 10 November 2010
Code# Support package written in R# http://github.com/ggobi/nullabor# Provides reference implementation of ideas
library(nullabor)library(ggplot2)
qplot(angle * 180 / pi, r, data = threept) %+% lineup(null_model(r ~ poly(angle, 2)), n = 10) + facet_wrap(~ .sample, ncol = 5)
Wednesday, 10 November 2010
Rorschach
Wednesday, 10 November 2010
RorschachWe’re surprisingly bad at appreciating the amount of variation in random data.
Showing only null plots is a good way to calibrate our intuition.
We also plan on using these plots as an empirical tool to understand what features people pick up on. Anecdotally, undergrads focus too much on outliers
Wednesday, 10 November 2010
result
count
020406080100
020406080100
020406080100
1
4
7
0.0 0.2 0.4 0.6 0.8 1.0
2
5
8
0.0 0.2 0.4 0.6 0.8 1.0
3
6
9
0.0 0.2 0.4 0.6 0.8 1.0
Wednesday, 10 November 2010
Future work
Wednesday, 10 November 2010
Future workHow can visual inference be integrated into visualisation software at a fundamental level?
How does training impact results? How do novices vs. experts differ?
What patterns do people pick up on? What are the alternatives that people respond to?
Wednesday, 10 November 2010
Questions?
Wednesday, 10 November 2010
Wednesday, 10 November 2010
This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
Wednesday, 10 November 2010