Post-academiccourseBigData
Post-academiccourseBigData
Joris KlerkxResearch Manager, [email protected]
VisualisatieBig DataIVPV - Instituut voor Permanente Vorming28-05-2015
1
Augment group - HCI research lab Dept. ComputerwetenschappenKU Leuvenhttps://augmenthuman.wordpress.com
2
Erik Duval11/9/1965 – 12/3/2016
3
Our mission
“Toaugmentthehumanintellect”(Engelbart,1962)
4
By ‘augmen+nghuman intellect’ we mean increasing the capability of a manto approach a complex problem situa+on, to gain comprehension to suit hisparticular needs, and to derive solu+onstoproblems.
Design,buildandevaluaterelevanttoolsandtechnologiesthathelpuserstobecomebeCerintheirdailylife&work(Duval,2015)
Our mission
5
What are relevant user actions?
How can we capture signals? How can we store them?
How can we create a meaningful feedback loop?
Our Research
Physiological, behavioural signals
Sensors, (self-)trackers
Information visualization
Scalable infrastructure
6
Application Domains
Technology-Enhanced Learning
Media Consumption
Science 2.0
(e)Health
7
Slides will be posted to Slideshare & Zephyr
8
http://www.hearts.com/ecolife/cut-paper-consumption-protect-forests/
9
Big Data
10
Big data
11
Big datainsights
12
Better Human Understanding
13
A mental model represents what a person thinks is true… but isn’t necessarily true
14
UNDERSTANDING OF THEIR MENTAL MODELS
15
Wouter Walgrave - http://www.slideshare.net/wouterwalgraeve/mental-models-as-information-radiators 16
17
18
?
19
"The idea that business is strictly a numbers affair has always struck me as preposterous. For one thing, I’ve never been particularly good at numbers, but I think I’ve done a
reasonable job with feelings. And I’m convinced that it is feelings — and feelings alone — that account for the success of the Virgin brand in all of its myriad forms.” -- Richard
Branson
20
Gut feeling21
What your gut feeling says
What the facts say
22
What your gut feeling says
What the facts say
Confirmation bias
Undervalued Overvalued Foolish23
Big datainsightsdata-driven insights
24
25
Big datainsightsdata-driven insights
Meaningful
26
Defining visualization
27
Definition
28
Information Visualization is the use of interactive visual representations to amplify cognition [Card. et. al]
algorithm<>
human
29
Information Visualisation is the use of interactive visual representations to amplify cognition [Card. et. al]
Definition
30
http://www.demorgen.be/dm/nl/5403/Internet/article/detail/1890428/2014/05/18/Twitteractiviteit-verraadt-je-politieke-profiel.dhtml31
Facilitate human interaction for exploration with and understanding of big data
32
Data visualization
Slidesource:JohnStasko
Scientific visualization
Information visualization
33
Scientific visualisation
Specifically concerned with data that has a well-defined representation in 2D or 3D space (e.g., from simulation mesh or scanner).
Slidesource:RobertPutman 34
Information Visualisation
Concerned with data that does not have a well-defined representation in 2D or 3D space (i.e., “abstract data”)
35
Dispersion (Backstrom & Kleinberg)36
The role of visualisation
37
Big datainsightsdata-driven insights
Meaningful
38
By Longlivetheux - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=3770524739
https://medium.com/@angelamorelli/3-powerful-lessons-i-have-learnt-as-an-information-designer-cb028940254#.mkgb0h2cc40
The Role of visualisation
Brehmer, M.; Munzner, T., "A Multi-Level Typology of Abstract Visualization Tasks," Visualization and Computer Graphics, IEEE Transactions on , vol.19, no.12, pp.2376,2385, Dec. 2013 41
Explore
Data insights: a visualization (Gregor Aisch)
42
Visualizing Big Data
44
Multiple data sources with varied data types
“Diverse” data
I talk geoJSON
i talk custom xml
i talk apache logs
45
millions of records
“Tall” data
46
Example: 51 million ratings
48
http://dataclysm.org
Example: 51 million ratings
49
http://dataclysm.org
Example: 51 million ratings
50
http://dataclysm.org 51
Cluttered displays
Heer, J. & Kandel, S. (2012), Interactive Analysis of Big Data, XRDS, 19 (1)52
Cluttered displaysBinned density scatterplot
Hexagonal instead of rectangular
Heer, J. & Kandel, S. (2012), Interactive Analysis of Big Data, XRDS, 19 (1)53
Multi-variate data with 100s to 1000s of variables
“Wide” data
54
http://www.perceptualedge.com/blog/?p=2046
In this day of so-called Big Data, organizations are scrambling to implement new software and hardware to increase the amount of data that they collect and store. In so doing they are unwittingly making it harder to find the needles of useful information in the rapidly growing mounds of hay. If you don’t know how to differentiate signals from noise, adding more noise only makes matters worse.
55
Avoid the All-You-Can-Eat buffet! (Ben Fry)56
Visualizations might help reveal multidimensional patterns
Use the power of the machine to find a proxy in the data that predicts the selected variables
Depending on their specific questions, domain experts might select a subset of variables they are interested in
57
Example: 4 million messages/day on OKCupid
http://dataclysm.org 58
Each dot at 90% transparency
http://dataclysm.org 59
http://dataclysm.org 60
http://dataclysm.org 61
http://dataclysm.org 62
Multiple views on the data allow exploration of patterns
63
The strength of visualization
64
Anscombe`s quartet http://en.wikipedia.org/wiki/Anscombe's_quartet
Enables discovery of visual patterns in data sets
Graphics reveal data (Tufte, 2001)
65
World Population GrowthA tremendous change occurred with the industrial revolution: whereas it had taken all of human history until around 1800 for world population to reach one billion, the second billion was achieved in only 130 years (1930), the third billion in less than 30 years (1959), the fourth billion in 15 years (1974), and the fifth billion in only 13 years (1987). During the 20th century alone, the population in the world has grown from 1.65 billion to 6 billion.
Seeing is understanding
66
Facilitates understandinghttp://www.bbc.co.uk/news/world-15391515
67
Facilitates human interaction for exploration and understandinghttp://www.bbc.co.uk/news/world-15391515
68
http://www.informationisbeautiful.net/visualizations/how-many-gigatons-of-co2/
Tells stories
69
T. Nagel, M. Maitan, E. Duval, A. Vande Moere, J. Klerkx, K. Kloeckl, and C. Ratti. Touching transport - a case study on visualizing metropolitan public transit on interactive tabletops. In AVI2014: 12th ACM International Working Conference on Advanced Visual Interfaces, pages 281–288, 2014.
http://www.youtube.com/watch?v=wQpTM7ASc-w
Facilitates human interaction for exploration and understanding70
Will there be enough food?
http
://w
ww.
foot
netw
ork.o
rg/e
n/ind
ex.ph
p/gfn
/pag
e/ea
rth_
over
shoo
t_da
y/
Communicates insights easily
71Triggers Impact
Interactivity allows comparison
73
http://blog.stephenwolfram.com/2012/03/the-personal-analytics-of-my-life/
Shows trends & anomalies in the data, therefore triggers questions
74
Helps to find stories, see trends
BelgiumBrazil
USA
India
75
Sentiment analysis in enterprise social network (slack)
Shows patterns
76
http://deredactie.be/cm/vrtnieuws/grafiek/interactief/1.224856177
Reader Client
Tracking Service
WebSockets
Database
engagement data mouse data
10.065 sessies werden getracked
9674 sessies werden gebruikt in de analyse
391 sessies werden verwijderd uit analyse (noise)
78
Visualizing Reader Activity
Elk vierkant is een ‘slide’
Elke rij stelt een navigatie-patroon voor doorheen de slides
Kolom 1 toont absoluut aantal lezers
Kolom 2 toont het percentage lezers
79
262 readers (2.7%) gaan volledig door alle slides, waarna ze snel teruggaan naar de eerste slide om die nog even te bekijken.
Lezerstijd per slide
Lezers spenderen +/- 75 seconden (avg) op de eerste slide om te bestuderen welke informatie voorhanden is.
80
Shows patterns
Sentiment analysis in enterprise social network (slack)
Triggers questions & creates awareness
Disclaimer: Should we trust NLP-algorithms? 81
Empowers users to make informed decisions
Positive Badges
Negative Badges
82
Show errors in the data
http://woutervds.github.io/InfoVisPostgraduwhat/83
Show errors in the data84
Khaled Bachour, Frederic Kaplan, Pierre Dillenbourg, "An Interactive Table for Supporting Participation Balance in Face-to-Face Collaborative Learning," IEEE Transactions on Learning Technologies, vol. 3, no. 3, pp. 203-213, July-September, 2010
Creates awareness
85
http://infosthetics.com/
http://visualizing.orghttp://www.visualcomplexity.com/vc/
http://visual.ly/
http://flowingdata.comhttp://www.infovis-wiki.net
86
Visualizing (big) dataGuidelines & Facts
88
How many circles?
89
Humans have advanced perceptual abilitiesOur brains makes us extremely good at recognizing visual patterns
90
91
Humans have little short term memoryOur brain remembers relatively little of what we perceive.
Most of us can only hold three to seven chunks of data at the same time.Humans have little short term memory
92
RecognitionIdentify previously learned information
93
Humans have advanced perceptual abilities
Humans have little short term memory
Our brains makes us extremely good at recognizing visual patterns
Our brains remember relatively little of what we perceive
Externalize data by using interactive, visual encodingsPromote recognition rather than recall
94
https://www.youtube.com/watch?v=og7bzN0DhpI (9:51 - 11:22 )95
96
“The centrality of human activity in the process is key”
97
Explore
Data insights: a visualization (Gregor Aisch)
98
“It’s not a magical algorithm that finds the insight for you”
“You have to look at the overview, you have to decide what you zoom in to, what you filter out. And then
you click to get the details”Ben Shneiderman, 201199
http://www.bbc.com/future/bespoke/20140724-flight-risk/
Overview first, zoom & filter, details-on-demand
100
Overview first, zoom & filter, details-on-demand
http://www.student.kuleuven.be/~r0580868/
101
https://postgraduwhatblog.wordpress.com/2016/02/13/infovis-van-de-week-1-wouter/
Overview first, zoom & filter, details-on-demand
102
Visual Information Seeking Mantra
103
Real data is ugly and needs to be cleaned
http
://hc
il2.c
s.um
d.ed
u/tr
s/20
11-3
4/20
11-3
4.pd
f
http://www.netmagazine.com/features/seven-dirty-secrets-data-visualisationhttps://code.google.com/p/google-refine/
http://vis.stanford.edu/wrangler/Pre-process your data
104
http://nieuws.vtm.be/verkiezingen/gemeente?province=P1&city=G73
Always check & pre-process your data
105
Verkiezingen 14/10/12
Forget about 3D graphs (on a 2D screen..)
Occlusion Complex to interact with Doesn’t add anything to the data
106
Source: Stephen Few
What if we need to add a 3rd variable?
107
Use small coordinated graphs to add variables
108
Forget about 3D graphs
Source: Stephen Few
Which student has more blogposts?
• Size & angle are difficult to compare• Without labels & legends, impossible to show exact quantitative
differences• Limited Short term (visual) memory
109
Source: Stephen Few
Save the pies for dessert (S. Few)
Try using either of the pies to put the slices in order by size
110
deredactie.be
demorgen.be
vtm.be
Verkiezingen 14/10/12
111
Obviously there are exceptions to the rule
112http://themetapicture.com/the-sunny-side-of-the-pyramid/
0"
5"
10"
15"
20"
25"
30"
blogposts" tweets" comments"on"blogs"
reports"submi6ed"
Student'1'
Student"1"
0" 5" 10" 15" 20" 25" 30"
blogposts"
comments"on"blogs"
tweets"
reports"submi6ed"
Student'1'
Student"1"
Use Common Sense
0"
5"
10"
15"
20"
25"
30"
blogposts" comments"on"blogs"
tweets" reports"submi6ed"
Student'1'
Student"1"
113
0" 10" 20" 30" 40" 50" 60"
Student"1"
Student"2"
Student"3"
Student"4"
blogposts"
tweets"
comments"on"blogs"
reports"submi:ed"
0%# 20%# 40%# 60%# 80%# 100%#
Student#1#
Student#2#
Student#3#
Student#4#
blogposts#
tweets#
comments#on#blogs#
reports#submi;ed#
Use Common Sense
What are you comparing?What story do you get from it?
114
Which graph makes it easier to focus on the pattern of change through time, instead of the individual values?
Choose graph that answers your questions about your data115Source: Stephen Few
vtm.be
deredactie.be
nieuwsblad.be
Verkiezingen 14/10/12
Communicate the correct story
116
Don’t use visualisations to mislead
117
Don’t use visualisations to mislead
118
Source: Stephen Few 119
Source: Stephen Few 120
121
http://fellinlovewithdata.com/research/deceptive-visualizations 122
http://fellinlovewithdata.com/research/deceptive-visualizations 123
How much better are the drinking water conditions in Willowtown as compared to Silvatown?
124http://fellinlovewithdata.com/research/deceptive-visualizations
Storytelling with visualisation
125
Visualization tasks
Brehmer, M.; Munzner, T., "A Multi-Level Typology of Abstract Visualization Tasks," Visualization and Computer Graphics, IEEE Transactions on , vol.19, no.12, pp.2376,2385, Dec. 2013 126
http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html127
Human Perception
128
Our brains makes us extremely good at recognizing visual patterns
Source: Katrien Verbert 129
Source: Katrien Verbert 130
A limited set of visual properties that are detected - very rapidly (< 200 to 250 ms), - accurately,- with little effort,- before focused attentionby the low-lever visual system on them.
Healey,C.,&Enns,J.(2012).ADenEonandVisualMemoryinVisualizaEonandComputerGraphics.IEEETransac+onsonVisualiza+onandComputerGraphics,18(7),1170-1188.
Pre-attentive characteristics
Note that eye movements take at least 200 ms to initiate.
131
Pre-attentive characteristics
Find the red dot
<> Hue
Find the dot
<> shape
Find the red dot
conjunction not pre-attentive
http://www.csc.ncsu.edu/faculty/healey/PP/
helps to spot differences in multi-element display
132
Pre-attentive characteristics
Line orientation Length, width Closure Size
Curvature Density, contrast Intersection 3D depth
Not all of them allow showing exact quantitative differencesHelps to spot differences in multi-element display
133
http://www.csc.ncsu.edu/faculty/healey/PP/
http://www.slideshare.net/chelsc/gestalt-laws-and-design-presentation
http://artspilesenglish.blogspot.be/2011/11/gestalt-theory-exercise-for-3rdlevel.html
134
Gestalt Laws (“Pattern” laws)
Basic rules or design principles that describe perceptual phenomena.Explain the way users or humans see patterns in visualisations.
Figure & Ground
135
136
Closure
Smallness
137Source: Katrien Verbert
Common Fate
Objects with a common movement, that move in the same direction, at the same pace, at the same time are organised as a group (Ehrenstein, 2004).
138
Law of Isomorphism
Is similarity that can be behavioural or perceptual, and can be a response based on the viewers previous experiences (Luchins & Luchins, 1999; Chang, 2002). This law is the basis for symbolism (Schamber, 1986).
139
London Tube Map
Which Gestalt laws do you see?
140
Visualization design process
141
B. McDonnel and N. Elmqvist. Towards utilizing gpus in information visualization: A model and implementation of image-space operations. Visualization and Computer Graphics, IEEE Transactions on, 15(6):1105–1112, 2009.http://www.infovis-wiki.net/index.php/Visualization_Pipeline
142
143
Data
- structuretime, hierarchy, network, 1D, 2D, nD, …
- questions where, when, how often, …
- audience domain & visualisation expertise, …
144
S. Stevens. On the theory of scales of measurement. Science, 103(2684), 1946.
StructureTime? hierarchical? 1D? 2D? nD? network? …
145
Questions (to get things going)
What is the average amount of students that bought the course book ?
What? When? How much? How often?
When did students start looking at the course material?
How much hours did Peter work on this assignment?
(Why did Peter have to redo his assignment?)
How often did Peter retake the course before he passed?
(why?)
146
147
Visual mapping
Encode data characteristics into visual form
Each mark (point, line, area,…) represents a data element
Think about relationships between elements (position)
“Simplicity is the ultimate sophistication.”Leonardo da Vinci
Size
http
://w
ww.
info
rmat
ioni
sbea
utifu
l.net
/200
9/vi
sual
isin
g-th
e-gu
ardi
an-d
atab
log/
148
X4
How much bigger is the lower bar?
SlideadaptedfromMichaelPorath&KatrienVerbert
Length
149
X5
How much bigger is the right circle?
SlideadaptedfromMichaelPorath&KatrienVerbert
Area
150
X9
How much bigger is the right circle?
151
Apparent magnitude curves
http://makingmaps.net/2007/08/28/perceptual-scaling-of-map-symbols
SlideadaptedfromMichaelPorath 152
Which one looks more accurate?
SlideadaptedfromMichaelPorath 153
Compensating magnitude to match perception
Color
Color Principles - Hue, Saturation, and Value
https://www.youtube.com/watch?v=l8_fZPHasdo154
Use maximum +/- 5 colors (for categories,.. ) (short term memory)
http://en.wikipedia.org/wiki/HSL_and_HSV
• hue: categorical
• saturation: ordinal and quantitative
• luminance/brightness: ordinal and quantitative
How to choose colors
source from: Katrien Verbert 155
http://colorbrewer2.org
156
157
https://eagereyes.org/basics/rainbow-color-map
158
http://gizmodo.com/why-a-white-cup-makes-your-coffee-taste-more-intense-1663691154
intensity, sweetness, aroma, bitterness, and quality
159
How to choose colors
Position
160
Position & color
http://time.com/12933/what-you-think-you-know-about-the-web-is-wrong/
161
J. Mackinlay. Automating the design of graphical presentations of relational information. ACM Transactions On Graphics, 5(2):110–141, 1986.
162
163
J. Mackinlay. Automating the design of graphical presentations of relational information. ACM Transactions On Graphics, 5(2):110–141, 1986.
164
Offer precise controls for sharing on the Internet... Users should navigate through 50 settings with more than 170 options
Example Facebook privacy statement
Questions?
How did its complexity change over time? How does its length compare to privacy statementsof other tools?
165
How did its complexity change over time?
http://www.nytimes.com/interactive/2010/05/12/business/facebook-privacy.html166
How does its length compare to privacy statementsof other tools?
http://www.nytimes.com/interactive/2010/05/12/business/facebook-privacy.html167
Example: Encoding weather forecast on a smartphone
168
?Joris KlerkxResearch Manager, [email protected]@jkofmsk https://augmenthuman.wordpress.com
169
Always on-the-look for new opportunities…