Data Visualization Nikhil Srivastava, 2015
Nikhil Srivastava
iHub Summer Data Jam
Data Visualization Nikhil Srivastava, 2015
About this Lecture
• Shortened version of longer course
• Course website
– Slides, demos, extra material
– Code samples and libraries
– Final projects
Data Visualization Nikhil Srivastava, 2015
Effective Data Visualization
Data Visualization Nikhil Srivastava, 2015
Nikhil Srivastava
0713 987 262
I build products & businesses in the fields of finance & technology.
I organize & visualize information for teaching & understanding.
nikhilsrivastava.com
Data Visualization Nikhil Srivastava, 2015
• What is Data Visualization?
• Thinking and Seeing
• From Data to Graphics
• Principles and Guidelines
• Building Visualizations
• Advanced Topics
introduction
foundation & theory
building blocks
design & critique
construction
advanced
Outline
Data Visualization Nikhil Srivastava, 2015
• What is Data Visualization?
• Thinking and Seeing
• From Data to Graphics
• Principles and Guidelines
• Building Visualizations
• Advanced Topics
introduction
foundation & theory
building blocks
design & critique
construction
advanced
Data Visualization Nikhil Srivastava, 2015
Data Visualization
Information Visualization
Scientific Visualization Infographics
Statistical GraphicsInformative Art
ArtScience
Statistics
JournalismDesign
Visual Analytics
Data Visualization Nikhil Srivastava, 2015
City/Town County Population Ahero Kisumu 76,828 Athi River Machakos 139,380 Awasi Kisumu 93,369 Kangundo-Tala Machakos 218,557 Karuri Kiambu 129,934 Kiambu Kiambu 88,869 Kikuyu Kiambu 233,231 Kisumu Kisumu 409,928 Kitale Trans-Nzoia 106,187 Kitui Kitui 155,896 Limuru Kiambu 104,282 Machakos Machakos 150,041 Molo Nakuru 107,806 Mwingi Kitui 83,803 Naivasha Nakuru 181,966 Nakuru Nakuru 307,990 Nandi Hills Trans-Nzoia 73,626 Ruiru Kiambu 238,858 Thika Kiambu 139,853
Data Visualization Nikhil Srivastava, 2015
City/Town County Population Ahero Kisumu 76,828 Athi River Machakos 139,380 Awasi Kisumu 93,369 Kangundo-Tala Machakos 218,557 Karuri Kiambu 129,934 Kiambu Kiambu 88,869 Kikuyu Kiambu 233,231 Kisumu Kisumu 409,928 Kitale Trans-Nzoia 106,187 Kitui Kitui 155,896 Limuru Kiambu 104,282 Machakos Machakos 150,041 Molo Nakuru 107,806 Mwingi Kitui 83,803 Naivasha Nakuru 181,966 Nakuru Nakuru 307,990 Nandi Hills Trans-Nzoia 73,626 Ruiru Kiambu 238,858 Thika Kiambu 139,853
• Which is the most populous
city in the list?
• Which county in the list has
the most cities?
• Which county in the list has
the largest average city?
Data Visualization Nikhil Srivastava, 2015
Data Visualization Nikhil Srivastava, 2015
• Which is the most populous
city in the list?
• Which county in the list has
the most cities?
• Which county in the list has
the largest average city?
Data Visualization Nikhil Srivastava, 2015
• Which is the most populous
city in the list?
• Which county in the list has
the most cities?
• Which county in the list has
the largest average city?
• What is the population of
Limuru?
Data Visualization Nikhil Srivastava, 2015
• Which is the most populous
city in the list?
• Which county in the list has
the most cities?
• Which county in the list has
the largest average city?
Data Visualization is:
• Useful
– Answers user questions
– Reduces user workload
(by design, not by default)
Data Visualization Nikhil Srivastava, 2015
Anscombe’s quartet (1973)
Data Visualization Nikhil Srivastava, 2015
Anscombe’s quartet (1973)
Data Visualization Nikhil Srivastava, 2015
Data Visualization is:
• Important
– Understand structure and patterns
– Resolve ambiguity
– Locate outliers
Data Visualization Nikhil Srivastava, 2015
Data Visualization Nikhil Srivastava, 2015
Data Visualization is:
• Important
– Design decisions affect interpretation
Data Visualization Nikhil Srivastava, 2015
Data Visualization Nikhil Srivastava, 2015
Data Visualization Nikhil Srivastava, 2015
Data Visualization Nikhil Srivastava, 2015
Data Visualization is:
• Powerful
– Communicate, teach, inspire
Data Visualization Nikhil Srivastava, 2015
Data Visualization is:
• Relevant
– In one second …
– Open data, open technologies
– Growing use in business,
education, media, advertising …
Data Visualization Nikhil Srivastava, 2015
Definitions
• “the process that transforms (abstract) data into
interactive graphical representations” 1
• “finding the artificial memory that best supports
our natural means of perception” 2
• “visual representations of data to amplify
cognition” 3
• “giving information a visual representation” 4
Data Visualization Nikhil Srivastava, 2015
Focus Extra
purpose communicate explore, analyze
data numerical,categorical
text, maps, graphs, networks
feature representation animation,Interactivity
Course Scope
Data Visualization Nikhil Srivastava, 2015
• What is Data Visualization?
• Thinking and Seeing
• From Data to Graphics
• Principles and Guidelines
• Building Visualizations
• Advanced Topics
introduction
foundation & theory
building blocks
design & critique
construction
advanced
Data Visualization Nikhil Srivastava, 2015
Bandwidth of Our Senses
Why Vision?
Data Visualization Nikhil Srivastava, 2015
The Hardware
Data Visualization Nikhil Srivastava, 2015
The Software• High-level concepts: objects,
symbols
• Involves working memory
• Slower, serial, conscious
• Sensory input
• Low-level features: orientation,
shape, color, movement
• Rapid, parallel, automatic
Visual Perception
“Bottom-up”
Data Visualization Nikhil Srivastava, 2015
The Software• High-level concepts: objects,
symbols
• Involves working memory
• Slow, sequential, conscious
• Sensory input
• Low-level features: orientation,
shape, color, movement
• Rapid, parallel, automatic
Visual Perception
“Bottom-up”
“Top-down”
Data Visualization Nikhil Srivastava, 2015
Task: Counting
How many 3’s?
1281768756138976546984506985604982826762 9809858458224509856458945098450980943585 9091030209905959595772564675050678904567 8845789809821677654876364908560912949686
Data Visualization Nikhil Srivastava, 2015
Task: Counting
How many 3’s?
1281768756138976546984506985604982826762 9809858458224509856458945098450980943585 9091030209905959595772564675050678904567 8845789809821677654876364908560912949686
1281768756138976546984506985604982826762 9809858458224509856458945098450980943585 9091030209905959595772564675050678904567 8845789809821677654876364908560912949686
Data Visualization Nikhil Srivastava, 2015
Task: Counting
Slow, sequential, conscious
Rapid, parallel, automatic
1281768756138976546984506985604982826762 9809858458224509856458945098450980943585 9091030209905959595772564675050678904567 8845789809821677654876364908560912949686
1281768756138976546984506985604982826762 9809858458224509856458945098450980943585 9091030209905959595772564675050678904567 8845789809821677654876364908560912949686
Data Visualization Nikhil Srivastava, 2015
Task: (Distractor) Search
Which side has the red circle?
Data Visualization Nikhil Srivastava, 2015
Task: (Distractor) Search
Which side has the red circle?
Data Visualization Nikhil Srivastava, 2015
Task: Search
Which side has the red circle?
Data Visualization Nikhil Srivastava, 2015
Task: Search
Which side has the red circle?
Data Visualization Nikhil Srivastava, 2015
Task: Search
Slow, sequential, conscious
Rapid, parallel, automatic
Data Visualization Nikhil Srivastava, 2015
Task: Unique SearchSlow, sequential, conscious
Rapid, parallel, automatic
(7)
(5)
(3)
Data Visualization Nikhil Srivastava, 2015
Lessons for Visualization
• Use “pre-attentive” attributes when possible
– Color, shape, orientation (depth, motion)
– Faster, higher bandwidth
• Caveats
– Working memory: magical number 7 (+/- 2)
– Be careful mixing attributes
Data Visualization Nikhil Srivastava, 2015
Example: Too Many Attributes
Data Visualization Nikhil Srivastava, 2015
Example: Too Many Attributes
Data Visualization Nikhil Srivastava, 2015
Eye != Camera
Data Visualization Nikhil Srivastava, 2015
Eye != Camera
limited aperture
limited color
Data Visualization Nikhil Srivastava, 2015
Data Visualization Nikhil Srivastava, 2015
Eye != Camera
Saccades: limited time and location
Data Visualization Nikhil Srivastava, 2015
Eye != Camera: Relative
A
B
Data Visualization Nikhil Srivastava, 2015
Eye != Camera: Relative
Data Visualization Nikhil Srivastava, 2015
Eye != Camera: Knowledge
Data Visualization Nikhil Srivastava, 2015
Eye != Camera: Knowledge
Data Visualization Nikhil Srivastava, 2015
Lessons for Visualization
• Human vision has limits and constraints:
aperture, color, time, location
• “What we see” depends on “what we
know”
• Attention and experience matters
Data Visualization Nikhil Srivastava, 2015
• What is Data Visualization?
• Thinking and Seeing
• From Data to Graphics
• Principles and Guidelines
• Building Visualizations
• Advanced Topics
introduction
foundation & theory
building blocks
design & critique
construction
advanced
Data Visualization Nikhil Srivastava, 2015
From Data to Graphics
What kind
of data do
we have?
How can we
represent the
data visually?
How can we
organize this into
a visualization?
Athi River Machakos 139,380
Awasi Kisumu 93,369
Kangundo-Tala Machakos 218,557
Karuri Kiambu 129,934
Kiambu Kiambu 88,869
Kikuyu Kiambu 233,231
Kisumu Kisumu 409,928
Kitale Trans-Nzoia 106,187
Kitui Kitui 155,896
Limuru Kiambu 104,282
Machakos Machakos 150,041
Molo Nakuru 107,806
Mwingi Kitui 83,803
Naivasha Nakuru 181,966
Nakuru Nakuru 307,990
Nandi Hills Trans-Nzoia 73,626
Visual Encoding
Data Visualization Nikhil Srivastava, 2015
What kind
of data do
we have?
How can we
represent the
data visually?
How can we
organize this into
a visualization?
Athi River Machakos 139,380
Awasi Kisumu 93,369
Kangundo-Tala Machakos 218,557
Karuri Kiambu 129,934
Kiambu Kiambu 88,869
Kikuyu Kiambu 233,231
Kisumu Kisumu 409,928
Kitale Trans-Nzoia 106,187
Kitui Kitui 155,896
Limuru Kiambu 104,282
Machakos Machakos 150,041
Molo Nakuru 107,806
Mwingi Kitui 83,803
Naivasha Nakuru 181,966
Nakuru Nakuru 307,990
Nandi Hills Trans-Nzoia 73,626
Data Visualization Nikhil Srivastava, 2015
Data as Input
Athi River Machakos 139,380
Awasi Kisumu 93,369
Kangundo-Tala Machakos 218,557
Karuri Kiambu 129,934
Kiambu Kiambu 88,869
Kikuyu Kiambu 233,231
Kisumu Kisumu 409,928
Kitale Trans-Nzoia 106,187
Kitui Kitui 155,896
Limuru Kiambu 104,282
Machakos Machakos 150,041
Molo Nakuru 107,806
Mwingi Kitui 83,803
Naivasha Nakuru 181,966
Nakuru Nakuru 307,990
Nandi Hills Trans-Nzoia 73,626
CleanRestructure
ExploreAnalyze
DATA
Visualization Goals
Data Visualization Nikhil Srivastava, 2015
Model and Attribute
item attribute A attribute B … attribute M
item 1 value1_A value1_B …
item 2 value2_A value2_B …
… … …
item N valueN_M
Data Visualization Nikhil Srivastava, 2015
Data TypesCATEGORICAL ORDINAL NUMERICAL
Interval Ratio
Male / Female
Asia / Africa / Europe
True / False
Small / Med / Large
Low / High
Yes / Maybe / No
Latitude/Longitude
Compass direction
Time (event)
Length
Count
Time (duration)
= = = =
< > < > < >
+ - + -
* /
Data Visualization Nikhil Srivastava, 2015
Data Types: Example
• Which are categorical? (=)
• Which are ordinal? (= < >)
ID Gender Test Score Grade Size Temperature
1 Male 77 C Small 36.5
2 Female 85 B Large 37.2
3 Female 95 A Medium 36.7
4 Male 90 A Large 37.4
• Which are interval? (= < > + -)
• Which are ratio? (= < > + - * /)
Data Visualization Nikhil Srivastava, 2015
Data Type TransformationCATEGORICAL ORDINAL NUMERICAL
Interval Ratio
Male / Female
Asia / Africa / Europe
True / False
Small / Med / Large
Low / High
Yes / Maybe / No
Time
Latitude/Longitude
Compass direction
Time
Length
Count
Binning/Categorizing
Differencing/Normalization
Data Visualization Nikhil Srivastava, 2015
Advanced Data Types
• Networks/Graphs
– Hierarchies/Trees
• Text
• Maps: points, regions, routes
Data Visualization Nikhil Srivastava, 2015
What kind
of data do
we have?
How can we
represent the
data visually?
How can we
organize this into
a visualization?
Athi River Machakos 139,380
Awasi Kisumu 93,369
Kangundo-Tala Machakos 218,557
Karuri Kiambu 129,934
Kiambu Kiambu 88,869
Kikuyu Kiambu 233,231
Kisumu Kisumu 409,928
Kitale Trans-Nzoia 106,187
Kitui Kitui 155,896
Limuru Kiambu 104,282
Machakos Machakos 150,041
Molo Nakuru 107,806
Mwingi Kitui 83,803
Naivasha Nakuru 181,966
Nakuru Nakuru 307,990
Nandi Hills Trans-Nzoia 73,626
Data Visualization Nikhil Srivastava, 2015
Visual Encodings
Marks
point
line
area
volume
Channels
position
size
shape
color
angle/tilt
Data Visualization Nikhil Srivastava, 2015
Channel Effectiveness
Data Visualization Nikhil Srivastava, 2015
Channel Effectiveness
“Spatial position is such a good visual
coding of data that the first decision of
visualization design is which variables get
spatial encoding at the expense of others”
Data Visualization Nikhil Srivastava, 2015
Color as a Channel
Categorical Quantitative
Hue Good (6-8 max)
Poor
Value Poor Good
Saturation Poor Okay
Data Visualization Nikhil Srivastava, 2015
What kind
of data do
we have?
How can we
represent the
data visually?
How can we
organize this into
a visualization?
Athi River Machakos 139,380
Awasi Kisumu 93,369
Kangundo-Tala Machakos 218,557
Karuri Kiambu 129,934
Kiambu Kiambu 88,869
Kikuyu Kiambu 233,231
Kisumu Kisumu 409,928
Kitale Trans-Nzoia 106,187
Kitui Kitui 155,896
Limuru Kiambu 104,282
Machakos Machakos 150,041
Molo Nakuru 107,806
Mwingi Kitui 83,803
Naivasha Nakuru 181,966
Nakuru Nakuru 307,990
Nandi Hills Trans-Nzoia 73,626
Data Visualization Nikhil Srivastava, 2015
type mark channel data represented
Scatter Plot point position 2 quantitative
Data Visualization Nikhil Srivastava, 2015
type mark channel data represented
Scatter + Hue point position,color
2 quantitative, 1 categorical
Data Visualization Nikhil Srivastava, 2015
type mark channel data represented
Scatter + Size (“Bubble”)
point position,size
3 quantitative
Data Visualization Nikhil Srivastava, 2015
Scatter Plot – Applications
CORRELATION GROUPING OUTLIERS
Data Visualization Nikhil Srivastava, 2015
Scatter Plot – Dangers
OCCLUSION (DENSITY)
OCCLUSION (OVERLAP)
3-D
Data Visualization Nikhil Srivastava, 2015
type mark channel data represented
Line Chart line position(orientation)
2 quantitative
Data Visualization Nikhil Srivastava, 2015
type mark channel data represented
Area Chart area size (length) 2 quantitative
Data Visualization Nikhil Srivastava, 2015
Line Chart – Applications
PATTERN OVER TIME COMPARISON
Data Visualization Nikhil Srivastava, 2015
Line Chart – Dangers
Y SCALING
X SCALING
OVERLOAD
Data Visualization Nikhil Srivastava, 2015
type mark channel data represented
Bar Chart line size (length) 1 categorical,1 quantitative
Data Visualization Nikhil Srivastava, 2015
type mark channel data represented
Histogram line size (length) 1 ordinal/quantitative,1 quantitative (count)
Data Visualization Nikhil Srivastava, 2015
Bar Chart – Applications
COMPARE CATEGORIES DISTRIBUTION
Data Visualization Nikhil Srivastava, 2015
Bar Chart – Dangers
TOO MANY CATEGORIES
POORLY SORTED
Data Visualization Nikhil Srivastava, 2015
type mark channel data represented
Pie Chart area size (angle) 1 quantitative
Data Visualization Nikhil Srivastava, 2015
Pie Chart – Dangers
AREA SCALE SIMILAR AREAS OVERLOAD
Data Visualization Nikhil Srivastava, 2015
Multi-Series Bar Charts
GROUPED BAR CHART
STACKED BAR CHART
Data Visualization Nikhil Srivastava, 2015
Multi-Series Line Charts
MULTIPLE LINE
STACKED AREA CHART
Data Visualization Nikhil Srivastava, 2015
Normalization
NORMALIZED BAR NORMALIZED AREA
Data Visualization Nikhil Srivastava, 2015
Small Multiples Chart
Data Visualization Nikhil Srivastava, 2015
Advanced Charts
Treemap (Hierarchical Data)
Channels: ?
Strengths:
nested relationships
Concerns:
order vs aspect ratio
Data Visualization Nikhil Srivastava, 2015
Advanced Charts
Multi-Level Pie(Hierarchical Data)
Channels: ?
Strengths:
nested relationships
Concerns:
readability
Data Visualization Nikhil Srivastava, 2015
Advanced Charts
Heat Map(Table/Field Data)
Channels: ?
Strengths: pattern/outlier detection
Concerns: ordering/ clustering
Data Visualization Nikhil Srivastava, 2015
Advanced Charts
Choropleth Map(Region Data)
Channels: ?
Strengths:
geography
Concerns:
region size
color spectrum
Data Visualization Nikhil Srivastava, 2015
Advanced Charts
Cartogram(Region Data)
Channels: ?
Strengths: geographic pattern
Concerns: base map knowledge
Data Visualization Nikhil Srivastava, 2015
• What is Data Visualization?
• Thinking and Seeing
• From Data to Graphics
• Principles and Guidelines
• Building Visualizations
• Advanced Topics
introduction
foundation & theory
building blocks
design & critique
construction
advanced
Data Visualization Nikhil Srivastava, 2015
From Science to Art
• Design principles*
• Style guidelines*
*dependent on visualization context
and objective (and author)
Data Visualization Nikhil Srivastava, 2015
Design Principles
Data Visualization Nikhil Srivastava, 2015
Design Principles
• Integrity
– Tell the truth with data
• Effectiveness
– Achieve visualization objectives
• Aesthetics
– Be compelling, vivid, beautiful
Data Visualization Nikhil Srivastava, 2015
Integrity
Lie Ratio = size of effect in graphic
size of effect in data
Data Visualization Nikhil Srivastava, 2015
Integrity
Data Visualization Nikhil Srivastava, 2015
Integrity
“show data variation, not design variation”
Data Visualization Nikhil Srivastava, 2015
Effectiveness*
Data/Ink Ratio = ink representing data
total ink
*according to Tufte
Data Visualization Nikhil Srivastava, 2015
Effectiveness* *according to Tufte
avoid “chart junk”
Data Visualization Nikhil Srivastava, 2015
Avoid Chart Junk
Data Visualization Nikhil Srivastava, 2015
Avoid Chart Junk
Data Visualization Nikhil Srivastava, 2015
Avoid Chart Junk
Data Visualization Nikhil Srivastava, 2015
Avoid Chart Junk
Data Visualization Nikhil Srivastava, 2015
Avoid Chart Junk
Data Visualization Nikhil Srivastava, 2015
Avoid Chart Junk
Data Visualization Nikhil Srivastava, 2015
Effectiveness (Few)
Data Visualization Nikhil Srivastava, 2015
Practical Guidelines
• Avoid 3-D charts
• Focus on substance over graphics
• Avoid separate legends and keys
• Faint grids/guidelines
• Avoid unnecessary textures and colors
Data Visualization Nikhil Srivastava, 2015
Color Guidelines
• To label
• To emphasize
• To liven or decorate
Data Visualization Nikhil Srivastava, 2015
Bad Color
Data Visualization Nikhil Srivastava, 2015
Good Color
Data Visualization Nikhil Srivastava, 2015
More Color Guidelines
• Use color only when necessary
• Use saturated colors for data labels, thin
lines, small areas
• Use less saturated colors for large areas,
backgrounds
• Use tools like ColorBrewer
Data Visualization Nikhil Srivastava, 2015
• What is Data Visualization?
• Thinking and Seeing
• From Data to Graphics
• Principles and Guidelines
• Building Visualizations
• Advanced Topics
introduction
foundation & theory
building blocks
design & critique
construction
advanced
Data Visualization Nikhil Srivastava, 2015
What Software to Use?
Athi River Machakos 139,380
Awasi Kisumu 93,369
Kangundo-Tala Machakos 218,557
Karuri Kiambu 129,934
Kiambu Kiambu 88,869
Kikuyu Kiambu 233,231
Kisumu Kisumu 409,928
Kitale Trans-Nzoia 106,187
Kitui Kitui 155,896
Limuru Kiambu 104,282
Machakos Machakos 150,041
Molo Nakuru 107,806
Mwingi Kitui 83,803
Naivasha Nakuru 181,966
Nakuru Nakuru 307,990
Nandi Hills Trans-Nzoia 73,626
CleanRestructure
ExploreAnalyze
DATA
Visualization Goals
Data Visualization Nikhil Srivastava, 2015
Visualization Software
• Web friendly
– Highcharts
– InfoVis
– Processing
– D3
• Statistics
– Python (Matplotlib)
– R (ggplot2)
• Maps
– Google Charts
– Leaflet
– CartoDB
• Dashboards
• Graphs
– GraphViz
– Gephi
Data Visualization Nikhil Srivastava, 2015
Highcharts - Reference
• Examples
– Hello, Chart
– Basic Charts
• Documentation, API
• Highcharts Cloud
Data Visualization Nikhil Srivastava, 2015
• What is Data Visualization?
• Thinking and Seeing
• From Data to Graphics
• Principles and Guidelines
• Building Visualizations
• Advanced Topics
introduction
foundation & theory
building blocks
design & critique
construction
advanced
Data Visualization Nikhil Srivastava, 2015
The Ebb and Flow of Movies
NY Times, 2008
Advanced Visualizations
Data Visualization Nikhil Srivastava, 2015
Word Cloud - “Data Visualization” Wikipedia PageWordle
Data Visualization Nikhil Srivastava, 2015
ZIPScribbleRobert Kosara, 2006
Data Visualization Nikhil Srivastava, 2015
Twitter NetworksPJ Lamberson, 2012
Data Visualization Nikhil Srivastava, 2015
Blogs
• Infosthetics.com
• Visualizing.org
• FlowingData.com