Data Visualisation
Harvinder Atwal
2
Agenda
Warm-Up
Data Visualisation: Why it matters
The Rules
Seven Common Quantitative Relationships
The best means to encode quantitative data in charts
Step by Step Guide
Test
5 mins
5 mins
10 mins
5 mins
5 mins
10 mins
5 mins
3
Agenda
Warm-Up
Data Visualisation: Why it matters
The Rules
Seven Common Quantitative Relationships
The best means to encode quantitative data in charts
Step by Step Guide
Test
5 mins
5 mins
10 mins
5 mins
5 mins
10 mins
5 mins
4
Even the best information is useless, if its story is poorly told
The effective display of quantitative information involves two fundamental challenges
Selecting the right medium of display (for example, a table or a graph, and the appropriate kind of either)
and
Designing the individual visual components of the selected medium to display the information and its message as clearly as possible
Most presentations of quantitative business data are poorly designed – painfully so, often to the point of misinformation.
Anyone can start drawing charts in Excel and use PowerPoint but hardly anyone is trained to do so effectively.
5
Bad Data Visualisation can have tragic consequences
In Jan 1986 NASA had to decide whether to launch the Challenger shuttle in
a “100-year cold”
Morton Thiokol engineers produced a chart and
recommended that shuttles not be flown
below 53F because of potential damage to the O-Rings in the booster
rockets
Morton Thiokol managers accepted the
recommendation and passed it on to NASA
NASA asks for the recommendation to be
reconsidered
Morton Thiokol managers agree to the flight
6
The engineers are Morton Thiokol came up with this chart
Looking at the O-Ring damage over the previous 24 shuttle missions, the data was presented in chronological order showing the location and extent of the damage sustained to the left and right boosters and the temperature at launch time.
The Morton Thiokol engineers failed to convince their management and NASA with fatal consequences
7
8
Would this chart have been more convincing?
If instead we remove all the extraneous data and do a simple plot of temperature vs damage then the pattern becomes much clearer.
Never damage above 76F
ALWAYS damage below 66F
WTF!? How many hours of valuable management time have been wasted trying to understand a badly drawn chart?
9
How many £billions have been wasted on incorrect decisions because someone has misinterpreted a chart message?
10
To communicate effectively visually you need to understand visual perception and cognition.
Present your message in a way that takes advantage of the strengths of visual perception while avoiding its weaknesses - matching the human thought process.
You can develop a simple set of skills (graphicacy) based on this knowledge.
This is mostly Not
, based on clear-cut principles about what works and what doesn’t
11
Agenda
Warm-Up
Data Visualisation: Why it matters
The Rules
Seven Common Quantitative Relationships
The best means to encode quantitative data in charts
Step by Step Guide
Test
5 mins
5 mins
10 mins
5 mins
3 mins
10 mins
5 mins
Research Finding: Communication is most effective when you say neither more nor less than what is relevant to your message.
Principle #1: Display neither more nor less than what is relevant to your message.
12
13
Tufte’s data-ink ratio is the single most important concept in data visualisation
Data-ink ratio =data-ink / total ink used to print the graphic
= proportion of a graphic’s ink devoted to thenon-redundant display of data-information
= 1.0 − proportion of a graphic that can be erasedwithout loss of data-information.
(The Visual Display of Quantitative Information, Edward R. Tufte, Graphics Press, Cheshire CT, 1983, p.93)
14
Eliminate all redundant visual information!
You wouldn’t write a document like this using multiple fonts, gratuitous formatting, redundant
excessive highlighting, variable colours, difficult to read italics, pointless underlining, desperate shadows in multiple
sizes.● Yet everyday you see the graphical equivalent as
people try to make their charts “interesting” instead of useful!.
15
How many items of redundant visual information can you see in this chart?
112
100
183
150
97
75
9185
0
20
40
60
80
100
120
140
160
180
200
Volume
Wales and West London And South East Scotland and North MidlandsRegion
Sales and Appointments by Region
Appointments
Sales
3-D Effect
Border on Legend
Grey Background
Highlighting for no reason
Floor
Legend Key
Data Labels
Underlining
Excessive tick marks
Vertical Lines
Border on Bars
Border
Less is more; the same chart de-junked…
0
20
40
60
80
100
120
140
160
180
200
Wales and West London And South East Scotland and North Midlands
Appointments Sales
Region
Volumes Sales and Appointments by Region
17
Research Finding: People perceive visual differences in an information display as differences in meaning.
Principle #2: Do not include visual differences in a graph that do not correspond to actual differences in the data.
What is the meaning of the different colours that appear on the bars? The answer is “nothing.”
18
Don’t confuse people and waste their time by including visual differences that are meaningless.
19
Research Finding: The visual properties that work best for representing quantitative values are the length or 2-D location of objects.
Principle #3: Use the lengths or 2-D locations of objects to encode quantitative values in graphs unless they have already been used for other variables.
20
#1 How much taller is bar B than A?
A B
21
#2 How much higher is point A than B?
A B
22
#3 How much bigger is the area of B than A?
AB
23
#4 How much darker is circle B than A?
AB
A B
Answers
A B
#1 #2
AB
#3 AB
#4 5x10x
4x
5x
25
How much taller is bar B than A?
A B
26
Bar B is actually only 10% bigger than A, not 100%
470
480
490
500
510
520
530
540
550
560
A B
27
Research Finding: People perceive differences in the lengths or 2-D locations of objects fairly accurately and interpret them as differences in the actual values that they represent.
Principle #4: Differences in the visual properties that represent values (that is, differences in their lengths or 2-D locations) should accurately correspond to the actual differences in the values they represent.
28
Research Finding: People perceive things that appear connected as wholes and things that appear disconnected as discrete.
Principle #5: Do not visually connect values that are discrete, thereby suggesting a relationship that does not exist in the data.
The regions are discrete, so values that measure something going on in these regions should be displayed as discrete.
29
Connecting discrete items with a line is misleading. Doing so forms a pattern of upwards and downwards slopes that are utterly meaningless.
30
Research Finding: People pay most attention to and consider most important those parts of a visual display that are most salient.
Principle #6: Make the information that is most important to your message more visually salient in a graph than information that is less important.
Some information is more important to your message than others
31
You can communicate this fact in a graph by making those items that are most important more visually dominant (salient).
It is your job to direct people’s eyes to the most important parts of the display, so they adequately focus on them.
32
Research Finding: Short-term memory is limited to about four chunks of information at a time.
Principle #7: Augment people’s short-term memory by combining multiple facts into a single visual pattern that can be stored as a chunk of memory and by presenting all the information they need to compare within eye span.
By presenting quantitative information visually as patterns, more information can be simultaneously stored in short-
term memory,
33
Each of the two lines in this line graph combines 12 different sales figures, one per month, into a single pattern of upward and downward sloping line segments.
When encoded in a visual pattern such as this, these 12 numbers can be stored together as a single chunk of information in short-term memory
34
Agenda
Warm-Up
Data Visualisation: Why it matters
The Rules
Seven Common Quantitative Relationships
The best means to encode quantitative data in charts
Step by Step Guide
Test
5 mins
5 mins
10 mins
5 mins
5 mins
10 mins
5 mins
35
Seven common quantitative relationships in graphs andhow to display them
Meaningful quantitative information always involves relationships. With rare exceptions in business graphs, these relationships always boil down to one or more of the seven relationships described on the following slides.
36
Time Series
Expresses the rise and fall of
values through time.– Use lines to emphasize
overall pattern.– Use bars to emphasize
individual values.– Use points connected by
lines to slightly emphasize individual values while still highlighting the overall pattern.
– Always place time on the horizontal axis.
37
Ranking
Expresses values in order by size.
Use bars only (horizontal or vertical).
– To highlight high values, sort in descending order.
– To highlight low values, sort in ascending order.
38
Part-to-Whole
Expresses the portion of each part relative to the whole.
– Use bars only (horizontal or vertical).
– Use stacked bars only when you must display measures of the whole
39
Deviation
Expresses how and the degree to which one or more things differ from another.
– Use lines to emphasize the overall pattern only when displaying deviation and timeseries relationships together.
– Use points connected by lines to slightly emphasize individual data points while also highlighting the overall pattern when displaying deviation and time-series relationships together.
– Use bars to emphasize individual values, but limit to vertical bars when a time series relationship is included.
– Always include a reference line to compare the measures of deviation against.
40
Distribution
Expresses a range of values as well as the shape of the distribution across that range.
Single distribution:– Use vertical bars to emphasize individual
values– Use lines to emphasize the overall shape.
Multiples distributions:– Use vertical or horizontal bars (a.k.a.
range bars or boxes) to encode the full range from the low value to the high value, or some meaningful portion of the range (for example, 90% of the values).
– Use points or lines together to encode measures of centre (for example, the median).
41
Correlation
Expresses how two paired sets of values vary in relation to one another.
– Use points and a trend line in the form of a scatter plot.
42
Nominal Comparison
Simply expresses the comparative sizes of multiple related but discrete values in no particular order.
– Use bars only (horizontal or vertical).
43
Agenda
Warm-Up
Data Visualisation: Why it matters
The Rules
Seven Common Quantitative Relationships
The best means to encode quantitative data in charts
Step by Step Guide
Test
5 mins
5 mins
10 mins
5 mins
5 mins
10 mins
5 mins
44
Four types of objects work best for encoding quantitative values in graphs: points, lines, bars, and boxes.
Points
Lines
Bars
Boxes
45
Points and Lines
Points are the smallest of the objects that are used to encode values in graphs. They can take the shape of dots, squares, triangles, Xs, dashes, and other simple objects. They have two primary strengths:
(1) they can be used to encode quantitative values along two quantitative scales simultaneously, as in a scatter plot, and
(2) they can be used to in place of bars when the quantitative scale does not begin at zero. Unlike lines, points emphasize individual values, rather than the shape of those values as they move up and down.
Lines connect the individual values in a series, emphasizing the shape of the data as it moves from value to value. As such, they are superb for showing the shape of data as it moves and changes through time. Trends, patterns, and exceptions stand out clearly.
You should only use lines to encode data along an interval scale.
46
0
20
40
60
80
100
120
140
160
Wales and West London And South East Scotland and North
Sales
Nominal Scale
Do not use lines for Nominal or Ordinal scales!
Wrong Wrong
In nominal and ordinal scales, the individual items are not related closely enough to be linked with lines, so you should use bars or points instead. Lines suggest change from one item to the next, but change isn’t happening if the items aren’t closely related as sequential subdivisions of a continuous range of values. For instance, it is appropriate to use lines to display change from one day to the next or from one price range to the next, but not from one community bank to the next.
Sales
0
20
40
60
80
100
120
Extra-Value Standard Branded Finest
47
Use lines only for Interval scales
0
20
40
60
80
100
120
Q1 Q2 Q3 Q4
Sales
Interval Scale
Right
With interval scales, you are not forced in all cases to use lines; you can use bars and points as well. If you want to emphasize the overall shape of the data or changes from one item to the next, lines work best.
If, however, you want to emphasize individual items, such as individual months, or to support discrete comparisons of multiple values at the same location along the interval scale, such as revenues and expenses for individual months, then bars or points work best.
0
20
40
60
80
100
120
Q1 Q2 Q3 Q4
Sales
48
Bars encode data in a way that emphasizes individual values powerfully
This ability is due in part to the fact that bars encode quantitative values in two ways:
(1) the 2-D position of the bar’s endpoint in relation to the quantitative scale, and
(2) the length of the bar.
You probably recognize that these two characteristics correspond precisely to the two visual attributes that can be used to encode data in graphs. When you want to draw focus to individual values or to support the comparison of individual values to one another (see figure 19), bars are an ideal choice. They don’t, however, do as well as lines in revealing the overall shape of the data. Bars may be oriented vertically or horizontally.
0
10
20
30
40
50
60
70
80
90
100
Rewards Exchange
Budget Actual
49
Whenever you use bars, your quantitative scale must include zero
The lengths of the bars encode their values, but won’t do so accurately if those values don’t begin at zero. Notice what happens when you narrow the quantitative scale and use bars below. Actual sales appear to be half of planned sales, but in fact they are 90% of the plan.
470
480
490
500
510
520
530
540
550
560
A B
When you would normally use bars, but wish to narrow the quantitative scale to show differences between the values in greater detail, you should switch from bars to points, because points encode values merely as 2-D location in relation to the quantitative scale, which eliminates the need to begin the scale at zero.
70
80
90
100
Rewards Exchange
Budget Actual
50
Boxes
Boxes are a lot like bars, except that both ends encode quantitative values. When bars are used in this way, they are sometimes called range bars. They are used to encode a range of values, usually from the highest to the lowest, rather than a single value.
In the 1970s John Tukey invented a method of using rectangles (bars with or without fill colors) in combination with individual data points (often a short line) and thin bars to encode several facts about a distribution of values, including the median (middle value), middle 50%, etc.
He called his invention a box plot (a.k.a. box-and-whisker plot).
51
Agenda
Warm-Up
Data Visualisation: Why it matters
The Rules
Seven Common Quantitative Relationships
The best means to encode quantitative data in charts
Step by Step Guide
Test
5 mins
5 mins
10 mins
5 mins
5 mins
10 mins
5 mins
52
Step 1: Determine your message
Determine your message.
Don’t just turn your data into a chart!
Think about what your data means, what you want to communicate and most importantly your audiences’ needs.
Will the data be used to look up and compare individual values, or will the data need to be precise? If so, you should display it in a table.
Is the message contained in the shape of the data—in trends, patterns, exceptions, or comparisons that involve more than a few values? If so, you should display it in a graph.
Or, do both.
53
Step 2: Determine the best means to encode the values
What am I trying to represent?
Nominal comparison. Bars (horizontal or vertical). Points (if the quantitative scale does not include zero).
Correlation. Points and a trend line in the form of a scatter plot
Time Series. Lines to emphasize the overall shape of the data
Bars to emphasize and support comparisons between individual values
Points connected by lines to slightly emphasize individual values while still highlighting the overall shape of the data
Ranking. Bars (horizontal or vertical). Points (if the quantitative scale does not include zero)
Part-to-Whole. Bars (horizontal or vertical) Note: Pie charts are commonly used to display part-to-whole relationships, but they don’t work nearly as well as bar graphs because it is much harder to compare the sizes of slices than the length of bars. Use stacked bars only when you must display measures of the whole as well as the parts
Deviation. Lines to emphasize the overall shape of the data (only when displaying deviation and time-series relationships together)
Points connected by lines to slightly emphasize individual data points while also highlighting the overall shape (only when displaying deviation and time-series relationships together)
Frequency Distribution. Bars (vertical only) to emphasize individual values. This kind of graph is called a histogram
Lines to emphasize the overall shape of the data. This kind of graph is called a frequency polygon.
54
Step 3: Determine where to display each variable – One Variable
Place the categorical variable on the x-axis if your graph will include ONE categorical variable and any one of the following is true:
• The categorical scale is an interval scale
• You are using lines to encode the data
• You are using bars to encode the data and the labels are not long or many
If you are using bars place the categorical variable on the Y-axis when either of these two conditions exist:
• The text labels associated with the bars are long
• There are many bars.
Is better than
0
20
40
60
80
100
120
Beef
Fresh
por
k
Lam
b
Bacon
Sausa
ge
Beef f
illet jn
t
Beef s
irloin
joint
Pork r
oulad
es
Fresh
por
k minc
e
Fresh
pou
ltry g
ravy
Beef s
tock
4 be
ef b
urge
rs
8 be
ef st
eak b
urge
rs
Angus
bur
gers
0 20 40 60 80 100 120
Beef
Fresh pork
Lamb
Bacon
Sausage
Beef fillet jnt
Beef sirloin joint
Pork roulades
Fresh pork mince
Fresh poultry gravy
Beef stock
4 beef burgers
8 beef steak burgers
Angus burgers
55
Step 3: Determine where to display each variable – Two or three variables
With a line graph, place the variable that is most important to your message along the X axis.
With a bar graph, encode the variable whose items you want to make it easiest to compare using a method other than association with an axis. Notice how much easier it is to compare appointments and sales than the regions, because they are positioned next to one another.
0
20
40
60
80
100
120
140
160
180
200
Wales and West London And South East
Scotland and North Midlands
Appointments
Sales
If the graph involves two or three variables, you must decide which to display along the axes and which to encode using distinct versions of another visual attribute, such as colour.
56
Step 3: Determine where to display each variable - the problem of the fourth variable
This solution involves a series of small graphs, arranged in the same way as a graph with three variables, all arranged together in a way that can be seen simultaneously. Each graph is alike, including consistent scales, differing only in that each features a different item of a categorical variable. Each graph varies according to a fourth variable, which is sales channel (e.g. product).
Using small multiples to support an additional variable is a powerful technique. Graphs can be arranged horizontally, vertically, or even in a matrix of columns and rows. If you need to display one more variable than you can fit into a single graph, select this approach.
0 50 100 150 200
Wales and West
London And South East
Scotland and North
Midlands
0 50 100 150 200 0 50 100 150 200
Sales
Appointments
Face-Value Rewards Big Exchange
2010
2011
57
Step 4: Determine the best design for the remaining objects - Scale
It’s now time to make a series of design decisions that remain, including the scales and text. These decisions are concerned with the placement and visual appearance of items.
If the graph will be used for analysis purposes that require seeing the differences between values in as much detail as possible, narrowing the scale can be useful. Generally, you should adjust the scale so that it extends a little below the lowest data value and a little above the highest.
If you are using bars to encode the data, but your message could be better communicated by narrowing the scale, Remember to switch from bars to points!
470
480
490
500
510
520
530
540
550
560
A B
800
1000
1200
1400
1600
1800
Q1 Q2 Q3 Q4
Sales
58
Step 4: Determine the best design for the remaining objects - Legend
If a Legend Is Required, and you are using lines, label the lines directly
If you are using bars, place the legend above the plot area with the labels arranged side-by-side in the same order as the bars
800
1000
1200
1400
1600
1800
Q1 Q2 Q3 Q4
Sales
London and South East
Wales and West
0
10
20
30
40
50
60
70
80
90
100
Rewards Exchange
Budget Actual
59
Step 4: Determine the best design for the remaining objects – Tick Marks and Scales
Tick marks are only necessary on quantitative scales, for they serve no real purpose on categorical scales. A number between 5 and 10 tick marks usually does the job; too many clutters the graph and too few fail to give the level of detail needed to interpret the values.
If the graph can be read with the scale in only one place (left, right, top bottom) place it nearest the data you want to emphasise or make easiest to read.
If the graph is so large it cannot be read with only one scale, place it in both positions ( top and bottom, left and right).
60
Step 4: Determine the best design for the remaining objects – Gridlines
Unless they are necessary to understand your message or divide a scatter plot into sections leave them off, and when used subdue them visually. Bear in mind graphs display patterns and relationships. If you want to communicate data with a high degree of quantitative accuracy use a table.
800
1000
1200
1400
1600
1800
Q1 Q2 Q3 Q4
Sales
800
1000
1200
1400
1600
1800
Q1 Q2 Q3 Q4
Sales
London and South East
Wales and West
61
Step 4: Determine the best design for the remaining objects – Descriptive Text
Although the primary message of a graph is carried in the picture it provides, text is always required to some degree to clarify the meaning of that picture. Some text if often needed, including:
– A descriptive title– Axis titles (unless the nature of the scale and its unit of measure are already clear)
Numbers in the form of text along quantitative scales are always necessary and legends often are. It is often useful to include one or more notes to describe what is going on in the graph, what ought to be examined in particular, or how to read the graph, whenever these bits of important information are not otherwise obvious.
Widget Sales by Region and Calendar Quarter (2007)
Widget sales in London and South East have been ahead of Wales and West with the exception of Q3
62
Step 5: Determine if particular data should be featured, and if so, how
The final major stage in the process involves highlighting particular data if some data is more important than the rest. Whatever the reason, you have a number of possible ways to make selected data stand out.
One of the best and simplest ways is to encode those items using bright or dark colours, which will stand out clearly if you’ve used soft colours for everything else. Other methods include:
–When bars are used, place borders only around those bars that should be highlighted.–When lines are used, make the lines that must stand out thicker.–When points are used, make the featured points larger or include fill colour in them alone.
0
20
40
60
80
100
120
Q1 Q2 Q3 Q4
Sales
800
1000
1200
1400
1600
1800
Q1 Q2 Q3 Q4
Sales
A B
63
Remember to follow this process for graph selection and design in order to communicate your information in the most e ective ff
manner
Determine your message and identify your data
Determine if a table, graph, or combination of both is needed to communicate your message
Determine the best means to encode the values
Determine where to display each variable
The best means to encode quantitative data in charts
Determine the best design for the remaining objects
Determine if particular data should be featured, and if so, how
Summary
Whenever you create a graph, you have a choice to make — to communicate or not. That’s what it all comes down to. If you have something important to say, then say it clearly and accurately. These guidelines are designed to help you do just that.
65
Agenda
Warm-Up
Data Visualisation: Why it matters
The Rules
Seven Common Quantitative Relationships
The best means to encode quantitative data in charts
Step by Step Guide
Test
5 mins
5 mins
10 mins
5 mins
5 mins
10 mins
5 mins
66
Which graph makes it easier to determine whether Mid-Cap US stocks or Small-Cap US stocks have a greater share?
A B
67
Which of these line graphs is easier to read?
A B
68
Which of these tables is easier to read?
A
B
69
Which graph makes it easier to focus on the pattern of change through time, instead of the individual values?
A
B
70
Only one of these graphs accurately encodes the values. The other skews the values in a misleading manner. Which graph presents the data accurately?
A B
71
Which map makes it easier to find all of the counties with positive growth rates?
A B
72
Which graph makes it easier to determine R&D’s travel expense?
A
B
73
In which graph are the labels easier to read?
A B
74
Which graph is easier to look at?
A
B
75
Which table allows you to see the areas of poor performance more quickly?
A
B
What percentage of the population is colour-blind?