89
Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 1CS 239, Spring 2007

Data Presentation CS 239

Experimental Methodologies for System Software

Peter ReiherMay 22, 2007

Page 2: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 2CS 239, Spring 2007

Outline

• General issues of data presentation

• Common methods of deceiving with data

• Graphical data presentation

Page 3: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 3CS 239, Spring 2007

Data Presentation

• Experiments are usually not done just for your own edification

• Generally, you need to describe the results to others

• How do you effectively present your data?

Page 4: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 4CS 239, Spring 2007

The Key Point

• You aren’t presenting numbers or graphs• You are presenting insight

– Backed up by solid evidence• Think about what you want your audience

to learn first• Worry about your numbers and graphs later

– And choose numbers and graphs that explain and defend your insights

Page 5: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 5CS 239, Spring 2007

A Corollary

• Present as little data as you need to present– But no less

• Don’t print every possible graph• Don’t show redundant graphs that simply

repeat the same point• Make sure each piece of data you present:

– Says something important– Says something the others don’t

Page 6: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 6CS 239, Spring 2007

Another Consideration

• Know your audience• Is it likely to believe your results?• Will it be skeptical?• Does it have the expertise to

understand complex issues?• Tailor your presentation to the

audience

Page 7: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 7CS 239, Spring 2007

Methods of Presenting Data

• In written form

• In live presentations

Page 8: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 8CS 239, Spring 2007

Presenting Data in Written Form

• Simple in-line description– “System x was 12% faster”– Only good for short, simple statements– Especially in summaries/conclusions

• Tables – Useful if seeing individual numbers is helpful– Also good if reader will make many comparisons across table

entries– But big tables are hard to read– So minimize number of entries– Avoid big tables with lots of rows and columns

• Graphics– Often best way to explain large, complex data sets

Page 9: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 9CS 239, Spring 2007

Presenting Data in Talks

• If it’s important, make sure it’s written down– On your slides, handouts, or

somewhere else they can go to later• Generally want it short and snappy

– Tables usually not appropriate– Brief mentions and graphics better

Page 10: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 10CS 239, Spring 2007

What About the Full Data?

• You should always save all the data• When possible, make it available to everyone

– Not possible if sensitive– Or legally prohibited– Or potential liability– Or a company asset

• Otherwise, put it in a suitable repository• Or put it on the web yourself• Make sure you include information on data’s

format and basic meaning

Page 11: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 11CS 239, Spring 2007

Using Data Deceptively

• Well-known techniques for using real data in deceptive ways

• Often called “games”

– Benchmarking games

– Ratio games

Page 12: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 12CS 239, Spring 2007

Benchmarking Games

• Techniques for producing data likely to be deceptive

• Methods of designing experiments that lead to results that are “advantageous”– Assuming experimenter cares how

experiment turns out– As he often does

• Not always intentional

Page 13: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 13CS 239, Spring 2007

Differing Configurations

• Use different configurations for the same workload on two systems

• E.g., a Linux vs. Windows comparison where one system has less memory

• Be suspicious when all conditions aren’t fully mentioned– Particularly if data comes from different

sources

Page 14: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 14CS 239, Spring 2007

Differing Compilation Options

• One version of the code is compiled with optimization

• The other isn’t

• True performance difference, but . . .

• Again, if data gathered from different sources, might be hard to tell

Page 15: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 15CS 239, Spring 2007

Biased Testing Specifications

• Experiment designed to favor situations good for one alternative

• E.g., a system that makes heavy use of disk is tested on a machine with a fast, expensive drive

• Or specifying a 10 Mbps incoming link for testing a firewall that runs poorly above 11 Mbps

Page 16: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 16CS 239, Spring 2007

Playing With the Workload

• Design workload so it fits known cycles of one system

• E.g., design workload to hide page faults

• Or choose arbitrary workload that isn’t representative– But does well on preferred system

Page 17: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 17CS 239, Spring 2007

Very Small Benchmarks

• Set benchmark sizes small to hide scaling effects

• E.g., hide poor disk performance by working out of cache

• Or conceal poor scaling of key data structure by limiting number of entries in workload

Page 18: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 18CS 239, Spring 2007

Tweaking Benchmarks

• Can alter aspects of benchmarks specifically to play to system strengths

• E.g., removing all images from web benchmark if system handles them poorly

• Increasing proportions of graphics operations in a benchmark to favor machine with special graphics hardware

Page 19: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 19CS 239, Spring 2007

• Ratios offer special possibility to conceal information

• Sometimes used specifically to hide what you don’t want to show

• How do you do that?

• How do you tell when others do that?

Ratio Games

Page 20: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 20CS 239, Spring 2007

Choosing a Base System

• A simple game method using ratios

• Run workloads on two systems

• Normalize performance to system you want to look good

• Take average of ratios

• Presto: you control what’s best

Page 21: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 21CS 239, Spring 2007

Example of Choosinga Base System

• A processor comparison (lower is better)

• Raw Data:

System

Benchmark A B

Block 41.16 51.5

Sieve 63.17 48.08

Sum 104.33 99.58

Average 52.17 49.79

Page 22: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 22CS 239, Spring 2007

Making System A Look Better

• Calculate ratio with A as base

System

Benchmark A B

Block 1.00 1.25

Sieve 1.00 0.76

Sum 2.00 2.01

Average 1.00 1.01

Page 23: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 23CS 239, Spring 2007

Making System B Look Better

• Use B as a base

System

Benchmark A B

Block .80 1.00

Sieve 1.31 1.00

Sum 2.11 2.00

Average 1.06 1.00

Page 24: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 24CS 239, Spring 2007

Lying With Ratio Metrics

• Pick a metric that is itself a ratio

– E.g., power = throughput response time

– Or cost/performance

• Handy because division is “hidden”

Page 25: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 25CS 239, Spring 2007

Relative Performance Enhancement

• Compare systems with incomparable bases

• Turn into ratios

• Example: compare Ficus 1 vs. 2 replicas with UFS vs. NFS (1 run on chosen day):

• “Proves” adding Ficus replica costs less than going from UFS to NFS

"cp" Time RatioFicus 1 vs. 2 197.4 246.6 1.25UFS vs. NFS 178.7 238.3 1.33

Page 26: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 26CS 239, Spring 2007

Ratio Games with Percentages

• Percentages are inherently ratios– But disguised– So they’re great for ratio games

• Example: Passing tests

• A is worse, but looks better in total line!

Test A Runs A Passes A % B Runs B Passes B %1 300 60 20 32 8 252 50 2 4 500 40 8

Total 350 62 18 532 48 9

Page 27: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 27CS 239, Spring 2007

More on Percentages

• Psychological impact– 1000% sounds bigger than 10-fold (or 11-fold)– Great when both original and final performance

are lousy• E.g., salary went from $40 to $80 per week

• Base should be initial, not final value– E.g., price can’t drop 400%

Page 28: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 28CS 239, Spring 2007

Strategies for Winninga Ratio Game

• Can you win?

• How to win

Page 29: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 29CS 239, Spring 2007

Can You Winthe Ratio Game?

• If one system is better by all measures, a ratio game won’t work– But recall percent-passes example– And selecting the base lets you change the

magnitude of the difference• If each system wins on some measures, ratio

games might be possible (but no promises)– May have to try all bases

Page 30: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 30CS 239, Spring 2007

How to WinYour Ratio Game

• For LB metrics, use your system as the base• For HB metrics, use the other as a base• If possible, adjust lengths of benchmarks

– Elongate when your system performs best– Short when your system is worst– This gives greater weight to your strengths

Page 31: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 31CS 239, Spring 2007

Correct Analysis of Ratios

• Discussed in detail in previous lectures

• Generally, harmonic or geometric means are appropriate

– Or use only the raw data

• If someone’s doing something else with ratios, be suspicious

Page 32: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 32CS 239, Spring 2007

• Reference works

• Types of variables

• Guidelines for good graphics charts

• Common mistakes in graphics

• Pictorial games

• Special-purpose charts

Graphical Data Presentation

Page 33: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 33CS 239, Spring 2007

• Edward R. Tufte, The Visual Display of Quantitative Information, Graphics Press, Cheshire, Connecticut, 1983.

• Edward R. Tufte, Envisioning Information, Graphics Press, Cheshire, Connecticut, 1990.

• Darrell Huff, How to Lie With Statistics, W.W. Norton & Co., New York, 1954

Useful References Works

Page 34: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 34CS 239, Spring 2007

• Qualitative

– Ordered (e.g., modem, Ethernet, satellite)

– Unordered (e.g., CS, math, literature)

• Quantitative

– Discrete (e.g., number of terminals)

– Continuous (e.g., time)

Types of Variables

Page 35: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 35CS 239, Spring 2007

Charting Basedon Variable Types

• Qualitative variables usually work best with bar charts or Kiviat graphs

– If ordered, use bar charts to show order

• Quantitative variables work well in X-Y graphs

– Use points if discrete, lines if continuous

– Bar charts sometimes work well for discrete

Page 36: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 36CS 239, Spring 2007

• Principles of graphical excellence• Principles of good graphics• Specific hints for specific situations• Aesthetics• Friendliness• Much of this based on Tufte’s

principles of graphics design

Guidelines for Good Graphics Presentation

Page 37: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 37CS 239, Spring 2007

Principlesof Graphical Excellence

• Graphical excellence is the well-designed presentation of interesting data:

– Substance

– Statistics

– Design

Page 38: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 38CS 239, Spring 2007

Graphical Excellence (2)

• Complex ideas get communicated with:

– Clarity

– Precision

– Efficiency

Page 39: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 39CS 239, Spring 2007

Graphical Excellence (3)

• Viewer gets:

– Greatest number of ideas

– In the shortest time

– With the least ink

– In the smallest space

• Requires telling truth about data

Page 40: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 40CS 239, Spring 2007

Principles of Good Graphics

• Above all else show the data

• Maximize the data-ink ratio

• Erase non-data ink

• Erase redundant data ink

• Revise and edit

Page 41: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 41CS 239, Spring 2007

An Important Caveat

“The principles should not be applied rigidly or in a peevish spirit; they are not logical or mathematically certain; and it is better to violate any principle than to place graceless or inelegant marks on paper.”

Edward Tufte, “The Visual Display of Quantitative Information”

Page 42: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 42CS 239, Spring 2007

Above All ElseShow the Data

y = 1E-05x + 1.3641

R2 = 0.0033

0

1

2

3

4

5

0 5000 10000 15000File size (bytes)

Tim

e to

fet

ch (

seco

nds) Linear model

Where’s the data?

Page 43: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 43CS 239, Spring 2007

Above All ElseShow the Data

y = 1E-05x + 1.3641

R2 = 0.0033

0

1

2

3

4

5

0 5000 10000 15000File size (bytes)

Tim

e to

fet

ch (

seco

nds) Linear model

Limitations of linear model much clearer now

Page 44: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 44CS 239, Spring 2007

Maximize theData-Ink Ratio

1st Qtr

3rd Qtr

010203040506070

80

90

East

West

North

Page 45: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 45CS 239, Spring 2007

Maximize theData-Ink Ratio

0

20

40

60

80

100

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

East West

North

Important features become much clearer

Page 46: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 46CS 239, Spring 2007

Erase Non-Data Ink

05

101520253035404550556065707580859095

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

East

West

North

Note disconcerting vibration effects

Page 47: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 47CS 239, Spring 2007

Erase Non-Data Ink

0

20

40

60

80

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

Eas

t

Wes

t

Nor

th

Page 48: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 48CS 239, Spring 2007

Erase Redundant Data Ink

20.4

27.4

90

20.4

38.634.6

31.6

46.9 45 43.9

30.6

45.9

0

20

40

60

80

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

Eas

t

Wes

t

Nor

th

Page 49: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 49CS 239, Spring 2007

Erase Redundant Data Ink

0

20

40

60

80

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

Eas

t

Wes

t

Nor

th

Page 50: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 50CS 239, Spring 2007

Revise and Edit

0

20

40

60

80

100

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

Default Microsoft Powerpoint Chart

East

West

North

Page 51: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 51CS 239, Spring 2007

Revise and Edit

Remove Decorative Effects

0

20

40

60

80

100

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

East

West

North

Page 52: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 52CS 239, Spring 2007

Revise and Edit

Remove Clutter

0

20

40

60

80

100

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

East

West

North

Page 53: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 53CS 239, Spring 2007

Revise and Edit

Make Legend Simple to Interpret

0

20

40

60

80

100

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

East

West

North

Page 54: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 54CS 239, Spring 2007

Revise and Edit

Eliminate Superfluous Ink

0

20

40

60

80

100

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

East

West

North

Page 55: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 55CS 239, Spring 2007

Revise and Edit

Eliminate Red/Green Distinctions

0

20

40

60

80

100

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

East

West

North

Page 56: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 56CS 239, Spring 2007

Revise and Edit

Choose Better Fonts

0

20

40

60

80

100

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

East

West

North

Page 57: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 57CS 239, Spring 2007

Specific Things to Do

• Give information the reader needs

• Limit complexity and confusion

• Have a point

• Show statistics graphically

• Don’t always use graphics

• Discuss it in the text

Page 58: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 58CS 239, Spring 2007

Give Informationthe Reader Needs

• Show informative axes

– Use axes to indicate range

• Label things fully and intelligently

• Highlight important points on the graph

Page 59: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 59CS 239, Spring 2007

Giving Informationthe Reader Needs

0

20

40

60

80

1 2 3 4

E

W

N

What does any of this mean?

Page 60: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 60CS 239, Spring 2007

Giving Informationthe Reader Needs

0

20

40

60

80

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

Salesin

Millions

MicrosoftContractSigned

East

North

West

Page 61: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 61CS 239, Spring 2007

Limit Complexityand Confusion

• Not too many curves

• Single scale for all curves

• No “extra” curves

• No pointless decoration (“ducks”)

Page 62: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 62CS 239, Spring 2007

0

10

20

30

40

50

60

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

0

20

40

60

80

100

120 West

North

Northeast

Southwest

Mexico

Europe

Japan

East

South

International

Limiting Complexityand Confusion

Page 63: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 63CS 239, Spring 2007

International Sales

0

20

40

60

80

100

120

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

Millionsof

Dollars

Japan

Mexico

Europe

Limiting Complexityand Confusion

Page 64: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 64CS 239, Spring 2007

Have a Point

• Graphs should add information not otherwise available to reader

• Don’t plot data just because you collected it

• Know what you’re trying to show, and make sure the graph shows it

Page 65: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 65CS 239, Spring 2007

Having a Point

• Sales were up 15% this quarter:

0

20

40

60

80

100

120

140

1st Qtr 2nd Qtr

Page 66: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 66CS 239, Spring 2007

Having a Point

User Time of Copy Benchmarks

0

0.01

0.02

0.03

0.04

1 Replica 2 Replicas 3 Replicas 4 Replicas

cp rcp

Page 67: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 67CS 239, Spring 2007

Having a Point

0

500000

1000000

1500000

2000000

2500000

3000000

3500000

4000000

4500000

Modem Ethernet ATM Satellite

Throughput

Latency

Page 68: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 68CS 239, Spring 2007

Having a Point

1

10

100

1000

10000

100000

1000000

0.01 0.1 1 10 100 1000

Throughput (Kbits/sec)

Latency(m s) Ethernet

Modem

ATM

Satellite

Page 69: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 69CS 239, Spring 2007

Show Statistics Graphically• Put bars in a reasonable order

– Geographical

– Best to worst

– Even alphabetic

• Make bar widths reflect interval widths

– Hard to do with most graphing software

• Show confidence intervals on the graph

– Examples will be shown later

Page 70: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 70CS 239, Spring 2007

Don’t Always Use Graphics

• Tables are best for small sets of numbers

– Tufte says 20 or fewer

• Also best for certain arrangements of data

– E.g., 10 graphs of 3 points each

• Sometimes a simple sentence will do

• Always ask whether the chart is the best way to present the information

– And whether it brings out your message

Page 71: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 71CS 239, Spring 2007

Discuss It in the Text

• Figures should be self-explanatory– Many people scan papers, just look at graphs– Good graphs build interest, “hook” readers

• But text should highlight and aid figures– Tell readers when to look at figures– Point out what figure is telling them– Expand on what figure has to say

Page 72: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 72CS 239, Spring 2007

Aesthetics

• Not everyone is an artist

– But figures should be visually pleasing

• Elegance is found in

– Simplicity of design

– Complexity of data

Page 73: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 73CS 239, Spring 2007

Principles of Aesthetics

• Use appropriate format and design• Use words, numbers, drawings together• Reflect balance, proportion, relevant scale• Keep detail and complexity accessible• Have a story about the data (narrative quality)• Do a professional job of drawing• Avoid decoration and chartjunk

Page 74: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 74CS 239, Spring 2007

Use AppropriateFormat and Design

• Don’t automatically draw a graph

– We’ve covered this before

• Choose graphical format carefully

• Sometimes a “text graphic” works best

– Use text placement to communicate numbers

– Very close to being a table

Page 75: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 75CS 239, Spring 2007

GNP: +3.8 IPG: +5.8 CPI: +7.7 Profits: +13.3

CEA: +4.7

DR: +4.5

NABE: +4.5

WEF: +4.5

CBO: +4.4

CB: +4.2

IBM: +4.1

CE: +2.9

NABE: +6.2

IBM: +5.9

CB: +5.5

DR: +5.2

WEF: +4.8

IBM: +6.6

NABE: +6.5

CB: +6.2

WEF: +21

DR: +10.5

IBM: +10.4

CE: +6.5

WEF: 6.8

CB: 6.7

NABE: 6.7

IBM: 6.6

DR: 6.5

CBO: 6.3

CEA: 6.3

Unempl: 6.0

About a year ago, eight forecasters were asked for

their predictions on some key economic indicators.

Here’s how the forecasts stack up against the

probable 1978 results (shown in the black panel).

(New York Times,

Jan. 2, 1979)

Using Text as a Graphic

Page 76: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 76CS 239, Spring 2007

Choosinga Graphical Format

• Many options, more being invented all the time

– Examples will be given later

– See Jain for some commonly useful ones

– Tufte shows ways to get creative

• Choose a format that reflects your data

– Or that helps you analyze it yourself

Page 77: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 77CS 239, Spring 2007

Use Words, Numbers, Drawings Together

• Put graphics near or in text that discusses them

– Even if you have to murder your word processor

• Integrate text into graphics

• Tufte: “Data graphics are paragraphs about data and should be treated as such”

Page 78: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 78CS 239, Spring 2007

Reflect Balance, Proportion, Relevant Scale

• Much of this boils down to “artistic sense”

• Make sure things are big enough to read

– Tiny type is OK only for young people!

• Keep lines thin

– But use heavier lines to indicate important information

• Keep horizontal larger than vertical

– About 50% larger works well

Page 79: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 79CS 239, Spring 2007

Poor Balanceand Proportion

• Sales in the North and West districts were steady through all quarters

• North sales varied widely, significantly outperforming the other districts in the third quarter 0

10

20

30

40

50

60

70

80

90

100

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

Page 80: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 80CS 239, Spring 2007

Better Proportion

• Sales in the North and West districts were steady through all quarters

• North sales varied widely, significantly outperforming the other districts in the third quarter

0

50

100

Q1 Q2 Q3 Q4

Page 81: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 81CS 239, Spring 2007

Keep Detail and Complexity Accessible

• Make your graphics friendly:

– Avoid abbreviations and encodings

– Run words left-to-right

– Explain data with little messages

– Label graphic, don’t use elaborate shadings and a complex legend

– Avoid red/green distinctions

– Use clean, serif fonts in mixed case

Page 82: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 82CS 239, Spring 2007

An Unfriendly Graph

050

100150200250300350400450

1 REPL 2 3 4 5 6 7 8

Time

CP

FIND

FINDGREP

GREP

LS

MAB

RCP

RM

Page 83: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 83CS 239, Spring 2007

A Friendly Version

0

100

200

300

400

1 2 3 4 5 6 7 8

Number of Replicas

Time in Seconds

Copy

Compile

Remove

Note almost no growth incompile/remove times

Page 84: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 84CS 239, Spring 2007

Even Friendlier

0

100

200

300

400

Copy Compile Remove

Benchmark and Number of Replicas

Time in Seconds

Note slower growth incompile and remove times

1 Replica

8 Replicas(note departurefrom linearity)

Page 85: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 85CS 239, Spring 2007

Have a Story About the Data (Narrative Quality)

• May be difficult in technical papers• But think about why you are drawing• Example:

– Performance is controlled by network speed– But it tops out at the high end– And that’s because we hit a CPU bottleneck

Page 86: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 86CS 239, Spring 2007

Showing a StoryAbout the Data

0

20

40

60

0 2 4 6 8 10 12

Network Bandwidth (Mbps)

Transactionsper

SecondCPU bottleneck

reached

Page 87: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 87CS 239, Spring 2007

Do a Professional Jobof Drawing

• This is easy with modern tools

– But take the time to do it right

• Align things carefully

• Check the final version in the format you will use

– I.e., print the PDF one last time before submission

– Or look at your slides on the projection screen

• From the machine you’ll use

• Preferably with the projector you’ll use

Page 88: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 88CS 239, Spring 2007

Avoid Decorationand Chartjunk

• Powerpoint, etc. make chartjunk easy• Avoid clip art, automatic backgrounds, etc.• Remember: the data is the story

– Statistics aren’t boring– Uninterested readers aren’t drawn by cartoons– Interested readers are distracted

• Does removing it change the message?– If not, leave it out

Page 89: Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

Lecture 13Page 89CS 239, Spring 2007

Examples of Chartjunk

0

10

20

30

40

50

60

70

80

90

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

Gridlines!Vibration

Pointless

Fake 3-D Effects

Filled “Floor” Clip Art

In or out?

Filled

“Walls”

Borders and

Fills Galore

Unintentional

Heavy Lines

Filled Labels