Upload
mike-gilmore
View
175
Download
0
Tags:
Embed Size (px)
Citation preview
Chapter 2Summarizing and Graphing Data
Professor Mike Gilmore
Middlesex Community College
Spring 2012
1
Why Graphs?
• Describe data
• Explore data
• Compare data
• The goal is to convey a message about the data, rather than to decorate…
2
A Bunch of Data
ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCCACGGCCACCGCTGCCCTGCC CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCACGGCCACCGCTGCCCT CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGACGGCCACCGCTGCCCTGC AAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCACGGCCACCGCTGCCTG CTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGACGGCCACCGCTGCCCTGCC TTTAATTACAGACCTGAAACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCCAA CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCACGGCCACCGCTGCCCT CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGACGGCCACCGCTGCCCTGC AAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCACGGCCACCGCTGCCCTG CTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGACGGCCACCGCTGCCCTGCCA TTTAATTACAGACCTGAAACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCTATT CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCACGGCCACCGCTGCCCT CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGACGGCCACCGCTGCCCTGC AAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCACGGCCACCGCTGCCCT CTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGACGGCCACCGCTGCCCTGCA TTTAATTACAGACCTGAAACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCTGA CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCACGGCCACCGCTGCCCTG CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGACGGCCACCGCTGCCCTGCC AAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCACGGCCACCGCTGCCCTG CTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGACGGCCACCGCTGCCCTGCCA TTTAATTACAGACCTGAAACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCCGTG CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCACGGCCACCGCTGCCCTG CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGACGGCCACCGCTGCCCTGCC AAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCACGGCCACCGCTGCCCTG CTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGACGGCCACCGCTGCCCTGCCA TTTAATTACAGACCTGAAACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCCGTA CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCACGGCCACCGCTGCCCTG CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGACGGCCACCGCTGCCCTGCC AAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCACGGCCACCGCTGCCCTG CTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGACGGCCACCGCTGCCCTGCCT TTTAATTACAGACCTGAAACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCCTAA CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCACGGCCACCGCTGCCCTG CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGACGGCCACCGCTGCCCTGCC AAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCACGGCCACCGCTGCCCTC CTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGTTTAATTACAGACCTGAACACG
3
Frequency Table
Nucleotide Frequency
A 787
C 712
T 79
G 723
Nucleotide Frequency
ATG 51
CCG 32
AATTG 1
GAGA 3
4
Center
155 142 149 130151 163 151 142156 133 138 161128 144 172 137151 166 147 163145 116 136 158114 165 169 145150 150 150 158151 145 152 140170 129 188 156
?
Bin Frequency
109.5 0
119.5 2
129.5 2
139.5 5
149.5 9
159.5 13
169.5 6
179.5 2
189.5 1
More 0
7
Variation
155 142 149 130151 163 151 142156 133 138 161128 144 172 137151 166 147 163145 116 136 158114 165 169 145150 150 150 158151 145 152 140170 129 188 156
?
Bin Frequency
109.5 0
119.5 2
129.5 2
139.5 5
149.5 9
159.5 13
169.5 6
179.5 2
189.5 1
More 0
8
Distribution
155 142 149 130151 163 151 142156 133 138 161128 144 172 137151 166 147 163145 116 136 158114 165 169 145150 150 150 158151 145 152 140170 129 188 156
?
Bin Frequency
109.5 0
119.5 2
129.5 2
139.5 5
149.5 9
159.5 13
169.5 6
179.5 2
189.5 1
More 0
9
Outliers
155 142 149 130151 163 151 142156 133 138 161228 144 172 137151 166 147 163145 316 136 158114 165 169 145150 150 150 158151 145 152 140170 129 488 156
?
0
2
4
6
8
10
12
14
Frequency
Frequency
Bin Frequency
109.5 0
119.5 1
129.5 1
139.5 5
149.5 9
159.5 13
169.5 6
179.5 2
189.5 0
199.5 0
209.5 0
219.5 0
229.5 1
239.5 0
249.5 0
259.5 0
269.5 0
279.5 0
289.5 0
299.5 0
More 2
10
Change Over Time
http://www.ted.com/talks/lang/en/hans_rosling_at_state.html
http://www.maps4kids.com/vizdata_pop.html
http://www.gapminder.org/ 11
Frequency Table
• A frequency table shows how a data set is partitioned among all of several categories (or classes) by listing all of the categories along with the number of data values in each of the categories.
13
Simple Frequency Table
Cumulative Relative
Grade Frequency
A 4 = 4
B 4 + 7 = 11
C 11 + 9 = 20
D 20 + 3 = 23
F 23 + 2 = 25
Grade Frequency
A 4 / 25 = 0.16
B 7 / 25 = 0.28
C 9 / 25 = 0.36
D 3 / 25 = 0.12
F 2 / 25 = 0.08
Statistical Reasoning, Bennett, et.al., 3rd edition
14
Frequency Table Terms for Quantitative Categories
• Lower class limits
• Upper class limits
• Class boundaries
• Class midpoints
• Class width
– No gaps between classes
15
Constructing a Frequency Table
1. Determine number of classes
2. Calculate class width
3. Choose first lower class limit
4. List all lower class limits
5. List all upper class limits
6. Tally each data point next to appropriate class limits
17
Histogram
• A histogram is a graph of bars of equal width drawn adjacent to each other (without gaps). The horizontal scale represents classes of quantitative data values. The vertical scale represents frequencies.
• What characteristic of a data set can be better understood by constructing a histogram?
21
Pareto Chart
• When we want to attract attention to more important data.
• Used for qualitative data, nominal not ordinal – WHY?
• Bars arranged in descending order by frequencies.
30
Pie Chart
• Also for qualitative data
32http://assistantvillageidiot.blogspot.com/2007/11/why-would-they-lie-huh.html
“Bad” Graphs
• Graphics can offer clear and meaningful summaries of statistical data. However, even well-made graphics can be misleading if we are not careful in interpreting them, and poorly made graphics are almost always misleading. Moreover, some people use graphics in deliberately misleading ways.
37
39
Perceptual Distortion – 3D – WORSE
What is the right thing to do here?
Statistical Reasoning, Bennett, et.al., 3rd edition
Partial Data
http://www.yale.edu/ynhti/curriculum/units/2008/6/08.06.06.x.html