Stat 2411 Statistical Methods
Chapter 2: Summarizing data
Summarizing Data
Data are collected to answer some questions. The analysis of the data includes thinking and statistical methods.
Example: 8 lb test Fishing Line
Question: Which type(s) of line are strongest?
2.1 Listing numerical data
• Trilene XL 11.5 11.3 11.7 11.6 11.7
11.4 11.5 11.511.6 11.4
• Trilene XT11.6 11.8 11.7 11.7 11.5
116 11.6 11.8 11.411.7
• Stren 11.1 11.1 11.2 11.0 11.1
11.3 11.2 10.911.0 11.1
Plotting of the dataDot diagram
When Analyzing data, always plot the data!
A dot diagram:XL XT Stren
11.8 * *11.7 * * * * *11.6 * * * * *11.5 * * * *11.4 * * *11.3 * *11.2 * *11.1 * * * *11.0 * *10.9 *
Plotting of the dataBar Chart
• A bar chart – Trilene XL
11.3 11.4 11.5 11.6 11.7
2.2 Stem and Leaf Diagram
1) Separate each observation into 2 parts• Stem: everything but the rightmost digit• Leaf: the final digit
2) Write the stems in a vertical column, then draw a vertical line next to them
3) Write each leaf in a row to the right of its stem
Stem Leaf plotStem Leaf plot
9 10111213
Systolic bp data108 134 100 108 112 112 112 122 116 116 120 108 108 96 114 108 128 114 112 124 90 102 106 124
130 116
820
4
82
Completed Stem Leaf plotCompleted Stem Leaf plot
9 10 11 12 13
06026888882222446660244804
Stem and Leaf Diagram Exercise
Cardiac output in middle aged runners. (Journal of Sports Medicine)
20.9 17.9 19.9 16.0 12.8 23.2 21.221.0 20.9 15.0 22.2 22.2 18.3 19.821.0 15.8 23.6 20.6
Tip: Stem—Ones Leaves—Tenths
12 . 813 . 14 . 15 . 0 816 . 0 17 . 918 . 319 . 8 920 . 6 9 921 . 0 0 2 6 922 . 2 223 . 6
2.3 Frequency DistributionsWith larger data sets it helps to count numbers of values in different summary classes, usually 5-15 classes.
E.g. Suspended solids in agricultural watersheds. (Water Resources Bulletin)
Suspended Solids (ppm) Frequency30-39 840-49 7
50-69 560-69 1170-79 680-89 190-99 2
Frequency Distributions
Look at book for:– Class limits– Upper class limits– Lower class limits– Class marks– Class intervals
2.4 Graphical Representations
• A histogram represents a frequency distribution with bars.
8 7 5
116
1 230-39 40-49 50-59 60-69 70-79 80-89 90-
99
Pie Chart (360 x %)
Tree # % Degrees
Oak 50 62.5% 225
Maple20 25% 90
Ash 10 12.5% 45
80 360
oak
maple
ash
2.5 Two Variable DataScattergram
Cma Chromosome Abnormal %
0.11 20.19 50.51 130.53 151.08 251.62 281.73 362.36 452.72 563.12 593.88 634.18 60
0
20
40
60
80
0.0 1.0 2.0 3.0 4.0 5.0
Cm_
%A
bnor
mal
Two groups can be compared with back to back stem and leaf diagramsE.g. Stopping distances of bikes
Treaded tire Smooth tire34 1 8 935 5
5 366 4 37 5
38 1 39 12 0 40
Or dot diagrams | | | * | ** | | * |** Treaded340 350 360 370 380 390 400 |*** | * | | * | | * | Smooth
When there are associations between sets of data values, plot the data accordingly.
E.g., Snowfall for duluth and White Bear Lake 1972-2000A not very good way to plot the data
WB Lake Duluth130 *120 *110 **
** 100 *** * 90 *****
80 ****** ****** 70 **
*** 60 ** ********** 50 ****
*** 40 *** *** 30 * *** 20
Snowfall plot
0102030405060708090100110120130140
1972 1977 1982 1987 1992 1997
year
sn
ow
_to
tal
Duluth
White Bear
A study of trace metals in South Indian River
12
3
4
5
6
T=top water zinc concentration (mg/L)B=bottom water zinc (mg/L)
1 2 3 4 5 6Top 0.415 0.238 0.390 0.410 0.605 0.609Bottom 0.430 0.266 0.567 0.531 0.7070.716
• One of the first things to do when analyzing data is to PLOT the data
• This is not a useful way to plot the data. There is not a clear distinction between bottom water and top water zinc—even though Bottom>Top at all 6 locations.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Zin
c
Top Bottom
A better way
0.2
0.3
0.4
0.5
0.6
0.7
Zin
c
Top Bottom
Connect points in the same pair.
A better way
0
0.2
0.4
0.6
0.8
0 0.2 0.4 0.6 0.8
Bottom=Top