Upload
vuongquynh
View
229
Download
4
Embed Size (px)
Citation preview
10/18/2016
1
ORGANIZING AND GRAPHING
QUANTITATIVE DATA
36
October 19, 2016
Organizing &Graphing Quantitative Data
• Ordered array
• Frequency Distributions
– Constructing Frequency Distribution Tables
– Relative and Percentage Distributions
• Graphing Grouped Data
– Stem and leaf plots
– Histograms
– Polygons
37
Organizing & Grouping Data
• To facilitate the calculation of various descriptive measures such as percentages and averages (Before the days of computers)
• The main purpose in grouping data now is summarization
• Summarization is a way of making it easier to understand the information in data
38
Ordered array
• A first step in organizing data
• An ordered array is a
listing of the values of a collection (either population or sample) in order of magnitude from the smallest value to the largest value.
• If the number of measurements to be ordered is of any appreciable size, the use of a computer is highly desirable.
40
10/18/2016
2
STEM-AND-LEAF DISPLAYS
Definition
In a stem-and-leaf display of quantitative data, each value is divided into two portions – a stem and a leaf. The leaves for each stem are shown separately in a display.
43
Example 2-8
The following are the scores of 30 college students on a statistics test:
Construct a stem-and-leaf display.
44
75
69
83
52
72
84
80
81
77
96
61
64
65
76
71
79
86
87
71
79
72
87
68
92
93
50
57
95
92
98
Solution 2-8
To construct a stem-and-leaf display for these scores, we split each score into two parts. The first part contains the first digit, which is called the stem. The second part contains the second digit, which is called the leaf.
45
Solution 2-8
We observe from the data that the stems for all scores are 5, 6, 7, 8, and 9 because all the scores lie in the range 50 to 98
46
Figure 2.13 Stem-and-leaf display.
47
5
6
7
8
9
2
5
Leaf for 75
Leaf for 52
Stems
10/18/2016
3
Solution 2-8
After we have listed the stems, we read the leaves for all scores and record them next to the corresponding stems on the right side of the vertical line.
48
Figure 2.14 Stem-and-leaf display of test scores.
5
6
7
8
9
2 0 7
5 9 1 8 4
5 9 1 2 6 9 7 1 2
0 7 1 6 3 4 7
6 3 5 2 2 8
49
Figure 2.15 Ranked stem-and-leaf display of test
scores.
5
6
7
8
9
0 2 7
1 4 5 8 9
1 1 2 2 5 6 7 9 9
0 1 3 4 6 7 7
2 2 3 5 6 8
50
Example 2-9
The following data are monthly rents paid by a sample of 30 households selected from a small city.
Construct a stem-and-leaf display for these data.
51
880
1210
1151
1081
985
630
721
1231
1175
1075
932
952
1023
850
1100
775
825
1140
1235
1000
750
750
915
1140
965
1191
1370
960
1035
1280
Solution 2-9
6
7
8
9
10
11
12
13
30
75 50 21 50
80 25 50
32 52 15 60 85 65
23 81 35 75 00
91 51 40 75 40 00
10 31 35 80
70
52
Figure 2.16Stem-and-leaf display of rents.
53
10/18/2016
4
54
Information from a stem & leaf displays…
• Provide information regarding the range of the data set
• Shows the location of the highest concentration of measurements
• Reveals the presence or absence of symmetry.
55
Example 2-10
The following stem-and-leaf display is prepared for the number of hours that 25 students spent working on computers during the last month.
56
Example 2-10
Prepare a new stem-and-leaf display by grouping the stems.
57
0
1
2
3
4
5
6
7
8
6
1 7 9
2 6
2 4 7 8
1 5 6 9 9
3 6 8
2 4 4 5 7
5 6
Solution 2-10
58
0 – 2 3 – 5
6 – 8
6 * 1 7 9 * 2 6
2 4 7 8 * 1 5 6 9 9 * 3 6 8
2 4 4 5 7 * * 5 6
Figure 2.17 Grouped stem-and-leaf display.
Stem-and-leaf displays
• Most effective with relatively small data sets.
• As a rule they are not suitable for use in annual reports or other communications aimed at the general public.
• They are primarily of value in helping researchers and decision makers understand the nature of their data.
59
10/18/2016
5
Frequency Distributions
60
Frequency Distributions
• A frequency distribution for quantitative data lists
–all the classes
and
–the number of values that belong to each class.
• Data presented in the form of a frequency distribution are called grouped data.
61
62
Frequency Distributions
63
Weekly Earnings
(dollars)
Number of Employees
f
401 to 600
601 to 800
801 to 1000
1001 to 1200
1201 to 1400
1401 to 1600
9
22
39
15
9
6
Table 2.7 Weekly Earnings of 100 Employees of a Company
Variable
Third class
Lower limit of the sixth class
Upper limit of the sixth class
Frequency of the third class
Frequency column
Class width
Essential Question :
How do we construct a frequency distribution table?
Process of Constructing a Frequency Table
10/18/2016
6
Frequency Distributions
66
Weekly Earnings
(dollars)
Number of Employees
f
Weekly Earnings of 100 Employees of a Company
STEP 1. Determine the tentative number of classes (k)
k = 1 + 3.322 log N
Always round – off
Note: The number of classes should be between 5 and 15. The actual number of classes may be affected by convenience or other subjective factors
Process of Constructing a Frequency Table
STEP 2: Determine the range (R).
R = Highest Value – Lowest Value
STEP 3. Find the class width by dividing the range by the number of classes.
(Always round – off )
k
Rc
classesofnumber
Rangewidthclass
STEP 4. Write the classes or categories starting with the lowest score. Stop when the class already includes the highest score.
Add the class width to the starting point to get the second lower class limit. Add the class width to the second lower class limit to get the third, and so on. List the lower class limits in a vertical column and enter the upper class limits, which can be easily identified at this stage.
STEP 5. Determine the frequency for each class by referring to the tally columns and present the results in a table.
10/18/2016
7
When constructing frequency tables, the following guidelines should be followed.
The classes must be mutually exclusive. That is, each score must belong to exactly one class.Include all classes, even if the frequency might be zero.
All classes should have the same width, although it is sometimes impossible to avoid open –ended intervals such as “65 years or older”.
The number of classes should be between 5 and 15.
Let’s Try!!!
• Time magazine collected information on all 464 people who died from gunfire in the Philippines during one week. Here are the ages of 50 men randomly selected from that population.
• Construct a frequency distribution table.
19 18 30 40 41 33 73 25
23 25 21 33 65 17 20 76
47 69 20 31 18 24 35 24
17 36 65 70 22 25 65 16
24 29 42 37 26 46 27 63
21 27 23 25 71 37 75 25
27 23
Determine the tentative number of classes (K).
K = 1 + 3. 322 log N
= 1 + 3.322 log 50
= 1 + 3.322 (1.69897)
= 6.64
*Round – off the result to the next integer if the decimal part exceeds 0.
K = 7
Determine the range.
R = Highest Value – Lowest Value
R = 76 – 16 = 60
10/18/2016
8
Find the class width (c).
* Round – off the quotient if the decimal part exceeds 0.
k
Rc
classesofnumber
Rangewidthclass
957.87
60c
Write the classes starting with lowest score.
Classes Tally Marks Freq.
70 – 78
61 – 6952 – 6043 – 5134 – 4225 – 33
16 – 24
/////
/////
///////-///////-/////-////
/////-/////-/////-//
5
5027
14
17
Using Table:
• What is the lower class limit of the highest class?
• Upper class limit of the lowest class?
• Find the class mark of the class 43 – 51.
• What is the frequency of the class 16 – 24?
Concept of true class boundaries
81
Classes True Class boundaries
Tally Marks Freq. x
70 – 7861 – 6952 – 6043 – 5134 – 4225 – 3316 – 24
69.5 – 78.560.5 – 69.551.5 – 60.5 42.5 – 51.533.5 – 42.524.5 – 33.515.5 – 24.5
//////////
///////-///////-/////-/////////-/////-/////-//
550
2714 17
74655647382920
Example
Table 2.9 gives the total home runs hit by all players of each of the 30 Major League Baseball teams during the 2002 season.
Construct a frequency distribution table.
83
10/18/2016
9
Table 2.9 Home Runs Hit by Major League Baseball
Teams During the 2002 Season
Team Home Runs Team Home Runs
Anaheim
Arizona
Atlanta
Baltimore
Boston
Chicago Cubs
Chicago White Sox
Cincinnati
Cleveland
Colorado
Detroit
Florida
Houston
Kansas City
Los Angeles
152
165
164
165
177
200
217
169
192
152
124
146
167
140
155
Milwaukee
Minnesota
Montreal
New York Mets
New York Yankees
Oakland
Philadelphia
Pittsburgh
St. Louis
San Diego
San Francisco
Seattle
Tampa Bay
Texas
Toronto
139
167
162
160
223
205
165
142
175
136
198
152
133
230
187
84
Solution 2-3
2.215
124230classeach of width eApproximat
85
Now we round this approximate width to a convenient number – say, 22.
Solution 2-3
The lower limit of the first class can be taken as 124 or any number less than 124. Suppose we take 124 as the lower limit of the first class. Then our classes will be
124 – 145, 146 – 167, 168 – 189, 190 – 211,
and 212 - 233
86
Table 2.10 Frequency Distribution for the Data of
Table 2.9
87
Total Home Runs Tally f
124 – 145
146 – 167
168 – 189
190 – 211
212 - 233
|||| |
|||| |||| |||
||||
||||
|||
6
13
4
4
3
∑f = 30
Relative Frequency and Percentage Distributions
88
Relative Frequency and Percentage Distributions
A relative frequency distribution lists the categories and the proportion with which each occurs
Calculating Relative Frequency of a Category
89
sfrequencie all of Sum
category that ofFrequency category a offrequency lativeRe
10/18/2016
10
Relative Frequency and Percentage Distributions
Relative Frequency and Percentage Distributions
90
100 frequency) (Relative Percentage
sfrequencie all of Sum
class that ofFrequency class a offrequency Relative
f
f
Example 2-4
Calculate the relative frequencies and percentages for Table 2.10
91
Solution 2-4
92
Total Home
RunsClass Boundaries
Relative Frequency
Percentage
124 – 145
146 – 167
168 – 189
190 – 211
212 - 233
123.5 to less than 145.5
145.5 to less than 167.5
167.5 to less than 189.5
189.5 to less than 211.5
211.5 to less than 233.5
.200
.433
.133
.133
.100
20.0
43.3
13.3
13.3
10.0
Sum = .999 Sum = 99.9%
Table 2.11 Relative Frequency and Percentage Distributions for
Table 2.10Graphing Grouped Data
93
Histogram • A way of presenting grouped frequency distribution
graphically
• A histogram is a bar graph in which classes are marked on the horizontal axis
And
• The frequencies, relative frequencies/percentages are marked on the vertical axis.
• The frequencies, relative frequencies, or percentages are represented by the heights of the bars.
• In a histogram, the bars are drawn adjacent to each other. 94
Figure 2.3 Frequency histogram for Table 2.10.
95
124 -145
146 -167
168 -
189
190 -
211
212 -
233Total home runs
15
12
9
6
3
0
Fre
qu
en
cy
10/18/2016
11
Figure 2.3 Frequency histogram for Table 2.10.
96
124 146 168 190 212
Total home runs
15
12
9
6
3
0
Fre
qu
en
cy
Figure 2.4 Relative frequency histogram for Table
2.10.
97
124 -145
146 -167
168 -
189
190 -
211
212 -
233Total home runs
.50
.40
.30
.20
.10
0
Re
lati
ve
Fre
qu
en
cy
98
Information from a histogram…
• Provides information regarding the range of the data set
• Shows the location of the highest concentration of measurements
• Reveals the presence or absence of symmetry.
99
Graphing Grouped Data cont.
Polygon A graph formed by joining the midpoints of the tops of successive bars in a histogram with straight lines is called a polygon.
A special kind of line graph
100 101
10/18/2016
12
Figure 2.5 Frequency polygon for Table 2.10.
102
124 -145
146 -167
168 -
189
190 -
211
212 -
233
15
12
9
6
3
0
Fre
qu
en
cy
Figure 2.6 Frequency Distribution curve.
103
Fre
qu
en
cy
x
Some reflections on similarities of Histograms and Stem & Leaf
Displays
104 105
Advantage of the stem-and-leaf display over the histogram
• It preserves the information contained in the individual measurements.
• They can be constructed during the tallying process, so the intermediate step of preparing an ordered array is eliminated.
106
Example 2-5
The following data give the average travel time from home to work (in minutes) for 50 states. The data are based on a sample survey of 700,000 households conducted by the Census Bureau (USA TODAY, August 6, 2001).
107
10/18/2016
13
Example 2-5
108
22.4
19.7
21.6
15.4
21.1
18.2
27.0
21.9
22.1
25.4
23.7
21.7
23.2
19.6
24.9
19.8
17.6
16.0
21.4
25.5
26.7
17.7
16.1
23.8
20.1
23.4
22.5
22.3
21.9
17.1
23.5
23.7
24.4
21.9
22.5
21.2
28.7
15.6
24.3
29.2
19.9
22.7
26.7
26.1
31.2
23.6
24.2
22.7
22.6
20.8
Construct a frequency distribution table. Calculate the relative frequencies and percentages for all classes.
Solution 2-5
63.26
4.152.31classeach of width eApproximat
109
Solution 2-5
Class Boundaries fRelative
Frequency Percentage
15 to less than 18
18 to less than 21
21 to less than 24
24 to less than 27
27 to less than 30
30 to less than 33
7
7
23
9
3
1
.14
.14
.46
.18
.06
.02
14
14
46
18
6
2
Σf = 50 Sum = 1.00 Sum = 100%
110
Table 2.12 Frequency, Relative Frequency, and Percentage
Distributions of Average Travel Time to Work
Work to do!
• Please read chapter 2 from the course book and solve the exercises at the end of chapter
111