Upload
vanminh
View
226
Download
2
Embed Size (px)
Citation preview
A raw score is the score obtained by a
particular student in a particular test
before it is being processed or
arranged.
Example: Raw Score
3
78, 74, 65, 74, 74, 67, 63,
67, 80, 58 74, 50, 65, 74,
86, 78, 63, 65, 80, 89
The raw scores for 20 students
in a test
The Raw Score is a Variable
4
Data in raw form are usually not easy to use for decision making Some type of organization is needed
Table
Graph
Techniques reviewed here: Ordered Array
Stem-and-Leaf Display
Frequency Distributions and Histograms
Bar charts and pie charts
Contingency tables
Organizing and Presenting Data Graphically
5
Interval Data
Ordered Array
Stem-and-Leaf
Display Histogram Polygon Ogive
Frequency Distributions
and
Cumulative Distributions
Tables and Charts for Interval Data
6
A sorted list of data:
Shows range (minimum to maximum)
Provides some signals about variability
within the range
May help identify outliers (unusual observations)
If the data set is large, the ordered array is
less useful
The Ordered Array
7
Data in raw form (as collected):
24, 26, 24, 21, 27, 27, 30, 41, 32, 38
Data in ordered array from smallest to
largest:
21, 24, 24, 26, 27, 27, 30, 32, 38, 41
(continued)
The Ordered Array
8
A simple way to see distribution details in a
data set
METHOD: Separate the sorted data series
into leading digits (the stem) and
the trailing digits (the leaves)
Stem-and-Leaf Diagram
9
Stem-and-Leaf
The major advantage to organizing the data into stem-and-leaf display is that we get a quick visual picture of the shape of the distribution.
Stem-and-leaf display is a statistical technique to present a set of data. Each numerical value is divided into two parts. The leading digit(s) becomes the stem and the trailing digit the leaf. The stems are located along the vertical axis, and the leaf values are stacked against each other along the horizontal axis.
Advantage of the stem-and-leaf display over a frequency distribution - the identity of each observation is not lost.
10
Here, use the 10’s digit for the stem
unit:
Data in ordered array: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
21 is shown as
38 is shown as
Stem Leaf
2 1
3 8
Example
11
Example
Suppose the seven observations in the 90 up to 100 class are: 96, 94, 93, 94, 95, 96, and 97.
The stem value is the leading digit or digits, in this case 9. The leaves are the trailing digits. The stem is placed to the left of a vertical line and the leaf values to the right. The values in the 90 up to 100 class would appear as
Then, we sort the values within each stem from smallest to largest. Thus, the second row of the stem-and-leaf display would appear as follows:
12
Completed stem-and-leaf
diagram: Stem Leaves
2 1 4 4 6 7 7
3 0 2 8
4 1
(continued)
Data in ordered array: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
Example
13
Using the 100’s digit as the stem:
Round off the 10’s digit to form the
leaves
613 would become 6 1
776 would become 7 8
. . .
1224 becomes 12 2
Stem Leaf
Using other stem units
14
Using the 100’s digit as the
stem:
The completed stem-and-leaf
display:
Stem Leaves
(continued)
6 1 3 6
7 2 2 5 8
8 3 4 6 6 9 9
9 1 3 3 6 8
10 3 5 6
11 4 7
12 2
Data:
613, 632, 658, 717,
722, 750, 776, 827,
841, 859, 863, 891,
894, 906, 928, 933,
955, 982, 1034,
1047,1056, 1140,
1169, 1224
Using other stem units
15
Stem-and-leaf: Another Example
Listed in Table 4–1 is the number of 30-second radio advertising spots purchased by each of the 45 members of the Greater Buffalo Automobile Dealers Association last year. Organize the data into a stem-and-leaf display. Around what values do the number of advertising spots tend to cluster? What is the fewest number of spots purchased by a dealer? The largest number purchased?
16
What is a Frequency Distribution?
A frequency distribution is a list or a table …
containing class groupings (categories or ranges within which the data fall) ...
and the corresponding frequencies with which data fall within each grouping or category
Tabulating Numerical Data: Frequency Distributions
18
A frequency distribution is a way to
summarize data
The distribution condenses the raw data
into a more useful form...
and allows for a quick visual interpretation
of the data
Why Use Frequency
Distributions?
19
Score
(X)
Frequency
(f)
50
58
63
65
67
74
78
80
86
89
1
1
2
3
2
5
2
2
1
1
Total
20
Frequency distribution table
for ungrouped data
Frequency Distribution (ungrouped
data)
20
A Frequency
Distribution is a
grouping of data into
mutually exclusive
categories showing
the number of
observations in
each class.
Frequency Distribution (grouped
data)
21
EXAMPLE – Creating a Frequency
Distribution Table
Ms. Kathryn Ball of AutoUSA
wants to develop tables, charts,
and graphs to show the typical
selling price on various dealer
lots. The table on the right
reports only the price of the 80
vehicles sold last month at
Whitner Autoplex.
22
Constructing a Frequency Table -
Example
Step 1: Decide on the number of classes.
A useful recipe to determine the number of classes (k) is the “2 to the k rule.” such that 2k > n.
There were 80 vehicles sold. So n = 80. If we try k = 6, which means we would use 6 classes, then 26 = 64, somewhat less than 80. Hence, 6 is not enough classes. If we let k = 7, then 27 128, which is greater than 80. So the recommended number of classes is 7.
Step 2: Determine the class interval or width.
The formula is: i (H-L)/k where i is the class interval, H is the highest observed value, L is the lowest observed value, and k is the number of classes.
($35,925 - $15,546)/7 = $2,911
Round up to some convenient number, such as a multiple of 10 or 100. Use a class width of $3,000
23
Largest
observation
Collect data
Bills
42.19
38.45
29.23
89.35
118.04
110.46
0.00
72.88
83.05
.
.
(There are 200 data points
Prepare a frequency distribution How many classes to use?
Number of observations Number of classes
Less then 50 5-7
50 - 200 7-9
200 - 500 9-10
500 - 1,000 10-11
1,000 – 5,000 11-13
5,000- 50,000 13-17
More than 50,000 17-20
Class width = [Range] / [# of classes]
[119.63 -0] / [8] = 14.95 15
Largest
observation Largest
observation
Smallest
observation Smallest
observation Smallest
observation
Smallest
observation Largest
observation
NO of Class= 1 +3.3 log (n)
n: No of data/observation
Guide
line
Or Use No. of Class = 1 + 3.3log(n)
Guide line
24
Step 4: Tally the
vehicle selling prices
into the classes.
Step 5: Count the
number of items in
each class.
Constructing a Frequency Table -
Example
26
Frequency Distribution –
Characteristics
Class midpoint: A point that divides a class into
two equal parts. This is the average of the
upper and lower class limits.
Class frequency: The number of observations
in each class.
Class interval: The class interval is obtained by
subtracting the lower limit of a class from the
lower limit of the next class.
27
Relative Frequencies
Class frequencies can be converted to relative class frequencies to show the fraction of the total number of observations in each class.
A relative frequency captures the relationship between a class total and the total number of observations.
28
28
Relative Frequency Distribution
To convert a frequency distribution to a relative frequency
distribution, each of the class frequencies is divided by the
total number of observations.
29
Score
(X)
Frequency
(f)
Relative
Frequency
Percentage
(%)
50
1
0.05
5
58
1
0.05
5
63
2
0.10
10
65
3
0.15
15
67
2
0.10
10
74
5
0.25
25
78
2
0.10
10
80
2
0.10
10
86
1
0.05
5
89
1
0.05
5
f = 20
Example
30
Relative frequency =
Example: At score 65
Relative frequency =
Percentage = Relative frequency x 100
Example: At score 65
Percentage = 0.15 x 100 = 15%
f
f
students ofnumber Total
score particular aat students ofNumber
15.020
3
Relative Frequencies – Definition
31
The three commonly used graphic forms are:
Histograms
Frequency polygons
Cumulative frequency distributions
Ogive
Graphic Presentation of a Frequency
Distribution
32
Histogram for a frequency distribution based on quantitative data
is very similar to the bar chart showing the distribution of
qualitative data. The classes are marked on the horizontal axis
and the class frequencies on the vertical axis. The class
frequencies are represented by the heights of the bars.
Histogram
33
Frequency Polygon
A frequency polygon also shows the shape of a distribution and is similar to a histogram.
It consists of line segments connecting the points formed by the intersections of the class midpoints and the class frequencies.
35
Frequency Polygon: Daily High Temperature
0
1
2
3
4
5
6
7
5 15 25 35 45 55 More
Fre
qu
en
cy
Class Midpoints
Class
10 but less than 20 15 3
20 but less than 30 25 6
30 but less than 40 35 5
40 but less than 50 45 4
50 but less than 60 55 2
Frequency Class
Midpoint
(In a percentage
polygon the vertical axis
would be defined to
show the percentage of
observations per class)
Example: Frequency Polygon
36
0
1
2
3
4
5
6
1 2 3 4 5 6 7 8 9 10
Skor
Kekera
pan
0
1
2
3
4
5
6
1 2 3 4 5 6 7 8 9 10
Score
Fre
qu
en
cy
50 60 70 80 90
Example: Frequency Polygon
37
0.00
0.05
0.10
0.15
0.20
0.25
0.30
1 2 3 4 5 6 7 8 9 10
Score
Rela
tive F
req
uen
cy
50 60 70 80 90
Example: Relative Frequency
Polygon
38
0
5
10
15
20
25
30
1 2 3 4 5 6 7 8 9 10 11
Score
Perc
en
t (%
)
50 60 70 80 90
Example: Percent Graph
39
Class
10 but less than 20 3 15 3 15
20 but less than 30 6 30 9 45
30 but less than 40 5 25 14 70
40 but less than 50 4 20 18 90
50 but less than 60 2 10 20 100
Total 20 100
Percentage Cumulative Percentage
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Frequency Cumulative
Frequency
Cumulative Frequency Distribution
43
Ogive: Daily High Temperature
0
20
40
60
80
100
10 20 30 40 50 60
Cu
mu
lati
ve
Pe
rce
nta
ge
Class Boundaries (Not Midpoints)
Class
Less than 10 10 0
10 but less than 20 20 15
20 but less than 30 30 45
30 but less than 40 40 70
40 but less than 50 50 90
50 but less than 60 60 100
Cumulative
Percentage
Lower class
boundary
The Ogive (Cumulative % Polygon)
44
Score
(X)
Frequency
(f)
Cumulative
Frequency (cf)
Cumulative Relative
Frequency (crf)
Cumulative
Percent (cp)
50
1
1
0.05
5
58
1
2
0.10
10
63
2
4
0.20
20
65
3
7
0.35
35
67
2 9
0.45
45
74
5
14
0.70
70
78
2
16
0.80
80
80
2
18
0.90
90
86
1
19
0.95
95
89
1
20
1.00
100
f = 20
Cumulative Relative Frequency
Distribution
45
0
5
10
15
20
25
1 2 3 4 5 6 7 8 9 10
Score
Cu
mu
lati
ve
Fre
qu
en
cy
18 students obtain
score 85 or less
50 60 70 80 90
Cumulative Frequency Curve
46
Grouped Data – Cumulative Frequency
Distribution and Cumulative Percent
Class
Interval
(CI)
(score X)
Class Limit
(CL)
(score X)
Class Mid
Point
(m)
Frequency
(less than
Upper
Class Limit
(UCL))
(f)
Relative
Frequency
(less than
UCL)
(cf)
Cumulative
Relative
Frequency
(less than
UCL)
(crf)
Cumulative
Percent (less
than UCL)
(cp)
50 – 54
55 – 59
60 – 64
65 – 69
70 – 74
75 – 79
80 – 84
85 – 89
49.5 – 54.5
54.5 – 59.5
59.5 – 64.5
64.5 – 69.5
69.5 – 74.5
74.5 – 79.5
79.5 – 84.5
84.5 – 89.5
52
57
62
67
72
77
82
87
1
1
2
5
5
2
2
2
1
2
4
9
14
16
18
20
0.05
0.10
0.20
0.45
0.70
0.80
0.90
1.00
5
10
20
45
70
80
90
100
47
0
5
10
15
20
1 2 3 4 5 6 7 8 9
Score
Cu
mu
lati
ve F
req
uen
cy
100%
75%
50%
25%
0
Cumulative
Percent
49.5 54.5 59.5 64.5 69.5 74.5 79.5 84.5 89.5
Cumulative Frequency Curve and
Cumulative Percent for Grouped Data
48
Orgive is a smooth cumulative frequency curve.
The curve moves from the left and increases
smoothly to the right.
The smooth increase is called monotonic.
OGIVE
Ogive
50
Categorical
Data
Graphing Data
Pie
Charts
Pareto
Diagram
Bar
Charts
Tabulating Data
Summary Table
Tables and Charts for Categorical Data
51
Investment Amount Percentage Type (in thousands $) (%) Stocks 46.5 42.27 Bonds 32.0 29.09 CD 15.5 14.09 Savings 16.0 14.55 Total 110.0 100.0
(Variables are
Categorical)
Summarize data by category
The Summary Table
52
Bar charts and Pie charts are often
used for qualitative
(category/nominal) data
Height of bar or size of pie slice
shows the frequency or percentage
for each category
Bar and Pie Charts
53
Bar Chart: Example
Investor's Portfolio
0 10 20 30 40 50
Stocks
Bonds
CD
Savings
Amount in $1000's
Investment Amount Percentage Type (in thousands $) (%) Stocks 46.5 42.27 Bonds 32.0 29.09 CD 15.5 14.09 Savings 16.0 14.55 Total 110.0 100.0
Current Investment Portfolio
55
Percentages
are rounded to
the nearest
percent
Current Investment Portfolio
Savings
15%
CD
14%
Bonds
29%
Stocks
42%
Investment Amount Percentage Type (in thousands $) (%) Stocks 46.5 42.27 Bonds 32.0 29.09 CD 15.5 14.09 Savings 16.0 14.55 Total 110.0 100.0
Pie Charts: Example
57
Pareto Diagram
Used to portray categorical data
A bar chart, where categories are shown in
descending order of frequency
A cumulative polygon is often shown in the
same graph
Used to separate the “vital few” from the “trivial
many”
59
cu
mu
lativ
e %
investe
d
(line g
rap
h)
% i
nveste
d i
n e
ach
cate
go
ry
(bar
gra
ph
)
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
Stocks Bonds Savings CD
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Current Investment Portfolio
Pareto Diagram: Example
60
Contingency Tables
A scatter diagram requires that both of the
variables be at least interval scale.
What if we wish to study the relationship
between two variables when one or both are
nominal or ordinal scale? In this case we tally
the results in a contingency table.
61
Contingency Tables – Example
A manufacturer of preassembled windows produced 50 windows yesterday. This morning the quality assurance inspector reviewed each window for all quality aspects. Each was classified as acceptable or unacceptable and by the shift on which it was produced. Thus we reported two variables on a single item. The two variables are shift and quality. The results are reported in the following table.
62
Contingency Table for Investment Choices
($1000’s) Investment Investor A Investor B Investor C Total Category
Stocks 46.5 55 27.5 129 Bonds 32.0 44 19.0 95 CD 15.5 20 13.5 49 Savings 16.0 28 7.0 51 Total 110.0 147 67.0 324
(Individual values could also be expressed as percentages of the overall total,
percentages of the row totals, or percentages of the column totals)
Contingency Table
63
Example
To conduct an efficient advertisement
campaign the relationship between
occupation and newspapers readership is
studied. The following table was created
Blue Collar White collar Professional
G&M 27 29 33
Post 18 43 51
Star 38 15 24
Sun 37 21 18
Contingency Table: Example
64
Solution
If there is no relationship between
occupation and newspaper read, the bar
charts describing the frequency of
readership of newspapers should look
similar across occupations.
Contingency Table: Example
65
Blue
0
10
20
30
40
1 2 3 4
Blue-collar workers prefer
the “Star” and the “Sun”.
White-collar workers and
professionals mostly read
the“Post” and the “Globe
and Mail”
White
0
10
20
30
40
50
1 2 3 4
Prof
0
10
20
30
40
50
60
1 2 3 4
Contingency Table: Example
66
We create a contingency table.
This table lists the frequency for each
combination of values of the two
variables.
We can create a bar chart that
represent the frequency of occurrence
of each combination of values.
Graphing the Relationship Between
Two Nominal Variables
67
Data can be classified according to the time it is collected.
Cross-sectional data are all collected at the same time.
Time-series data are collected at successive points in time.
Time-series data is often depicted on a line chart (a plot of the variable over time).
Describing Time-Series Data
68
Example
The total amount of income tax paid by
individuals in 1987 through 1999 are listed
below.
Draw a graph of this data and describe the
information produced.
Line Chart
69
Line Chart
0
200,000
400,000600,000
800,000
1,000,000
1,200,000
87 88 89 90 91 92 93 94 95 96 97 98 99
For the first five years – total tax was relatively flat From 1993 there was a rapid increase in tax revenues.
Line charts can be used to describe nominal data time series.
Line Chart
70
Present data in a way that provides substance, statistics and design
Communicate complex ideas with clarity, precision and efficiency
Give the largest number of ideas in the most efficient manner
Excellence almost always involves several dimensions
Tell the truth about the data
Principles of Graphical Excellence
71
Providing information concerning the monthly bills of new subscribers in the first month after signing on with a telephone company. (Refer to file) Collect data
Prepare a frequency distribution
Draw a histogram
APPLICATION EXAMPLE
72
42.19 103.15 39.21 89.5 75.71 2.42 8.37 77.21 1.62 109.08 28.77 104.4 35.32 115.78 13.9 6.95
38.45 94.52 48.54 13.36 88.62 1.08 7.18 72.47 91.1 2.45 9.12 2.88 117.69 0.98 9.22 6.48
29.23 26.84 93.31 44.16 99.5 76.69 11.07 0 10.88 21.97 118.75 65.9 106.84 19.45 109.94 11.64
89.35 93.93 104.88 92.97 85 13.62 1.47 5.64 30.62 17.12 0 20.55 8.4 0 10.7 83.26
118.04 90.26 30.61 99.56 0 88.51 26.4 6.48 100.05 19.7 13.95 3.43 90.04 27.21 0 15.42
110.46 72.78 22.57 92.62 8.41 55.99 13.26 6.95 26.97 6.93 14.34 10.44 3.85 89.27 11.27 24.49
0 101.36 63.7 78.89 70.48 12.24 21.13 19.6 15.43 10.05 79.52 21.36 91.56 14.49 72.02 89.13
72.88 104.8 104.84 87.71 92.88 119.63 95.03 8.11 29.25 99.03 2.72 24.42 10.13 92.17 7.74 111.14
83.05 74.01 6.45 93.57 3.2 23.31 29.04 9.01 1.88 29.24 9.63 95.52 5.72 21 5.04 92.64
95.73 56.01 16.47 0 115.5 11.05 5.42 84.77 16.44 15.21 21.34 6.72 33.69 106.59 33.4 53.9
114.67 19.34 15.3 112.94
27.57 13.54 75.49 20.12
64.78 18.89 68.69 53.21
45.81 1.57 35 15.3
56.04 0 9.12 49.24
20.39 5.2 18.49 9.44
31.77 2.8 84.12 2.67
94.67 5.1 13.68 4.69
44.32 3.03 20.84 41.38
3.69 9.16 100.04 45.77
Data of Bill
73
Largest
observation
Collect data
Bills
42.19
38.45
29.23
89.35
118.04
110.46
0.00
72.88
83.05
.
.
(There are 200 data points
Prepare a frequency distribution How many classes to use?
Number of observations Number of classes
Less then 50 5-7
50 - 200 7-9
200 - 500 9-10
500 - 1,000 10-11
1,000 – 5,000 11-13
5,000- 50,000 13-17
More than 50,000 17-20
Class width = [Range] / [# of classes]
[119.63 -0] / [8] = 14.95 15
Largest
observation Largest
observation
Smallest
observation Smallest
observation Smallest
observation
Smallest
observation Largest
observation
NO of Class= 1 +3.3 log (n)
n: No of data/observation
Guide
line
Preparing Frequency Distribution
74
Draw a Histogram Bill Frequency
15 71
30 37
45 13
60 9
75 10
90 18
105 28
120 14
0
20
40
60
80
15 30 45 60 75 90 105 120
Bills
Fre
qu
en
cy
Draw Histogram
75
0
20
40
60
80 1
5
30
45
60
75
90
10
5
12
0
Bills
Fre
qu
en
cy
What information can we extract from this histogram
About half of all
the bills are small
71+37=108 13+9+10=32
A few bills are in
the middle range
Relatively,
large number
of large bills
18+28+14=60
Extracting Information
76
Positively skewed Negatively skewed
•One with the long tail
extending to either right or
left side
Shapes of Histograms
78
A modal class is the one with the largest number of observations.
A unimodal histogram
The modal class
Modal classes
79
• A special type of symmetric unimodal histogram is bell shaped
• Many statistical techniques require that the population be bell
shaped.
• Drawing the histogram helps verify the shape of the population in
question
Bell shaped histograms
81
Example 2: Comparing students’ performance
Students’ performance in two statistics classes were compared.
The two classes differed in their teaching emphasis
Class A – mathematical analysis and development of theory.
Class B – applications and computer based analysis.
The final mark for each student in each course was recorded.
Draw histograms and interpret the results.
Interpreting Histograms
82
Marks (Manual) Marks (Computer)
77 59 75 60 65 81 72 59
74 83 71 50 71 53 85 66
75 77 75 52 66 70 72 71
75 74 74 47 79 76 77 68
67 78 53 46 65 73 64 72
72 67 49 50 82 73 77 75
81 82 56 51 80 85 89 74
76 55 61 44 86 83 87 77
79 73 61 52 67 80 78 69
73 92 54 53 64 67 79 60
75 71 44 56 62 78 59 92
52 53 54 53 74 68 63 69
72 75 78 76 67 67 84 69
72 70 73 82 72 62 74 73
83 59 81 82 68 83 74 65
Data
83
Histogram
02040
50 60 70 80 90 100
Marks(Manual)
Fre
qu
en
cy
Histogram
02040
50 60 70 80 90 100
Marks(Computer)
Fre
qu
en
cy
The mathematical emphasis
creates two groups, and a
larger spread.
Interpreting Histograms
84