Upload
palkybd
View
214
Download
0
Embed Size (px)
DESCRIPTION
Math
Citation preview
Module 3aDescriptive Statistics: Numerical Methods Measures of Location Percentiles and Quartiles Measures of Variability
Learning GoalsUnderstand the purpose of measures of location.Be able to compute the mean, median, mode, quartiles, and various percentiles.Understand the purpose of measures of variability.Be able to compute the range, interquartile range, variance, standard deviation, and coefficient of variation.
Measures of locationThe table on the right contains excerpt from a data set that contains salaries for 474 employees at a Midwestern bank.
We want to use measures of location to describe this data set.
Sheet1
ObservationSalary
16300
26360
36480
46480
56480
66540
76600
86660
96720
106780
116780
126780
136840
146840
156900
166960
176960
187080
197260
207260
217380
227500
237680
247680
257860
267860
277860
287860
297860
307860
317920
327980
337980
348040
358040
368160
378160
388160
398220
408280
418280
428340
438340
448340
458340
468340
478400
488400
498460
508460
518520
528520
538520
548520
558580
568580
578580
588640
598640
608640
618700
628700
638700
648760
658760
668760
678760
688760
698820
708820
718820
728820
738880
748880
758880
768940
778940
788940
798940
808940
818940
829000
839000
849000
859000
869000
879000
889000
899060
909060
919120
929120
939180
949180
959180
969180
979240
989240
999240
1009240
1019300
1029300
1039360
1049360
1059360
1069360
1079420
1089420
1099480
1109480
1119540
1129540
1139600
1149600
1159600
1169600
1179600
1189600
1199600
1209600
1219660
1229660
1239660
1249660
1259720
1269720
1279720
1289720
1299780
1309780
1319780
1329780
1339780
1349780
1359780
1369780
1379840
1389840
1399900
1409900
1419900
1429900
1439960
14410020
14510020
14610020
14710020
14810080
14910080
15010080
15110080
15210080
15310140
15410140
15510200
15610200
15710200
15810200
15910260
16010260
16110320
16210320
16310380
16410380
16510380
16610380
16710440
16810440
16910500
17010500
17110500
17210500
17310500
17410500
17510500
17610560
17710560
17810560
17910560
18010620
18110620
18210620
18310620
18410620
18510680
18610680
18710680
18810680
18910680
19010680
19110680
19210740
19310740
19410800
19510800
19610800
19710860
19810860
19910920
20010920
20110920
20210920
20310920
20410980
20510980
20610980
20710980
20810980
20911040
21011040
21111100
21211100
21311100
21411100
21511100
21611100
21711100
21811160
21911160
22011160
22111160
22211220
22311220
22411220
22511280
22611340
22711340
22811340
22911400
23011400
23111400
23211400
23311400
23411400
23511460
23611520
23711520
23811580
23911640
24011640
24111640
24211640
24311640
24411664
24511700
24611700
24711736
24811760
24911760
25011760
25111760
25211760
25311820
25411880
25511940
25611940
25711940
25811940
25912000
26012000
26112000
26212000
26312000
26412060
26512060
26612108
26712120
26812120
26912120
27012120
27112180
27212240
27312240
27412240
27512300
27612300
27712300
27812300
27912300
28012300
28112300
28212300
28312300
28412300
28512300
28612300
28712300
28812360
28912360
29012360
29112420
29212480
29312480
29412480
29512540
29612540
29712540
29812600
29912600
30012600
30112660
30212660
30312660
30412660
30512780
30612780
30712780
30812780
30912840
31012960
31113020
31213020
31313020
31413140
31513200
31613260
31713320
31813320
31913320
32013380
32113416
32213500
32313560
32413560
32513560
32613560
32713560
32813560
32913764
33013800
33113800
33213800
33313800
33413800
33513848
33613920
33713920
33813980
33914040
34014040
34114100
34214100
34314100
34414100
34514220
34614220
34714280
34814280
34914280
35014400
35114400
35214400
35314400
35414460
35514640
35614820
35715000
35815060
35915120
36015120
36115120
36215360
36315420
36415480
36515540
36615540
36715660
36815720
36915840
37015960
37116020
37216080
37316080
37416080
37516080
37616140
37716140
37816320
37916320
38016440
38116620
38216800
38316920
38416920
38517200
38617364
38717400
38817460
38917580
39017950
39118000
39218060
39318100
39418250
39518400
39618400
39718750
39818900
39919020
40019200
40119500
40219600
40320000
40420220
40520400
40620500
40720580
40820850
40921060
41021250
41121600
41221750
41321950
41421960
41522000
41622000
41722000
41822200
41922300
42022600
42122620
42222700
42322800
42423250
42523500
42623750
42723760
42824000
42924000
43024150
43124250
43224500
43324750
43424750
43525000
43626000
43726000
43826000
43926400
44026500
44126700
44226750
44326750
44427000
44527250
44627250
44727500
44827500
44927700
45028000
45128000
45228350
45329000
45429400
45529500
45630000
45730000
45831250
45931300
46031400
46132000
46232500
46333000
46433500
46534500
46636250
46736500
46836800
46938800
47040000
47141400
47241500
47344250
47454000
Sheet2
ObservationSalary
16300
26360
36480
46480
56480
66540
76600
86660
96720
106780
46433500
46534500
46636250
46736500
46836800
46938800
47040000
47141400
47241500
47344250
Sheet3
Measures of LocationThe following are measures of location:MeanMedianModePercentilesQuartiles
MeanThe mean of a data set is the average of all the data values.If the data are from a sample, the mean is denoted by If the data are from a population, the mean is denoted by m (mu).
Mean
MedianThe median is the measure of location most often reported for annual income and property value data.A few extremely large incomes or property values can inflate the mean.
MedianThe median of a data set is the value in the middle when the data items are arranged in ascending order.For an odd number of observations, the median is the middle value.For an even number of observations, the median is the average of the two middle values.
MedianMedian Median = 50th percentilei = (p/100)n = (50/100)474 = 237Because N is even, we average the 237th and 238th data values:
Median = (11,520 + 11,580)/2 = 11,550
Mean and Median ComparedBoth the mean and median are supposed to be measures of central location for the data. In the case of this data set notice that the mean is $2,217.83 more than the median (13,767.80 11,550).
Why is there such a large discrepancy?
Looking at the frequency distribution of current salaries helps to explain why this discrepancy exists.
When there are data values in a distribution that are much smaller or larger than the others such that the distribution is skewed, the mean may not be a good measure of central tendency.The histogram on the left shows the distribution of current salary. Notice two vertical lines that run from top to bottom with numbers attached. The line on the left is the median (11,550) and the line on the right is the mean (13,768).When the distribution, as in this example, has a long tail that extends to larger values (skewed right) then the mean will be larger than the median. If the distribution has a long tail that extends to smaller values (skewed left), then the mean will be smaller than the median. When the data is symmetric (not skewed) then the mean and median will be equal.
ModeThe mode of a data set is the value that occurs with greatest frequency.The greatest frequency can occur at two or more different values.If the data have exactly two modes, the data are bimodal.If the data have more than two modes, the data are multimodal.
ModeExample:SalaryIn the salary example, the modal salary was $12,300. This was the current salary of 14 of the 474 employees included in this sample.
PercentilesRecall how the median divided the sample into 2 equal parts half the observations are less than the median and half are greater than the median.There are other ways to split the sample on a percentage basis: such as finding the value where 10 percent of the observations are less than that value and 90 percent are greater.Admission test scores for colleges and universities are frequently reported in terms of percentiles.
PercentilesThe pth percentile of a data set is a value such that at least p percent of the items take on this value or less and at least (100 - p) percent of the items take on this value or more.Arrange the data in ascending order.Compute index i, the position of the pth percentile. i = (p/100)n
If i is not an integer, round up. The pth percentile is the value in the ith position.If i is an integer, the pth percentile is the average of the values in positions i and i +1.Note: There is no universally accepted method to calculate percentiles. The method used in the book is not the same used in SPSS. For further information is available at http://cnx.rice.edu/content/m10805/latest
PercentilesExample: Salary (Book Method)10th Percentilei = (p/100)n = (10/100)474 = 47.4 = 48the 48th data value:10th Percentile = 8,400
QuartilesQuartiles are specific percentilesFirst Quartile = 25th PercentileSecond Quartile = 50th Percentile = MedianThird Quartile = 75th Percentile
QuartilesExample: Salaries (Book Method)Third QuartileThird quartile = 75th percent i = (p/100)n = (75/100)474 = 355.5 = 356Third quartile = 14,820
Using SPSSNotice how the value for the 75th percentile calculated using SPSS is different.
Measures of VariabilityMeasures of location do not give us an idea of how observations differ from each other.
Measures of variability quantify the spread or dispersion of observations.
Choosing suppliers is an example of why this is important In business. When choosing between suppliers we might consider not only the average delivery time for each, but also the variability in delivery time for each.
Measures of VariabilityRangeInterquartile RangeVarianceStandard DeviationCoefficient of Variation
Measures of Variability: the RangeThe range of a data set is the difference between the largest and smallest data values.It is the simplest measure of variability.It is very sensitive to the smallest and largest data values.The value of the range does not tell us anything about the variability of the values between the largest and smallest values.
Measures of Variability: the Interquartile RangeThe interquartile range of a data set is the difference between the third quartile and the first quartile.It is the range for the middle 50% of the data.It overcomes the sensitivity to extreme data values.
Measures of Variability: the Interquartile RangeExample: Salaries (Book Method)Interquartile Range3rd Quartile (Q3) = 14,8201st Quartile (Q1) = 9,600Interquartile Range = Q3 - Q1 = 14,820 9,600 = 5,220
Using SPSSInterquartile Range = Q3 - Q1 = 14,865 9,600 = 5,265
Measures of Variability: the VarianceThe variance is a measure of variability that utilizes all the data.It is based on the difference between the value of each observation (xi) and the mean ( for a sample, m for a population).
Measures of Variability: the VarianceThe variance is the average of the squared differences between each data value and the mean.If the data set is a sample, the variance is denoted by s2.
If the data set is a population, the variance is denoted by 2.
Measures of Variability: the Standard DeviationThe standard deviation of a data set is the positive square root of the variance.It is measured in the same units as the data, making it more easily comparable, than the variance, to the mean.
If the data set is a sample, the standard deviation is denoted s.
If the data set is a population, the standard deviation is denoted (sigma).
Measures of Variability: the Coefficient of VariationThe coefficient of variation indicates how large the standard deviation is in relation to the mean. This enables the comparison of the variability of different variables.
If the data set is a sample, the coefficient of variation is computed as follows:
If the data set is a population, the coefficient of variation is computed as follows:
Example: SalaryVariance
Standard Deviation
Coefficient of Variation