1 Statistics Alan D. Smith Descriptive Statistics - Measures of Central Tendency Statistics Alan D. Smith Descriptive Statistics - Measures of Central

1 StatisticsStatistics Alan D. Smith

Descriptive Statistics -

Measures of Central Tendency

StatisticsStatistics Alan D. Smith

Descriptive Statistics -

Measures of Central Tendency

• Chapters 3 & 4

2TO CALCULATE THE ARITHMETIC MEAN,

THE WEIGHTED MEAN, THE MEDIAN, THE MODE, AND THE GEOMETRIC MEAN.

TO EXPLAIN THE CHARACTERISTICS, USE, ADVANTAGES, AND DISADVANTAGES OF EACH MEASURE OF CENTRAL TENDENCY.

TO IDENTIFY THE POSITION OF THE ARITHMETIC MEAN,MEDIAN, AND MODE FOR BOTH A SYMMETRICAL AND A SKEWED DISTRIBUTION.

TO CALCULATE THE ARITHMETIC MEAN, THE WEIGHTED MEAN, THE MEDIAN, THE MODE, AND THE GEOMETRIC MEAN.

TO EXPLAIN THE CHARACTERISTICS, USE, ADVANTAGES, AND DISADVANTAGES OF EACH MEASURE OF CENTRAL TENDENCY.

TO IDENTIFY THE POSITION OF THE ARITHMETIC MEAN,MEDIAN, AND MODE FOR BOTH A SYMMETRICAL AND A SKEWED DISTRIBUTION.

TODAY’S GOALSTODAY’S GOALS

3Definition: For ungrouped data, the population

mean is the sum of all the population values divided by the total number of population values. To compute the population mean, use the following formula.

Definition: For ungrouped data, the population mean is the sum of all the population values divided by the total number of population values. To compute the population mean, use the following formula.

POPULATION MEANPOPULATION MEAN

mu

Sigma

PopulationSize

Individualvalue

XN

4Parameter: A measurable characteristic of a

population. For example, the population mean.A racing team has a fleet of four cars. The

following are the miles covered by each car over their lives: 23,000, 17,000, 9,000, and 13,000. Find the average miles covered by each car.

Since this fleet is the population, the mean is (23,000 + 17,000 + 9,000 + 13,000)/4 = 15,500.

Parameter: A measurable characteristic of a population. For example, the population mean.

A racing team has a fleet of four cars. The following are the miles covered by each car over their lives: 23,000, 17,000, 9,000, and 13,000. Find the average miles covered by each car.

Since this fleet is the population, the mean is (23,000 + 17,000 + 9,000 + 13,000)/4 = 15,500.

EXAMPLE EXAMPLE

5• Definition: For ungrouped data, the sample

mean is the sum of all the sample values divided by the number of sample values. To compute the sample mean, use the following formula.

• Definition: For ungrouped data, the sample mean is the sum of all the sample values divided by the number of sample values. To compute the sample mean, use the following formula.

THE SAMPLE MEANTHE SAMPLE MEAN

X Xn

X-bar Sigma

SampleSize

Individualvalue

6Population Mean Sample Mean x

Population Mean Sample Mean x

WHAT’S THE DIFFERENCE?WHAT’S THE DIFFERENCE?

?

7Statistic: A measurable characteristic of a

sample. For example, the sample mean.A sample of five executives received the

following amounts of bonus last year: 14, 15, 17, 16, and 15 in $1,000. Find the average bonus for these five executives.

Since these values represent a sample of size 5, the sample mean is (14,000 + 15,000 + 17,000 + 16,000 + 15,000)/5 = $15,400.

Statistic: A measurable characteristic of a sample. For example, the sample mean.

A sample of five executives received the following amounts of bonus last year: 14, 15, 17, 16, and 15 in $1,000. Find the average bonus for these five executives.

Since these values represent a sample of size 5, the sample mean is (14,000 + 15,000 + 17,000 + 16,000 + 15,000)/5 = $15,400.

EXAMPLE EXAMPLE

8Every set of interval-level and ratio-level data

has a mean.All the values are included in computing the

mean.A set of data has a unique mean.The mean is affected by unusually large or small

data values.The arithmetic mean is the only measure of

central tendency where the sum of the deviations of each value from the mean is zero.

Every set of interval-level and ratio-level data has a mean.

All the values are included in computing the mean.

A set of data has a unique mean.The mean is affected by unusually large or small

data values.The arithmetic mean is the only measure of

central tendency where the sum of the deviations of each value from the mean is zero.

PROPERTIES OF THE ARITHMETIC MEAN

PROPERTIES OF THE ARITHMETIC MEAN

9Sum of DeviationsSum of Deviations

Consider the set of values: 3, 8, and 4. The mean is 5. So (3 -5) + (8 - 5) + (4 - 5) = -2 + 3 - 1 = 0.

Symbolically we write:

The mean is also known as the “expected value” or “average.”

Consider the set of values: 3, 8, and 4. The mean is 5. So (3 -5) + (8 - 5) + (4 - 5) = -2 + 3 - 1 = 0.

Symbolically we write:

The mean is also known as the “expected value” or “average.”

( )X X 0

10Definition: The midpoint of the values after they

have been ordered from the smallest to the largest, or the largest to the smallest. There are as many values above the median as below it in the data array.

Note: For an odd set of numbers, the median will be the middle number in the ordered array.

Note: For an even set of numbers, the median will be the arithmetic average of the two middle numbers.

Definition: The midpoint of the values after they have been ordered from the smallest to the largest, or the largest to the smallest. There are as many values above the median as below it in the data array.

Note: For an odd set of numbers, the median will be the middle number in the ordered array.

Note: For an even set of numbers, the median will be the arithmetic average of the two middle numbers.

THE MEDIANTHE MEDIAN

11Compute the median:The road life for a sample of five tires in miles is:

42,000 51,000 40,000 39,000 48,000Arranging the data in ascending order gives:

39,000 40,000 42,000 48,000 51,000. Thus the median is 42,000 miles.

Compute the median:The road life for a sample of five tires in miles is:

42,000 51,000 40,000 39,000 48,000Arranging the data in ascending order gives:

39,000 40,000 42,000 48,000 51,000. Thus the median is 42,000 miles.

EXAMPLE EXAMPLE

12Compute the median:The following values are years of service for a

sample of six store managers: 16 12 8 15 7 23.

Arranging in order gives 7 8 12 15 16 23. Thus the median is (12 + 15)/2 = 13.5 years.

Compute the median:The following values are years of service for a

sample of six store managers: 16 12 8 15 7 23.

Arranging in order gives 7 8 12 15 16 23. Thus the median is (12 + 15)/2 = 13.5 years.

EXAMPLE EXAMPLE

13• There is a unique median for each data set.

• It is not affected by extremely large or small values and is therefore a valuable measure of central tendency when such values occur.

• It can be computed for ratio-level, interval-level, and ordinal-level data.

• It can be computed for an open-ended frequency distribution if the median does not lie in an open-ended class.

• There is a unique median for each data set.

• It is not affected by extremely large or small values and is therefore a valuable measure of central tendency when such values occur.

• It can be computed for ratio-level, interval-level, and ordinal-level data.

• It can be computed for an open-ended frequency distribution if the median does not lie in an open-ended class.

PROPERTIES OF THE MEDIANPROPERTIES OF THE MEDIAN

14• Definition: The mode is that value of the

observation that appears most frequently

• The exam scores for ten students are: 81 93 75 68 87 81 75 81 87. What is the modal exam score?

• Since the score of 81 occurs the most, then the modal score is 81.

• The next slide shows the histogram with six classes for the water consumption from our previous class. Observe that the modal class is the blue box with a midpoint of 15.

• Definition: The mode is that value of the observation that appears most frequently

• The exam scores for ten students are: 81 93 75 68 87 81 75 81 87. What is the modal exam score?

• Since the score of 81 occurs the most, then the modal score is 81.

• The next slide shows the histogram with six classes for the water consumption from our previous class. Observe that the modal class is the blue box with a midpoint of 15.

THE MODETHE MODE

15WATER CONSUMPTION IN 1,000 GALLONS

16THE WEIGHTED MEANTHE WEIGHTED MEAN

Definition: The weighted mean of a set of numbers X1, X2, ..., Xn, with corresponding weights w1, w2, ... , wn, is computed from the following formula.

Why would anyone want to weight observations?

Definition: The weighted mean of a set of numbers X1, X2, ..., Xn, with corresponding weights w1, w2, ... , wn, is computed from the following formula.

Why would anyone want to weight observations?

Xw X w X w X

w w w

or X w Xw

wn n

n

w

1 1 2 2

1 2

......

( )

17EXAMPLE EXAMPLE

During a one hour period on a busy Friday night, fifty soft drinks were sold at the Kruzin Cafe. Compute the weighted mean of the price of the soft drinks. (Price ($), Number sold): (0.5, 5), (0.75, 15), (0.9, 15), and (1.10, 15).

The weighted mean is

[0.5+ 0.7515 + 0.915 + 1.115]/[5 +15+15+ 15] = $43.75/50 =

$0.875

During a one hour period on a busy Friday night, fifty soft drinks were sold at the Kruzin Cafe. Compute the weighted mean of the price of the soft drinks. (Price ($), Number sold): (0.5, 5), (0.75, 15), (0.9, 15), and (1.10, 15).

The weighted mean is

[0.5+ 0.7515 + 0.915 + 1.115]/[5 +15+15+ 15] = $43.75/50 =

$0.875

18Definition: The geometric mean (GM) of a set of

n numbers is defined as the nth root of the product of the n numbers. The formula for the geometric mean is given by:

One main use of the geometric mean is to

average percents.How do you compute the nth root of a number?

Definition: The geometric mean (GM) of a set of n numbers is defined as the nth root of the product of the n numbers. The formula for the geometric mean is given by:

One main use of the geometric mean is to

average percents.How do you compute the nth root of a number?

THE GEOMETRIC MEANTHE GEOMETRIC MEAN

GM X X X X nn

1 2 3 ...

19EXAMPLE EXAMPLE

The profits earned by ABC Construction on three projects were 6, 3, and 2 percent respectively. Compute the geometric mean profit and the arithmetic mean and compare.

The geometric mean is The arithmetic mean profit =(6 + 3 + 2)/3 =

3.6667. The geometric mean of 3.3019 gives a more

conservative profit figure than the arithmetic mean of 3.6667. This is because the GM is not heavily weighted by the profit of 6 percent.

The profits earned by ABC Construction on three projects were 6, 3, and 2 percent respectively. Compute the geometric mean profit and the arithmetic mean and compare.

The geometric mean is The arithmetic mean profit =(6 + 3 + 2)/3 =

3.6667. The geometric mean of 3.3019 gives a more

conservative profit figure than the arithmetic mean of 3.6667. This is because the GM is not heavily weighted by the profit of 6 percent.

GM ( )( )( ) . .6 3 23 33019

20The other main use of the geometric mean to

determine the average percent increase in sales, production or other business or economic series from one time period to another.

The formula for the geometric mean as applied to this type of problem is:

Where did this come from?

The other main use of the geometric mean to determine the average percent increase in sales, production or other business or economic series from one time period to another.

The formula for the geometric mean as applied to this type of problem is:

Where did this come from?

THE GEOMETRIC MEAN (continued)THE GEOMETRIC MEAN (continued)

GM Value a t end o f periodValue a t beg inn ing o f period

n 1 1

21The total enrollment at a large university

increased from 18,246 in 1985 to 22,840 in 1995. Compute the geometric mean rate of increase over the period.

Here n = 10, so n - 1 = 9 = (number of periods)The geometric mean rate of increase is given by

That is, the geometric mean rate of increase is 2.53%.

The total enrollment at a large university increased from 18,246 in 1985 to 22,840 in 1995. Compute the geometric mean rate of increase over the period.

Here n = 10, so n - 1 = 9 = (number of periods)The geometric mean rate of increase is given by

That is, the geometric mean rate of increase is 2.53%.

EXAMPLE EXAMPLE

GM 22,84018,246

9 1 00253. .

22• Do you prefer I use the arithmetic mean or the

geometric mean to compute class score averages?

• Do you prefer I use the arithmetic mean or the geometric mean to compute class score averages?

EXAMPLE EXAMPLE

?

23The mean of a sample of data organized in a

frequency distribution is computed by the following formula:

The mean of a sample of data organized in a frequency distribution is computed by the following formula:

THE MEAN OF GROUPED DATATHE MEAN OF GROUPED DATA

X-bar

Sum of frequencies

Class midpointX values -

Samplesize

X Xff

Xfn

f - classfrequency

24A sample of twenty appliance stores in a large

metropolitan area revealed the following number of VCR’s sold last week. Compute the mean number sold. The formula and computation is shown below.

A sample of twenty appliance stores in a large metropolitan area revealed the following number of VCR’s sold last week. Compute the mean number sold. The formula and computation is shown below.

EXAMPLE EXAMPLE

X fXf

fXn

= 325/20 = 16.25 VCR’s

25EXAMPLE (continued)EXAMPLE (continued)

The table also gives the necessary computations.The table also gives the necessary computations.

26SYMMETRIC DISTRIBUTIONSYMMETRIC DISTRIBUTION

SymmetricDistribution

Mode = Median = Mean

Zero Skewness

27RIGHT SKEWED DISTRIBUTIONRIGHT SKEWED DISTRIBUTION

MODEMEDIANMEAN

Positively skewed

Mean and median are to the RIGHT of the mode.

28LEFT SKEWED DISTRIBUTIONLEFT SKEWED DISTRIBUTION

MODEMEDIANMEAN

Negatively skewedMean and median are to the LEFT of the mode.

29USEFUL RELATIONSHIPSUSEFUL RELATIONSHIPS

If two averages of a moderately skewed frequency distribution are known, the third can be approximated. The formulas are:

Mode = Mean - 3(Mean - Median)

Mean = [3(Median) - Mode]/2

Median = [2(Mean) + Mode]/3

30READING ASSIGNMENTREADING ASSIGNMENT

• Read Chapter 4 and 5 of text.• Read Chapter 4 and 5 of text.

James S. Hawkes

31TO COMPARE (COMPUTE) VARIOUS MEASURES

OF DISPERSION FOR GROUPED AND UNGROUPED DATA.

TO EXPLAIN THE CHARACTERISTICS, USES, ADVANTAGES, AND DISADVANTAGES OF EACH MEASURE OF DISPERSION.

TO EXPLAIN CHEBYSHEV’S THEOREM AND THE EMPIRICAL (NORMAL) RULE.

TO COMPUTE THE COEFFICIENTS OF VARIATION AND SKEWNESS.

DEMONSTRATE COMPUTING STATISTICS WITH EXCEL.

TO COMPARE (COMPUTE) VARIOUS MEASURES OF DISPERSION FOR GROUPED AND UNGROUPED DATA.

TO EXPLAIN THE CHARACTERISTICS, USES, ADVANTAGES, AND DISADVANTAGES OF EACH MEASURE OF DISPERSION.

TO EXPLAIN CHEBYSHEV’S THEOREM AND THE EMPIRICAL (NORMAL) RULE.

TO COMPUTE THE COEFFICIENTS OF VARIATION AND SKEWNESS.

DEMONSTRATE COMPUTING STATISTICS WITH EXCEL.

TODAY’S GOALSTODAY’S GOALS

32Range: For ungrouped data, the range is the

difference between the highest and lowest values in a set of data. To compute the range, use the following formula.

EXAMPLE : A sample of five recent

accounting graduates revealed the following starting salaries (in $1000): 17 26 18 20 19. The range is thus $26,000 - $17,000 = $9,000.

Range: For ungrouped data, the range is the difference between the highest and lowest values in a set of data. To compute the range, use the following formula.

EXAMPLE : A sample of five recent

accounting graduates revealed the following starting salaries (in $1000): 17 26 18 20 19. The range is thus $26,000 - $17,000 = $9,000.

MEASURES OF DISPERSION - UNGROUPED DATA

MEASURES OF DISPERSION - UNGROUPED DATA

RANGE = HIGHEST VALUE - LOWEST VALUE

33Mean Deviation: The arithmetic mean of the

absolute values of the deviations from the arithmetic mean. It is computed by the formula below:

Mean Deviation: The arithmetic mean of the absolute values of the deviations from the arithmetic mean. It is computed by the formula below:

MEAN DEVIATIONMEAN DEVIATION

IndividualValue

ArithmeticMean

SampleSize

MDX Xn

34The weights of a sample of crates ready for

shipment to France are(in kg) 103, 97, 101, 106, and 103.

|103-102| +

|97-102| +

|101-102| +

|106 - 102| +

|103 - 102| = ?

The weights of a sample of crates ready for shipment to France are(in kg) 103, 97, 101, 106, and 103.

|103-102| +

|97-102| +

|101-102| +

|106 - 102| +

|103 - 102| = ?

EXAMPLEEXAMPLE

1. X = 510/5 = 102 kg.

2. MD = 12/5 = 2.4 kg.

3. Typically, the weights of the crates are 2.4 kg from the mean weight of 102 kg.

35Population Variance: The population variance

for ungrouped data is the arithmetic mean of the squared deviations from the population mean. It is computed from the formula below:

Population Variance: The population variance for ungrouped data is the arithmetic mean of the squared deviations from the population mean. It is computed from the formula below:

POPULATION VARIANCE POPULATION VARIANCE

Sigmasquare

Populationsize

Population meanIndividual value

22

( )XN

36The ages of all the patients in the isolation ward

of Yellowstone Hospital are 38, 26, 13, 41, and 22 years. What is the population variance? The computations are given below.

The ages of all the patients in the isolation ward of Yellowstone Hospital are 38, 26, 13, 41, and 22 years. What is the population variance? The computations are given below.

EXAMPLE EXAMPLE

= X)/N = 140/5 = 28.

2 = (X - )2/N = 534/5 = 106.8.

37ALTERNATIVE FORMULA FOR THE POPULATION VARIANCE

ALTERNATIVE FORMULA FOR THE POPULATION VARIANCE

Verify, using above formula, that the population variance is 106.8 for the previous example.

Why would you use this formula?

2 2 2

XN

XN

38Population Standard Deviation: The population

standard deviation () is the square root of the population variance.

For the previous example, the population

standard deviation is = 10.3344 (square root of 106.8).

Note: If you are given the population standard deviation, just square that number to get the population variance.

Population Standard Deviation: The population

standard deviation () is the square root of the population variance.

For the previous example, the population

standard deviation is = 10.3344 (square root of 106.8).

Note: If you are given the population standard deviation, just square that number to get the population variance.

THE POPULATION STANDARD DEVIATION

THE POPULATION STANDARD DEVIATION

39Sample Variance: The formula for the sample

variance for ungrouped data is:

OROR

Sample Variance: The formula for the sample variance for ungrouped data is:

OROR

SAMPLE VARIANCE SAMPLE VARIANCE

22

1s X Xn

( )

22

2

1s XXn

n ( )

Samplevariance

This sample variance is used to estimate the population variance.

40A sample of five hourly wages for blue-collar

jobs is: 17 26 18 20 19. Find the variance.

A sample of five hourly wages for blue-collar jobs is: 17 26 18 20 19. Find the variance.

EXAMPLE EXAMPLE

= 100/5 = 20

s2 = 50/(5 - 1) = 12.5.

X

41Sample Standard Deviation: The sample

standard deviation (s) is the square root of the sample variance.

For the previous example, the sample standard s = 3.5355 (square root of 12.5).

Note: If you are given the sample standard deviation, just square that number to get the sample variance.

Sample Standard Deviation: The sample

standard deviation (s) is the square root of the sample variance.

For the previous example, the sample standard s = 3.5355 (square root of 12.5).

Note: If you are given the sample standard deviation, just square that number to get the sample variance.

SAMPLE STANDARD DEVIATIONSAMPLE STANDARD DEVIATION

42Chebyshev’s theorem: For any set of

observations (sample or population), the minimum proportion of the values that lie within k standard deviations of the mean is at least 1 - 1/k2, where k is any constant greater than 1.

Empirical Rule: For any symmetrical, bell-shaped distribution, approximately 68% of the observations will lie within 1of the mean (); approximately 98% within 2of the mean (); and approximately 99.7% within 3of the mean ().

Chebyshev’s theorem: For any set of observations (sample or population), the minimum proportion of the values that lie within k standard deviations of the mean is at least 1 - 1/k2, where k is any constant greater than 1.

Empirical Rule: For any symmetrical, bell-shaped distribution, approximately 68% of the observations will lie within 1of the mean (); approximately 98% within 2of the mean (); and approximately 99.7% within 3of the mean ().

INTERPRETATION AND USES OF THE STANDARD DEVIATION

INTERPRETATION AND USES OF THE STANDARD DEVIATION

43Between:

1. 68.26%

2. 95.44%

3. 99.97%

Between:

1. 68.26%

2. 95.44%

3. 99.97%

Bell-Shaped Curve showing the relationship between and .

44Interquartile range: Distance between the third

quartile Q3 and the first quartile Q1.

First Quartile: It is the value corresponding to the point below which 25% of the observations lie in an ordered data set.

Third Quartile: It is the value corresponding to the point below which 75% of the observations lie in an ordered data set.

Interquartile range: Distance between the third quartile Q3 and the first quartile Q1.

First Quartile: It is the value corresponding to the point below which 25% of the observations lie in an ordered data set.

Third Quartile: It is the value corresponding to the point below which 75% of the observations lie in an ordered data set.

INTERQUARTILE RANGEINTERQUARTILE RANGE

Interquartile rangeThird quartile First quartileQ Q

=

3 1

45Percentiles: Each data set has 99 percentiles,

thus dividing the set into 100 equal parts. Note: Note: in order to determine percentiles, you must first order the set.

Percentile Range: The 10-to-90 percentile range is the distance between the 10th and 90th percentiles.

Percentiles: Each data set has 99 percentiles, thus dividing the set into 100 equal parts. Note: Note: in order to determine percentiles, you must first order the set.

Percentile Range: The 10-to-90 percentile range is the distance between the 10th and 90th percentiles.

PERCENTILE RANGE PERCENTILE RANGE

P10 P90

10-to-90 Percentile Range

Min

Max

10% 10%80%

46RELATIVE DISPERSIONRELATIVE DISPERSION

Coefficient of Variation: The ratio of the standard deviation to the arithmetic mean, expressed as a percentage.

For example, if the CV for the yield of two different stocks are 10 and 25. The stock with the larger CV has more variation relative to the mean yield. That is, the yield for this stock is not as stable as the other.

Coefficient of Variation: The ratio of the standard deviation to the arithmetic mean, expressed as a percentage.

For example, if the CV for the yield of two different stocks are 10 and 25. The stock with the larger CV has more variation relative to the mean yield. That is, the yield for this stock is not as stable as the other.

CV sX (100% )

47SKEWNESSSKEWNESS

Skewness: Measurement of the lack of symmetry of the distribution.

The coefficient of skewness is computed from the following formula:

Note: There are other coefficients of skewness.

Skewness: Measurement of the lack of symmetry of the distribution.

The coefficient of skewness is computed from the following formula:

Note: There are other coefficients of skewness.

Sk = 3(Mean - Median)/(Standard deviation)

48SYMMETRIC DISTRIBUTIONSYMMETRIC DISTRIBUTION

SymmetricDistribution

Mode = Median = Mean

Zero Skewness

49RIGHT SKEWED DISTRIBUTIONRIGHT SKEWED DISTRIBUTION

MODEMEDIANMEAN

Positively skewed

Mean and median are to the RIGHT of the mode.

50LEFT SKEWED DISTRIBUTIONLEFT SKEWED DISTRIBUTION

MODEMEDIANMEAN

Negatively skewedMean and median are to the LEFT of the mode.

51

=AVERAGE(A1:A10) Arithmetic Mean

=MEDIAN(A1:A10) Median Value

=MODE(A1:A10) Modal Value

=GEOMEAN(A1:A10) Geometric Mean

=QUARTILE(A1:A10,Q) Quartile Q Value

=MAX(A1:A10)-MIN(A1:A10) Range

=PERCENTILE(A1:A10,P) Percentile P Value

EXCEL FUNCTIONSEXCEL FUNCTIONSEXCEL FUNCTIONSEXCEL FUNCTIONS

52

=AVEDEV(A1:A10) Mean Absolute Deviation (MAD)

=VAR(A1:A10) Sample Variance

=STDEV(A1:A10) Sample Standard Deviation

=VARP(A1:A10) Population Variance

=STDEVP(A1:A10) Population Standard Deviation

=STDEV(A1:A10)/AVERAGE(A1:A10) Coefficeint of

Variation

=SKEW(A1:A10) Coefficient of Skewness

(not same as book)

MORE EXCEL FUNCTIONSMORE EXCEL FUNCTIONSMORE EXCEL FUNCTIONSMORE EXCEL FUNCTIONS

Documents

1 Statistics Alan D. Smith Descriptive Statistics - Measures of Central Tendency Statistics Alan D. Smith Descriptive Statistics - Measures of Central