113
Numerical Measures

Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Embed Size (px)

Citation preview

Page 1: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Numerical Measures

Page 2: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Numerical Measures

• Measures of Central Tendency (Location)

• Measures of Non Central Location

• Measure of Variability (Dispersion, Spread)

• Measures of Shape

Page 3: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Measures of Central Tendency (Location)

• Mean

• Median

• Mode

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0 5 10 15 20 25

Central Location

Page 4: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Measures of Non-central Location

• Quartiles, Mid-Hinges

• Percentiles

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0 5 10 15 20 25

Non - Central Location

Page 5: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Measure of Variability (Dispersion, Spread)

• Variance, standard deviation

• Range

• Inter-Quartile Range

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0 5 10 15 20 25

Variability

Page 6: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Measures of Shape• Skewness

• Kurtosis

00.020.040.060.080.1

0.120.140.16

0 5 10 15 20 25

00.020.040.060.080.1

0.120.140.16

0 5 10 15 20 25

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0 5 10 15 20 25

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0 5 10 15 20 25

0

-3 -2 -1 0 1 2 3

0

-3 -2 -1 0 1 2 3

Page 7: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Summation Notation

Page 8: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Summation Notation

Let x1, x2, x3, … xn denote a set of n numbers.

Then the symbol

denotes the sum of these n numbers

x1 + x2 + x3 + …+ xn

n

iix

1

Page 9: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Example

Let x1, x2, x3, x4, x5 denote a set of 5 denote the set of numbers in the following table.

i 1 2 3 4 5

xi 10 15 21 7 13

Page 10: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Then the symbol

denotes the sum of these 5 numbers

x1 + x2 + x3 + x4 + x5

= 10 + 15 + 21 + 7 + 13

= 66

5

1iix

Page 11: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Meaning of parts of summation notation

n

mi

i in expression

Quantity changing in each term of the sum

Starting value for i

Final value for i

each term of the sum

Page 12: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Example

Again let x1, x2, x3, x4, x5 denote a set of 5 denote the set of numbers in the following table.

i 1 2 3 4 5

xi 10 15 21 7 13

Page 13: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Then the symbol

denotes the sum of these 3 numbers

= 153 + 213 + 73

= 3375 + 9261 + 343

= 12979

34

33

32 xxx

4

2

3

iix

Page 14: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Measures of Central Location (Mean)

Page 15: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Mean

Let x1, x2, x3, … xn denote a set of n numbers.

Then the mean of the n numbers is defined as:

n

xxxxx

n

xx nn

n

ii

13211

Page 16: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Example

Again let x1, x2, x3, x4, x5 denote a set of 5 denote the set of numbers in the following table.

i 1 2 3 4 5

xi 10 15 21 7 13

Page 17: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Then the mean of the 5 numbers is:

5554321

5

1 xxxxxx

x ii

2.135

66

5

137211510

Page 18: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Interpretation of the Mean

Let x1, x2, x3, … xn denote a set of n numbers.

Then the mean, , is the centre of gravity of those the n numbers.

That is if we drew a horizontal line and placed a weight of one at each value of xi , then the balancing point of that system of mass is at the point .

x

x

Page 19: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

x1 x2x3 x4xn

x

Page 20: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

107 15 2113

2.13x

In the Example

100 20

Page 21: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

The mean, , is also approximately the center of gravity of a histogram

0

5

10

15

20

25

30

60 - 70 70 - 80 80 - 90 90 - 100 100 - 110 110 - 120 120 - 130 130 - 140 140 - 150

x

x

Page 22: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Measures of Central Location (Median)

Page 23: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

The Median

Let x1, x2, x3, … xn denote a set of n numbers.

Then the median of the n numbers is defined as the number that splits the numbers into two equal parts.

To evaluate the median we arrange the numbers in increasing order.

Page 24: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

If the number of observations is odd there will be one observation in the middle.

This number is the median.

If the number of observations is even there will be two middle observations.

The median is the average of these two observations

Page 25: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Example

Again let x1, x2, x3, x3 , x4, x5 denote a set of 5 denote the set of numbers in the following table.

i 1 2 3 4 5

xi 10 15 21 7 13

Page 26: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

The numbers arranged in order are:

7 10 13 15 21

Unique “Middle” observation – the median

Page 27: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Example 2

Let x1, x2, x3 , x4, x5 , x6 denote the 6 denote numbers:

23 41 12 19 64 8

Arranged in increasing order these observations would be:

8 12 19 23 41 64

Two “Middle” observations

Page 28: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Median

= average of two “middle” observations =

212

42

2

2319

Page 29: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Example

The data on N = 23 students

Variables

• Verbal IQ

• Math IQ

• Initial Reading Achievement Score

• Final Reading Achievement Score

Page 30: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

 Data Set #3

The following table gives data on Verbal IQ, Math IQ,Initial Reading Acheivement Score, and Final Reading Acheivement Score

for 23 students who have recently completed a reading improvement program 

Initial FinalVerbal Math Reading Reading

Student IQ IQ Acheivement Acheivement 

1 86 94 1.1 1.72 104 103 1.5 1.73 86 92 1.5 1.94 105 100 2.0 2.05 118 115 1.9 3.56 96 102 1.4 2.47 90 87 1.5 1.88 95 100 1.4 2.09 105 96 1.7 1.7

10 84 80 1.6 1.711 94 87 1.6 1.712 119 116 1.7 3.113 82 91 1.2 1.814 80 93 1.0 1.715 109 124 1.8 2.516 111 119 1.4 3.017 89 94 1.6 1.818 99 117 1.6 2.619 94 93 1.4 1.420 99 110 1.4 2.021 95 97 1.5 1.322 102 104 1.7 3.123 102 93 1.6 1.9

Total 2244 2307 35.1 48.3

Initial FinalVerbal Math Reading Reading

IQ IQ Acheivement AcheivementMeans 97.57 100.30 1.526 2.100

Page 31: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Computing the Median

Stem leaf Diagrams

Median = middle observation =12th observation

Page 32: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Summary

Initial FinalVerbal Math Reading Reading

IQ IQ Acheivement AcheivementMeans 97.57 100.30 1.526 2.100Median 96 97 1.5 1.9

Page 33: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Some Comments

• The mean is the centre of gravity of a set of observations. The balancing point.

• The median splits the obsevations equally in two parts of approximately 50%

Page 34: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

• The median splits the area under a histogram in two parts of 50%

• The mean is the balancing point of a histogram

00.020.040.060.080.1

0.120.140.16

0 5 10 15 20 25

50%

50%

xmedian

Page 35: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0 5 10 15 20 25

• For symmetric distributions the mean and the median will be approximately the same value

50% 50%

xMedian &

Page 36: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

00.020.040.060.080.1

0.120.140.16

0 5 10 15 20 25

50%

xmedian

• For Positively skewed distributions the mean exceeds the median

• For Negatively skewed distributions the median exceeds the mean

50%

Page 37: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

• An outlier is a “wild” observation in the data

• Outliers occur because – of errors (typographical and computational)– Extreme cases in the population

Page 38: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

• The mean is altered to a significant degree by the presence of outliers

• Outliers have little effect on the value of the median

• This is a reason for using the median in place of the mean as a measure of central location

• Alternatively the mean is the best measure of central location when the data is Normally distributed (Bell-shaped)

Page 39: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Review

Page 40: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Summarizing Data

Graphical Methods

Page 41: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

8 0 2 4 6 6 9

9 0 4 4 5 5 6 9 9

10 2 2 4 5 5 9

11 1 8 9

12

0

1

2

3

4

5

6

7

8

70 to 80 80 to 90 90 to100

100 to110

110 to120

120 to130

Histogram

Stem-Leaf Diagram

Verbal IQ Math IQ70 to 80 1 180 to 90 6 290 to 100 7 11

100 to 110 6 4110 to 120 3 4120 to 130 0 1

Grouped Freq Table

Page 42: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Numerical Measures

• Measures of Central Tendency (Location)

• Measures of Non Central Location

• Measure of Variability (Dispersion, Spread)

• Measures of Shape

The objective is to reduce the data to a small number of values that completely describe the data and certain aspects of the data.

Page 43: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Measures of Central Location (Mean)

Page 44: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Mean

Let x1, x2, x3, … xn denote a set of n numbers.

Then the mean of the n numbers is defined as:

n

xxxxx

n

xx nn

n

ii

13211

Page 45: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Interpretation of the Mean

Let x1, x2, x3, … xn denote a set of n numbers.

Then the mean, , is the centre of gravity of those the n numbers.

That is if we drew a horizontal line and placed a weight of one at each value of xi , then the balancing point of that system of mass is at the point .

x

x

Page 46: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

x1 x2x3 x4xn

x

Page 47: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

The mean, , is also approximately the center of gravity of a histogram

0

5

10

15

20

25

30

60 - 70 70 - 80 80 - 90 90 - 100 100 - 110 110 - 120 120 - 130 130 - 140 140 - 150

x

x

Page 48: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

The Median

Let x1, x2, x3, … xn denote a set of n numbers.

Then the median of the n numbers is defined as the number that splits the numbers into two equal parts.

To evaluate the median we arrange the numbers in increasing order.

Page 49: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

If the number of observations is odd there will be one observation in the middle.

This number is the median.

If the number of observations is even there will be two middle observations.

The median is the average of these two observations

Page 50: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Measures of Non-Central Location

• Percentiles

• Quartiles (Hinges, Mid-hinges)

Page 51: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

DefinitionThe P×100 Percentile is a point , xP ,

underneath a distribution that has a fixed proportion P of the population (or sample) below that value

00.020.040.060.080.1

0.120.140.16

0 5 10 15 20 25

P×100 %

xP

Page 52: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Definition (Quartiles)The first Quartile , Q1 ,is the 25 Percentile , x0.25

00.020.040.060.080.1

0.120.140.16

0 5 10 15 20 25

25 %

x0.25

Page 53: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

The second Quartile , Q2 ,is the 50th

Percentile , x0.50

00.020.040.060.080.1

0.120.140.16

0 5 10 15 20 25

50 %

x0.50

Page 54: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

• The second Quartile , Q2 , is also the

median and the 50th percentile

Page 55: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

The third Quartile , Q3 ,is the 75th Percentile , x0.75

00.020.040.060.080.1

0.120.140.16

0 5 10 15 20 25

75 %

x0.75

Page 56: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

The Quartiles – Q1, Q2, Q3

divide the population into 4 equal parts of 25%.

00.020.040.060.080.1

0.120.140.16

0 5 10 15 20 25

25 %

25 %

25 % 25 %

Q1 Q2 Q3

Page 57: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Computing Percentiles and Quartiles

• There are several methods used to compute percentiles and quartiles. Different computer packages will use different methods

• Sometimes for small samples these methods will agree (but not always)

• For large samples the methods will agree within a certain level of accuracy

Page 58: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Computing Percentiles and Quartiles – Method 1• The first step is to order the observations in

increasing order.

• We then compute the position, k, of the P×100 Percentile.

k = P × (n+1)

Where n = the number of observations

Page 59: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

ExampleThe data on n = 23 students

Variables

• Verbal IQ

• Math IQ

• Initial Reading Achievement Score

• Final Reading Achievement Score

We want to compute the 75th percentile and

the 90th percentile

Page 60: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

The position, k, of the 75th Percentile.

k = P × (n+1) = .75 × (23+1) = 18

The position, k, of the 90th Percentile.

k = P × (n+1) = .90 × (23+1) = 21.6

When the position k is an integer the percentile is the kth observation (in order of magnitude) in the data set.

For example the 75th percentile is the 18th (in size) observation

Page 61: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

When the position k is an not an integer but an integer(m) + a fraction(f).

i.e. k = m + f

then the percentile is

xP = (1-f) × (mth observation in size)

+ f × (m+1st observation in size)

In the example the position of the 90th percentile is:

k = 21.6

Then

x.90 = 0.4(21st observation in size)

+ 0.6(22nd observation in size)

Page 62: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

When the position k is an not an integer but an integer(m) + a fraction(f).i.e. k = m + fthen the percentile is

xP = (1-f) × (mth observation in size)+ f × (m+1st observation in size)

xp = (1- f) ( mth obs) + f [(m+1)st obs]

(m+1)st obsmth obs

obs obs 1

obs obs 1obs 1

obs obs 1

obs thst

thstth

thst

thp

mm

mmfmf

mm

mx

f

mm

mfmfthst

thst

obs obs 1

obs obs 1

Page 63: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

When the position k is an not an integer but an integer(m) + a fraction(f).

i.e. k = m + f

xp = (1- f) ( mth obs) + f [(m+1)st obs]

(m+1)st obsmth obs

fmm

mxthst

thp

obs obs 1

obs

Thus the position of xp is 100f% through the interval between the mth observation and the (m +1)st observation

Page 64: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Example

The data Verbal IQ on n = 23 students arranged in increasing order is:

80 82 84 86 86 89 90 94

94 95 95 96 99 99 102 102

104 105 105 109 111 118 119

Page 65: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

x0.75 = 75th percentile = 18th observation in size =105

(position k = 18)

x0.90 = 90th percentile

= 0.4(21st observation in size)

+ 0.6(22nd observation in size)

= 0.4(111)+ 0.6(118) = 115.2

(position k = 21.6)

Page 66: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

An Alternative method for computing Quartiles – Method 2• Sometimes this method will result in the

same values for the quartiles.

• Sometimes this method will result in the different values for the quartiles.

• For large samples the two methods will result in approximately the same answer.

Page 67: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Let x1, x2, x3, … xn denote a set of n numbers.

The first step in Method 2 is to arrange the numbers in increasing order.

From the arranged numbers we compute the median.

This is also called the Hinge

Page 68: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

ExampleConsider the 5 numbers:

10 15 21 7 13Arranged in increasing order:

7 10 13 15 21

The median (or Hinge) splits the observations in half

Median (Hinge)

Page 69: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

The lower mid-hinge (the first quartile) is the “median” of the lower half of the observations (excluding the median).

The upper mid-hinge (the third quartile) is the “median” of the upper half of the observations (excluding the median).

Page 70: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Consider the five number in increasing order:

7 10 13 15 21

Median (Hinge)

13

Lower Half

Upper Half

Upper Mid-Hinge

(First Quartile)

(7+10)/2 =8.5

Upper Mid-Hinge

(Third Quartile)

(15+21)/2 = 18

Page 71: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Computing the median and the quartile using the first method:Position of the median: k = 0.5(5+1) = 3

Position of the first Quartile: k = 0.25(5+1) = 1.5

Position of the third Quartile: k = 0.75(5+1) = 4.5

7 10 13 15 21

Q2 = 13Q1 = 8. 5 Q3 = 18

Page 72: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

• Both methods result in the same value

• This is not always true.

Page 73: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Example

The data Verbal IQ on n = 23 students arranged in increasing order is:

80 82 84 86 86 89 90 94 94 95 95 96 99 99 102 102 104 105 105 109 111 118 119

Median (Hinge)

96

Lower Mid-Hinge

(First Quartile)

89

Upper Mid-Hinge

(Third Quartile)

105

Page 74: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Computing the median and the quartile using the first method:Position of the median: k = 0.5(23+1) = 12

Position of the first Quartile: k = 0.25(23+1) = 6

Position of the third Quartile: k = 0.75(23+1) = 18

80 82 84 86 86 89 90 94 94 95 95 96 99 99 102 102 104 105 105 109 111 118 119

Q2 = 96Q1 = 89 Q3 = 105

Page 75: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

• Many programs compute percentiles, quartiles etc.

• Each may use different methods.

• It is important to know which method is being used.

• The different methods result in answers that are close when the sample size is large.

Page 76: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Announcement

Assignment 2 has been posted

this assignment has to be handed in and is due Friday, January 22

This assignment requires the use of a Statistical Package (SPSS or Minitab) available in most computer labs.

Instructions on the use of these packages will be given in the lab today

Page 77: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Box-PlotsBox-Whisker Plots

• A graphical method of displaying data

• An alternative to the histogram and stem-leaf diagram

Page 78: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

To Draw a Box Plot

• Compute the Hinge (Median, Q2) and the Mid-hinges (first & third quartiles – Q1 and Q3 )

• We also compute the largest and smallest of the observations – the max and the min

• The five number summary

min, Q1, Q2, Q3, max

Page 79: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Example

The data Verbal IQ on n = 23 students arranged in increasing order is:

80 82 84 86 86 89 90 94 94 95 95 96 99 99 102 102 104 105 105 109 111 118 119

Q2 = 96Q1 = 89 Q3 = 105min = 80 max = 119

Page 80: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

The Box Plot is then drawn

• Drawing above an axis a “box” from Q1 to Q3.

• Drawing vertical line in the box at the median, Q2

• Drawing whiskers at the lower and upper ends of the box going down to the min and up to max.

Page 81: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

BoxLower Whisker

Upper Whisker

Q2Q1Q3min max

Page 82: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Example

The data Verbal IQ on n = 23 students arranged in increasing order is:

min = 80

Q1 = 89

Q2 = 96

Q3 = 105

max = 119

This is sometimes called the five-number summary

Page 83: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

70 80 90 100 110 120 130

Box Plot of Verbal IQ

Page 84: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

70

80

90

100

110

120

130

Box Plot can also be drawn vertically

Page 85: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Box-Whisker plots(Verbal IQ, Math IQ)

Page 86: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Box-Whisker plots(Initial RA, Final RA )

Page 87: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Summary Information contained in the box plot

Middle 50% of population

25% 25% 25% 25%

Page 88: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Advance Box Plots

Page 89: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

• An outlier is a “wild” observation in the data

• Outliers occur because– of errors (typographical and computational)– Extreme cases in the population

• We will now consider the drawing of box-plots where outliers are identified

Page 90: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

To Draw a Box Plot we need to:

• Compute the Hinge (Median, Q2) and the Mid-hinges (first & third quartiles – Q1 and Q3 )

• The difference Q3– Q1 is called the inter-quartile range (denoted by IQR)

• To identify outliers we will compute the inner and outer fences

Page 91: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

The fences are like the fences at a prison. We expect the entire population to be within both sets of fences.

If a member of the population is between the inner and outer fences it is a mild outlier.

If a member of the population is outside of the outer fences it is an extreme outlier.

Page 92: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Inner fences

Page 93: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Lower inner fence

f1 = Q1 - (1.5)IQR

Upper inner fence

f2 = Q3 + (1.5)IQR

Page 94: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Outer fences

Page 95: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Lower outer fence

F1 = Q1 - (3)IQR

Upper outer fence

F2 = Q3 + (3)IQR

Page 96: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

• Observations that are between the lower and upper inner fences are considered to be non-outliers.

• Observations that are outside the inner fences but not outside the outer fences are considered to be mild outliers.

• Observations that are outside outer fences are considered to be extreme outliers.

Page 97: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

• mild outliers are plotted individually in a box-plot using the symbol

• extreme outliers are plotted individually in a box-plot using the symbol

• non-outliers are represented with the box and whiskers with– Max = largest observation within the fences– Min = smallest observation within the fences

Page 98: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Inner fencesOuter fence

Mild outliers

Extreme outlierBox-Whisker plot representing the data that are not outliers

Page 99: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Example

Data collected on n = 109 countries in 1995.

Data collected on k = 25 variables.

Page 100: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

The variables

1. Population Size (in 1000s)

2. Density = Number of people/Sq kilometer

3. Urban = percentage of population living in cities

4. Religion

5. lifeexpf = Average female life expectancy

6. lifeexpm = Average male life expectancy

Page 101: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

7. literacy = % of population who read

8. pop_inc = % increase in popn size (1995)

9. babymort = Infant motality (deaths per 1000)

10. gdp_cap = Gross domestic product/capita

11. Region = Region or economic group

12. calories = Daily calorie intake.

13. aids = Number of aids cases

14. birth_rt = Birth rate per 1000 people

Page 102: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

15. death_rt = death rate per 1000 people

16. aids_rt = Number of aids cases/100000 people

17. log_gdp = log10(gdp_cap)

18. log_aidsr = log10(aids_rt)

19. b_to_d =birth to death ratio

20. fertility = average number of children in family

21. log_pop = log10(population)

Page 103: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

22. cropgrow = ??

23. lit_male = % of males who can read

24. lit_fema = % of females who can read

25. Climate = predominant climate

Page 104: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

The data file as it appears in SPSS

Page 105: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Consider the data on infant mortality

Stem-Leaf diagram stem = 10s, leaf = unit digit

0 4455555666666666777778888899 1 0122223467799 2 0001123555577788 3 45567999 4 135679 5 011222347 6 03678 7 4556679 8 5 9 4 10 1569 11 0022378 12 46 13 7 14 15 16 8

Page 106: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

median = Q2 = 27

Quartiles

Lower quartile = Q1 = the median of lower half

Upper quartile = Q3 = the median of upper half

Summary Statistics

1 3

12 12 66 6712, 66.5

2 2Q Q

Interquartile range (IQR)

IQR = Q1 - Q3 = 66.5 – 12 = 54.5

Page 107: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

lower = Q1 - 3(IQR) = 12 – 3(54.5) = - 151.5

The Outer Fences

No observations are outside of the outer fences

lower = Q1 – 1.5(IQR) = 12 – 1.5(54.5) = - 69.75

The Inner Fences

upper = Q3 = 1.5(IQR) = 66.5 + 1.5(54.5) = 148.25

upper = Q3 = 3(IQR) = 66.5 + 3(54.5) = 230.0

Only one observation (168 – Afghanistan) is outside of the inner fences – (mild outlier)

Page 108: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Box-Whisker Plot of Infant Mortality

0

0 50 100 150 200

Infant Mortality

Page 109: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Example 2

In this example we are looking at the weight gains (grams) for rats under six diets differing in level of protein (High or Low) and source of protein (Beef, Cereal, or Pork).

– Ten test animals for each diet

Page 110: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

TableGains in weight (grams) for rats under six diets

differing in level of protein (High or Low)and source of protein (Beef, Cereal, or Pork)

Level  High Protein Low protein

Source  Beef  Cereal  Pork Beef Cereal Pork

Diet 1 2 3 4 5 6

  73 98 94 90 107 49

  102 74 79 76 95 82

  118 56 96 90 97 73

  104 111 98 64 80 86

  81 95 102 86 98 81

  107 88 102 51 74 97

  100 82 108 72 74 106

  87 77 91 90 67 70

  117 86 120 95 89 61

  111 92 105 78 58 82

Median 103.0 87.0 100.0 82.0 84.5 81.5

Mean 100.0 85.9 99.5 79.2 83.9 78.7

IQR 24.0 18.0 11.0 18.0 23.0 16.0

PSD 17.78 13.33 8.15 13.33 17.04 11.05

Variance 229.11 225.66 119.17 192.84 246.77 273.79

Std. Dev. 15.14 15.02 10.92 13.89 15.71 16.55

Page 111: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Non-Outlier MaxNon-Outlier Min

Median; 75%25%

Box Plots: Weight Gains for Six Diets

Diet

We

igh

t G

ain

40

50

60

70

80

90

100

110

120

130

1 2 3 4 5 6

High Protein Low Protein

Beef Beef Cereal Cereal Pork Pork

Page 112: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Conclusions

• Weight gain is higher for the high protein meat diets

• Increasing the level of protein - increases weight gain but only if source of protein is a meat source

Page 113: Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Next topic:Numerical Measures of Variability