57
Descriptive Statistics: Overview Measures of Center Mode Median Mean * Measures of Symmetry Skewness Measures of Spread Range Inter-quartile Range Variance Standard deviation * * Measures of Position Percentile Deviation Score Z-score * *

Descriptive Statistics: Overview

  • Upload
    ziva

  • View
    57

  • Download
    2

Embed Size (px)

DESCRIPTION

Measures of Spread. Measures of Center. Range Inter-quartile Range Variance Standard deviation. Mode Median Mean. *. *. *. Measures of Symmetry. Measures of Position. Skewness. Percentile Deviation Score Z-score. *. *. Descriptive Statistics: Overview. Central tendency. - PowerPoint PPT Presentation

Citation preview

Page 1: Descriptive Statistics: Overview

Descriptive Statistics: Overview

Measures of Center

Mode Median Mean

*

Measures of Symmetry

Skewness

Measures of SpreadRange Inter-quartile Range VarianceStandard deviation*

*

Measures of Position

Percentile Deviation ScoreZ-score

**

Page 2: Descriptive Statistics: Overview

Central tendency

• Seeks to provide a single value that best represents a distribution

Page 3: Descriptive Statistics: Overview

Central tendency

0

2

4

6

8

10

12

14

16

18

3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5

Nightly Hours of Sleep

No.

of

Peop

le

Page 4: Descriptive Statistics: Overview

Central tendency

0

2

4

6

8

10

12

14

16

0 1 2 3 4 5 6

# of wheels

# of

veh

icle

s

Page 5: Descriptive Statistics: Overview

Central tendency

0

5

10

15

20

25

30

35

40

0 20 40 60 80 100

120

140

160

180

200

220

240

Income in 1,000s

No.

of

Peo

ple

Page 6: Descriptive Statistics: Overview

Central tendency

• Seeks to provide a single value that best represents a distribution

• Typical measures are – mode – median– mean

Page 7: Descriptive Statistics: Overview

Mode

• the most frequently occurring score value• corresponds to the highest point on the frequency distribution

0

1

2

3

4

5

33 34 35 36 37 38 39 40 41 42 43 44 45

Score

Fre

quen

cyFor a given sample N=16:

33 35 36 37 38 38 38 39 39 39 39 40 40 41 41 45

The mode = 39

Page 8: Descriptive Statistics: Overview

Mode

• The mode is not sensitive to extreme scores.

0

1

2

3

4

5

33 35 37 39 41 43 45 47 49

Score

Fre

quen

cy

For a given sample N=16:

33 35 36 37 38 38 38 39 39 39 39 40 40 41 41 50

The mode = 39

Page 9: Descriptive Statistics: Overview

Mode

• a distribution may have more than one mode

0

1

2

3

4

5

33 34 35 36 37 38 39 40

Score

Fre

quen

cy

For a given sample N=16:

34 34 35 35 35 35 36 37 38 38 39 39 39 39 40 40

The modes = 35 and 39

Page 10: Descriptive Statistics: Overview

Mode

• there may be no unique mode, as in the case of a rectangular distribution

0

1

2

3

4

5

33 34 35 36 37 38 39 40

Score

Fre

quen

cy

For a given sample N=16:

33 33 34 34 35 35 36 36 37 37 38 38 39 39 40 40

No unique mode

Page 11: Descriptive Statistics: Overview

Median• the score value that cuts the distribution in half (the

“middle” score)• 50th percentile

0

1

2

3

4

5

33 34 35 36 37 38 39 40

Score

Fre

quen

cyFor N = 15the median is the eighth score = 37

Page 12: Descriptive Statistics: Overview

Median

0

1

2

3

4

5

33 34 35 36 37 38 39 40

Score

Fre

quen

cyFor N = 16the median is the average of the eighth and ninth scores = 37.5

Page 13: Descriptive Statistics: Overview

Mean

• this is what people usually have in mind when they say “average”

• the sum of the scores divided by the number of scores

Changing the value of a single score may not affect the mode or median, but it will affect the mean.

For a population:

n

X

For a sample:

n

XX

Page 14: Descriptive Statistics: Overview

Mean

X=7.07 In many cases the mean is the preferred measure of central tendency, both as a description of the data and as an estimate of the parameter.0

2

4

6

8

10

12

14

16

18

3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5

Nightly Hours of Sleep

No.

of

Peop

le

__

In order for the mean to be meaningful, the variable of interest must be measures on an interval scale.

0

1

2

3

4

5

Buddhist

Prote

stant

Cathol

ic

Jewish

Musli

m

Score

Fre

quen

cy

X=2.4__

Page 15: Descriptive Statistics: Overview

Mean

The mean is sensitive to extreme scores and is appropriate for more symmetrical distributions.

0

1

2

3

4

5

33 34 35 36 37 38 39 40

Score

Fre

quen

cy

X=36.8__

0

1

2

3

4

5

33 34 35 36 37 38 39 40

Score

Fre

quen

cy

X=36.5__

0

5

10

15

20

25

30

35

40

0 20 40 60 80 100

120

140

160

180

200

220

240

Income in 1,000s

No.

of

Peo

ple X=93.2

__

Page 16: Descriptive Statistics: Overview

• a symmetrical distribution exhibits no skewness

• in a symmetrical distribution the Mean = Median = Mode

0

2

4

6

8

10

12

14

16

18

3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5

Nightly Hours of Sleep

No.

of

Peop

le

Symmetry

Page 17: Descriptive Statistics: Overview

• Skewness refers to the asymmetry of the distribution

0

5

10

15

20

25

30

35

40

0 20 40 60 80 100

120

140

160

180

200

220

240

Income in 1,000s

No.

of

Peo

ple

Skewed distributions

• A positively skewed distribution is asymmetrical and points in the positive direction.

Mode = 70,000$Median = 88,700$Mean = 93,600$

mode mean

median

•mode < median < mean

Page 18: Descriptive Statistics: Overview

• A negatively skewed distribution

Skewed distributions

0

1

2

3

4

5

6

7

0 20 40 60 80 100

Test score

No.

of

Peo

ple

• mode > median > mean

modemean

median

Page 19: Descriptive Statistics: Overview

Measures of central tendency

+ -

Mode• quick & easy to compute

• useful for nominal data

• poor sampling stability

Median• not affected by extreme scores • somewhat poor sampling

stability

Mean

• sampling stability

• related to variance

• inappropriate for discrete data

• affected by skewed distributions

Page 20: Descriptive Statistics: Overview

Distributions

• Center: mode, median, mean• Shape: symmetrical, skewed• Spread

0

2

4

6

8

10

12

14

16

0 10 20 30 40 50 60 70 80 90 100

Scores

# of

Peo

ple

Page 21: Descriptive Statistics: Overview

Measures of Spread

• the dispersion of scores from the center• a distribution of scores is highly variable if the scores

differ wildly from one another

• Three statistics to measure variability– range– interquartile range– variance

Page 22: Descriptive Statistics: Overview

Range

• largest score minus the smallest score

• these two

have same range (80)

but spreads look different

• says nothing about how scores vary around the center

• greatly affected by extreme scores (defined by them)

0

2

4

6

8

10

12

14

16

0 10 20 30 40 50 60 70 80 90 100

Scores#

of P

eopl

e

Page 23: Descriptive Statistics: Overview

Interquartile range

• the distance between the 25th percentile and the 75th percentile

• Q3-Q1 = 70 - 30 = 40• Q3-Q1 = 52.5 - 47.5 = 5

• effectively ignores the top and bottom quarters, so extreme scores are not influential

• dismisses 50% of the distribution

0

2

4

6

8

10

12

14

16

0 10 20 30 40 50 60 70 80 90 100

Scores#

of P

eopl

e

Page 24: Descriptive Statistics: Overview

Deviation measures

• Might be better to see how much scores differ from the center of the distribution -- using distance

• Scores further from the mean have higher deviation scores

Score Deviation

Amy 10 -40

Theo 20 -30

Max 30 -20

Henry 40 -10

Leticia 50 0

Charlotte 60 10

Pedro 70 20

Tricia 80 30

Lulu 90 40

AVERAGE 50

Page 25: Descriptive Statistics: Overview

Deviation measures

• To see how ‘deviant’ the distribution is relative to another, we could sum these scores

• But this would leave us with a big fat zero

Score Deviation

Amy 10 -40

Theo 20 -30

Max 30 -20

Henry 40 -10

Leticia 50 0

Charlotte 60 10

Pedro 70 20

Tricia 80 30

Lulu 90 40

SUM 0

Page 26: Descriptive Statistics: Overview

Deviation measures

So we use squared deviations from the mean

Score Deviation

Sq.

Deviation

Amy 10 -40 1600

Theo 20 -30 900

Max 30 -20 400

Henry 40 -10 100

Leticia 50 0 0

Charlotte 60 10 100

Pedro 70 20 400

Tricia 80 30 900

Lulu 90 40 1600

SUM 0 6000

This is the sum of squares (SS)

SS= ∑(X-X)2__

Page 27: Descriptive Statistics: Overview

Variance

We take the “average” squared deviation from the mean and call it VARIANCE

(to correct for the fact that sample variance tends to underestimate pop variance)

For a population:

N

SS2

For a sample:

12

n

SSs

Page 28: Descriptive Statistics: Overview

Variance

1. Find the mean.

2. Subtract the mean from every score.

3. Square the deviations.

4. Sum the squared deviations.

5. Divide the SS by N or N-1.

Score Dev’n Sq. Dev.

Amy 10 -40 1600

Theo 20 -30 900

Max 30 -20 400

Henry 40 -10 100

Leticia 50 0 0

Charlotte 60 10 100

Pedro 70 20 400

Tricia 80 30 900

Lulu 90 40 1600

SUM 0 6000 6000/8=750

Page 29: Descriptive Statistics: Overview

The standard deviation is the square root of the variance

The standard deviation measures spread in the original units of measurement, while the variance does so in units squared.

Variance is good for inferential stats. Standard deviation is nice for descriptive stats.

Standard deviation

12

n

SSss

Page 30: Descriptive Statistics: Overview

Example

0

2

4

6

8

10

12

14

0 10 20 30 40 50 60 70 80 90 100

Scores

# of

Peo

ple

N = 28X = 50s2 = 555.55s = 23.57

N = 28X = 50s2 = 140.74s = 11.86

Page 31: Descriptive Statistics: Overview

Descriptive Statistics: Quick Review

Measures of Center

Mode Median Mean

* *

Measures of Symmetry

Skewness

Measures of SpreadRange Inter-quartile Range VarianceStandard deviation* *

* *

Page 32: Descriptive Statistics: Overview

Descriptive Statistics: Quick Review

For a population: For a sample:

Mean

Variance

2 SS

N

s2 SS

n 1

Standard Deviation

2ss 2

Page 33: Descriptive Statistics: Overview

• Treat this little distribution as a sample and calculate:– Mode, median, mean

– Range, variance, standard deviation

1 2 3 4 5

Exercise

Page 34: Descriptive Statistics: Overview

Descriptive Statistics: Overview

Measures of Center

Mode Median Mean

*

Measures of Symmetry

Skewness

Measures of SpreadRange Inter-quartile Range VarianceStandard deviation*

*

Measures of Position

Percentile Deviation ScoreZ-score

**

Page 35: Descriptive Statistics: Overview

Measures of Position How to describe a data point in relation to its distribution

Page 36: Descriptive Statistics: Overview

Quantile

Deviation Score

Z-score

Measures of Position

Page 37: Descriptive Statistics: Overview

Quantiles

Quartile

Divides ranked scores into four equal parts

25% 25% 25% 25%

(minimum) (maximum)(median)

Page 38: Descriptive Statistics: Overview

Quantiles

10% 10% 10% 10% 10% 10% 10% 10% 10% 10%

Divides ranked scores into ten equal parts

Decile

Page 39: Descriptive Statistics: Overview

Quantiles

Divides ranked scores into 100 equal parts

Percentile rank of score x = • 100number of scores less than x

total number of scores

Percentile rank

Page 40: Descriptive Statistics: Overview

Deviation Scores

Score Deviation

Amy 10 -40

Theo 20 -30

Max 30 -20

Henry 40 -10

Leticia 50 0

Charlotte 60 10

Pedro 70 20

Tricia 80 30

Lulu 90 40

Average 50

For a population:

For a sample:

deviation X X

deviation X

Page 41: Descriptive Statistics: Overview

•What if we want to compare scores from distributions that have different means and standard deviations?

•Example –Nine students scores on two different tests

–Tests scored on different scales

Page 42: Descriptive Statistics: Overview

Nine Students on Two Tests

Test 1 Test 2

Amy 10 1

Theo 20 2

Max 30 3

Henry 40 4

Leticia 50 5

Charlotte 60 6

Pedro 70 7

Tricia 80 8

Lulu 90 9

Average 50 5

Page 43: Descriptive Statistics: Overview

Nine Students on Two Tests

Test 1 Test 2Deviation Score 1

Deviation Score 2

Amy 10 1 -40 -4

Theo 20 2 -30 -3

Max 30 3 -20 -2

Henry 40 4 -10 -1

Leticia 50 5 0 0

Charlotte 60 6 10 1

Pedro 70 7 20 2

Tricia 80 8 30 3

Lulu 90 9 40 4

Average 50 5

Page 44: Descriptive Statistics: Overview

Z-Scores

• Z-scores modify a distribution so that it is centered on 0 with a standard deviation of 1

• Subtract the mean from a score, then divide by the standard deviation

For a population: For a sample:

S

XXz

X

z

Page 45: Descriptive Statistics: Overview

Z-Scores

Test 1 Test 2 Z- Score 1 Z-Score 2

Amy 10 1 -1.5 -1.5

Theo 20 2 -1.2 -1.2

Max 30 3 -.77 -.77

Henry 40 4 -.34 -.34

Leticia 50 5 0 0

Charlotte 60 6 .34 .34

Pedro 70 7 .77 .77

Tricia 80 8 1.2 1.2

Lulu 90 9 1.5 1.5

Average 50 5 0 0

St Dev 25.8 2.58 1 1

Page 46: Descriptive Statistics: Overview

A distribution of Z-scores…

Z-Scores

•Always has a mean of zero

•Always has a standard deviation of 1

•Converting to standard or z scores does not change the shape of the distribution: z scores cannot normalize a non-normal distribution

A Z-score is interpreted as “number of standard deviations above/below the mean”

Page 47: Descriptive Statistics: Overview

Exercise

Test 3 Z-Score

Amy 52

Theo 39

Max -1.5

Henry 1.3

On their third test, the class average was 45 and the standard deviation was 6. Fill in the rest.

Page 48: Descriptive Statistics: Overview

Descriptive Statistics: Quick Review

For a population: For a sample:

Mean

Variance

Z-score

2 SS

N

s2 SS

n 1

S

XXz

X

z

Standard Deviation

2ss 2

Page 49: Descriptive Statistics: Overview

If you add or subtract a constant from each value in a distribution, then• the mean is increased/decreased by that amount• the standard deviation is unchanged• the z-scores are unchanged

If you multiply or divide each value in a distribution by a constant, then• the mean is multiplied/divided by that amount• the standard deviation is multiplied/divided by that amount• the z-scores are unchanged

Messing with Units

Page 50: Descriptive Statistics: Overview

ExampleScore Dev’s Sq dev Z-score

Theo 5 -1 1 -1.5

Max 3 -3 9 -.5

Henry 5 -1 1 .5

Leticia 7 1 1 .5

Charlotte 7 1 1 1.0

Pedro 8 2 4 -1.0

Tricia 4 -2 4 1.5

Lulu 9 3 9 -.5

MEAN 6 STDEV 1.94

Page 51: Descriptive Statistics: Overview

Adding 1Score Dev’s Sq dev Z-score

Theo 6 -1 1 -1.5

Max 4 -3 9 -.5

Henry 6 -1 1 .5

Leticia 8 1 1 .5

Charlotte 8 1 1 1.0

Pedro 9 2 4 -1.0

Tricia 5 -2 4 1.5

Lulu 10 3 9 -.5

MEAN 7 STDEV 1.94

Page 52: Descriptive Statistics: Overview

ExampleScore Dev’s Sq dev Z-score

Theo 5 -1 1 -1.5

Max 3 -3 9 -.5

Henry 5 -1 1 .5

Leticia 7 1 1 .5

Charlotte 7 1 1 1.0

Pedro 8 2 4 -1.0

Tricia 4 -2 4 1.5

Lulu 9 3 9 -.5

MEAN 6 STDEV 1.94

Page 53: Descriptive Statistics: Overview

Multiplying by 10Score Dev’s Sq dev Z-score

Theo 50 -10 100 -1.5

Max 30 -30 900 -.5

Henry 50 -10 100 .5

Leticia 70 10 100 .5

Charlotte 70 10 100 1.0

Pedro 80 20 400 -1.0

Tricia 40 -20 400 1.5

Lulu 90 30 900 -.5

MEAN 60 STDEV 19.4

Page 54: Descriptive Statistics: Overview

Other Standardized Distributions

The Z distribution is not the only standardized distribution. You can easily create others (it’s just messing with units, really).

Page 55: Descriptive Statistics: Overview

Score

Theo 5

Max 3

Henry 5

Leticia 7

Charlotte 7

Pedro 8

Tricia 4

Lulu 9

Average 6

St Dev 1.94

Example:

Let’s change these test scores into ETS type scores (mean 500, stdev 100)

Other Standardized Distributions

Page 56: Descriptive Statistics: Overview

Score Z-Score ETS type

score

Theo 3 -1.5 350

Max 5 -.5 450

Henry 7 .5 550

Leticia 7 .5 550

Charlotte 8 1.0 600

Pedro 4 -1.0 400

Tricia 9 1.5 650

Lulu 5 -.5 450

Average 6 0 500

St Dev 1.94 1 100

Here’s How:

Convert to Z scores

Multiply by 100 to increase the st dev

Add 500 to increase the mean

Other Standardized Distributions

Page 57: Descriptive Statistics: Overview

Exercise

Score PercentileDeviation Score Z-Score

IQ type score

(Mean 100

Stdev 10)

Theo 20

Max 18

Henry 13

Leticia 17

Charlotte 19

Pedro 16

Tricia 11

Lulu 9