23
populations vs. samples • we want to describe both samples and populations • the latter is a matter of inference…

Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…

Embed Size (px)

Citation preview

Page 1: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…

populations vs. samples

• we want to describe both samples and populations

• the latter is a matter of inference…

Page 2: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…

“outliers”

• minority cases, so different from the majority that they merit separate consideration– are they errors?– are they indicative of a different pattern?

• think about possible outliers with care, but beware of mechanical treatments…

• significance of outliers depends on your research interests

Page 3: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…
Page 4: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…

summaries of distributions

• graphic vs. numeric– graphic may be better for visualization– numeric are better for statistical/inferential

purposes

• resistance to outliers is usually an advantage in either case

Page 5: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…

general characteristics

• kurtosis

-5 50.00

0.22

-5 5D

0.0

0.4X

-5 5D

0.0

0.8

X

‘leptokurtic’ ’platykurtic’

[“peakedness”]

Page 6: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…

0.0 0.2 0.4 0.6 0.8 1.0 1.2D

0

1

2

3

4

5X

right(positive)

skew

0.0 0.2 0.4 0.6 0.8 1.0 1.2D

0

1

2

3

4

5

X

left(negative)

skew

• skew (skewness)

Page 7: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…
Page 8: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…

central tendency

• measures of central tendency– provide a sense of the value expressed by

multiple cases, over all…

• mean

• median

• mode

Page 9: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…

mean

• center of gravity

• evenly partitions the sum of all measurement among all cases; average of all measures

n

xx

n

ii

1

Page 10: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…

• crucial for inferential statistics

• mean is not very resistant to outliers

• a “trimmed mean” may be better for descriptive purposes

mean – pro and con

Page 11: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…

meanrim diameter (cm)

unit 1 unit 212.6 16.211.6 16.416.3 13.813.1 13.212.1 11.326.9 14.09.7 9.0

11.5 12.514.8 15.613.5 11.212.4 12.213.6 15.5

11.7

n 12 13total 168.1 172.6total/n 14.0 13.3

unit 1 unit 29 26

252423222120191817

3 16 2415 56

14.0== 8 14 0651 13 28 ==13.3641 12 25

65 11 23710

7 9 0

R: mean(x)

Page 12: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…

trimmed meanrim diameter (cm)

unit 1 unit 29.7 9.0

11.5 11.211.6 11.312.1 11.712.4 12.212.6 12.513.1 13.213.5 13.813.6 14.014.8 15.516.3 15.626.9 16.2

16.4

n 10 11total 131.5 147.2total/n 13.2 13.4

unit 1 unit 29 26

252423222120191817

3 16 2415 56

8 14 013.2== 651 13 28 ==13.4

641 12 2565 11 237

107 9 0

R: mean(x, trim=.1)

Page 13: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…

median

• 50th percentile…

• less useful for inferential purposes

• more resistant to effects of outliers…

Page 14: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…

median

rim diameter (cm)

unit 1 unit 29.7 9.0

11.5 11.211.6 11.312.1 11.712.4 12.212.6 12.5

12.9 <-- 13.2 13.213.1 13.813.5 14.013.6 15.514.8 15.616.3 16.226.9 16.4

unit 1 unit 29 26

252423222120191817

3 16 2415 56

8 14 0651 13 28 ==13.20

12.85== 641 12 2565 11 237

107 9 0

Page 15: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…

mode

• the most numerous category• for ratio data, often implies that data have

been grouped in some way• can be more or less created by the grouping

procedure• for theoretical distributions—simply the

location of the peak on the frequency distribution

Page 16: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…

isol

ated

sca

tter

s

ham

lets

vill

ages

regi

onal

cen

ters

regi

onal

cen

ters

modal class = ‘hamlets’

-5 50.00

0.22

1.0 1.5 2.0 2.5

Page 17: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…

dispersion

• measures of dispersion – summarize degree of clustering of cases, esp.

with respect to central tendency…

• range

• variance

• standard deviation

Page 18: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…

range

unit 1 unit 29.7 9.0

11.5 11.211.6 11.312.1 11.712.4 12.212.6 12.513.1 13.213.5 13.813.6 14.014.8 15.516.3 15.626.9 16.2

16.4

unit 1 unit 2* 9 26| 25| 24| 23| 22| 21| 20| 19| 18| 17| 3 16 24 *| 15 56 || 8 14 0 || 651 13 28 || 641 12 25 || 65 11 237 || 10 |* 7 9 0 *

• would be better to use midspread…R: range(x)

Page 19: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…

variance

• analogous to average deviation of cases from mean

• in fact, based on sum of squared deviations from the mean—“sum-of-squares”

11

2

2

n

xxs

n

ii

R: var(x)

Page 20: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…

variance

• computational form:

1

/2

11

2

2

n

nxx

s

n

ii

n

ii

Page 21: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…

• note: units of variance are squared…

• this makes variance hard to interpret

• ex.: projectile point sample:mean = 22.6 mmvariance = 38 mm2

• what does this mean???

Page 22: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…

standard deviation

• square root of variance:

11

2

n

xxs

n

ii

1

/1

2

1

2

n

nxx

s

n

i

n

iii

Page 23: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…

standard deviation

• units are in same units as base measurements

• ex.: projectile point sample:mean = 22.6 mmstandard deviation = 6.2 mm

• mean +/- sd (16.4—28.8 mm)– should give at least some intuitive sense of where most

of the cases lie, barring major effects of outliers