9
1 Math 103 Statistics and Probability Central Tendency and Spread CJD Characteristics of Data Center: A representative or average value t hat indicates where the middle of the data set is located Variation: A measure of the amount that the values vary among themselves Distribution: The nature or shape of the distribution of data (such as bell-shaped, uniform, or skewed) Outliers: Sample values t hat lie very far away from the vast majority of other sample values CJD Measures of Center a value at the center or middle of a data set Notation : Σ Σ Σ Σ denotes the addition of a set of values  x is the variable usually used to represent the indivi dual data values  n represents the number of data values in a sample N represents the number of data values in a population CJD Mean Mean (Arithmetic Mean) AVERAGE the number obtained by adding the values and dividing the total by the number of values  µ is pronounced ‘myu’ and denotes the mean of all values in a population is pronounced ‘x-bar’ and denotes the mean of a set of sample values Calculators can calculate the mean of data  x  =  n Σ Σ Σ Σ x  x  N  µ  = Σ Σ Σ Σ x

Math 103 02 Central Tendency and Spread_1

Embed Size (px)

Citation preview

Page 1: Math 103 02 Central Tendency and Spread_1

7/27/2019 Math 103 02 Central Tendency and Spread_1

http://slidepdf.com/reader/full/math-103-02-central-tendency-and-spread1 1/9

1

Math 103Statistics andProbability

Central Tendency and Spread

CJD

Characteristics of Data

Center: A representative or average value thatindicates where the middle of the data set is

located

Variation: A measure of the amount that the

values vary among themselves

Distribution: The nature or shape of thedistribution of data (such as bell-shaped, uniform,

or skewed) Outliers: Sample values that lie very far away

from the vast majority of other sample values

CJD

Measures of Center

a value at the center or middle of a data set

Notation :

ΣΣΣΣ denotes the addition of a set of values

 x is the variable usually used to represent theindividual data values

 n represents the number of data values in a sample

N represents the number of data values in apopulation

CJD

Mean

Mean (Arithmetic Mean) AVERAGE

the number obtained by adding the values anddividing the total by the number of values

 µ is pronounced ‘myu’ and denotes the mean of all values

in a population

is pronounced ‘x-bar’ and denotes the mean of a set

of sample values

Calculators can calculate the mean of data

 x   = nΣΣΣΣ x

 x

 N  µ   =

ΣΣΣΣ x

Page 2: Math 103 02 Central Tendency and Spread_1

7/27/2019 Math 103 02 Central Tendency and Spread_1

http://slidepdf.com/reader/full/math-103-02-central-tendency-and-spread1 2/9

2

CJD

Mean

6.72 3 .46 3.60 6.44 26.70

Example: Find the mean of the following weights (in kg)

of sample carry-on luggages presented at an airport

check-in counter in the last hour.

Solution :

Sum of all weights = 6.72 + 3.46 + 3.60 + 6.44 + 26.70 = 46.92

Number of weights = 5

Mean = 46.92 / 5 = 9.384 kg.

Notice the impact of the outlier 26.70 on the mean.

CJD

often denoted by (pronounced ‘x-tilde’)

or by (pronounced ‘myu-tilde’)

is not affected by an extreme value

Median

 x~

Medianthe middle value when the original data values are

arranged in order of increasing magnitude

~

CJD

Median

6.72 3.46 3.60 6.44 26.70

3.46 3.60 6.44 6.72 26.70

(odd number of values)

exact middle

MEDIAN is 6.44

6.72 3 .46 3.60 6.44

3.46 3.60 6.44 6.72

no exact middle -- shared by two numbers3.60 + 6.44

2

(even number of values)

MEDIAN is 5.02

unsorted

sorted

unsorted

sorted

CJD

Mode

Mode

- the score that occurs most frequently- Unimodal, Bimodal, Multimodal or No Mode

- denoted by M- the only measure of central tendency that can

be used with nominal data

a. 5 5 5 3 1 5 1 4 3 5

b. 1 2 2 2 3 4 5 6 6 6 7 9

c. 1 2 3 6 7 8 9 10

Mode is 5

Bimodal - 2 and 6

No Mode

Page 3: Math 103 02 Central Tendency and Spread_1

7/27/2019 Math 103 02 Central Tendency and Spread_1

http://slidepdf.com/reader/full/math-103-02-central-tendency-and-spread1 3/9

3

CJD

Qualitative Data

376PajeroMitsubishi

581LancerMitsubishi

1,243JeepneySarao

459CRVHonda

960CityHonda

732CivicHonda

417InnovaToyota

725AltisToyota

104PriusToyota

1,098ViosToyota

Units SoldModelMaker

Mode: Sarao JeepneyCJD

Comparison

• can be used

also for nominaldata

• hardly requiresany calculation if

data is sorted

• may not exist ormay not be

unique

• not useful for

small n

• Second most

useful

• not affected by

outliers – givestruer average

• Easy to compute

if data is sorted orn is small

• varies greatlyfrom sample to

sample

• most useful

• easiest tocompute for large n

• uses all data

• does not vary

much from sampleto sample

• distribution ofmeans is well

known

• affected byoutliers

ModeMedianMean

CJD

Weighted Mean

 x   =w

ΣΣΣΣ (w • x)

ΣΣΣΣ

Each individual value x may have

a weight w associated with it.

Example: A talent show is judged 40% execution,30% difficulty, 20% originality and 10% audience impact.

If a contestant scored 8,9,6 and 7, the weighted mean is

= 3.2 + 2.7 + 1.2 + 0.7 = 7.8

CJD

Raw Data

6074745872

5882522672

6666609278

4638506650

6264686284

5466664460

8470767266

7064524078

7642506448

6440825474

Raw Data – Test Scores in a Statistics Test

Page 4: Math 103 02 Central Tendency and Spread_1

7/27/2019 Math 103 02 Central Tendency and Spread_1

http://slidepdf.com/reader/full/math-103-02-central-tendency-and-spread1 4/94

CJD

Sorted Data

9274666050

8474666050

8472666048

8272665846

8272645844

7870645442

7870645440

7668645240

7666625238

7466625026

66

64~

7.62

=

=

=

 M 

 µ 

 µ 

Applying the formulas, (and using a calculator) we get …

CJD

Measures of Spread or Variation

Range

Mean Deviation

Variance

Standard Deviation

CJD

Range and Midrange

9274666050

8474666050

8472666048

8272665846

8272645844

7870645442

7870645440

7668645240

7666625238

7466625026

Range = Highest Value – Lowest Value

In Example:

Range = 92 – 26 = 66

(a measure of spread)

Mid-Range =(92+26)/2 = 59

(a measure of center)

Mid-Range = (Highest + Lowest) / 2

CJD

Mean Deviation

n

 x x∑   −

 N 

 x∑   − µ 

Mean Dev of a Sample Mean Dev of a Population

6.72 3 .46 3.60 6.44 26.70

Example: Weights of Carry-on Luggages

Mean = 9.384 Range = 26.70 – 3.46 = 23.24

Mean Deviation =

926.65

632.34

5

384.970.26384.944.6384.960.3384.946.3384.972.6==

−+−+−+−+−

Page 5: Math 103 02 Central Tendency and Spread_1

7/27/2019 Math 103 02 Central Tendency and Spread_1

http://slidepdf.com/reader/full/math-103-02-central-tendency-and-spread1 5/95

CJD

Variance

 N 

 x

 N 

i

i∑=

=  1

2

2

)(  µ 

σ 

1

)(1

2

2

=

∑=

n

 x x

s

n

i

i

Population Variance Sample Variance

Computing Formula for Variance

)1(

1

2

1

2

2

  

  −

=

∑ ∑= =

nn

 x xn

s

n

i

n

i

ii

Using n-1 will reduce the biasUsing n will underestimate variance

2

1

2

1

2

2

 N 

 x x N  N 

i

 N 

i

ii∑ ∑= =

  

  −

=σ 

CJD

Variance Example

824.45446.92

712.89026.70

41.4746.44

12.9603.60

11.9723.46

45.1586.72

i x

  2

i x

6.72 3 .46 3.60 6.44 26.70

Example: Weights of sample Carry-on Luggages

039.96)4(5

)92.46()454.824(5  2

2=

−=s

CJD

Standard Deviation

2σ σ   =

  2ss  =

Population SD Sample SD

In example,

800.9039.96   ==s

CJD

Symbols for Standard Deviation

Sample Population

σ σσ σ 

σσσσ x

xσ σσ σ n

s

Sx

xσ σσ σ n-1

Textbook

Some graphicscalculators

Somenon-graphicscalculators

Textbook

Some graphicscalculators

Somenon-graphics

calculators

Excel variance Excel variancevar varp

Page 6: Math 103 02 Central Tendency and Spread_1

7/27/2019 Math 103 02 Central Tendency and Spread_1

http://slidepdf.com/reader/full/math-103-02-central-tendency-and-spread1 6/96

CJD

Comparison

• Easiest to use

and interpret

• Not useful for

large n

• Says nothing

about

distribution ofdata between

max and min

• Considers all data

relative to themean

• Simple butawkward to

compute

• most useful

• Considers all datarelative to the mean

• Computation moreinvolved

• interpretation notstraight-forward

RangeMean DeviationVariance / SD

CJD

Example : The prelim exam grades of a Statistics class

and a Calculus class are summarized below:

Coefficient of Variation

%100×= µ 

σ CV 

31.50%17.856.5Calculus32.26%22.068.2Statistics

Coefficientof Variation

StandardDeviation

MeanSubject

To compare spreads of samples/populations with different

means

Therefore, the statistics grades are relatively only slightly

more variable than the calculus grades.

%100×= x

sCV 

CJD

z scores

z  Score (or standard score)

- A measure of position relative to other data

- the number of standard deviations that a givenvalue x is above or below the mean

Sample

z = x - x

s

Population

z =  x - µσ σσ σ 

Round to 2 decimal places

CJD

Example

A student scored 67 in a calculus test and 74 in astatistics test. If the calculus test has a mean of 53 with

SD of 8, and the statistics test has a mean of 65 with SDof 6, did the student fare better relative to his classmates

in calculus or in statistics ?

Calculus: z = (67-53)/8 = 1.75

Statistics: z = (74-65)/6 = 1.50

Conclusion: The student fared better in Calculus

Page 7: Math 103 02 Central Tendency and Spread_1

7/27/2019 Math 103 02 Central Tendency and Spread_1

http://slidepdf.com/reader/full/math-103-02-central-tendency-and-spread1 7/97

CJD

Interpreting z scores

- 3 - 2 - 1 0 1 2 3Z

UnusualValues

UnusualValues

OrdinaryValues

CJD

Measures of Location or Position

Percentiles – 100 parts (in 1%)

Deciles  – 10 parts (in 10%)

Quartiles  – 4 parts (in 25%)

Fractiles or Quantiles

CJD

Percentiles

Percentiles

P1, P

2, P

3, …, P

98, P

99

i % of the data falls below (<=) Pi

CJD

Deciles

D1, D2, D3, D4, D5, D6, D7, D8, D9

divides ranked data into ten equal parts

10% 10% 10% 10% 10% 10% 10% 10% 10% 10%

D1 D2 D3 D4 D5 D6 D7 D8 D9

i*10% of the data falls below (<=) Di

D9 is the 90th Percentile or P90

Page 8: Math 103 02 Central Tendency and Spread_1

7/27/2019 Math 103 02 Central Tendency and Spread_1

http://slidepdf.com/reader/full/math-103-02-central-tendency-and-spread1 8/98

CJD

Quartiles

Q1, Q2, Q3

divides ranked scores into four equal parts

25% 25% 25% 25%

Q3Q2Q1(minimum) (maximum)

(median)

i*25% of the data falls below (<=) Qi

Q50 is the 50th percentile P50 or 5th decile D5

CJD

Example

9274666050

8474666050

8472666048

8272665846

8272645844

7870645442

7870645440

7668645240

7666625238

7466625026P5 =P4 =

P94=

P99=

D4 =

D9 =Q2 =

Q3 =

40 (5/100)*50 rounds up to 3rd

39 (4/100)*50=2 :get mid 38&40

83 (get mid 47th and 48th)

92 (99/100)*50 rds up to 50th

61 (4/10)*50 :get mid 20th&21st

80(9/10)*50 :get mid 45th&46th

64 (2/4)*50 :get mid 25th&26th

72 (3/4)*50 rounds up to 38th

CJD

Decile of score x = • 10

Quartile of score x = • 4

Quantile of a Score

Percentile of score x = • 100number of scores <= x 

total number of scores

number of scores <= x 

total number of scores

number of scores <= x 

total number of scores

If result is not an integer, Round up to the next higher integer

Example: 32 of 50 test scores are <= 66.

32/50*100=64, 32/50*10=6.4, 32/50*4=2.56

So test score 66 is in P64, D7 and Q3

To improve estimate, include only half of other scores equal to x in the numerator.

CJD

Interquartile and Percentile Range

Percentile Range = P90 – P10

9274666050

8474666050

8472666048

8272665846

8272645844

7870645442

78706454407668645240

7666625238

7466625026 Q3 = 72

Q1 = 52IQR = 20

Range = 92 – 26 = 66

P10 = 43P90 = 80P10 to P90 Range

= 80-43 = 37

Interquartile Range (or IQR) = Q3 – Q1

Page 9: Math 103 02 Central Tendency and Spread_1

7/27/2019 Math 103 02 Central Tendency and Spread_1

http://slidepdf.com/reader/full/math-103-02-central-tendency-and-spread1 9/99

CJD

Boxplots

Simple graph to indicate Median, IQR and Outliers

Also known as the box and whisker plot

9274666050

8474666050

8472666048

8272665846

8272645844

7870645442

7870645440

7668645240

7666625238

7466625026

26 92

Q1=52 Q2=64 Q3=72 IQR=20

Variation:Whiskers extend only to 1.5*IQR below

Q1 and 1.5*IQR above Q3. Outside data

are marked with circles to mark outliers.

Basic Boxplot:Q1 Q2 Q3

CJD

End