Introduction to Summary Statistics. Statistics The collection, evaluation, and interpretation of...

Preview:

Citation preview

Introduction to Summary Statistics

Statistics• The collection, evaluation, and interpretation of

data

• Statistical analysis of measurements can help verify the quality of a design or process

Summary Statistics

Central Tendency• “Center” of a distribution

– Mean, median, mode

Variation• Spread of values around the center

– Range, standard deviation, interquartile range

Distribution• Summary of the frequency of values

– Frequency tables, histograms, normal distribution

• The meanmean is the sum of the values of a set of data divided by the number of values in that data set.

Mean Central Tendency

Mean Central Tendency

• Data Set

3 7 12 17 21 21 23 27 32 36 44

• Sum of the values = 243

• Number of values = 11

Mean = 24311

= 22.09=

Mean Central Tendency

Mode Central Tendency

• Measure of central tendency

• The most frequently occurring value in a set of data is the mode

• Symbol is M

27 17 12 7 21 44 23 3 36 32 21

Data Set:

• The most frequently occurring value in a set of data is the mode

3 7 12 17 21 21 23 27 32 36 44

Data Set:

Mode = M = 21

Mode Central Tendency

• The most frequently occurring value in a set of data is the mode.

• Bimodal Data Set: Two numbers of equal frequency stand out

• Multimodal Data Set: If more than two numbers of equal frequency stand out

Mode Central Tendency

Determine the mode of

48, 63, 62, 49, 58, 2, 63, 5, 60, 59, 55Mode = 63

Determine the mode of

48, 63, 62, 59, 58, 2, 63, 5, 60, 59, 55Mode = 63 & 59 Bimodal

Determine the mode of

48, 63, 62, 59, 48, 2, 63, 5, 60, 59, 55Mode = 63, 59, & 48 Multimodal

Mode Central Tendency

• Measure of central tendency

• The median is the value that occurs in the middle of a set of data that has been arranged in numerical order

• Symbol is x, pronounced “x-tilde”~

Median Central Tendency

• The median is the value that occurs in the middle of a set of data that has been arranged in numerical order.

Data Set:27 17 12 7 21 44 23 3 36 32 21

Median Central Tendency

• A data set that contains an odd number of values always has a Median.

3 7 12 17 21 21 23 27 32 36 44Data Set:

Median Central Tendency

• For a data set that contains an even number of values, the two middle values are averaged with the result being the Median.

3 7 12 17 21 21 23 27 31 32 36 44Data Set:

Median Central Tendency

• Measure of data variation.

• The range is the difference between the largest and smallest values that occur in a set of data.

• Symbol is R

Range = R = 44 – 3 = 41

3 7 12 17 21 21 23 27 32 36 44Data Set:

Range Variation

• Measure of data variation.

• The standard deviation is a measure of the spread of data values.– A larger standard deviation indicates a wider

spread in data values

Standard Deviation Variation

Standard Deviation Variation

σ = standard deviation

xi = individual data value ( x1, x2, x3, …)

μ = mean

N = size of population

Standard Deviation Variation

Procedure:

1.Calculate the mean, μ.

2.Subtract the mean from each value and then square each difference.

3.Sum all squared differences.

4.Divide the summation by the size of the population (number of data values), N.

5.Calculate the square root of the result.

Standard Deviation

2, 5, 48, 49, 55, 58, 59, 60, 62, 63, 63

Calculate the standard deviation for the data array

524

111. Calculate the mean. 47.64

2. Subtract the mean from each data value and square each difference.

(2 - 47.64)2 = 2083.01 (5 - 47.64)2 = 1818.17(48 - 47.64)2 = 0.13(49 - 47.64)2 = 1.85(55 - 47.64)2 = 54.17(58 - 47.64)2 = 107.33

(59 - 47.64)2 = 129.05(60 - 47.64)2 = 152.77(62 - 47.64)2 = 206.21(63 - 47.64)2 = 235.93(63 - 47.64)2 = 235.93

Standard Deviation Variation

3. Sum all squared differences.

2083.01 + 1818.17 + 0.13 + 1.85 + 54.17 + 107.33 + 129.05 + 152.77 + 206.21 + 235.93 + 235.93

= 5,024.55

4. Divide the summation by the number of data values.

5. Calculate the square root of the result.

A Note about Standard Deviation

• Two distinct calculations– Population Standard Deviation

• The measure of the spread of data within a population.

• Used when you have a data value for every member of the entire population of interest.

– Sample Standard Deviation• An estimate of the spread of data within a larger

population.• Used when you do not have a data value for every

member of the entire population of interest.• Uses a subset (sample) of the data to generalize

the results to the larger population.

Population Standard Deviation

SampleStandard Deviation

A Note about Standard Deviation

σ = population standard deviationxi = individual data value ( x1, x2, x3, …)

μ = population mean

N = size of population

Sample Standard Deviation Variation

Sample Mean Central Tendency

Essen

tially

the

sam

e ca

lculat

ion a

s

popu

lation

mea

n

2. Subtract the sample mean from each data value and square the difference.

Sample Standard Deviation

2, 5, 48, 49, 55, 58, 59, 60, 62, 63, 63

Estimate the standard deviation for a population for which the following data is a sample.

524

11 47.641. Calculate the sample mean.

(2 - 47.64)2 = 2083.01 (5 - 47.64)2 = 1818.17(48 - 47.64)2 = 0.13(49 - 47.64)2 = 1.85(55 - 47.64)2 = 54.17(58 - 47.64)2 = 107.33

(59 - 47.64)2 = 129.05(60 - 47.64)2 = 152.77(62 - 47.64)2 = 206.21(63 - 47.64)2 = 235.93(63 - 47.64)2 = 235.93

Sample Standard Deviation Variation

= 5,024.55

3. Sum all squared differences.

2083.01 + 1818.17 + 0.13 + 1.85 + 54.17 + 107.33 + 129.05 + 152.77 + 206.21 + 235.93 + 235.93

4. Divide the summation by the number of sample data values minus one.

5. Calculate the square root of the result.

Population Standard Deviation

SampleStandard Deviation

A Note about Standard Deviation

σ = population standard deviationxi = individual data value ( x1, x2, x3, …)

μ = population mean

N = size of population

As n → N, s → σ

Population Standard Deviation

SampleStandard Deviation

A Note about Standard Deviation

σ = population standard deviationxi = individual data value ( x1, x2, x3, …)

μ = population mean

N = size of population

Given the ACT score of every student in your

class, use the population standard deviation formula to find the standard deviation of

ACT scoresin the class.

Population Standard Deviation

SampleStandard Deviation

A Note about Standard Deviation

σ = population standard deviationxi = individual data value ( x1, x2, x3, …)

μ = population mean

N = size of population

Given the ACT scores of every student in your

class, use the sample standard deviation formula to estimate the standard

deviation of the ACT scores of all students at

your school.

• A histogram is a common data distribution chart that is used to show the frequency with which specific values, or values within ranges, occur in a set of data.

• An engineer might use a histogram to show the variation of a dimension that exists among a group of parts that are intended to be identical.

Histogram Distribution

• Large sets of data are often divided into limited number of groups. These groups are called class intervals.

-5 to 5

Class Intervals6 to 16-6 to -16

Histogram Distribution

• The number of data elements in each class interval is shown by the frequency, which occurs along the Y-axis of the graph

Fre

qu

ency

1

3

5

7

-5 to 5 6 to 16-16 to -6

Histogram Distribution

3

ExampleF

req

uen

cy

1

2

4

6 to 10 11 to 151 to 5

1, 7, 15, 4, 8, 8, 5, 12, 10

12,15 1, 4, 5, 7, 8, 8, 10,

Histogram Distribution

• The height of each bar in the chart indicates the number of data elements, or frequency of occurrence, within each range

Histogram Distribution

3

Fre

qu

ency

1

2

4

6 to 10 11 to 151 to 5

12,15 1, 4, 5, 7, 8, 8, 10,

Class Intervals

MINIMUM = 0.745 in.

MAXIMUM = 0.760 in.

Histogram Distribution

0 1 2 3 4 5 6-1-2-3-4-5-6

0

3

-1

3

2

-1

-1

1

2

-3

0

1

0

1

-2

1

2

-4

-1

1

0

-2

0

0

Dot Plot Distribution

0 1 2 3 4 5 6-1-2-3-4-5-6

0

3

-1

3

2

-1

-1

1

2

-3

0

1

0

1

-2

1

2

-4

-1

1

0

-2

0

0

Fre

qu

ency

1

3

5

Dot Plot Distribution

“Is the data distribution normal?”

•Translation: Is the histogram/dot plot bell-shaped?

– Does the greatest frequency of the data values occur at about the mean value?

– Does the curve decrease on both sides away from the mean?

– Is the curve symmetric about the mean?

Normal Distribution Distribution

Fre

qu

ency

Data Elements

0 1 2 3 4 5 6-1-2-3-4-5-6

Bell shaped curve

Normal Distribution Distribution

Fre

qu

ency

Data Elements

0 1 2 3 4 5 6-1-2-3-4-5-6

Mean Value

Normal Distribution Distribution

Does the greatest frequency of the data values occur at about the mean value?

Fre

qu

ency

Data Elements

0 1 2 3 4 5 6-1-2-3-4-5-6

Mean Value

Normal Distribution Distribution

Does the curve decrease on both sides away from the mean?

Fre

qu

ency

Data Elements

0 1 2 3 4 5 6-1-2-3-4-5-6

Mean Value

Normal Distribution Distribution

Is the curve symmetric about the mean?

What if things are not equal?

Histogram Interpretation: Skewed (Non-Normal) Right

• 68% of the observations fall within 1 standard deviation of the mean.

• 95% of the observations fall within 2 standard deviations of the mean.

• 99.7% of the observations fall within 3 standard deviations of the mean.

Normal Distribution Distribution

If the data are normally distributed:

Normal Distribution ExampleData from a sample of a larger population

Data Elements

Normal Distribution Distribution

s +1.77

s -1.77

0.08

+ 1

.77

= 1

.88

0.08

+ -

1.7

7=

-1.

69

68 %

Data Elements

Normal Distribution Distribution

2σ - 3.54

2σ + 3.54

0.08

+ 3

.54

= 3

.62

0.08

+ -

3.54

=

- 3

.46

95 %

Recommended