300
Intro Classes Properties Summary Module 3: Describing Data Jon Starkweather, PhD [email protected] Consultant Research and Statistical Support Introduction to Statistics for the Social Sciences Starkweather Module 3

Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary

Module 3: Describing Data

Jon Starkweather, [email protected]

ConsultantResearch and Statistical Support

Introduction to Statistics for the Social Sciences

Starkweather Module 3

Page 2: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary

The RSS short courses

The Research and Statistical Support (RSS) office at theUniversity of North Texas hosts a number of “Short Courses”. Alist of them is available at:

http://www.unt.edu/rss/Instructional.htm

Starkweather Module 3

Page 3: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary

Outline1 Introduction

Context of example dataDescriptive StatisticsNotationSummation

2 Classes of Descriptive StatisticsCentral TendencyDispersionShapeRelationship

3 Properties of StatisticsSufficiencyUnbiasednessEfficiencyResistance

4 Summary of Module 3

Starkweather Module 3

Page 4: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary

Outline1 Introduction

Context of example dataDescriptive StatisticsNotationSummation

2 Classes of Descriptive StatisticsCentral TendencyDispersionShapeRelationship

3 Properties of StatisticsSufficiencyUnbiasednessEfficiencyResistance

4 Summary of Module 3

Starkweather Module 3

Page 5: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary

Outline1 Introduction

Context of example dataDescriptive StatisticsNotationSummation

2 Classes of Descriptive StatisticsCentral TendencyDispersionShapeRelationship

3 Properties of StatisticsSufficiencyUnbiasednessEfficiencyResistance

4 Summary of Module 3

Starkweather Module 3

Page 6: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary

Outline1 Introduction

Context of example dataDescriptive StatisticsNotationSummation

2 Classes of Descriptive StatisticsCentral TendencyDispersionShapeRelationship

3 Properties of StatisticsSufficiencyUnbiasednessEfficiencyResistance

4 Summary of Module 3Starkweather Module 3

Page 7: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

Fictional Extraction Example

Suppose a man, we’ll call him Bob Prentice, from England;owns an oil company which operates 1000 deep water drillingrigs...

Bob wants to maximize his profits (barrels of oil extracted) whileminimizing his expenditures (operating costs).So Bob gathers data on how many barrels of oil each of his1000 rigs extracted in 1 month because, Bob might want to shutdown rigs which do not extract much oil, but still cause him topay the substantial operating costs.

The resulting data file is available at the following link:http://www.unt.edu/rss/class/Jon/ISSS/Module003/M3_BigOilData.txt

How will Bob describe his data?

Starkweather Module 3

Page 8: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

Fictional Extraction Example

Suppose a man, we’ll call him Bob Prentice, from England;owns an oil company which operates 1000 deep water drillingrigs...Bob wants to maximize his profits (barrels of oil extracted) whileminimizing his expenditures (operating costs).

So Bob gathers data on how many barrels of oil each of his1000 rigs extracted in 1 month because, Bob might want to shutdown rigs which do not extract much oil, but still cause him topay the substantial operating costs.

The resulting data file is available at the following link:http://www.unt.edu/rss/class/Jon/ISSS/Module003/M3_BigOilData.txt

How will Bob describe his data?

Starkweather Module 3

Page 9: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

Fictional Extraction Example

Suppose a man, we’ll call him Bob Prentice, from England;owns an oil company which operates 1000 deep water drillingrigs...Bob wants to maximize his profits (barrels of oil extracted) whileminimizing his expenditures (operating costs).So Bob gathers data on how many barrels of oil each of his1000 rigs extracted in 1 month because, Bob might want to shutdown rigs which do not extract much oil, but still cause him topay the substantial operating costs.

The resulting data file is available at the following link:http://www.unt.edu/rss/class/Jon/ISSS/Module003/M3_BigOilData.txt

How will Bob describe his data?

Starkweather Module 3

Page 10: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

Fictional Extraction Example

Suppose a man, we’ll call him Bob Prentice, from England;owns an oil company which operates 1000 deep water drillingrigs...Bob wants to maximize his profits (barrels of oil extracted) whileminimizing his expenditures (operating costs).So Bob gathers data on how many barrels of oil each of his1000 rigs extracted in 1 month because, Bob might want to shutdown rigs which do not extract much oil, but still cause him topay the substantial operating costs.

The resulting data file is available at the following link:http://www.unt.edu/rss/class/Jon/ISSS/Module003/M3_BigOilData.txt

How will Bob describe his data?

Starkweather Module 3

Page 11: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

Fictional Extraction Example

Suppose a man, we’ll call him Bob Prentice, from England;owns an oil company which operates 1000 deep water drillingrigs...Bob wants to maximize his profits (barrels of oil extracted) whileminimizing his expenditures (operating costs).So Bob gathers data on how many barrels of oil each of his1000 rigs extracted in 1 month because, Bob might want to shutdown rigs which do not extract much oil, but still cause him topay the substantial operating costs.

The resulting data file is available at the following link:http://www.unt.edu/rss/class/Jon/ISSS/Module003/M3_BigOilData.txt

How will Bob describe his data?

Starkweather Module 3

Page 12: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

Describing Data

Bob is interested in describing the extraction and cost of hisrigs. But, he does not want to look at all 1000 rigs’ data(population) and instead decides to draw at random 10 rigs’data (sample) which is available on the next slide (Table 1).

After displaying his data (as covered in Module 2), with avariety of tables and figures (graphs), Bob will likely useDescriptive Statistics to describe his data.Recall the goals of science and how we achieve them fromModule 1:

Observation yields DescriptionExperimentation yields ExplanationModeling yields Prediction

Descriptive Statistics only allow us to describe thedata.

Starkweather Module 3

Page 13: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

Describing Data

Bob is interested in describing the extraction and cost of hisrigs. But, he does not want to look at all 1000 rigs’ data(population) and instead decides to draw at random 10 rigs’data (sample) which is available on the next slide (Table 1).

After displaying his data (as covered in Module 2), with avariety of tables and figures (graphs), Bob will likely useDescriptive Statistics to describe his data.

Recall the goals of science and how we achieve them fromModule 1:

Observation yields DescriptionExperimentation yields ExplanationModeling yields Prediction

Descriptive Statistics only allow us to describe thedata.

Starkweather Module 3

Page 14: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

Describing Data

Bob is interested in describing the extraction and cost of hisrigs. But, he does not want to look at all 1000 rigs’ data(population) and instead decides to draw at random 10 rigs’data (sample) which is available on the next slide (Table 1).

After displaying his data (as covered in Module 2), with avariety of tables and figures (graphs), Bob will likely useDescriptive Statistics to describe his data.Recall the goals of science and how we achieve them fromModule 1:

Observation yields DescriptionExperimentation yields ExplanationModeling yields Prediction

Descriptive Statistics only allow us to describe thedata.

Starkweather Module 3

Page 15: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

Describing Data

Bob is interested in describing the extraction and cost of hisrigs. But, he does not want to look at all 1000 rigs’ data(population) and instead decides to draw at random 10 rigs’data (sample) which is available on the next slide (Table 1).

After displaying his data (as covered in Module 2), with avariety of tables and figures (graphs), Bob will likely useDescriptive Statistics to describe his data.Recall the goals of science and how we achieve them fromModule 1:

Observation yields Description

Experimentation yields ExplanationModeling yields Prediction

Descriptive Statistics only allow us to describe thedata.

Starkweather Module 3

Page 16: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

Describing Data

Bob is interested in describing the extraction and cost of hisrigs. But, he does not want to look at all 1000 rigs’ data(population) and instead decides to draw at random 10 rigs’data (sample) which is available on the next slide (Table 1).

After displaying his data (as covered in Module 2), with avariety of tables and figures (graphs), Bob will likely useDescriptive Statistics to describe his data.Recall the goals of science and how we achieve them fromModule 1:

Observation yields DescriptionExperimentation yields Explanation

Modeling yields Prediction

Descriptive Statistics only allow us to describe thedata.

Starkweather Module 3

Page 17: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

Describing Data

Bob is interested in describing the extraction and cost of hisrigs. But, he does not want to look at all 1000 rigs’ data(population) and instead decides to draw at random 10 rigs’data (sample) which is available on the next slide (Table 1).

After displaying his data (as covered in Module 2), with avariety of tables and figures (graphs), Bob will likely useDescriptive Statistics to describe his data.Recall the goals of science and how we achieve them fromModule 1:

Observation yields DescriptionExperimentation yields ExplanationModeling yields Prediction

Descriptive Statistics only allow us to describe thedata.

Starkweather Module 3

Page 18: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

Describing Data

Bob is interested in describing the extraction and cost of hisrigs. But, he does not want to look at all 1000 rigs’ data(population) and instead decides to draw at random 10 rigs’data (sample) which is available on the next slide (Table 1).

After displaying his data (as covered in Module 2), with avariety of tables and figures (graphs), Bob will likely useDescriptive Statistics to describe his data.Recall the goals of science and how we achieve them fromModule 1:

Observation yields DescriptionExperimentation yields ExplanationModeling yields Prediction

Descriptive Statistics only allow us to describe thedata.

Starkweather Module 3

Page 19: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

Data: Extraction & Cost Sample (n = 10)

Table 1: Raw Data for One Month

rig barrels costs065 166 570142 185 560198 159 520277 207 580408 194 530533 191 560621 176 510788 216 550796 199 560915 228 560

“rig” = identification number; “barrels” = 1,000 barrels extracted;“costs” = 1,000 USD

Starkweather Module 3

Page 20: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

A note about Notation

Generally, variables are given capital letters, commonly Xand Y.

Subscripts are used to identify each score on a variable.A set of scores [35,22,15,96,84,77] on Variable XSo that scores can be ‘called’ X1 = 35; X2 = 22 ... X6 = 77

Multiple subscripts can be used to identify a score locatedat a particular row (i) and column (j).

So, we can use the convention Xij to identify a particularscore, such as X23 = 51Which indicates that score 51 is located in the 2nd row andin the 3rd column of a table of data.

Starkweather Module 3

Page 21: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

A note about Notation

Generally, variables are given capital letters, commonly Xand Y.Subscripts are used to identify each score on a variable.

A set of scores [35,22,15,96,84,77] on Variable XSo that scores can be ‘called’ X1 = 35; X2 = 22 ... X6 = 77

Multiple subscripts can be used to identify a score locatedat a particular row (i) and column (j).

So, we can use the convention Xij to identify a particularscore, such as X23 = 51Which indicates that score 51 is located in the 2nd row andin the 3rd column of a table of data.

Starkweather Module 3

Page 22: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

A note about Notation

Generally, variables are given capital letters, commonly Xand Y.Subscripts are used to identify each score on a variable.

A set of scores [35,22,15,96,84,77] on Variable X

So that scores can be ‘called’ X1 = 35; X2 = 22 ... X6 = 77Multiple subscripts can be used to identify a score locatedat a particular row (i) and column (j).

So, we can use the convention Xij to identify a particularscore, such as X23 = 51Which indicates that score 51 is located in the 2nd row andin the 3rd column of a table of data.

Starkweather Module 3

Page 23: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

A note about Notation

Generally, variables are given capital letters, commonly Xand Y.Subscripts are used to identify each score on a variable.

A set of scores [35,22,15,96,84,77] on Variable XSo that scores can be ‘called’ X1 = 35; X2 = 22 ... X6 = 77

Multiple subscripts can be used to identify a score locatedat a particular row (i) and column (j).

So, we can use the convention Xij to identify a particularscore, such as X23 = 51Which indicates that score 51 is located in the 2nd row andin the 3rd column of a table of data.

Starkweather Module 3

Page 24: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

A note about Notation

Generally, variables are given capital letters, commonly Xand Y.Subscripts are used to identify each score on a variable.

A set of scores [35,22,15,96,84,77] on Variable XSo that scores can be ‘called’ X1 = 35; X2 = 22 ... X6 = 77

Multiple subscripts can be used to identify a score locatedat a particular row (i) and column (j).

So, we can use the convention Xij to identify a particularscore, such as X23 = 51Which indicates that score 51 is located in the 2nd row andin the 3rd column of a table of data.

Starkweather Module 3

Page 25: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

A note about Notation

Generally, variables are given capital letters, commonly Xand Y.Subscripts are used to identify each score on a variable.

A set of scores [35,22,15,96,84,77] on Variable XSo that scores can be ‘called’ X1 = 35; X2 = 22 ... X6 = 77

Multiple subscripts can be used to identify a score locatedat a particular row (i) and column (j).

So, we can use the convention Xij to identify a particularscore, such as X23 = 51

Which indicates that score 51 is located in the 2nd row andin the 3rd column of a table of data.

Starkweather Module 3

Page 26: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

A note about Notation

Generally, variables are given capital letters, commonly Xand Y.Subscripts are used to identify each score on a variable.

A set of scores [35,22,15,96,84,77] on Variable XSo that scores can be ‘called’ X1 = 35; X2 = 22 ... X6 = 77

Multiple subscripts can be used to identify a score locatedat a particular row (i) and column (j).

So, we can use the convention Xij to identify a particularscore, such as X23 = 51Which indicates that score 51 is located in the 2nd row andin the 3rd column of a table of data.

Starkweather Module 3

Page 27: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

Use and rules of Sigma as Summation

Please note that:

The∑

X is read as the sum of the values of XSuch as X1 + X2 + X3+ ... XnWhere n is the number of scores of variable X.

The∑

X 2 is read as the sum of the squared values of X.Such as X 2

1 + X 22 + X 2

3+ ... X 2n

Which is very different from the following.The (

∑X )2 is read as the square of the sum of the values

of XSuch as (X1 + X2 + X3 + ...Xn)

2

Starkweather Module 3

Page 28: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

Use and rules of Sigma as Summation

Please note that:The

∑X is read as the sum of the values of X

Such as X1 + X2 + X3+ ... XnWhere n is the number of scores of variable X.

The∑

X 2 is read as the sum of the squared values of X.Such as X 2

1 + X 22 + X 2

3+ ... X 2n

Which is very different from the following.The (

∑X )2 is read as the square of the sum of the values

of XSuch as (X1 + X2 + X3 + ...Xn)

2

Starkweather Module 3

Page 29: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

Use and rules of Sigma as Summation

Please note that:The

∑X is read as the sum of the values of X

Such as X1 + X2 + X3+ ... XnWhere n is the number of scores of variable X.

The∑

X 2 is read as the sum of the squared values of X.Such as X 2

1 + X 22 + X 2

3+ ... X 2n

Which is very different from the following.The (

∑X )2 is read as the square of the sum of the values

of XSuch as (X1 + X2 + X3 + ...Xn)

2

Starkweather Module 3

Page 30: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

Use and rules of Sigma as Summation

Please note that:The

∑X is read as the sum of the values of X

Such as X1 + X2 + X3+ ... XnWhere n is the number of scores of variable X.

The∑

X 2 is read as the sum of the squared values of X.

Such as X 21 + X 2

2 + X 23+ ... X 2

n

Which is very different from the following.The (

∑X )2 is read as the square of the sum of the values

of XSuch as (X1 + X2 + X3 + ...Xn)

2

Starkweather Module 3

Page 31: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

Use and rules of Sigma as Summation

Please note that:The

∑X is read as the sum of the values of X

Such as X1 + X2 + X3+ ... XnWhere n is the number of scores of variable X.

The∑

X 2 is read as the sum of the squared values of X.Such as X 2

1 + X 22 + X 2

3+ ... X 2n

Which is very different from the following.The (

∑X )2 is read as the square of the sum of the values

of XSuch as (X1 + X2 + X3 + ...Xn)

2

Starkweather Module 3

Page 32: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

Use and rules of Sigma as Summation

Please note that:The

∑X is read as the sum of the values of X

Such as X1 + X2 + X3+ ... XnWhere n is the number of scores of variable X.

The∑

X 2 is read as the sum of the squared values of X.Such as X 2

1 + X 22 + X 2

3+ ... X 2n

Which is very different from the following.

The (∑

X )2 is read as the square of the sum of the valuesof X

Such as (X1 + X2 + X3 + ...Xn)2

Starkweather Module 3

Page 33: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

Use and rules of Sigma as Summation

Please note that:The

∑X is read as the sum of the values of X

Such as X1 + X2 + X3+ ... XnWhere n is the number of scores of variable X.

The∑

X 2 is read as the sum of the squared values of X.Such as X 2

1 + X 22 + X 2

3+ ... X 2n

Which is very different from the following.The (

∑X )2 is read as the square of the sum of the values

of X

Such as (X1 + X2 + X3 + ...Xn)2

Starkweather Module 3

Page 34: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Context Descriptives Notation Sum

Use and rules of Sigma as Summation

Please note that:The

∑X is read as the sum of the values of X

Such as X1 + X2 + X3+ ... XnWhere n is the number of scores of variable X.

The∑

X 2 is read as the sum of the squared values of X.Such as X 2

1 + X 22 + X 2

3+ ... X 2n

Which is very different from the following.The (

∑X )2 is read as the square of the sum of the values

of XSuch as (X1 + X2 + X3 + ...Xn)

2

Starkweather Module 3

Page 35: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The 4 Classes of Descriptive Statistics

There are many ways to describe data because, there aremany types of descriptive statistics and many individualdescriptive statistics.

There are four classes of descriptive statistics.Central TendencyDispersionShapeRelationship

Each class tells Bob something about how his rigs areproducing.Each class of Descriptive Statistics tells us somethingabout how scores are distributed.

Starkweather Module 3

Page 36: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The 4 Classes of Descriptive Statistics

There are many ways to describe data because, there aremany types of descriptive statistics and many individualdescriptive statistics.

There are four classes of descriptive statistics.

Central TendencyDispersionShapeRelationship

Each class tells Bob something about how his rigs areproducing.Each class of Descriptive Statistics tells us somethingabout how scores are distributed.

Starkweather Module 3

Page 37: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The 4 Classes of Descriptive Statistics

There are many ways to describe data because, there aremany types of descriptive statistics and many individualdescriptive statistics.

There are four classes of descriptive statistics.Central Tendency

DispersionShapeRelationship

Each class tells Bob something about how his rigs areproducing.Each class of Descriptive Statistics tells us somethingabout how scores are distributed.

Starkweather Module 3

Page 38: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The 4 Classes of Descriptive Statistics

There are many ways to describe data because, there aremany types of descriptive statistics and many individualdescriptive statistics.

There are four classes of descriptive statistics.Central TendencyDispersion

ShapeRelationship

Each class tells Bob something about how his rigs areproducing.Each class of Descriptive Statistics tells us somethingabout how scores are distributed.

Starkweather Module 3

Page 39: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The 4 Classes of Descriptive Statistics

There are many ways to describe data because, there aremany types of descriptive statistics and many individualdescriptive statistics.

There are four classes of descriptive statistics.Central TendencyDispersionShape

Relationship

Each class tells Bob something about how his rigs areproducing.Each class of Descriptive Statistics tells us somethingabout how scores are distributed.

Starkweather Module 3

Page 40: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The 4 Classes of Descriptive Statistics

There are many ways to describe data because, there aremany types of descriptive statistics and many individualdescriptive statistics.

There are four classes of descriptive statistics.Central TendencyDispersionShapeRelationship

Each class tells Bob something about how his rigs areproducing.Each class of Descriptive Statistics tells us somethingabout how scores are distributed.

Starkweather Module 3

Page 41: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The 4 Classes of Descriptive Statistics

There are many ways to describe data because, there aremany types of descriptive statistics and many individualdescriptive statistics.

There are four classes of descriptive statistics.Central TendencyDispersionShapeRelationship

Each class tells Bob something about how his rigs areproducing.

Each class of Descriptive Statistics tells us somethingabout how scores are distributed.

Starkweather Module 3

Page 42: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The 4 Classes of Descriptive Statistics

There are many ways to describe data because, there aremany types of descriptive statistics and many individualdescriptive statistics.

There are four classes of descriptive statistics.Central TendencyDispersionShapeRelationship

Each class tells Bob something about how his rigs areproducing.Each class of Descriptive Statistics tells us somethingabout how scores are distributed.

Starkweather Module 3

Page 43: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Central Tendency

There are 3 primary measures of central tendency; eachhas pros and cons, but all attempt to describe the centerpoint of a distribution of scores.

ModeMedianMean

Measures of central tendency offer us a “point” estimate, orsingle number, which we can use as a summary of adistribution of scores.

One number which represents or characterizes the entiredistribution (as best as one number can).

Keep in mind, the center point of scores in a distributionmay not be in the middle of the scale of those scores(more on this later).

Starkweather Module 3

Page 44: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Central Tendency

There are 3 primary measures of central tendency; eachhas pros and cons, but all attempt to describe the centerpoint of a distribution of scores.

Mode

MedianMean

Measures of central tendency offer us a “point” estimate, orsingle number, which we can use as a summary of adistribution of scores.

One number which represents or characterizes the entiredistribution (as best as one number can).

Keep in mind, the center point of scores in a distributionmay not be in the middle of the scale of those scores(more on this later).

Starkweather Module 3

Page 45: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Central Tendency

There are 3 primary measures of central tendency; eachhas pros and cons, but all attempt to describe the centerpoint of a distribution of scores.

ModeMedian

MeanMeasures of central tendency offer us a “point” estimate, orsingle number, which we can use as a summary of adistribution of scores.

One number which represents or characterizes the entiredistribution (as best as one number can).

Keep in mind, the center point of scores in a distributionmay not be in the middle of the scale of those scores(more on this later).

Starkweather Module 3

Page 46: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Central Tendency

There are 3 primary measures of central tendency; eachhas pros and cons, but all attempt to describe the centerpoint of a distribution of scores.

ModeMedianMean

Measures of central tendency offer us a “point” estimate, orsingle number, which we can use as a summary of adistribution of scores.

One number which represents or characterizes the entiredistribution (as best as one number can).

Keep in mind, the center point of scores in a distributionmay not be in the middle of the scale of those scores(more on this later).

Starkweather Module 3

Page 47: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Central Tendency

There are 3 primary measures of central tendency; eachhas pros and cons, but all attempt to describe the centerpoint of a distribution of scores.

ModeMedianMean

Measures of central tendency offer us a “point” estimate, orsingle number, which we can use as a summary of adistribution of scores.

One number which represents or characterizes the entiredistribution (as best as one number can).

Keep in mind, the center point of scores in a distributionmay not be in the middle of the scale of those scores(more on this later).

Starkweather Module 3

Page 48: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Central Tendency

There are 3 primary measures of central tendency; eachhas pros and cons, but all attempt to describe the centerpoint of a distribution of scores.

ModeMedianMean

Measures of central tendency offer us a “point” estimate, orsingle number, which we can use as a summary of adistribution of scores.

One number which represents or characterizes the entiredistribution (as best as one number can).

Keep in mind, the center point of scores in a distributionmay not be in the middle of the scale of those scores(more on this later).

Starkweather Module 3

Page 49: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Central Tendency

There are 3 primary measures of central tendency; eachhas pros and cons, but all attempt to describe the centerpoint of a distribution of scores.

ModeMedianMean

Measures of central tendency offer us a “point” estimate, orsingle number, which we can use as a summary of adistribution of scores.

One number which represents or characterizes the entiredistribution (as best as one number can).

Keep in mind, the center point of scores in a distributionmay not be in the middle of the scale of those scores(more on this later).

Starkweather Module 3

Page 50: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Mode: symbol = Mo

The mode is the most frequently occurring score in adistribution.

Commonly used with categorical variables.Pro: Easy to compute: simply observe and report the mostfrequent score(s).Pro: Not affected by outliersCon: Usually only reflects one actual score1.Con: Different samples almost always produce differentmodes for the same variable.

1Multi-modal distributions have multiple scores which occur mostfrequently; meaning multiple scores occur the same (most frequent)number of times.

Starkweather Module 3

Page 51: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Mode: symbol = Mo

The mode is the most frequently occurring score in adistribution.

Commonly used with categorical variables.

Pro: Easy to compute: simply observe and report the mostfrequent score(s).Pro: Not affected by outliersCon: Usually only reflects one actual score1.Con: Different samples almost always produce differentmodes for the same variable.

1Multi-modal distributions have multiple scores which occur mostfrequently; meaning multiple scores occur the same (most frequent)number of times.

Starkweather Module 3

Page 52: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Mode: symbol = Mo

The mode is the most frequently occurring score in adistribution.

Commonly used with categorical variables.Pro: Easy to compute: simply observe and report the mostfrequent score(s).

Pro: Not affected by outliersCon: Usually only reflects one actual score1.Con: Different samples almost always produce differentmodes for the same variable.

1Multi-modal distributions have multiple scores which occur mostfrequently; meaning multiple scores occur the same (most frequent)number of times.

Starkweather Module 3

Page 53: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Mode: symbol = Mo

The mode is the most frequently occurring score in adistribution.

Commonly used with categorical variables.Pro: Easy to compute: simply observe and report the mostfrequent score(s).Pro: Not affected by outliers

Con: Usually only reflects one actual score1.Con: Different samples almost always produce differentmodes for the same variable.

1Multi-modal distributions have multiple scores which occur mostfrequently; meaning multiple scores occur the same (most frequent)number of times.

Starkweather Module 3

Page 54: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Mode: symbol = Mo

The mode is the most frequently occurring score in adistribution.

Commonly used with categorical variables.Pro: Easy to compute: simply observe and report the mostfrequent score(s).Pro: Not affected by outliersCon: Usually only reflects one actual score1.

Con: Different samples almost always produce differentmodes for the same variable.

1Multi-modal distributions have multiple scores which occur mostfrequently; meaning multiple scores occur the same (most frequent)number of times.

Starkweather Module 3

Page 55: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Mode: symbol = Mo

The mode is the most frequently occurring score in adistribution.

Commonly used with categorical variables.Pro: Easy to compute: simply observe and report the mostfrequent score(s).Pro: Not affected by outliersCon: Usually only reflects one actual score1.Con: Different samples almost always produce differentmodes for the same variable.

1Multi-modal distributions have multiple scores which occur mostfrequently; meaning multiple scores occur the same (most frequent)number of times.

Starkweather Module 3

Page 56: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Oil Sample (n = 10) Example showing the Mode

Starkweather Module 3

Page 57: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Median: symbol = Mdn

The median is the “middle” score of a distribution.

More precisely, it is the point that lies in the middle of adistribution.

Sometimes referred to as the 50th percentile because,there are as many scores above it as there are below it.

Pro: Not affected by outliers (extreme scores).Con: Ignores all but the middle of a distribution.

To calculate the median, first the scores must be arranged insequential order (e.g., smallest to largest).

Then;For an odd number of scores, the middle score is theMedian.For an even number of scores, the average of the twomiddle scores is the median.

Starkweather Module 3

Page 58: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Median: symbol = Mdn

The median is the “middle” score of a distribution.More precisely, it is the point that lies in the middle of adistribution.

Sometimes referred to as the 50th percentile because,there are as many scores above it as there are below it.

Pro: Not affected by outliers (extreme scores).Con: Ignores all but the middle of a distribution.

To calculate the median, first the scores must be arranged insequential order (e.g., smallest to largest).

Then;For an odd number of scores, the middle score is theMedian.For an even number of scores, the average of the twomiddle scores is the median.

Starkweather Module 3

Page 59: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Median: symbol = Mdn

The median is the “middle” score of a distribution.More precisely, it is the point that lies in the middle of adistribution.

Sometimes referred to as the 50th percentile because,there are as many scores above it as there are below it.

Pro: Not affected by outliers (extreme scores).Con: Ignores all but the middle of a distribution.

To calculate the median, first the scores must be arranged insequential order (e.g., smallest to largest).

Then;For an odd number of scores, the middle score is theMedian.For an even number of scores, the average of the twomiddle scores is the median.

Starkweather Module 3

Page 60: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Median: symbol = Mdn

The median is the “middle” score of a distribution.More precisely, it is the point that lies in the middle of adistribution.

Sometimes referred to as the 50th percentile because,there are as many scores above it as there are below it.

Pro: Not affected by outliers (extreme scores).

Con: Ignores all but the middle of a distribution.To calculate the median, first the scores must be arranged insequential order (e.g., smallest to largest).

Then;For an odd number of scores, the middle score is theMedian.For an even number of scores, the average of the twomiddle scores is the median.

Starkweather Module 3

Page 61: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Median: symbol = Mdn

The median is the “middle” score of a distribution.More precisely, it is the point that lies in the middle of adistribution.

Sometimes referred to as the 50th percentile because,there are as many scores above it as there are below it.

Pro: Not affected by outliers (extreme scores).Con: Ignores all but the middle of a distribution.

To calculate the median, first the scores must be arranged insequential order (e.g., smallest to largest).

Then;For an odd number of scores, the middle score is theMedian.For an even number of scores, the average of the twomiddle scores is the median.

Starkweather Module 3

Page 62: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Median: symbol = Mdn

The median is the “middle” score of a distribution.More precisely, it is the point that lies in the middle of adistribution.

Sometimes referred to as the 50th percentile because,there are as many scores above it as there are below it.

Pro: Not affected by outliers (extreme scores).Con: Ignores all but the middle of a distribution.

To calculate the median, first the scores must be arranged insequential order (e.g., smallest to largest).

Then;For an odd number of scores, the middle score is theMedian.For an even number of scores, the average of the twomiddle scores is the median.

Starkweather Module 3

Page 63: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Median: symbol = Mdn

The median is the “middle” score of a distribution.More precisely, it is the point that lies in the middle of adistribution.

Sometimes referred to as the 50th percentile because,there are as many scores above it as there are below it.

Pro: Not affected by outliers (extreme scores).Con: Ignores all but the middle of a distribution.

To calculate the median, first the scores must be arranged insequential order (e.g., smallest to largest).

Then;

For an odd number of scores, the middle score is theMedian.For an even number of scores, the average of the twomiddle scores is the median.

Starkweather Module 3

Page 64: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Median: symbol = Mdn

The median is the “middle” score of a distribution.More precisely, it is the point that lies in the middle of adistribution.

Sometimes referred to as the 50th percentile because,there are as many scores above it as there are below it.

Pro: Not affected by outliers (extreme scores).Con: Ignores all but the middle of a distribution.

To calculate the median, first the scores must be arranged insequential order (e.g., smallest to largest).

Then;For an odd number of scores, the middle score is theMedian.

For an even number of scores, the average of the twomiddle scores is the median.

Starkweather Module 3

Page 65: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Median: symbol = Mdn

The median is the “middle” score of a distribution.More precisely, it is the point that lies in the middle of adistribution.

Sometimes referred to as the 50th percentile because,there are as many scores above it as there are below it.

Pro: Not affected by outliers (extreme scores).Con: Ignores all but the middle of a distribution.

To calculate the median, first the scores must be arranged insequential order (e.g., smallest to largest).

Then;For an odd number of scores, the middle score is theMedian.For an even number of scores, the average of the twomiddle scores is the median.

Starkweather Module 3

Page 66: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Oil Sample (n = 10) Example showing the Median

Starkweather Module 3

Page 67: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Mean: sample symbol = X , population symbol = µ

The mean is the arithmetic average of the scores of adistribution.

Mean is the most popular measure of central tendency.Pro: Generally the best measure of central tendencybecause, it utilizes all the scores.Con: Very sensitive to outliers (extreme scores).

To calculate the sample mean, simply sum all of the scoresand divide by the number of scores:

X =∑

Xn

Starkweather Module 3

Page 68: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Mean: sample symbol = X , population symbol = µ

The mean is the arithmetic average of the scores of adistribution.

Mean is the most popular measure of central tendency.

Pro: Generally the best measure of central tendencybecause, it utilizes all the scores.Con: Very sensitive to outliers (extreme scores).

To calculate the sample mean, simply sum all of the scoresand divide by the number of scores:

X =∑

Xn

Starkweather Module 3

Page 69: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Mean: sample symbol = X , population symbol = µ

The mean is the arithmetic average of the scores of adistribution.

Mean is the most popular measure of central tendency.Pro: Generally the best measure of central tendencybecause, it utilizes all the scores.

Con: Very sensitive to outliers (extreme scores).To calculate the sample mean, simply sum all of the scoresand divide by the number of scores:

X =∑

Xn

Starkweather Module 3

Page 70: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Mean: sample symbol = X , population symbol = µ

The mean is the arithmetic average of the scores of adistribution.

Mean is the most popular measure of central tendency.Pro: Generally the best measure of central tendencybecause, it utilizes all the scores.Con: Very sensitive to outliers (extreme scores).

To calculate the sample mean, simply sum all of the scoresand divide by the number of scores:

X =∑

Xn

Starkweather Module 3

Page 71: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Mean: sample symbol = X , population symbol = µ

The mean is the arithmetic average of the scores of adistribution.

Mean is the most popular measure of central tendency.Pro: Generally the best measure of central tendencybecause, it utilizes all the scores.Con: Very sensitive to outliers (extreme scores).

To calculate the sample mean, simply sum all of the scoresand divide by the number of scores:

X =∑

Xn

Starkweather Module 3

Page 72: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Mean: sample symbol = X , population symbol = µ

The mean is the arithmetic average of the scores of adistribution.

Mean is the most popular measure of central tendency.Pro: Generally the best measure of central tendencybecause, it utilizes all the scores.Con: Very sensitive to outliers (extreme scores).

To calculate the sample mean, simply sum all of the scoresand divide by the number of scores:

X =∑

Xn

Starkweather Module 3

Page 73: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Oil Sample (n = 10) Example showing the Mean

General formula for Mean:

X =∑

Xn or2: Y =

∑Y

n

Barrels: X = 159+166+176+185+191+194+199+207+216+22810

Barrels: X = 192110 = 192.1

Costs: Y = 510+520+530+550+560+560+560+560+570+58010

Costs: Y = 550010 = 550

2Recall, the capital letter we chose to represent a variable isarbitrary, X and Y are commonly used examples.

Starkweather Module 3

Page 74: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Oil Sample (n = 10) Example showing the Mean

General formula for Mean:

X =∑

Xn or2: Y =

∑Y

n

Barrels: X = 159+166+176+185+191+194+199+207+216+22810

Barrels: X = 192110 = 192.1

Costs: Y = 510+520+530+550+560+560+560+560+570+58010

Costs: Y = 550010 = 550

2Recall, the capital letter we chose to represent a variable isarbitrary, X and Y are commonly used examples.

Starkweather Module 3

Page 75: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Oil Sample (n = 10) Example showing the Mean

General formula for Mean:

X =∑

Xn or2: Y =

∑Y

n

Barrels: X = 159+166+176+185+191+194+199+207+216+22810

Barrels: X = 192110 = 192.1

Costs: Y = 510+520+530+550+560+560+560+560+570+58010

Costs: Y = 550010 = 550

2Recall, the capital letter we chose to represent a variable isarbitrary, X and Y are commonly used examples.

Starkweather Module 3

Page 76: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Oil Sample (n = 10) Example showing the Mean

General formula for Mean:

X =∑

Xn or2: Y =

∑Y

n

Barrels: X = 159+166+176+185+191+194+199+207+216+22810

Barrels: X = 192110 = 192.1

Costs: Y = 510+520+530+550+560+560+560+560+570+58010

Costs: Y = 550010 = 550

2Recall, the capital letter we chose to represent a variable isarbitrary, X and Y are commonly used examples.

Starkweather Module 3

Page 77: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Oil Sample (n = 10) Example showing the Mean

General formula for Mean:

X =∑

Xn or2: Y =

∑Y

n

Barrels: X = 159+166+176+185+191+194+199+207+216+22810

Barrels: X = 192110 = 192.1

Costs: Y = 510+520+530+550+560+560+560+560+570+58010

Costs: Y = 550010 = 550

2Recall, the capital letter we chose to represent a variable isarbitrary, X and Y are commonly used examples.

Starkweather Module 3

Page 78: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Oil Sample (n = 10) Example showing the Mean

General formula for Mean:

X =∑

Xn or2: Y =

∑Y

n

Barrels: X = 159+166+176+185+191+194+199+207+216+22810

Barrels: X = 192110 = 192.1

Costs: Y = 510+520+530+550+560+560+560+560+570+58010

Costs: Y = 550010 = 550

2Recall, the capital letter we chose to represent a variable isarbitrary, X and Y are commonly used examples.

Starkweather Module 3

Page 79: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Trimmed Mean & M-estimators

Because mean is very sensitive to outliers, alternatives havebeen proposed which attempt to correct for this problem.

Trimmed mean simply refers to a mean calculated after“trimming” a certain percentage of extreme scores.

The median is an extreme example of a trimmed mean; themedian trims all but the middle score or middle two scores.Common examples are 10% and 20% trimmed means;where the 10 or 20% of the most extreme scores (high &low) are trimmed.

M-estimators are weighted means; meaning scores nearthe middle are given more weight and scores at theextremes are given less weight.

Starkweather Module 3

Page 80: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Trimmed Mean & M-estimators

Because mean is very sensitive to outliers, alternatives havebeen proposed which attempt to correct for this problem.

Trimmed mean simply refers to a mean calculated after“trimming” a certain percentage of extreme scores.

The median is an extreme example of a trimmed mean; themedian trims all but the middle score or middle two scores.Common examples are 10% and 20% trimmed means;where the 10 or 20% of the most extreme scores (high &low) are trimmed.

M-estimators are weighted means; meaning scores nearthe middle are given more weight and scores at theextremes are given less weight.

Starkweather Module 3

Page 81: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Trimmed Mean & M-estimators

Because mean is very sensitive to outliers, alternatives havebeen proposed which attempt to correct for this problem.

Trimmed mean simply refers to a mean calculated after“trimming” a certain percentage of extreme scores.

The median is an extreme example of a trimmed mean; themedian trims all but the middle score or middle two scores.

Common examples are 10% and 20% trimmed means;where the 10 or 20% of the most extreme scores (high &low) are trimmed.

M-estimators are weighted means; meaning scores nearthe middle are given more weight and scores at theextremes are given less weight.

Starkweather Module 3

Page 82: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Trimmed Mean & M-estimators

Because mean is very sensitive to outliers, alternatives havebeen proposed which attempt to correct for this problem.

Trimmed mean simply refers to a mean calculated after“trimming” a certain percentage of extreme scores.

The median is an extreme example of a trimmed mean; themedian trims all but the middle score or middle two scores.Common examples are 10% and 20% trimmed means;where the 10 or 20% of the most extreme scores (high &low) are trimmed.

M-estimators are weighted means; meaning scores nearthe middle are given more weight and scores at theextremes are given less weight.

Starkweather Module 3

Page 83: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Trimmed Mean & M-estimators

Because mean is very sensitive to outliers, alternatives havebeen proposed which attempt to correct for this problem.

Trimmed mean simply refers to a mean calculated after“trimming” a certain percentage of extreme scores.

The median is an extreme example of a trimmed mean; themedian trims all but the middle score or middle two scores.Common examples are 10% and 20% trimmed means;where the 10 or 20% of the most extreme scores (high &low) are trimmed.

M-estimators are weighted means; meaning scores nearthe middle are given more weight and scores at theextremes are given less weight.

Starkweather Module 3

Page 84: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Dispersion

Measures of dispersion offer us an idea of how spread out thescores are, or how wide is the distribution of scores.

There are 5 primary measures of dispersion; 3 of which willbe used repeatedly during the rest of this course.

RangeSums of SquaresVarianceStandard DeviationCoefficient of Variation

All measures of dispersion must not be zero.If a measure of dispersion is zero, then you do not have avariable, you have a constant.If our scores are: (5, 5, 5, 5, 5) then dispersion is zero andthis is a constant.

Starkweather Module 3

Page 85: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Dispersion

Measures of dispersion offer us an idea of how spread out thescores are, or how wide is the distribution of scores.

There are 5 primary measures of dispersion; 3 of which willbe used repeatedly during the rest of this course.

RangeSums of SquaresVarianceStandard DeviationCoefficient of Variation

All measures of dispersion must not be zero.If a measure of dispersion is zero, then you do not have avariable, you have a constant.If our scores are: (5, 5, 5, 5, 5) then dispersion is zero andthis is a constant.

Starkweather Module 3

Page 86: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Dispersion

Measures of dispersion offer us an idea of how spread out thescores are, or how wide is the distribution of scores.

There are 5 primary measures of dispersion; 3 of which willbe used repeatedly during the rest of this course.

Range

Sums of SquaresVarianceStandard DeviationCoefficient of Variation

All measures of dispersion must not be zero.If a measure of dispersion is zero, then you do not have avariable, you have a constant.If our scores are: (5, 5, 5, 5, 5) then dispersion is zero andthis is a constant.

Starkweather Module 3

Page 87: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Dispersion

Measures of dispersion offer us an idea of how spread out thescores are, or how wide is the distribution of scores.

There are 5 primary measures of dispersion; 3 of which willbe used repeatedly during the rest of this course.

RangeSums of Squares

VarianceStandard DeviationCoefficient of Variation

All measures of dispersion must not be zero.If a measure of dispersion is zero, then you do not have avariable, you have a constant.If our scores are: (5, 5, 5, 5, 5) then dispersion is zero andthis is a constant.

Starkweather Module 3

Page 88: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Dispersion

Measures of dispersion offer us an idea of how spread out thescores are, or how wide is the distribution of scores.

There are 5 primary measures of dispersion; 3 of which willbe used repeatedly during the rest of this course.

RangeSums of SquaresVariance

Standard DeviationCoefficient of Variation

All measures of dispersion must not be zero.If a measure of dispersion is zero, then you do not have avariable, you have a constant.If our scores are: (5, 5, 5, 5, 5) then dispersion is zero andthis is a constant.

Starkweather Module 3

Page 89: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Dispersion

Measures of dispersion offer us an idea of how spread out thescores are, or how wide is the distribution of scores.

There are 5 primary measures of dispersion; 3 of which willbe used repeatedly during the rest of this course.

RangeSums of SquaresVarianceStandard Deviation

Coefficient of VariationAll measures of dispersion must not be zero.

If a measure of dispersion is zero, then you do not have avariable, you have a constant.If our scores are: (5, 5, 5, 5, 5) then dispersion is zero andthis is a constant.

Starkweather Module 3

Page 90: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Dispersion

Measures of dispersion offer us an idea of how spread out thescores are, or how wide is the distribution of scores.

There are 5 primary measures of dispersion; 3 of which willbe used repeatedly during the rest of this course.

RangeSums of SquaresVarianceStandard DeviationCoefficient of Variation

All measures of dispersion must not be zero.If a measure of dispersion is zero, then you do not have avariable, you have a constant.If our scores are: (5, 5, 5, 5, 5) then dispersion is zero andthis is a constant.

Starkweather Module 3

Page 91: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Dispersion

Measures of dispersion offer us an idea of how spread out thescores are, or how wide is the distribution of scores.

There are 5 primary measures of dispersion; 3 of which willbe used repeatedly during the rest of this course.

RangeSums of SquaresVarianceStandard DeviationCoefficient of Variation

All measures of dispersion must not be zero.

If a measure of dispersion is zero, then you do not have avariable, you have a constant.If our scores are: (5, 5, 5, 5, 5) then dispersion is zero andthis is a constant.

Starkweather Module 3

Page 92: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Dispersion

Measures of dispersion offer us an idea of how spread out thescores are, or how wide is the distribution of scores.

There are 5 primary measures of dispersion; 3 of which willbe used repeatedly during the rest of this course.

RangeSums of SquaresVarianceStandard DeviationCoefficient of Variation

All measures of dispersion must not be zero.If a measure of dispersion is zero, then you do not have avariable, you have a constant.

If our scores are: (5, 5, 5, 5, 5) then dispersion is zero andthis is a constant.

Starkweather Module 3

Page 93: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Dispersion

Measures of dispersion offer us an idea of how spread out thescores are, or how wide is the distribution of scores.

There are 5 primary measures of dispersion; 3 of which willbe used repeatedly during the rest of this course.

RangeSums of SquaresVarianceStandard DeviationCoefficient of Variation

All measures of dispersion must not be zero.If a measure of dispersion is zero, then you do not have avariable, you have a constant.If our scores are: (5, 5, 5, 5, 5) then dispersion is zero andthis is a constant.

Starkweather Module 3

Page 94: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Range

The range is simply the maximum score, minus the minimumscore.

Examples from our oil data:Barrels: 228− 159 = 69Costs: 580− 510 = 70

Disadvantages:It is calculated from only 2 scores.Those two values are the most extreme in the distribution(obviously sensitive to outliers).The range can change dramatically from sample to sample(of the same variable).The range is not terribly informative.

Starkweather Module 3

Page 95: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Range

The range is simply the maximum score, minus the minimumscore. Examples from our oil data:

Barrels: 228− 159 = 69Costs: 580− 510 = 70

Disadvantages:It is calculated from only 2 scores.Those two values are the most extreme in the distribution(obviously sensitive to outliers).The range can change dramatically from sample to sample(of the same variable).The range is not terribly informative.

Starkweather Module 3

Page 96: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Range

The range is simply the maximum score, minus the minimumscore. Examples from our oil data:

Barrels: 228− 159 = 69

Costs: 580− 510 = 70Disadvantages:

It is calculated from only 2 scores.Those two values are the most extreme in the distribution(obviously sensitive to outliers).The range can change dramatically from sample to sample(of the same variable).The range is not terribly informative.

Starkweather Module 3

Page 97: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Range

The range is simply the maximum score, minus the minimumscore. Examples from our oil data:

Barrels: 228− 159 = 69Costs: 580− 510 = 70

Disadvantages:It is calculated from only 2 scores.Those two values are the most extreme in the distribution(obviously sensitive to outliers).The range can change dramatically from sample to sample(of the same variable).The range is not terribly informative.

Starkweather Module 3

Page 98: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Range

The range is simply the maximum score, minus the minimumscore. Examples from our oil data:

Barrels: 228− 159 = 69Costs: 580− 510 = 70

Disadvantages:

It is calculated from only 2 scores.Those two values are the most extreme in the distribution(obviously sensitive to outliers).The range can change dramatically from sample to sample(of the same variable).The range is not terribly informative.

Starkweather Module 3

Page 99: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Range

The range is simply the maximum score, minus the minimumscore. Examples from our oil data:

Barrels: 228− 159 = 69Costs: 580− 510 = 70

Disadvantages:It is calculated from only 2 scores.

Those two values are the most extreme in the distribution(obviously sensitive to outliers).The range can change dramatically from sample to sample(of the same variable).The range is not terribly informative.

Starkweather Module 3

Page 100: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Range

The range is simply the maximum score, minus the minimumscore. Examples from our oil data:

Barrels: 228− 159 = 69Costs: 580− 510 = 70

Disadvantages:It is calculated from only 2 scores.Those two values are the most extreme in the distribution(obviously sensitive to outliers).

The range can change dramatically from sample to sample(of the same variable).The range is not terribly informative.

Starkweather Module 3

Page 101: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Range

The range is simply the maximum score, minus the minimumscore. Examples from our oil data:

Barrels: 228− 159 = 69Costs: 580− 510 = 70

Disadvantages:It is calculated from only 2 scores.Those two values are the most extreme in the distribution(obviously sensitive to outliers).The range can change dramatically from sample to sample(of the same variable).

The range is not terribly informative.

Starkweather Module 3

Page 102: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

The Range

The range is simply the maximum score, minus the minimumscore. Examples from our oil data:

Barrels: 228− 159 = 69Costs: 580− 510 = 70

Disadvantages:It is calculated from only 2 scores.Those two values are the most extreme in the distribution(obviously sensitive to outliers).The range can change dramatically from sample to sample(of the same variable).The range is not terribly informative.

Starkweather Module 3

Page 103: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Sums of Squares: symbol = SoS

The Sums of Squares are the sum of the squared deviationsfrom the mean for a distribution of scores.

Though not informative or used as a measure ofdispersion, it is very frequently used in the calculation ofother statistics.

The general formula for calculating a variable’s SoS is:

SoS =∑(

X − X)2

Starkweather Module 3

Page 104: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Sums of Squares: symbol = SoS

The Sums of Squares are the sum of the squared deviationsfrom the mean for a distribution of scores.

Though not informative or used as a measure ofdispersion, it is very frequently used in the calculation ofother statistics.

The general formula for calculating a variable’s SoS is:

SoS =∑(

X − X)2

Starkweather Module 3

Page 105: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Sums of Squares: symbol = SoS

The Sums of Squares are the sum of the squared deviationsfrom the mean for a distribution of scores.

Though not informative or used as a measure ofdispersion, it is very frequently used in the calculation ofother statistics.

The general formula for calculating a variable’s SoS is:

SoS =∑(

X − X)2

Starkweather Module 3

Page 106: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Sums of Squares: symbol = SoS

The Sums of Squares are the sum of the squared deviationsfrom the mean for a distribution of scores.

Though not informative or used as a measure ofdispersion, it is very frequently used in the calculation ofother statistics.

The general formula for calculating a variable’s SoS is:

SoS =∑(

X − X)2

Starkweather Module 3

Page 107: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Variance: sample symbol = S2, population symbol = σ 2

The variance is the average of each score’s squared differencefrom the mean.

The general formula for calculating a variable’s samplevariance is:

S2 =∑(X−X)

2

n−1 or S2 = SoSn−1

The formula for calculating a variable’s populationvariance is:

σ 2 =∑

(X−µ)2

N

Note: with a sample, we divide by n − 1; if we divided by n,our variance statistic would be less representative of thevariance parameter (i.e., the sample value would besystematically smaller than the population value).Also note: when referring to total number of scoresin a population we use N, in a sample we use n.

Starkweather Module 3

Page 108: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Variance: sample symbol = S2, population symbol = σ 2

The variance is the average of each score’s squared differencefrom the mean.

The general formula for calculating a variable’s samplevariance is:

S2 =∑(X−X)

2

n−1 or S2 = SoSn−1

The formula for calculating a variable’s populationvariance is:

σ 2 =∑

(X−µ)2

N

Note: with a sample, we divide by n − 1; if we divided by n,our variance statistic would be less representative of thevariance parameter (i.e., the sample value would besystematically smaller than the population value).Also note: when referring to total number of scoresin a population we use N, in a sample we use n.

Starkweather Module 3

Page 109: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Variance: sample symbol = S2, population symbol = σ 2

The variance is the average of each score’s squared differencefrom the mean.

The general formula for calculating a variable’s samplevariance is:

S2 =∑(X−X)

2

n−1 or S2 = SoSn−1

The formula for calculating a variable’s populationvariance is:

σ 2 =∑

(X−µ)2

N

Note: with a sample, we divide by n − 1; if we divided by n,our variance statistic would be less representative of thevariance parameter (i.e., the sample value would besystematically smaller than the population value).Also note: when referring to total number of scoresin a population we use N, in a sample we use n.

Starkweather Module 3

Page 110: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Variance: sample symbol = S2, population symbol = σ 2

The variance is the average of each score’s squared differencefrom the mean.

The general formula for calculating a variable’s samplevariance is:

S2 =∑(X−X)

2

n−1 or S2 = SoSn−1

The formula for calculating a variable’s populationvariance is:

σ 2 =∑

(X−µ)2

N

Note: with a sample, we divide by n − 1; if we divided by n,our variance statistic would be less representative of thevariance parameter (i.e., the sample value would besystematically smaller than the population value).Also note: when referring to total number of scoresin a population we use N, in a sample we use n.

Starkweather Module 3

Page 111: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Variance: sample symbol = S2, population symbol = σ 2

The variance is the average of each score’s squared differencefrom the mean.

The general formula for calculating a variable’s samplevariance is:

S2 =∑(X−X)

2

n−1 or S2 = SoSn−1

The formula for calculating a variable’s populationvariance is:

σ 2 =∑

(X−µ)2

N

Note: with a sample, we divide by n − 1; if we divided by n,our variance statistic would be less representative of thevariance parameter (i.e., the sample value would besystematically smaller than the population value).Also note: when referring to total number of scoresin a population we use N, in a sample we use n.

Starkweather Module 3

Page 112: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Variance: sample symbol = S2, population symbol = σ 2

The variance is the average of each score’s squared differencefrom the mean.

The general formula for calculating a variable’s samplevariance is:

S2 =∑(X−X)

2

n−1 or S2 = SoSn−1

The formula for calculating a variable’s populationvariance is:

σ 2 =∑

(X−µ)2

N

Note: with a sample, we divide by n − 1; if we divided by n,our variance statistic would be less representative of thevariance parameter (i.e., the sample value would besystematically smaller than the population value).

Also note: when referring to total number of scoresin a population we use N, in a sample we use n.

Starkweather Module 3

Page 113: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Variance: sample symbol = S2, population symbol = σ 2

The variance is the average of each score’s squared differencefrom the mean.

The general formula for calculating a variable’s samplevariance is:

S2 =∑(X−X)

2

n−1 or S2 = SoSn−1

The formula for calculating a variable’s populationvariance is:

σ 2 =∑

(X−µ)2

N

Note: with a sample, we divide by n − 1; if we divided by n,our variance statistic would be less representative of thevariance parameter (i.e., the sample value would besystematically smaller than the population value).Also note: when referring to total number of scoresin a population we use N, in a sample we use n.

Starkweather Module 3

Page 114: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Standard Deviation: sample symbol = S, population symbol = σ

The Standard Deviation is the square root of the variance andallows us to compare the dispersion of one distribution toanother.

It is the most commonly reported measure of dispersion3.It is very easy to calculate...just take the square root of thevariance.

Sample Formula: S =√

S2

Population Formula: σ =√σ 2

Note the use of the word “Standard” which you will seeoften; it refers to standardization, which tends to allow usto compare statistics from different variables ordistributions (i.e., apples & oranges).

3The American Psychological Association (APA) PublicationManual requires that mean and standard deviation be reportedwhenever one is referring to a group of scores.

Starkweather Module 3

Page 115: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Standard Deviation: sample symbol = S, population symbol = σ

The Standard Deviation is the square root of the variance andallows us to compare the dispersion of one distribution toanother.

It is the most commonly reported measure of dispersion3.

It is very easy to calculate...just take the square root of thevariance.

Sample Formula: S =√

S2

Population Formula: σ =√σ 2

Note the use of the word “Standard” which you will seeoften; it refers to standardization, which tends to allow usto compare statistics from different variables ordistributions (i.e., apples & oranges).

3The American Psychological Association (APA) PublicationManual requires that mean and standard deviation be reportedwhenever one is referring to a group of scores.

Starkweather Module 3

Page 116: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Standard Deviation: sample symbol = S, population symbol = σ

The Standard Deviation is the square root of the variance andallows us to compare the dispersion of one distribution toanother.

It is the most commonly reported measure of dispersion3.It is very easy to calculate...just take the square root of thevariance.

Sample Formula: S =√

S2

Population Formula: σ =√σ 2

Note the use of the word “Standard” which you will seeoften; it refers to standardization, which tends to allow usto compare statistics from different variables ordistributions (i.e., apples & oranges).

3The American Psychological Association (APA) PublicationManual requires that mean and standard deviation be reportedwhenever one is referring to a group of scores.

Starkweather Module 3

Page 117: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Standard Deviation: sample symbol = S, population symbol = σ

The Standard Deviation is the square root of the variance andallows us to compare the dispersion of one distribution toanother.

It is the most commonly reported measure of dispersion3.It is very easy to calculate...just take the square root of thevariance.

Sample Formula: S =√

S2

Population Formula: σ =√σ 2

Note the use of the word “Standard” which you will seeoften; it refers to standardization, which tends to allow usto compare statistics from different variables ordistributions (i.e., apples & oranges).

3The American Psychological Association (APA) PublicationManual requires that mean and standard deviation be reportedwhenever one is referring to a group of scores.

Starkweather Module 3

Page 118: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Standard Deviation: sample symbol = S, population symbol = σ

The Standard Deviation is the square root of the variance andallows us to compare the dispersion of one distribution toanother.

It is the most commonly reported measure of dispersion3.It is very easy to calculate...just take the square root of thevariance.

Sample Formula: S =√

S2

Population Formula: σ =√σ 2

Note the use of the word “Standard” which you will seeoften; it refers to standardization, which tends to allow usto compare statistics from different variables ordistributions (i.e., apples & oranges).

3The American Psychological Association (APA) PublicationManual requires that mean and standard deviation be reportedwhenever one is referring to a group of scores.

Starkweather Module 3

Page 119: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Formula Smormula: Computational formulas vs. Definitional formulas

Computational: designed to make computing by hand easier.

A matter of opinion these days...Definitional: designed to make understanding the concepteasier, formula follows the definition of the concepts.

Here are both for the standard deviation of a sample.Definitional Computational

S =

√∑(X−X)

2

n−1 S =

√∑X 2−[(

∑X)2/n]

n−1

Either can be used; both types provide the same answer.

Starkweather Module 3

Page 120: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Formula Smormula: Computational formulas vs. Definitional formulas

Computational: designed to make computing by hand easier.A matter of opinion these days...

Definitional: designed to make understanding the concepteasier, formula follows the definition of the concepts.

Here are both for the standard deviation of a sample.Definitional Computational

S =

√∑(X−X)

2

n−1 S =

√∑X 2−[(

∑X)2/n]

n−1

Either can be used; both types provide the same answer.

Starkweather Module 3

Page 121: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Formula Smormula: Computational formulas vs. Definitional formulas

Computational: designed to make computing by hand easier.A matter of opinion these days...

Definitional: designed to make understanding the concepteasier, formula follows the definition of the concepts.

Here are both for the standard deviation of a sample.Definitional Computational

S =

√∑(X−X)

2

n−1 S =

√∑X 2−[(

∑X)2/n]

n−1

Either can be used; both types provide the same answer.

Starkweather Module 3

Page 122: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Formula Smormula: Computational formulas vs. Definitional formulas

Computational: designed to make computing by hand easier.A matter of opinion these days...

Definitional: designed to make understanding the concepteasier, formula follows the definition of the concepts.

Here are both for the standard deviation of a sample.

Definitional Computational

S =

√∑(X−X)

2

n−1 S =

√∑X 2−[(

∑X)2/n]

n−1

Either can be used; both types provide the same answer.

Starkweather Module 3

Page 123: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Formula Smormula: Computational formulas vs. Definitional formulas

Computational: designed to make computing by hand easier.A matter of opinion these days...

Definitional: designed to make understanding the concepteasier, formula follows the definition of the concepts.

Here are both for the standard deviation of a sample.Definitional Computational

S =

√∑(X−X)

2

n−1 S =

√∑X 2−[(

∑X)2/n]

n−1

Either can be used; both types provide the same answer.

Starkweather Module 3

Page 124: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Formula Smormula: Computational formulas vs. Definitional formulas

Computational: designed to make computing by hand easier.A matter of opinion these days...

Definitional: designed to make understanding the concepteasier, formula follows the definition of the concepts.

Here are both for the standard deviation of a sample.Definitional Computational

S =

√∑(X−X)

2

n−1 S =

√∑X 2−[(

∑X)2/n]

n−1

Either can be used; both types provide the same answer.

Starkweather Module 3

Page 125: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Calculating SoS using example data (X = barrels)

Xi X X(

X − X) (

X − X)2

1 159 192.1 -33.1 1095.612 166 192.1 -26.1 681.213 176 192.1 -16.1 259.214 185 192.1 -7.1 50.415 191 192.1 -1.1 1.216 194 192.1 1.9 3.617 199 192.1 6.9 47.618 207 192.1 14.9 222.019 216 192.1 23.9 571.2110 228 192.1 35.9 1288.81∑

X = 1921 SoS =∑(

X − X)2

= 4220.90

Sample mean = X =∑

X/n = 1921/10 = 192.1

Starkweather Module 3

Page 126: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Calculating variance & standard deviation using example data (X = barrels)

Taking the information from the last slide...

Sample Variance for ‘Barrels’ is:

S2 =∑(X−X)

2

n−1 = SoSn−1 = 4220.90

10−1 = 4220.909 = 468.99

Sample Standard Deviation for ‘Barrels’ is:

S =

√∑(X−X)

2

n−1 =√

SoSn−1 =

√S2 =

√468.99 = 21.66

Starkweather Module 3

Page 127: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Calculating variance & standard deviation using example data (X = barrels)

Taking the information from the last slide...Sample Variance for ‘Barrels’ is:

S2 =∑(X−X)

2

n−1 = SoSn−1 = 4220.90

10−1 = 4220.909 = 468.99

Sample Standard Deviation for ‘Barrels’ is:

S =

√∑(X−X)

2

n−1 =√

SoSn−1 =

√S2 =

√468.99 = 21.66

Starkweather Module 3

Page 128: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Calculating variance & standard deviation using example data (X = barrels)

Taking the information from the last slide...Sample Variance for ‘Barrels’ is:

S2 =∑(X−X)

2

n−1 = SoSn−1 = 4220.90

10−1 = 4220.909 = 468.99

Sample Standard Deviation for ‘Barrels’ is:

S =

√∑(X−X)

2

n−1 =√

SoSn−1 =

√S2 =

√468.99 = 21.66

Starkweather Module 3

Page 129: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Coefficient of Variation

The Coefficient of Variation (CV) is calculated by dividing thestandard deviation by the mean, then multiply the result times100 to express it as a percentage.

The CV allows us to compare the standard deviation of onedistribution to another.

CV = SX× 100 = 21.66

192.1 × 100 = 0.1128× 100 = 11.28

The CV for ‘Barrels’ tells us that the standard deviation is11.28% of the mean.In contrast, the CV of ‘Costs’ was 4.11% of the mean; the meanwas 550.

You should be able to work backwards from the informationin the lines directly above to get the standarddeviation, variance, & sums of squares for ‘Costs’.

Starkweather Module 3

Page 130: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Coefficient of Variation

The Coefficient of Variation (CV) is calculated by dividing thestandard deviation by the mean, then multiply the result times100 to express it as a percentage.The CV allows us to compare the standard deviation of onedistribution to another.

CV = SX× 100 = 21.66

192.1 × 100 = 0.1128× 100 = 11.28

The CV for ‘Barrels’ tells us that the standard deviation is11.28% of the mean.In contrast, the CV of ‘Costs’ was 4.11% of the mean; the meanwas 550.

You should be able to work backwards from the informationin the lines directly above to get the standarddeviation, variance, & sums of squares for ‘Costs’.

Starkweather Module 3

Page 131: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Coefficient of Variation

The Coefficient of Variation (CV) is calculated by dividing thestandard deviation by the mean, then multiply the result times100 to express it as a percentage.The CV allows us to compare the standard deviation of onedistribution to another.

CV = SX× 100 = 21.66

192.1 × 100 = 0.1128× 100 = 11.28

The CV for ‘Barrels’ tells us that the standard deviation is11.28% of the mean.In contrast, the CV of ‘Costs’ was 4.11% of the mean; the meanwas 550.

You should be able to work backwards from the informationin the lines directly above to get the standarddeviation, variance, & sums of squares for ‘Costs’.

Starkweather Module 3

Page 132: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Coefficient of Variation

The Coefficient of Variation (CV) is calculated by dividing thestandard deviation by the mean, then multiply the result times100 to express it as a percentage.The CV allows us to compare the standard deviation of onedistribution to another.

CV = SX× 100 = 21.66

192.1 × 100 = 0.1128× 100 = 11.28

The CV for ‘Barrels’ tells us that the standard deviation is11.28% of the mean.

In contrast, the CV of ‘Costs’ was 4.11% of the mean; the meanwas 550.

You should be able to work backwards from the informationin the lines directly above to get the standarddeviation, variance, & sums of squares for ‘Costs’.

Starkweather Module 3

Page 133: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Coefficient of Variation

The Coefficient of Variation (CV) is calculated by dividing thestandard deviation by the mean, then multiply the result times100 to express it as a percentage.The CV allows us to compare the standard deviation of onedistribution to another.

CV = SX× 100 = 21.66

192.1 × 100 = 0.1128× 100 = 11.28

The CV for ‘Barrels’ tells us that the standard deviation is11.28% of the mean.In contrast, the CV of ‘Costs’ was 4.11% of the mean; the meanwas 550.

You should be able to work backwards from the informationin the lines directly above to get the standarddeviation, variance, & sums of squares for ‘Costs’.

Starkweather Module 3

Page 134: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Coefficient of Variation

The Coefficient of Variation (CV) is calculated by dividing thestandard deviation by the mean, then multiply the result times100 to express it as a percentage.The CV allows us to compare the standard deviation of onedistribution to another.

CV = SX× 100 = 21.66

192.1 × 100 = 0.1128× 100 = 11.28

The CV for ‘Barrels’ tells us that the standard deviation is11.28% of the mean.In contrast, the CV of ‘Costs’ was 4.11% of the mean; the meanwas 550.

You should be able to work backwards from the informationin the lines directly above to get the standarddeviation, variance, & sums of squares for ‘Costs’.

Starkweather Module 3

Page 135: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Shape

Measures of shape offer us an idea of what the distribution ofscores looks like when plotted.

Unimodal: one peakBimodal: two peaksMultimodal: multiple peaksRectangular distributions: multiple peaks of the samemagnitude

There are two measures of shape we commonly use:Skewness (or simply Skew)Kurtosis

Starkweather Module 3

Page 136: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Shape

Measures of shape offer us an idea of what the distribution ofscores looks like when plotted.

Unimodal: one peak

Bimodal: two peaksMultimodal: multiple peaksRectangular distributions: multiple peaks of the samemagnitude

There are two measures of shape we commonly use:Skewness (or simply Skew)Kurtosis

Starkweather Module 3

Page 137: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Shape

Measures of shape offer us an idea of what the distribution ofscores looks like when plotted.

Unimodal: one peakBimodal: two peaks

Multimodal: multiple peaksRectangular distributions: multiple peaks of the samemagnitude

There are two measures of shape we commonly use:Skewness (or simply Skew)Kurtosis

Starkweather Module 3

Page 138: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Shape

Measures of shape offer us an idea of what the distribution ofscores looks like when plotted.

Unimodal: one peakBimodal: two peaksMultimodal: multiple peaks

Rectangular distributions: multiple peaks of the samemagnitude

There are two measures of shape we commonly use:Skewness (or simply Skew)Kurtosis

Starkweather Module 3

Page 139: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Shape

Measures of shape offer us an idea of what the distribution ofscores looks like when plotted.

Unimodal: one peakBimodal: two peaksMultimodal: multiple peaksRectangular distributions: multiple peaks of the samemagnitude

There are two measures of shape we commonly use:Skewness (or simply Skew)Kurtosis

Starkweather Module 3

Page 140: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Shape

Measures of shape offer us an idea of what the distribution ofscores looks like when plotted.

Unimodal: one peakBimodal: two peaksMultimodal: multiple peaksRectangular distributions: multiple peaks of the samemagnitude

There are two measures of shape we commonly use:

Skewness (or simply Skew)Kurtosis

Starkweather Module 3

Page 141: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Shape

Measures of shape offer us an idea of what the distribution ofscores looks like when plotted.

Unimodal: one peakBimodal: two peaksMultimodal: multiple peaksRectangular distributions: multiple peaks of the samemagnitude

There are two measures of shape we commonly use:Skewness (or simply Skew)

Kurtosis

Starkweather Module 3

Page 142: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Shape

Measures of shape offer us an idea of what the distribution ofscores looks like when plotted.

Unimodal: one peakBimodal: two peaksMultimodal: multiple peaksRectangular distributions: multiple peaks of the samemagnitude

There are two measures of shape we commonly use:Skewness (or simply Skew)Kurtosis

Starkweather Module 3

Page 143: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Skewness

The Skewness refers to the amount of non-symmetry adistribution of scores contains.

Negative skew is when the tail points to the smaller valuesand most scores are located at the larger values.Positive skew is when the tail points to the larger valuesand most scores are located at the smaller values.Zero skew indicates symmetry.

The farther from zero the skewness, the less symmetric thedistribution of scores.

Starkweather Module 3

Page 144: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Skewness

The Skewness refers to the amount of non-symmetry adistribution of scores contains.

Negative skew is when the tail points to the smaller valuesand most scores are located at the larger values.

Positive skew is when the tail points to the larger valuesand most scores are located at the smaller values.Zero skew indicates symmetry.

The farther from zero the skewness, the less symmetric thedistribution of scores.

Starkweather Module 3

Page 145: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Skewness

The Skewness refers to the amount of non-symmetry adistribution of scores contains.

Negative skew is when the tail points to the smaller valuesand most scores are located at the larger values.Positive skew is when the tail points to the larger valuesand most scores are located at the smaller values.

Zero skew indicates symmetry.The farther from zero the skewness, the less symmetric thedistribution of scores.

Starkweather Module 3

Page 146: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Skewness

The Skewness refers to the amount of non-symmetry adistribution of scores contains.

Negative skew is when the tail points to the smaller valuesand most scores are located at the larger values.Positive skew is when the tail points to the larger valuesand most scores are located at the smaller values.Zero skew indicates symmetry.

The farther from zero the skewness, the less symmetric thedistribution of scores.

Starkweather Module 3

Page 147: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Skewness

The Skewness refers to the amount of non-symmetry adistribution of scores contains.

Negative skew is when the tail points to the smaller valuesand most scores are located at the larger values.Positive skew is when the tail points to the larger valuesand most scores are located at the smaller values.Zero skew indicates symmetry.

The farther from zero the skewness, the less symmetric thedistribution of scores.

Starkweather Module 3

Page 148: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Recognizing Skewness

Starkweather Module 3

Page 149: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Recognizing Skewness

Zero Skewness Positive Skewness Negative Skewness

Starkweather Module 3

Page 150: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Kurtosis

The Kurtosis measures the amount of tail magnitude,commonly referred to as “peak-ness” or “flatness” of adistribution of scores.

Kurtosis is based on the size of a distribution’s tails.A distribution with a large, positive kurtosis has thin tailsand the distribution looks peaked.

This is known as Leptokurtic.A distribution with a large, negative kurtosis has thick tailsand the distribution looks flat.

This is known as Platykurtic (like a plateau).

Starkweather Module 3

Page 151: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Kurtosis

The Kurtosis measures the amount of tail magnitude,commonly referred to as “peak-ness” or “flatness” of adistribution of scores.

Kurtosis is based on the size of a distribution’s tails.

A distribution with a large, positive kurtosis has thin tailsand the distribution looks peaked.

This is known as Leptokurtic.A distribution with a large, negative kurtosis has thick tailsand the distribution looks flat.

This is known as Platykurtic (like a plateau).

Starkweather Module 3

Page 152: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Kurtosis

The Kurtosis measures the amount of tail magnitude,commonly referred to as “peak-ness” or “flatness” of adistribution of scores.

Kurtosis is based on the size of a distribution’s tails.A distribution with a large, positive kurtosis has thin tailsand the distribution looks peaked.

This is known as Leptokurtic.A distribution with a large, negative kurtosis has thick tailsand the distribution looks flat.

This is known as Platykurtic (like a plateau).

Starkweather Module 3

Page 153: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Kurtosis

The Kurtosis measures the amount of tail magnitude,commonly referred to as “peak-ness” or “flatness” of adistribution of scores.

Kurtosis is based on the size of a distribution’s tails.A distribution with a large, positive kurtosis has thin tailsand the distribution looks peaked.

This is known as Leptokurtic.

A distribution with a large, negative kurtosis has thick tailsand the distribution looks flat.

This is known as Platykurtic (like a plateau).

Starkweather Module 3

Page 154: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Kurtosis

The Kurtosis measures the amount of tail magnitude,commonly referred to as “peak-ness” or “flatness” of adistribution of scores.

Kurtosis is based on the size of a distribution’s tails.A distribution with a large, positive kurtosis has thin tailsand the distribution looks peaked.

This is known as Leptokurtic.A distribution with a large, negative kurtosis has thick tailsand the distribution looks flat.

This is known as Platykurtic (like a plateau).

Starkweather Module 3

Page 155: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Kurtosis

The Kurtosis measures the amount of tail magnitude,commonly referred to as “peak-ness” or “flatness” of adistribution of scores.

Kurtosis is based on the size of a distribution’s tails.A distribution with a large, positive kurtosis has thin tailsand the distribution looks peaked.

This is known as Leptokurtic.A distribution with a large, negative kurtosis has thick tailsand the distribution looks flat.

This is known as Platykurtic (like a plateau).

Starkweather Module 3

Page 156: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Recognizing Kurtosis

Zero Kurtosis Positive Kurtosis Negative Kurtosis

Starkweather Module 3

Page 157: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Describing a Distribution’s ShapeHow can we describe the shape of the distribution of oursample’s Barrels variable?

Unimodal. Slightly Negatively Skewed?Leptokurtic (positive kurtosis)?

Starkweather Module 3

Page 158: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Describing a Distribution’s ShapeHow can we describe the shape of the distribution of oursample’s Barrels variable? Unimodal.

Slightly Negatively Skewed?Leptokurtic (positive kurtosis)?

Starkweather Module 3

Page 159: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Describing a Distribution’s ShapeHow can we describe the shape of the distribution of oursample’s Barrels variable? Unimodal. Slightly Negatively Skewed?

Leptokurtic (positive kurtosis)?

Starkweather Module 3

Page 160: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Describing a Distribution’s ShapeHow can we describe the shape of the distribution of oursample’s Barrels variable? Unimodal. Slightly Negatively Skewed?Leptokurtic (positive kurtosis)?

Starkweather Module 3

Page 161: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Deceptive Sample Leads to Poor Judgment

When we have such a small sample size (n = 10), we mustbe careful when eyeballing the distribution.

Actually, the Skewness for Barrels is .068And, the Kurtosis for Barrels is -.595

Generally, in the social sciences; we expect variables tohave skewness and kurtosis between +1 and -1.When a variable displays a skewness or kurtosis largerthan +1 or -1, then we say the variable is not symmetricaland/or does not have well proportioned tails.

Starkweather Module 3

Page 162: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Deceptive Sample Leads to Poor Judgment

When we have such a small sample size (n = 10), we mustbe careful when eyeballing the distribution.

Actually, the Skewness for Barrels is .068

And, the Kurtosis for Barrels is -.595

Generally, in the social sciences; we expect variables tohave skewness and kurtosis between +1 and -1.When a variable displays a skewness or kurtosis largerthan +1 or -1, then we say the variable is not symmetricaland/or does not have well proportioned tails.

Starkweather Module 3

Page 163: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Deceptive Sample Leads to Poor Judgment

When we have such a small sample size (n = 10), we mustbe careful when eyeballing the distribution.

Actually, the Skewness for Barrels is .068And, the Kurtosis for Barrels is -.595

Generally, in the social sciences; we expect variables tohave skewness and kurtosis between +1 and -1.When a variable displays a skewness or kurtosis largerthan +1 or -1, then we say the variable is not symmetricaland/or does not have well proportioned tails.

Starkweather Module 3

Page 164: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Deceptive Sample Leads to Poor Judgment

When we have such a small sample size (n = 10), we mustbe careful when eyeballing the distribution.

Actually, the Skewness for Barrels is .068And, the Kurtosis for Barrels is -.595

Generally, in the social sciences; we expect variables tohave skewness and kurtosis between +1 and -1.

When a variable displays a skewness or kurtosis largerthan +1 or -1, then we say the variable is not symmetricaland/or does not have well proportioned tails.

Starkweather Module 3

Page 165: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Deceptive Sample Leads to Poor Judgment

When we have such a small sample size (n = 10), we mustbe careful when eyeballing the distribution.

Actually, the Skewness for Barrels is .068And, the Kurtosis for Barrels is -.595

Generally, in the social sciences; we expect variables tohave skewness and kurtosis between +1 and -1.When a variable displays a skewness or kurtosis largerthan +1 or -1, then we say the variable is not symmetricaland/or does not have well proportioned tails.

Starkweather Module 3

Page 166: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Relationship

Measures of relationship offer us an idea of how two setsof scores, or two variables, are related.

How much variance do the two variables share.How much does one variable overlap another.

Measures of Association: When at least one of the twovariables is ordinal or nominal in scale.Correlational Measures: When both variables are intervalor ratio scaled (continuous or nearly so).

For the time being, we will do a quick overview of theMeasures of Association and focus more attention on theCorrelational Measures.

Starkweather Module 3

Page 167: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Relationship

Measures of relationship offer us an idea of how two setsof scores, or two variables, are related.How much variance do the two variables share.

How much does one variable overlap another.Measures of Association: When at least one of the twovariables is ordinal or nominal in scale.Correlational Measures: When both variables are intervalor ratio scaled (continuous or nearly so).

For the time being, we will do a quick overview of theMeasures of Association and focus more attention on theCorrelational Measures.

Starkweather Module 3

Page 168: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Relationship

Measures of relationship offer us an idea of how two setsof scores, or two variables, are related.How much variance do the two variables share.How much does one variable overlap another.

Measures of Association: When at least one of the twovariables is ordinal or nominal in scale.Correlational Measures: When both variables are intervalor ratio scaled (continuous or nearly so).

For the time being, we will do a quick overview of theMeasures of Association and focus more attention on theCorrelational Measures.

Starkweather Module 3

Page 169: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Relationship

Measures of relationship offer us an idea of how two setsof scores, or two variables, are related.How much variance do the two variables share.How much does one variable overlap another.

Measures of Association: When at least one of the twovariables is ordinal or nominal in scale.

Correlational Measures: When both variables are intervalor ratio scaled (continuous or nearly so).

For the time being, we will do a quick overview of theMeasures of Association and focus more attention on theCorrelational Measures.

Starkweather Module 3

Page 170: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Relationship

Measures of relationship offer us an idea of how two setsof scores, or two variables, are related.How much variance do the two variables share.How much does one variable overlap another.

Measures of Association: When at least one of the twovariables is ordinal or nominal in scale.Correlational Measures: When both variables are intervalor ratio scaled (continuous or nearly so).

For the time being, we will do a quick overview of theMeasures of Association and focus more attention on theCorrelational Measures.

Starkweather Module 3

Page 171: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Relationship

Measures of relationship offer us an idea of how two setsof scores, or two variables, are related.How much variance do the two variables share.How much does one variable overlap another.

Measures of Association: When at least one of the twovariables is ordinal or nominal in scale.Correlational Measures: When both variables are intervalor ratio scaled (continuous or nearly so).

For the time being, we will do a quick overview of theMeasures of Association and focus more attention on theCorrelational Measures.

Starkweather Module 3

Page 172: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Measures of Association

There are several Measures of Association.

Point-Biserial Correlation (rpb) when one variable isdichotomous.Phi Coefficient (φ) when both variables are dichotomous.Spearman’s rho (ρ) or (rs) and Kendall’s tau (τ ) when oneor both variables are ranked (ordinal).

Starkweather Module 3

Page 173: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Measures of Association

There are several Measures of Association.Point-Biserial Correlation (rpb) when one variable isdichotomous.

Phi Coefficient (φ) when both variables are dichotomous.Spearman’s rho (ρ) or (rs) and Kendall’s tau (τ ) when oneor both variables are ranked (ordinal).

Starkweather Module 3

Page 174: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Measures of Association

There are several Measures of Association.Point-Biserial Correlation (rpb) when one variable isdichotomous.Phi Coefficient (φ) when both variables are dichotomous.

Spearman’s rho (ρ) or (rs) and Kendall’s tau (τ ) when oneor both variables are ranked (ordinal).

Starkweather Module 3

Page 175: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Measures of Association

There are several Measures of Association.Point-Biserial Correlation (rpb) when one variable isdichotomous.Phi Coefficient (φ) when both variables are dichotomous.Spearman’s rho (ρ) or (rs) and Kendall’s tau (τ ) when oneor both variables are ranked (ordinal).

Starkweather Module 3

Page 176: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Correlational Measures

There are three key Correlational Measures we will cover here.

Covariance (COV )COVXY where x and y are the two variables we are using.

Correlation; the Pearson Product-Moment CorrelationCoefficient (r )Adjusted Correlation Coefficient (radj )

Starkweather Module 3

Page 177: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Correlational Measures

There are three key Correlational Measures we will cover here.Covariance (COV )

COVXY where x and y are the two variables we are using.

Correlation; the Pearson Product-Moment CorrelationCoefficient (r )Adjusted Correlation Coefficient (radj )

Starkweather Module 3

Page 178: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Correlational Measures

There are three key Correlational Measures we will cover here.Covariance (COV )

COVXY where x and y are the two variables we are using.

Correlation; the Pearson Product-Moment CorrelationCoefficient (r )

Adjusted Correlation Coefficient (radj )

Starkweather Module 3

Page 179: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Correlational Measures

There are three key Correlational Measures we will cover here.Covariance (COV )

COVXY where x and y are the two variables we are using.

Correlation; the Pearson Product-Moment CorrelationCoefficient (r )Adjusted Correlation Coefficient (radj )

Starkweather Module 3

Page 180: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Covariance

The Covariance is a non-standardized measure of relationship;meaning it can not be used to compare the relationship of twovariables to the relationship of two other variables.

Covariance is not terribly meaningful by itself, but it is usedin calculating other statistics (e.g., correlation).

It is not terribly meaningful because, its scale or metric isdetermined by the two specific variables on which it iscalculated.For this reason, it is not comparable across different pairsof variables.

Starkweather Module 3

Page 181: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Covariance

The Covariance is a non-standardized measure of relationship;meaning it can not be used to compare the relationship of twovariables to the relationship of two other variables.

Covariance is not terribly meaningful by itself, but it is usedin calculating other statistics (e.g., correlation).

It is not terribly meaningful because, its scale or metric isdetermined by the two specific variables on which it iscalculated.

For this reason, it is not comparable across different pairsof variables.

Starkweather Module 3

Page 182: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Covariance

The Covariance is a non-standardized measure of relationship;meaning it can not be used to compare the relationship of twovariables to the relationship of two other variables.

Covariance is not terribly meaningful by itself, but it is usedin calculating other statistics (e.g., correlation).

It is not terribly meaningful because, its scale or metric isdetermined by the two specific variables on which it iscalculated.For this reason, it is not comparable across different pairsof variables.

Starkweather Module 3

Page 183: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

What is Covariance then?

Calculating the covariance gives us a numeric measure ofthe degree or strength of relationship between twovariables.

Covariance can range between −∞ and +∞The larger the number (negatively or positively), the greateror stronger the relationship.When covariance is zero, there is no relationship betweenthe variables; virtually never happens.

The sign associated with a covariance tells us the directionof the relationship between the two variables.

If the sign is negative, then high scores on one variable areassociated with low scores on the other variable.If the sign is positive, then high scores on one variable areassociated with high scores on the other variable.

Starkweather Module 3

Page 184: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

What is Covariance then?

Calculating the covariance gives us a numeric measure ofthe degree or strength of relationship between twovariables.

Covariance can range between −∞ and +∞

The larger the number (negatively or positively), the greateror stronger the relationship.When covariance is zero, there is no relationship betweenthe variables; virtually never happens.

The sign associated with a covariance tells us the directionof the relationship between the two variables.

If the sign is negative, then high scores on one variable areassociated with low scores on the other variable.If the sign is positive, then high scores on one variable areassociated with high scores on the other variable.

Starkweather Module 3

Page 185: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

What is Covariance then?

Calculating the covariance gives us a numeric measure ofthe degree or strength of relationship between twovariables.

Covariance can range between −∞ and +∞The larger the number (negatively or positively), the greateror stronger the relationship.

When covariance is zero, there is no relationship betweenthe variables; virtually never happens.

The sign associated with a covariance tells us the directionof the relationship between the two variables.

If the sign is negative, then high scores on one variable areassociated with low scores on the other variable.If the sign is positive, then high scores on one variable areassociated with high scores on the other variable.

Starkweather Module 3

Page 186: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

What is Covariance then?

Calculating the covariance gives us a numeric measure ofthe degree or strength of relationship between twovariables.

Covariance can range between −∞ and +∞The larger the number (negatively or positively), the greateror stronger the relationship.When covariance is zero, there is no relationship betweenthe variables; virtually never happens.

The sign associated with a covariance tells us the directionof the relationship between the two variables.

If the sign is negative, then high scores on one variable areassociated with low scores on the other variable.If the sign is positive, then high scores on one variable areassociated with high scores on the other variable.

Starkweather Module 3

Page 187: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

What is Covariance then?

Calculating the covariance gives us a numeric measure ofthe degree or strength of relationship between twovariables.

Covariance can range between −∞ and +∞The larger the number (negatively or positively), the greateror stronger the relationship.When covariance is zero, there is no relationship betweenthe variables; virtually never happens.

The sign associated with a covariance tells us the directionof the relationship between the two variables.

If the sign is negative, then high scores on one variable areassociated with low scores on the other variable.If the sign is positive, then high scores on one variable areassociated with high scores on the other variable.

Starkweather Module 3

Page 188: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

What is Covariance then?

Calculating the covariance gives us a numeric measure ofthe degree or strength of relationship between twovariables.

Covariance can range between −∞ and +∞The larger the number (negatively or positively), the greateror stronger the relationship.When covariance is zero, there is no relationship betweenthe variables; virtually never happens.

The sign associated with a covariance tells us the directionof the relationship between the two variables.

If the sign is negative, then high scores on one variable areassociated with low scores on the other variable.

If the sign is positive, then high scores on one variable areassociated with high scores on the other variable.

Starkweather Module 3

Page 189: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

What is Covariance then?

Calculating the covariance gives us a numeric measure ofthe degree or strength of relationship between twovariables.

Covariance can range between −∞ and +∞The larger the number (negatively or positively), the greateror stronger the relationship.When covariance is zero, there is no relationship betweenthe variables; virtually never happens.

The sign associated with a covariance tells us the directionof the relationship between the two variables.

If the sign is negative, then high scores on one variable areassociated with low scores on the other variable.If the sign is positive, then high scores on one variable areassociated with high scores on the other variable.

Starkweather Module 3

Page 190: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Calculating Covariance

The definitional formula for calculating the covariance of twosample variables (X, Y) is:

COVxy =∑

(X−X)(Y−Y )n−1

This formula is very similar to the variance formula; for instanceif we swap out all the Y’s in the above formula for more X’s, weget the variance of X:

S2X =

∑(X−X)(X−X)

n−1 =∑

(X−X)2

n−1

Starkweather Module 3

Page 191: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Calculating Covariance

The definitional formula for calculating the covariance of twosample variables (X, Y) is:

COVxy =∑

(X−X)(Y−Y )n−1

This formula is very similar to the variance formula; for instanceif we swap out all the Y’s in the above formula for more X’s, weget the variance of X:

S2X =

∑(X−X)(X−X)

n−1 =∑

(X−X)2

n−1

Starkweather Module 3

Page 192: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Calculating Covariance

The definitional formula for calculating the covariance of twosample variables (X, Y) is:

COVxy =∑

(X−X)(Y−Y )n−1

This formula is very similar to the variance formula; for instanceif we swap out all the Y’s in the above formula for more X’s, weget the variance of X:

S2X =

∑(X−X)(X−X)

n−1 =∑

(X−X)2

n−1

Starkweather Module 3

Page 193: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Computational formula for Covariance

The computational formula is generally considered moremanageable when calculating by hand.

But; as mentioned previously with standard deviation, boththe computational and definitional formulas provide thesame answer.

COVXY =∑

XY−∑

X∑

Yn

n−1

Starkweather Module 3

Page 194: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Computational formula for Covariance

The computational formula is generally considered moremanageable when calculating by hand.But; as mentioned previously with standard deviation, boththe computational and definitional formulas provide thesame answer.

COVXY =∑

XY−∑

X∑

Yn

n−1

Starkweather Module 3

Page 195: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Computational formula for Covariance

The computational formula is generally considered moremanageable when calculating by hand.But; as mentioned previously with standard deviation, boththe computational and definitional formulas provide thesame answer.

COVXY =∑

XY−∑

X∑

Yn

n−1

Starkweather Module 3

Page 196: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Oil Example sample data: Covariance Calculation

XYi Barrels (X ) Costs (Y ) XY1 159 520 826802 166 570 946203 176 510 897604 185 560 1036005 191 560 1069606 194 530 1028207 199 560 1114408 207 580 1200609 216 550 11880010 228 560 127680∑

X = 1921∑

Y = 5500∑

XY = 1058420

n = 10

Starkweather Module 3

Page 197: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Example Calculation continued

Taking the sums and n from the previous slide, we can use thecomputational formula to complete the calculation ofcovariance.

COVXY =∑

XY−∑

X∑

Yn

n−1 =1058420− (1921)(5500)

1010−1 ...

COVXY = 1058420−10565509 = 1870

9 = 207.78

So, the covariance of X and Y is 207.78; which does not seemterribly meaningful.

Positive number, indicates that high scores on X areassociated with high scores on Y (and vice versa).Large number (i.e., far from zero), so the two variables arelikely to be fairly well related.Beyond that, not much can be said.

Starkweather Module 3

Page 198: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Example Calculation continued

Taking the sums and n from the previous slide, we can use thecomputational formula to complete the calculation ofcovariance.

COVXY =∑

XY−∑

X∑

Yn

n−1 =1058420− (1921)(5500)

1010−1 ...

COVXY = 1058420−10565509 = 1870

9 = 207.78

So, the covariance of X and Y is 207.78; which does not seemterribly meaningful.

Positive number, indicates that high scores on X areassociated with high scores on Y (and vice versa).Large number (i.e., far from zero), so the two variables arelikely to be fairly well related.Beyond that, not much can be said.

Starkweather Module 3

Page 199: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Example Calculation continued

Taking the sums and n from the previous slide, we can use thecomputational formula to complete the calculation ofcovariance.

COVXY =∑

XY−∑

X∑

Yn

n−1 =1058420− (1921)(5500)

1010−1 ...

COVXY = 1058420−10565509 = 1870

9 = 207.78

So, the covariance of X and Y is 207.78; which does not seemterribly meaningful.

Positive number, indicates that high scores on X areassociated with high scores on Y (and vice versa).Large number (i.e., far from zero), so the two variables arelikely to be fairly well related.Beyond that, not much can be said.

Starkweather Module 3

Page 200: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Example Calculation continued

Taking the sums and n from the previous slide, we can use thecomputational formula to complete the calculation ofcovariance.

COVXY =∑

XY−∑

X∑

Yn

n−1 =1058420− (1921)(5500)

1010−1 ...

COVXY = 1058420−10565509 = 1870

9 = 207.78

So, the covariance of X and Y is 207.78; which does not seemterribly meaningful.

Positive number, indicates that high scores on X areassociated with high scores on Y (and vice versa).Large number (i.e., far from zero), so the two variables arelikely to be fairly well related.Beyond that, not much can be said.

Starkweather Module 3

Page 201: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Example Calculation continued

Taking the sums and n from the previous slide, we can use thecomputational formula to complete the calculation ofcovariance.

COVXY =∑

XY−∑

X∑

Yn

n−1 =1058420− (1921)(5500)

1010−1 ...

COVXY = 1058420−10565509 = 1870

9 = 207.78

So, the covariance of X and Y is 207.78; which does not seemterribly meaningful.

Positive number, indicates that high scores on X areassociated with high scores on Y (and vice versa).

Large number (i.e., far from zero), so the two variables arelikely to be fairly well related.Beyond that, not much can be said.

Starkweather Module 3

Page 202: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Example Calculation continued

Taking the sums and n from the previous slide, we can use thecomputational formula to complete the calculation ofcovariance.

COVXY =∑

XY−∑

X∑

Yn

n−1 =1058420− (1921)(5500)

1010−1 ...

COVXY = 1058420−10565509 = 1870

9 = 207.78

So, the covariance of X and Y is 207.78; which does not seemterribly meaningful.

Positive number, indicates that high scores on X areassociated with high scores on Y (and vice versa).Large number (i.e., far from zero), so the two variables arelikely to be fairly well related.

Beyond that, not much can be said.

Starkweather Module 3

Page 203: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Example Calculation continued

Taking the sums and n from the previous slide, we can use thecomputational formula to complete the calculation ofcovariance.

COVXY =∑

XY−∑

X∑

Yn

n−1 =1058420− (1921)(5500)

1010−1 ...

COVXY = 1058420−10565509 = 1870

9 = 207.78

So, the covariance of X and Y is 207.78; which does not seemterribly meaningful.

Positive number, indicates that high scores on X areassociated with high scores on Y (and vice versa).Large number (i.e., far from zero), so the two variables arelikely to be fairly well related.Beyond that, not much can be said.

Starkweather Module 3

Page 204: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Correlation

The Correlation (r ) is a standardized measure of relationship;meaning it can be compared across multiple pairs of variables,regardless of scale.

Correlation is the most frequently used statistic forassessing the relationship between two variables4.Correlation allows us to describe the direction andmagnitude of a relationship between two variables.Correlation is very similar to covariance, indeed we use thecovariance to calculate correlation.Correlation is used in many inferential statistics and often amatrix of correlations is the input data used to calculatethem.

4There will be a great deal more discussion of correlation laterin the course.

Starkweather Module 3

Page 205: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Correlation

The Correlation (r ) is a standardized measure of relationship;meaning it can be compared across multiple pairs of variables,regardless of scale.

Correlation is the most frequently used statistic forassessing the relationship between two variables4.

Correlation allows us to describe the direction andmagnitude of a relationship between two variables.Correlation is very similar to covariance, indeed we use thecovariance to calculate correlation.Correlation is used in many inferential statistics and often amatrix of correlations is the input data used to calculatethem.

4There will be a great deal more discussion of correlation laterin the course.

Starkweather Module 3

Page 206: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Correlation

The Correlation (r ) is a standardized measure of relationship;meaning it can be compared across multiple pairs of variables,regardless of scale.

Correlation is the most frequently used statistic forassessing the relationship between two variables4.Correlation allows us to describe the direction andmagnitude of a relationship between two variables.

Correlation is very similar to covariance, indeed we use thecovariance to calculate correlation.Correlation is used in many inferential statistics and often amatrix of correlations is the input data used to calculatethem.

4There will be a great deal more discussion of correlation laterin the course.

Starkweather Module 3

Page 207: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Correlation

The Correlation (r ) is a standardized measure of relationship;meaning it can be compared across multiple pairs of variables,regardless of scale.

Correlation is the most frequently used statistic forassessing the relationship between two variables4.Correlation allows us to describe the direction andmagnitude of a relationship between two variables.Correlation is very similar to covariance, indeed we use thecovariance to calculate correlation.

Correlation is used in many inferential statistics and often amatrix of correlations is the input data used to calculatethem.

4There will be a great deal more discussion of correlation laterin the course.

Starkweather Module 3

Page 208: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Correlation

The Correlation (r ) is a standardized measure of relationship;meaning it can be compared across multiple pairs of variables,regardless of scale.

Correlation is the most frequently used statistic forassessing the relationship between two variables4.Correlation allows us to describe the direction andmagnitude of a relationship between two variables.Correlation is very similar to covariance, indeed we use thecovariance to calculate correlation.Correlation is used in many inferential statistics and often amatrix of correlations is the input data used to calculatethem.

4There will be a great deal more discussion of correlation laterin the course.

Starkweather Module 3

Page 209: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Interpretation of Correlation

Once calculated:

Correlation (r ) can range between −1 and +1.The larger the value (positive or negative), the stronger therelationship between the variables.If r is negative, then high scores on one variable areassociated with low scores on the other variable.If r is positive, then high scores on one variable areassociated with high scores on the other variable.If r = 0, then there is no relationship between the variables(virtually never occurs).

The size of r indicates the strength of the relationship and thesign (positive or negative) indicates the direction of therelationship.

Starkweather Module 3

Page 210: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Interpretation of Correlation

Once calculated:Correlation (r ) can range between −1 and +1.

The larger the value (positive or negative), the stronger therelationship between the variables.If r is negative, then high scores on one variable areassociated with low scores on the other variable.If r is positive, then high scores on one variable areassociated with high scores on the other variable.If r = 0, then there is no relationship between the variables(virtually never occurs).

The size of r indicates the strength of the relationship and thesign (positive or negative) indicates the direction of therelationship.

Starkweather Module 3

Page 211: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Interpretation of Correlation

Once calculated:Correlation (r ) can range between −1 and +1.The larger the value (positive or negative), the stronger therelationship between the variables.

If r is negative, then high scores on one variable areassociated with low scores on the other variable.If r is positive, then high scores on one variable areassociated with high scores on the other variable.If r = 0, then there is no relationship between the variables(virtually never occurs).

The size of r indicates the strength of the relationship and thesign (positive or negative) indicates the direction of therelationship.

Starkweather Module 3

Page 212: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Interpretation of Correlation

Once calculated:Correlation (r ) can range between −1 and +1.The larger the value (positive or negative), the stronger therelationship between the variables.If r is negative, then high scores on one variable areassociated with low scores on the other variable.

If r is positive, then high scores on one variable areassociated with high scores on the other variable.If r = 0, then there is no relationship between the variables(virtually never occurs).

The size of r indicates the strength of the relationship and thesign (positive or negative) indicates the direction of therelationship.

Starkweather Module 3

Page 213: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Interpretation of Correlation

Once calculated:Correlation (r ) can range between −1 and +1.The larger the value (positive or negative), the stronger therelationship between the variables.If r is negative, then high scores on one variable areassociated with low scores on the other variable.If r is positive, then high scores on one variable areassociated with high scores on the other variable.

If r = 0, then there is no relationship between the variables(virtually never occurs).

The size of r indicates the strength of the relationship and thesign (positive or negative) indicates the direction of therelationship.

Starkweather Module 3

Page 214: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Interpretation of Correlation

Once calculated:Correlation (r ) can range between −1 and +1.The larger the value (positive or negative), the stronger therelationship between the variables.If r is negative, then high scores on one variable areassociated with low scores on the other variable.If r is positive, then high scores on one variable areassociated with high scores on the other variable.If r = 0, then there is no relationship between the variables(virtually never occurs).

The size of r indicates the strength of the relationship and thesign (positive or negative) indicates the direction of therelationship.

Starkweather Module 3

Page 215: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Interpretation of Correlation

Once calculated:Correlation (r ) can range between −1 and +1.The larger the value (positive or negative), the stronger therelationship between the variables.If r is negative, then high scores on one variable areassociated with low scores on the other variable.If r is positive, then high scores on one variable areassociated with high scores on the other variable.If r = 0, then there is no relationship between the variables(virtually never occurs).

The size of r indicates the strength of the relationship and thesign (positive or negative) indicates the direction of therelationship.

Starkweather Module 3

Page 216: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Calculating Correlation

Calculating correlation is quite easy, once you have thecovariance.

rXY = COVXYSX SY

So, given the descriptive statistics from previous slides:Barrels (X ): SX = 21.66Costs (Y ): SY = 22.61and: COVXY = 207.78

rXY = 207.78(21.66)(22.61) =

207.78489.73 = 0.424

The correlation is .424 between Barrels and Costs.

Starkweather Module 3

Page 217: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Calculating Correlation

Calculating correlation is quite easy, once you have thecovariance.

rXY = COVXYSX SY

So, given the descriptive statistics from previous slides:Barrels (X ): SX = 21.66Costs (Y ): SY = 22.61and: COVXY = 207.78

rXY = 207.78(21.66)(22.61) =

207.78489.73 = 0.424

The correlation is .424 between Barrels and Costs.

Starkweather Module 3

Page 218: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Calculating Correlation

Calculating correlation is quite easy, once you have thecovariance.

rXY = COVXYSX SY

So, given the descriptive statistics from previous slides:

Barrels (X ): SX = 21.66Costs (Y ): SY = 22.61and: COVXY = 207.78

rXY = 207.78(21.66)(22.61) =

207.78489.73 = 0.424

The correlation is .424 between Barrels and Costs.

Starkweather Module 3

Page 219: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Calculating Correlation

Calculating correlation is quite easy, once you have thecovariance.

rXY = COVXYSX SY

So, given the descriptive statistics from previous slides:Barrels (X ): SX = 21.66Costs (Y ): SY = 22.61and: COVXY = 207.78

rXY = 207.78(21.66)(22.61) =

207.78489.73 = 0.424

The correlation is .424 between Barrels and Costs.

Starkweather Module 3

Page 220: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Calculating Correlation

Calculating correlation is quite easy, once you have thecovariance.

rXY = COVXYSX SY

So, given the descriptive statistics from previous slides:Barrels (X ): SX = 21.66Costs (Y ): SY = 22.61and: COVXY = 207.78

rXY = 207.78(21.66)(22.61) =

207.78489.73 = 0.424

The correlation is .424 between Barrels and Costs.

Starkweather Module 3

Page 221: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Calculating Correlation

Calculating correlation is quite easy, once you have thecovariance.

rXY = COVXYSX SY

So, given the descriptive statistics from previous slides:Barrels (X ): SX = 21.66Costs (Y ): SY = 22.61and: COVXY = 207.78

rXY = 207.78(21.66)(22.61) =

207.78489.73 = 0.424

The correlation is .424 between Barrels and Costs.

Starkweather Module 3

Page 222: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

What does that mean?

The correlation between Barrels and Costs is 0.424...so what?

We can say that Barrels and Costs are positively related.Meaning; high scores on one tend to be associated withhigh scores on the other.And, low scores on one tend to be associated with lowscores on the other.

But what of the magnitude? This is a bit more tricky.Generally, familiarity with recent, similar research will guideyour interpretation of magnitude.

Same field, topic, variables, etc.

In the social sciences, it is common to find correlationsaround .400 to .600 referred to as ‘moderate’, ‘good’, oreven ‘strong’ (in the case of .600).

Starkweather Module 3

Page 223: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

What does that mean?

The correlation between Barrels and Costs is 0.424...so what?We can say that Barrels and Costs are positively related.

Meaning; high scores on one tend to be associated withhigh scores on the other.And, low scores on one tend to be associated with lowscores on the other.

But what of the magnitude? This is a bit more tricky.Generally, familiarity with recent, similar research will guideyour interpretation of magnitude.

Same field, topic, variables, etc.

In the social sciences, it is common to find correlationsaround .400 to .600 referred to as ‘moderate’, ‘good’, oreven ‘strong’ (in the case of .600).

Starkweather Module 3

Page 224: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

What does that mean?

The correlation between Barrels and Costs is 0.424...so what?We can say that Barrels and Costs are positively related.

Meaning; high scores on one tend to be associated withhigh scores on the other.And, low scores on one tend to be associated with lowscores on the other.

But what of the magnitude? This is a bit more tricky.Generally, familiarity with recent, similar research will guideyour interpretation of magnitude.

Same field, topic, variables, etc.

In the social sciences, it is common to find correlationsaround .400 to .600 referred to as ‘moderate’, ‘good’, oreven ‘strong’ (in the case of .600).

Starkweather Module 3

Page 225: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

What does that mean?

The correlation between Barrels and Costs is 0.424...so what?We can say that Barrels and Costs are positively related.

Meaning; high scores on one tend to be associated withhigh scores on the other.And, low scores on one tend to be associated with lowscores on the other.

But what of the magnitude?

This is a bit more tricky.Generally, familiarity with recent, similar research will guideyour interpretation of magnitude.

Same field, topic, variables, etc.

In the social sciences, it is common to find correlationsaround .400 to .600 referred to as ‘moderate’, ‘good’, oreven ‘strong’ (in the case of .600).

Starkweather Module 3

Page 226: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

What does that mean?

The correlation between Barrels and Costs is 0.424...so what?We can say that Barrels and Costs are positively related.

Meaning; high scores on one tend to be associated withhigh scores on the other.And, low scores on one tend to be associated with lowscores on the other.

But what of the magnitude? This is a bit more tricky.

Generally, familiarity with recent, similar research will guideyour interpretation of magnitude.

Same field, topic, variables, etc.

In the social sciences, it is common to find correlationsaround .400 to .600 referred to as ‘moderate’, ‘good’, oreven ‘strong’ (in the case of .600).

Starkweather Module 3

Page 227: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

What does that mean?

The correlation between Barrels and Costs is 0.424...so what?We can say that Barrels and Costs are positively related.

Meaning; high scores on one tend to be associated withhigh scores on the other.And, low scores on one tend to be associated with lowscores on the other.

But what of the magnitude? This is a bit more tricky.Generally, familiarity with recent, similar research will guideyour interpretation of magnitude.

Same field, topic, variables, etc.

In the social sciences, it is common to find correlationsaround .400 to .600 referred to as ‘moderate’, ‘good’, oreven ‘strong’ (in the case of .600).

Starkweather Module 3

Page 228: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

What does that mean?

The correlation between Barrels and Costs is 0.424...so what?We can say that Barrels and Costs are positively related.

Meaning; high scores on one tend to be associated withhigh scores on the other.And, low scores on one tend to be associated with lowscores on the other.

But what of the magnitude? This is a bit more tricky.Generally, familiarity with recent, similar research will guideyour interpretation of magnitude.

Same field, topic, variables, etc.

In the social sciences, it is common to find correlationsaround .400 to .600 referred to as ‘moderate’, ‘good’, oreven ‘strong’ (in the case of .600).

Starkweather Module 3

Page 229: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

What does that mean?

The correlation between Barrels and Costs is 0.424...so what?We can say that Barrels and Costs are positively related.

Meaning; high scores on one tend to be associated withhigh scores on the other.And, low scores on one tend to be associated with lowscores on the other.

But what of the magnitude? This is a bit more tricky.Generally, familiarity with recent, similar research will guideyour interpretation of magnitude.

Same field, topic, variables, etc.

In the social sciences, it is common to find correlationsaround .400 to .600 referred to as ‘moderate’, ‘good’, oreven ‘strong’ (in the case of .600).

Starkweather Module 3

Page 230: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Taking Correlation a step further.

One very good way of helping yourself to interpret correlation isto square it.

By squaring the correlation (r2), we can interpret it as theamount of variance shared between the two variables.

r2 = .4242 = .1798So, we can say barrels extracted and rig operating costsshare 17.98% of their variance.Now we have a better understanding of the relationshipbetween the two variables.Keep in mind:

Squaring any correlation coefficient makes it smaller (r isalways between -1 and +1).

Starkweather Module 3

Page 231: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Taking Correlation a step further.

One very good way of helping yourself to interpret correlation isto square it.

By squaring the correlation (r2), we can interpret it as theamount of variance shared between the two variables.

r2 = .4242 = .1798So, we can say barrels extracted and rig operating costsshare 17.98% of their variance.Now we have a better understanding of the relationshipbetween the two variables.Keep in mind:

Squaring any correlation coefficient makes it smaller (r isalways between -1 and +1).

Starkweather Module 3

Page 232: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Taking Correlation a step further.

One very good way of helping yourself to interpret correlation isto square it.

By squaring the correlation (r2), we can interpret it as theamount of variance shared between the two variables.

r2 = .4242 = .1798

So, we can say barrels extracted and rig operating costsshare 17.98% of their variance.Now we have a better understanding of the relationshipbetween the two variables.Keep in mind:

Squaring any correlation coefficient makes it smaller (r isalways between -1 and +1).

Starkweather Module 3

Page 233: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Taking Correlation a step further.

One very good way of helping yourself to interpret correlation isto square it.

By squaring the correlation (r2), we can interpret it as theamount of variance shared between the two variables.

r2 = .4242 = .1798So, we can say barrels extracted and rig operating costsshare 17.98% of their variance.

Now we have a better understanding of the relationshipbetween the two variables.Keep in mind:

Squaring any correlation coefficient makes it smaller (r isalways between -1 and +1).

Starkweather Module 3

Page 234: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Taking Correlation a step further.

One very good way of helping yourself to interpret correlation isto square it.

By squaring the correlation (r2), we can interpret it as theamount of variance shared between the two variables.

r2 = .4242 = .1798So, we can say barrels extracted and rig operating costsshare 17.98% of their variance.Now we have a better understanding of the relationshipbetween the two variables.

Keep in mind:Squaring any correlation coefficient makes it smaller (r isalways between -1 and +1).

Starkweather Module 3

Page 235: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Taking Correlation a step further.

One very good way of helping yourself to interpret correlation isto square it.

By squaring the correlation (r2), we can interpret it as theamount of variance shared between the two variables.

r2 = .4242 = .1798So, we can say barrels extracted and rig operating costsshare 17.98% of their variance.Now we have a better understanding of the relationshipbetween the two variables.Keep in mind:

Squaring any correlation coefficient makes it smaller (r isalways between -1 and +1).

Starkweather Module 3

Page 236: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Taking Correlation a step further.

One very good way of helping yourself to interpret correlation isto square it.

By squaring the correlation (r2), we can interpret it as theamount of variance shared between the two variables.

r2 = .4242 = .1798So, we can say barrels extracted and rig operating costsshare 17.98% of their variance.Now we have a better understanding of the relationshipbetween the two variables.Keep in mind:

Squaring any correlation coefficient makes it smaller (r isalways between -1 and +1).

Starkweather Module 3

Page 237: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Adjusted Correlation

When sample sizes are small, as they are here (n = 10), thesample correlation will tend to overestimate the populationcorrelation.

Meaning, r and r2 tend to be larger than they truly are inthe population.The relationship appears stronger than it actually is in thepopulation.So, we generally correct for this problem by adjusting r .

radj =√

1− (1−r2)(n−1)n−2

Starkweather Module 3

Page 238: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Adjusted Correlation

When sample sizes are small, as they are here (n = 10), thesample correlation will tend to overestimate the populationcorrelation.

Meaning, r and r2 tend to be larger than they truly are inthe population.The relationship appears stronger than it actually is in thepopulation.

So, we generally correct for this problem by adjusting r .

radj =√

1− (1−r2)(n−1)n−2

Starkweather Module 3

Page 239: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Adjusted Correlation

When sample sizes are small, as they are here (n = 10), thesample correlation will tend to overestimate the populationcorrelation.

Meaning, r and r2 tend to be larger than they truly are inthe population.The relationship appears stronger than it actually is in thepopulation.So, we generally correct for this problem by adjusting r .

radj =√

1− (1−r2)(n−1)n−2

Starkweather Module 3

Page 240: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Adjusted Correlation

When sample sizes are small, as they are here (n = 10), thesample correlation will tend to overestimate the populationcorrelation.

Meaning, r and r2 tend to be larger than they truly are inthe population.The relationship appears stronger than it actually is in thepopulation.So, we generally correct for this problem by adjusting r .

radj =√

1− (1−r2)(n−1)n−2

Starkweather Module 3

Page 241: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Adjusting our example correlation

Adjusting our Oil example correlation:

radj =√

1− (1−r2)(n−1)n−2 =

√1− (1−.4242)(10−1)

10−2 = ...√1− (1−.1798)(9)

8 =√

1− (.8202)(9)8 = ...√

1− 7.38188 =

√1− .9227 = ...

√.0773 = .2780

Our correlation shrank from r = .424 to radj = .278Shared variance shrank from r2 = .1789 to r2

adj = .0773.We now have a more accurate sample estimate of therelationship between Barrels and Costs.They share 7.73% of their variance (i.e., clearly aweak relationship).

Starkweather Module 3

Page 242: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Adjusting our example correlation

Adjusting our Oil example correlation:

radj =√

1− (1−r2)(n−1)n−2 =

√1− (1−.4242)(10−1)

10−2 = ...

√1− (1−.1798)(9)

8 =√

1− (.8202)(9)8 = ...√

1− 7.38188 =

√1− .9227 = ...

√.0773 = .2780

Our correlation shrank from r = .424 to radj = .278Shared variance shrank from r2 = .1789 to r2

adj = .0773.We now have a more accurate sample estimate of therelationship between Barrels and Costs.They share 7.73% of their variance (i.e., clearly aweak relationship).

Starkweather Module 3

Page 243: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Adjusting our example correlation

Adjusting our Oil example correlation:

radj =√

1− (1−r2)(n−1)n−2 =

√1− (1−.4242)(10−1)

10−2 = ...√1− (1−.1798)(9)

8 =√

1− (.8202)(9)8 = ...

√1− 7.3818

8 =√

1− .9227 = ...√.0773 = .2780

Our correlation shrank from r = .424 to radj = .278Shared variance shrank from r2 = .1789 to r2

adj = .0773.We now have a more accurate sample estimate of therelationship between Barrels and Costs.They share 7.73% of their variance (i.e., clearly aweak relationship).

Starkweather Module 3

Page 244: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Adjusting our example correlation

Adjusting our Oil example correlation:

radj =√

1− (1−r2)(n−1)n−2 =

√1− (1−.4242)(10−1)

10−2 = ...√1− (1−.1798)(9)

8 =√

1− (.8202)(9)8 = ...√

1− 7.38188 =

√1− .9227 = ...

√.0773 = .2780

Our correlation shrank from r = .424 to radj = .278Shared variance shrank from r2 = .1789 to r2

adj = .0773.We now have a more accurate sample estimate of therelationship between Barrels and Costs.They share 7.73% of their variance (i.e., clearly aweak relationship).

Starkweather Module 3

Page 245: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Adjusting our example correlation

Adjusting our Oil example correlation:

radj =√

1− (1−r2)(n−1)n−2 =

√1− (1−.4242)(10−1)

10−2 = ...√1− (1−.1798)(9)

8 =√

1− (.8202)(9)8 = ...√

1− 7.38188 =

√1− .9227 = ...

√.0773 = .2780

Our correlation shrank from r = .424 to radj = .278Shared variance shrank from r2 = .1789 to r2

adj = .0773.We now have a more accurate sample estimate of therelationship between Barrels and Costs.They share 7.73% of their variance (i.e., clearly aweak relationship).

Starkweather Module 3

Page 246: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Adjusting our example correlation

Adjusting our Oil example correlation:

radj =√

1− (1−r2)(n−1)n−2 =

√1− (1−.4242)(10−1)

10−2 = ...√1− (1−.1798)(9)

8 =√

1− (.8202)(9)8 = ...√

1− 7.38188 =

√1− .9227 = ...

√.0773 = .2780

Our correlation shrank from r = .424 to radj = .278

Shared variance shrank from r2 = .1789 to r2adj = .0773.

We now have a more accurate sample estimate of therelationship between Barrels and Costs.They share 7.73% of their variance (i.e., clearly aweak relationship).

Starkweather Module 3

Page 247: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Adjusting our example correlation

Adjusting our Oil example correlation:

radj =√

1− (1−r2)(n−1)n−2 =

√1− (1−.4242)(10−1)

10−2 = ...√1− (1−.1798)(9)

8 =√

1− (.8202)(9)8 = ...√

1− 7.38188 =

√1− .9227 = ...

√.0773 = .2780

Our correlation shrank from r = .424 to radj = .278Shared variance shrank from r2 = .1789 to r2

adj = .0773.

We now have a more accurate sample estimate of therelationship between Barrels and Costs.They share 7.73% of their variance (i.e., clearly aweak relationship).

Starkweather Module 3

Page 248: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Adjusting our example correlation

Adjusting our Oil example correlation:

radj =√

1− (1−r2)(n−1)n−2 =

√1− (1−.4242)(10−1)

10−2 = ...√1− (1−.1798)(9)

8 =√

1− (.8202)(9)8 = ...√

1− 7.38188 =

√1− .9227 = ...

√.0773 = .2780

Our correlation shrank from r = .424 to radj = .278Shared variance shrank from r2 = .1789 to r2

adj = .0773.We now have a more accurate sample estimate of therelationship between Barrels and Costs.

They share 7.73% of their variance (i.e., clearly aweak relationship).

Starkweather Module 3

Page 249: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Adjusting our example correlation

Adjusting our Oil example correlation:

radj =√

1− (1−r2)(n−1)n−2 =

√1− (1−.4242)(10−1)

10−2 = ...√1− (1−.1798)(9)

8 =√

1− (.8202)(9)8 = ...√

1− 7.38188 =

√1− .9227 = ...

√.0773 = .2780

Our correlation shrank from r = .424 to radj = .278Shared variance shrank from r2 = .1789 to r2

adj = .0773.We now have a more accurate sample estimate of therelationship between Barrels and Costs.They share 7.73% of their variance (i.e., clearly aweak relationship).

Starkweather Module 3

Page 250: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Additional Considerations with Measures ofRelationship

Measures of relationship tell us something about whether or nottwo (or more) variables share variance.

They do NOT tell us what causes the relationship!Nor do they tell us if one variable causes another!

You will often hear this: “Correlation does not equalcausation!”

X may cause YY may cause XZ may be causing the relationship between X and Y

Although, we do tend to use correlation (and othermeasures of relationship) in the process of investigatingcausal relationships.

Starkweather Module 3

Page 251: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Additional Considerations with Measures ofRelationship

Measures of relationship tell us something about whether or nottwo (or more) variables share variance.They do NOT tell us what causes the relationship!

Nor do they tell us if one variable causes another!You will often hear this: “Correlation does not equalcausation!”

X may cause YY may cause XZ may be causing the relationship between X and Y

Although, we do tend to use correlation (and othermeasures of relationship) in the process of investigatingcausal relationships.

Starkweather Module 3

Page 252: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Additional Considerations with Measures ofRelationship

Measures of relationship tell us something about whether or nottwo (or more) variables share variance.They do NOT tell us what causes the relationship!Nor do they tell us if one variable causes another!

You will often hear this: “Correlation does not equalcausation!”

X may cause YY may cause XZ may be causing the relationship between X and Y

Although, we do tend to use correlation (and othermeasures of relationship) in the process of investigatingcausal relationships.

Starkweather Module 3

Page 253: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Additional Considerations with Measures ofRelationship

Measures of relationship tell us something about whether or nottwo (or more) variables share variance.They do NOT tell us what causes the relationship!Nor do they tell us if one variable causes another!

You will often hear this: “Correlation does not equalcausation!”

X may cause YY may cause XZ may be causing the relationship between X and Y

Although, we do tend to use correlation (and othermeasures of relationship) in the process of investigatingcausal relationships.

Starkweather Module 3

Page 254: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Additional Considerations with Measures ofRelationship

Measures of relationship tell us something about whether or nottwo (or more) variables share variance.They do NOT tell us what causes the relationship!Nor do they tell us if one variable causes another!

You will often hear this: “Correlation does not equalcausation!”

X may cause YY may cause XZ may be causing the relationship between X and Y

Although, we do tend to use correlation (and othermeasures of relationship) in the process of investigatingcausal relationships.

Starkweather Module 3

Page 255: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Central Tendency Dispersion Shape Relationship

Additional Considerations with Measures ofRelationship

Measures of relationship tell us something about whether or nottwo (or more) variables share variance.They do NOT tell us what causes the relationship!Nor do they tell us if one variable causes another!

You will often hear this: “Correlation does not equalcausation!”

X may cause YY may cause XZ may be causing the relationship between X and Y

Although, we do tend to use correlation (and othermeasures of relationship) in the process of investigatingcausal relationships.

Starkweather Module 3

Page 256: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Properties of Statistics

There are 4 properties we use to evaluate statistics.

SufficiencyUnbiasednessEfficiencyResistance

Some of them you are already familiar with...

Starkweather Module 3

Page 257: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Properties of Statistics

There are 4 properties we use to evaluate statistics.Sufficiency

UnbiasednessEfficiencyResistance

Some of them you are already familiar with...

Starkweather Module 3

Page 258: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Properties of Statistics

There are 4 properties we use to evaluate statistics.SufficiencyUnbiasedness

EfficiencyResistance

Some of them you are already familiar with...

Starkweather Module 3

Page 259: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Properties of Statistics

There are 4 properties we use to evaluate statistics.SufficiencyUnbiasednessEfficiency

ResistanceSome of them you are already familiar with...

Starkweather Module 3

Page 260: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Properties of Statistics

There are 4 properties we use to evaluate statistics.SufficiencyUnbiasednessEfficiencyResistance

Some of them you are already familiar with...

Starkweather Module 3

Page 261: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Properties of Statistics

There are 4 properties we use to evaluate statistics.SufficiencyUnbiasednessEfficiencyResistance

Some of them you are already familiar with...

Starkweather Module 3

Page 262: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Sufficiency

The Sufficiency of a statistic refers to whether or not it makesuse of all the information contained in a sample to estimate itscorresponding parameter.

As an example, consider measures of central tendency:The mean is very sufficient because, it uses all the scoreswhen being calculated.The median and mode are not very sufficient because, theyonly use one or two scores.

Starkweather Module 3

Page 263: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Sufficiency

The Sufficiency of a statistic refers to whether or not it makesuse of all the information contained in a sample to estimate itscorresponding parameter.

As an example, consider measures of central tendency:

The mean is very sufficient because, it uses all the scoreswhen being calculated.The median and mode are not very sufficient because, theyonly use one or two scores.

Starkweather Module 3

Page 264: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Sufficiency

The Sufficiency of a statistic refers to whether or not it makesuse of all the information contained in a sample to estimate itscorresponding parameter.

As an example, consider measures of central tendency:The mean is very sufficient because, it uses all the scoreswhen being calculated.

The median and mode are not very sufficient because, theyonly use one or two scores.

Starkweather Module 3

Page 265: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Sufficiency

The Sufficiency of a statistic refers to whether or not it makesuse of all the information contained in a sample to estimate itscorresponding parameter.

As an example, consider measures of central tendency:The mean is very sufficient because, it uses all the scoreswhen being calculated.The median and mode are not very sufficient because, theyonly use one or two scores.

Starkweather Module 3

Page 266: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Unbiasedness

The Unbiasedness refers to how well a sample statisticrepresents its associated population parameter.

As we saw with correlation, some statistics (r ) are morebiased than others (radj ).Recall how we calculated:

Sample variance: S2 =∑

(X − X )2/n − 1Population variance: σ2 =

∑(X − µ)2/N

This is due to something we will discuss more later,degrees of freedom (df).For now, consider this: we use N in the population formulabecause, we have all of the scores.When dealing with samples, we do not have all the scores(of the defined population) and we make anadjustment, dividing by n − 1.

Starkweather Module 3

Page 267: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Unbiasedness

The Unbiasedness refers to how well a sample statisticrepresents its associated population parameter.

As we saw with correlation, some statistics (r ) are morebiased than others (radj ).

Recall how we calculated:Sample variance: S2 =

∑(X − X )2/n − 1

Population variance: σ2 =∑

(X − µ)2/N

This is due to something we will discuss more later,degrees of freedom (df).For now, consider this: we use N in the population formulabecause, we have all of the scores.When dealing with samples, we do not have all the scores(of the defined population) and we make anadjustment, dividing by n − 1.

Starkweather Module 3

Page 268: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Unbiasedness

The Unbiasedness refers to how well a sample statisticrepresents its associated population parameter.

As we saw with correlation, some statistics (r ) are morebiased than others (radj ).Recall how we calculated:

Sample variance: S2 =∑

(X − X )2/n − 1Population variance: σ2 =

∑(X − µ)2/N

This is due to something we will discuss more later,degrees of freedom (df).For now, consider this: we use N in the population formulabecause, we have all of the scores.When dealing with samples, we do not have all the scores(of the defined population) and we make anadjustment, dividing by n − 1.

Starkweather Module 3

Page 269: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Unbiasedness

The Unbiasedness refers to how well a sample statisticrepresents its associated population parameter.

As we saw with correlation, some statistics (r ) are morebiased than others (radj ).Recall how we calculated:

Sample variance: S2 =∑

(X − X )2/n − 1

Population variance: σ2 =∑

(X − µ)2/N

This is due to something we will discuss more later,degrees of freedom (df).For now, consider this: we use N in the population formulabecause, we have all of the scores.When dealing with samples, we do not have all the scores(of the defined population) and we make anadjustment, dividing by n − 1.

Starkweather Module 3

Page 270: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Unbiasedness

The Unbiasedness refers to how well a sample statisticrepresents its associated population parameter.

As we saw with correlation, some statistics (r ) are morebiased than others (radj ).Recall how we calculated:

Sample variance: S2 =∑

(X − X )2/n − 1Population variance: σ2 =

∑(X − µ)2/N

This is due to something we will discuss more later,degrees of freedom (df).For now, consider this: we use N in the population formulabecause, we have all of the scores.When dealing with samples, we do not have all the scores(of the defined population) and we make anadjustment, dividing by n − 1.

Starkweather Module 3

Page 271: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Unbiasedness

The Unbiasedness refers to how well a sample statisticrepresents its associated population parameter.

As we saw with correlation, some statistics (r ) are morebiased than others (radj ).Recall how we calculated:

Sample variance: S2 =∑

(X − X )2/n − 1Population variance: σ2 =

∑(X − µ)2/N

This is due to something we will discuss more later,degrees of freedom (df).

For now, consider this: we use N in the population formulabecause, we have all of the scores.When dealing with samples, we do not have all the scores(of the defined population) and we make anadjustment, dividing by n − 1.

Starkweather Module 3

Page 272: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Unbiasedness

The Unbiasedness refers to how well a sample statisticrepresents its associated population parameter.

As we saw with correlation, some statistics (r ) are morebiased than others (radj ).Recall how we calculated:

Sample variance: S2 =∑

(X − X )2/n − 1Population variance: σ2 =

∑(X − µ)2/N

This is due to something we will discuss more later,degrees of freedom (df).For now, consider this: we use N in the population formulabecause, we have all of the scores.

When dealing with samples, we do not have all the scores(of the defined population) and we make anadjustment, dividing by n − 1.

Starkweather Module 3

Page 273: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Unbiasedness

The Unbiasedness refers to how well a sample statisticrepresents its associated population parameter.

As we saw with correlation, some statistics (r ) are morebiased than others (radj ).Recall how we calculated:

Sample variance: S2 =∑

(X − X )2/n − 1Population variance: σ2 =

∑(X − µ)2/N

This is due to something we will discuss more later,degrees of freedom (df).For now, consider this: we use N in the population formulabecause, we have all of the scores.When dealing with samples, we do not have all the scores(of the defined population) and we make anadjustment, dividing by n − 1.

Starkweather Module 3

Page 274: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Efficiency

The Efficiency refers to how much a statistic can change fromsample to sample. An efficient statistic does not change.

Consider the sample mean: X =∑

X/nIf we took an infinite number of repeated samples from asymmetrical population distribution with µ in the center:

The mean of each sample would be fairly close to µ and themean of all those sample means would be µ.

The key to that statement being true is “symmetricalpopulation distribution with µ in the center”.

Extremely high and low scores (those farthest from µ) arerare when compared to the number of scores near µ.Therefore, we can expect most of those repeated samplesto have a mean close to µ, because most of the scores ingeneral (in the population) are close to µ.

Starkweather Module 3

Page 275: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Efficiency

The Efficiency refers to how much a statistic can change fromsample to sample. An efficient statistic does not change.

Consider the sample mean: X =∑

X/n

If we took an infinite number of repeated samples from asymmetrical population distribution with µ in the center:

The mean of each sample would be fairly close to µ and themean of all those sample means would be µ.

The key to that statement being true is “symmetricalpopulation distribution with µ in the center”.

Extremely high and low scores (those farthest from µ) arerare when compared to the number of scores near µ.Therefore, we can expect most of those repeated samplesto have a mean close to µ, because most of the scores ingeneral (in the population) are close to µ.

Starkweather Module 3

Page 276: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Efficiency

The Efficiency refers to how much a statistic can change fromsample to sample. An efficient statistic does not change.

Consider the sample mean: X =∑

X/nIf we took an infinite number of repeated samples from asymmetrical population distribution with µ in the center:

The mean of each sample would be fairly close to µ and themean of all those sample means would be µ.

The key to that statement being true is “symmetricalpopulation distribution with µ in the center”.

Extremely high and low scores (those farthest from µ) arerare when compared to the number of scores near µ.Therefore, we can expect most of those repeated samplesto have a mean close to µ, because most of the scores ingeneral (in the population) are close to µ.

Starkweather Module 3

Page 277: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Efficiency

The Efficiency refers to how much a statistic can change fromsample to sample. An efficient statistic does not change.

Consider the sample mean: X =∑

X/nIf we took an infinite number of repeated samples from asymmetrical population distribution with µ in the center:

The mean of each sample would be fairly close to µ and themean of all those sample means would be µ.

The key to that statement being true is “symmetricalpopulation distribution with µ in the center”.

Extremely high and low scores (those farthest from µ) arerare when compared to the number of scores near µ.Therefore, we can expect most of those repeated samplesto have a mean close to µ, because most of the scores ingeneral (in the population) are close to µ.

Starkweather Module 3

Page 278: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Efficiency

The Efficiency refers to how much a statistic can change fromsample to sample. An efficient statistic does not change.

Consider the sample mean: X =∑

X/nIf we took an infinite number of repeated samples from asymmetrical population distribution with µ in the center:

The mean of each sample would be fairly close to µ and themean of all those sample means would be µ.

The key to that statement being true is “symmetricalpopulation distribution with µ in the center”.

Extremely high and low scores (those farthest from µ) arerare when compared to the number of scores near µ.Therefore, we can expect most of those repeated samplesto have a mean close to µ, because most of the scores ingeneral (in the population) are close to µ.

Starkweather Module 3

Page 279: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Efficiency

The Efficiency refers to how much a statistic can change fromsample to sample. An efficient statistic does not change.

Consider the sample mean: X =∑

X/nIf we took an infinite number of repeated samples from asymmetrical population distribution with µ in the center:

The mean of each sample would be fairly close to µ and themean of all those sample means would be µ.

The key to that statement being true is “symmetricalpopulation distribution with µ in the center”.

Extremely high and low scores (those farthest from µ) arerare when compared to the number of scores near µ.

Therefore, we can expect most of those repeated samplesto have a mean close to µ, because most of the scores ingeneral (in the population) are close to µ.

Starkweather Module 3

Page 280: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Efficiency

The Efficiency refers to how much a statistic can change fromsample to sample. An efficient statistic does not change.

Consider the sample mean: X =∑

X/nIf we took an infinite number of repeated samples from asymmetrical population distribution with µ in the center:

The mean of each sample would be fairly close to µ and themean of all those sample means would be µ.

The key to that statement being true is “symmetricalpopulation distribution with µ in the center”.

Extremely high and low scores (those farthest from µ) arerare when compared to the number of scores near µ.Therefore, we can expect most of those repeated samplesto have a mean close to µ, because most of the scores ingeneral (in the population) are close to µ.

Starkweather Module 3

Page 281: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Resistance

The Resistance refers to how resistant a statistic is to outliers(extreme scores).

If extreme scores do not influence the statistic, then thestatistic is resistant.Again, consider measures of central tendency: Mo, Mdn, XBoth Mo and Mdn only consider the very center of adistribution, so they are very resistant.X is very sensitive to outliers, they pull the mean towardthem thus making the mean not very resistant.

Starkweather Module 3

Page 282: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Resistance

The Resistance refers to how resistant a statistic is to outliers(extreme scores).

If extreme scores do not influence the statistic, then thestatistic is resistant.

Again, consider measures of central tendency: Mo, Mdn, XBoth Mo and Mdn only consider the very center of adistribution, so they are very resistant.X is very sensitive to outliers, they pull the mean towardthem thus making the mean not very resistant.

Starkweather Module 3

Page 283: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Resistance

The Resistance refers to how resistant a statistic is to outliers(extreme scores).

If extreme scores do not influence the statistic, then thestatistic is resistant.Again, consider measures of central tendency: Mo, Mdn, X

Both Mo and Mdn only consider the very center of adistribution, so they are very resistant.X is very sensitive to outliers, they pull the mean towardthem thus making the mean not very resistant.

Starkweather Module 3

Page 284: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Resistance

The Resistance refers to how resistant a statistic is to outliers(extreme scores).

If extreme scores do not influence the statistic, then thestatistic is resistant.Again, consider measures of central tendency: Mo, Mdn, XBoth Mo and Mdn only consider the very center of adistribution, so they are very resistant.

X is very sensitive to outliers, they pull the mean towardthem thus making the mean not very resistant.

Starkweather Module 3

Page 285: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary Sufficiency Unbiasedness Efficiency Resistance

Resistance

The Resistance refers to how resistant a statistic is to outliers(extreme scores).

If extreme scores do not influence the statistic, then thestatistic is resistant.Again, consider measures of central tendency: Mo, Mdn, XBoth Mo and Mdn only consider the very center of adistribution, so they are very resistant.X is very sensitive to outliers, they pull the mean towardthem thus making the mean not very resistant.

Starkweather Module 3

Page 286: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary

Summary of Module 3 (continued on next slide)

Introduced:

The context of an example studyDescriptive Statistics

Classes of Descriptive StatisticsCentral Tendency

ModeMedianMean

DispersionRangeSums of SquaresVarianceStandard DeviationCoefficient of Variation

ShapeSkewnessKurtosis

Starkweather Module 3

Page 287: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary

Summary of Module 3 (continued on next slide)

Introduced:The context of an example study

Descriptive StatisticsClasses of Descriptive Statistics

Central TendencyModeMedianMean

DispersionRangeSums of SquaresVarianceStandard DeviationCoefficient of Variation

ShapeSkewnessKurtosis

Starkweather Module 3

Page 288: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary

Summary of Module 3 (continued on next slide)

Introduced:The context of an example studyDescriptive Statistics

Classes of Descriptive StatisticsCentral Tendency

ModeMedianMean

DispersionRangeSums of SquaresVarianceStandard DeviationCoefficient of Variation

ShapeSkewnessKurtosis

Starkweather Module 3

Page 289: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary

Summary of Module 3 (continued on next slide)

Introduced:The context of an example studyDescriptive Statistics

Classes of Descriptive Statistics

Central TendencyModeMedianMean

DispersionRangeSums of SquaresVarianceStandard DeviationCoefficient of Variation

ShapeSkewnessKurtosis

Starkweather Module 3

Page 290: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary

Summary of Module 3 (continued on next slide)

Introduced:The context of an example studyDescriptive Statistics

Classes of Descriptive StatisticsCentral Tendency

ModeMedianMean

DispersionRangeSums of SquaresVarianceStandard DeviationCoefficient of Variation

ShapeSkewnessKurtosis

Starkweather Module 3

Page 291: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary

Summary of Module 3 (continued on next slide)

Introduced:The context of an example studyDescriptive Statistics

Classes of Descriptive StatisticsCentral Tendency

ModeMedianMean

DispersionRangeSums of SquaresVarianceStandard DeviationCoefficient of Variation

ShapeSkewnessKurtosis

Starkweather Module 3

Page 292: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary

Summary of Module 3 (continued on next slide)

Introduced:The context of an example studyDescriptive Statistics

Classes of Descriptive StatisticsCentral Tendency

ModeMedianMean

DispersionRangeSums of SquaresVarianceStandard DeviationCoefficient of Variation

ShapeSkewnessKurtosis

Starkweather Module 3

Page 293: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary

Summary continued

Classes of Descriptive Statistics (continued)

RelationshipCovarianceCorrelationAdjusted Correlation

Properties of statisticsSufficiencyUnbiasednessEfficiencyResistance

Starkweather Module 3

Page 294: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary

Summary continued

Classes of Descriptive Statistics (continued)Relationship

CovarianceCorrelationAdjusted Correlation

Properties of statisticsSufficiencyUnbiasednessEfficiencyResistance

Starkweather Module 3

Page 295: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary

Summary continued

Classes of Descriptive Statistics (continued)Relationship

CovarianceCorrelationAdjusted Correlation

Properties of statistics

SufficiencyUnbiasednessEfficiencyResistance

Starkweather Module 3

Page 296: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary

Summary continued

Classes of Descriptive Statistics (continued)Relationship

CovarianceCorrelationAdjusted Correlation

Properties of statisticsSufficiency

UnbiasednessEfficiencyResistance

Starkweather Module 3

Page 297: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary

Summary continued

Classes of Descriptive Statistics (continued)Relationship

CovarianceCorrelationAdjusted Correlation

Properties of statisticsSufficiencyUnbiasedness

EfficiencyResistance

Starkweather Module 3

Page 298: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary

Summary continued

Classes of Descriptive Statistics (continued)Relationship

CovarianceCorrelationAdjusted Correlation

Properties of statisticsSufficiencyUnbiasednessEfficiency

Resistance

Starkweather Module 3

Page 299: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary

Summary continued

Classes of Descriptive Statistics (continued)Relationship

CovarianceCorrelationAdjusted Correlation

Properties of statisticsSufficiencyUnbiasednessEfficiencyResistance

Starkweather Module 3

Page 300: Module 3: Describing Databayes.acs.unt.edu:8083/.../ISSS_M3_DescribingData.pdfDescribing Data Bob is interested in describing the extraction and cost of his rigs. But, he does not

Intro Classes Properties Summary

This concludes Module 3

Next time Module 4.Next time we’ll begin covering “The Normal Curve”.Until next time; have a nice day.

These slides initially created on: September 16, 2010These slides last updated on: September 23, 2010

The bottom date shown is the date this Adobe.pdf file wascreated; LATEX5 has a command for automatically insertingthe date of a document’s creation.

5This document was created in LATEX using the Beamer packageStarkweather Module 3