34
Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Embed Size (px)

Citation preview

Page 1: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Chapter 2Characterizing Your Data Set

Allan Edwards:“Before you analyze your data, graph your data

Page 2: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Chapter 2Characterizing Your Data Set

Allan Edwards:“Before you analyze your data, graph your data

Francis Galton, Father of Intelligence Testing: Whenever you can, count!

Page 3: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Frequency TableVariable is Continuous

Page 4: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Grouped Frequency Table & Distribution

Continuous variable,Data from Same 100Subjects

Constant Interval“Class Interval”

Page 5: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Grouped Frequency HistogramFor Continuous Variable

Bars “Touch”, the end of one interval is beginning of nextValue is middle value of IntervalSpatz says the bars don’t touch – Whaaaaaa?????

Page 6: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Bar Chart for Categorical Variable

Bars are separated – a lot of Biology is not almost English

Page 7: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Standard Normal Distribution

The more Extreme your score the more unusual, improbable you areRemember this relationship -- it’s the basis of 90% of statisticsTypical of many characteristics -- E.G., height, intelligence, speed

Page 8: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Rectangular DistributionNever Seen One

Extreme Scores are NOT less usual/frequent/probable

Page 9: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Non-Normal Distribution

Example: Income -- Where is the mean? How would you characterize these data?

Page 10: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Negative Skew

Page 11: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Bimodal Distribution

Is the Mean appropriate/representativeE.G., Mean age of onset for Anorexia is 17yrs

One Peak is at 14yrs -- Onset of PubertyOne Peak is at 18yrs -- Going away to college

Page 12: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Bimodal Distribution, cont.

Page 13: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Characterizing Your DataMeasures of Central Tendency

Characterizing your Data:Shorthand notation for all of your values

Central Tendency:• A representative value• Where Your Scores tend to “Hang Out”• Where you go to find your data

1. Mean -- What is definition & why do you use it?2. Median -- Middle Value

What if you have an even # of values?3. Mode -- Most frequent value

Page 14: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Which Central Tendency is Best?

•MeanRatio Data (People allow Interval Data)Symmetrical Distributions

•MedianSkewed DistributionsOrdinal (Ranked) Data -- A mean cannot be computed

•ModeNominal (Qualitative) DataBimodal Data

Page 15: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

If you Had to Guess the Value of Each (Quantitative) Data Point

• Mode: Highest # of correct guesses

• Median: Errors would be symmetricalOverestimations would balance out Underestimations

• Mean: Errors of Estimation will be smallest, overallTwo Unique Properties of the Mean:

1. Deviations are smallest from the meanThan for any other value

2. Deviation scores sum to zero

Page 16: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

How Strong Is Your Tendency?Measures of Heterogeneity

(Chapter 3)

Two Data Sets with nearly identical:•Ns•Means•Medians•Modes

Are these two data sets similar?

Page 17: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Are They The Same?

Page 18: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Some Data Sets are More Heterogeneous

Jockeys: Very Low average height Very Homogeneous

Presbyterians: Medium average height Very Heterogeneous

NBA Players: Very High average height Very Homogenous

How do you characterize a data set’s Heterogeneity?The Greater the Heterogeneity, the Weaker the Central Tendency

Page 19: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Quantifying Heterogeneity

Range: Highest Score minus Lowest ScoreVery sensitive to a single Extreme Score

Inter Quartile Range: 75th percentile minus 25th percentileCaptures 50% of the scoresHow wide do you have to go to capture 50% of values?

The wider you have to go the more Heterogeneity

Page 20: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Heterogeneity, cont.

The more Heterogeneity, the more the scores will deviate from The mean

Xi-Xbar Xi-XbarXi Di Xi Di

4 -25 -16 0 6 07 1

8 2

Sum= 0 0Mean = 6 0 6 0

Page 21: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Heterogeneity, cont.

Two Unique properties of the Mean:

1. All deviation scores sum to zero

2. Raw scores Deviate Less from the mean than from any otherValue

This makes the mean the Best Representative of the dataSet If distribution is symmetrical

Page 22: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Heterogeneity, cont.

Problem: •All deviation scores sum to zero no matter how

Heterogeneous the raw scores

•You Cannot average deviations scores to quantify heterogeneity

Solution:Make all deviation scores Positive

Page 23: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Heterogeneity, cont.

Two way to make all deviation scores Positive:

•Take the Absolute Value of the Deviation Scores:Average of absolute values = Average Deviation

Mean +/- AD Captures 50% of raw scores

•Take the Square of the Deviation ScoresAverage of squared deviation scores = Variance

2 for PopulationS2 for SampleS2 -”hat” for estimating Population from Sample

Page 24: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Variance

Population Estimate of Population from Sample

To Describe sample use NS2 = Sample Variance

Problem: Magnitude of Variance is large relative to individualDeviation scores -- Quantifies but not very descriptive

Page 25: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Standard Deviation

Population Sample

Population Estimate

Mean +/- SD captures 68% of Data Points

Page 26: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Standard Deviation, cont.

Page 27: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

The Concept

Standard DeviationStandard Deviation from the Mean“Average” Deviation from the MeanExpected Deviation from the Mean

Expect 68% of your data to be within 1 SD of the meanExpect 95% of your data to be within 2 SD of the mean

If your score is beyond 2 SDs of the meanYou are very infrequentYou are very unusualYou are very improbable

Associate: Infrequent with Improbable

Page 28: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Interpreting a Value

Transforming a score to make it more interpretable:

•Comparing two scores:Two tests of Equal Difficulty but of Different Length

Pretend both tests were 100 items longHow many would you have gotten right?

Percent Correct is a Transformed Score

•Comparing one score to everybody else:Pretend there were 100 people, where would rank?

Percentile is a Transformed Score

Page 29: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Z-scores & Z-transformations

Take each score (Xi) and covert it to ZiMean of z-scores = 0Standard Deviation = 1Units of z-scores are in Standard DeviationsZ-score compares Your Deviation (numerator) to the“Average Deviation” (denominator)

Page 30: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Where you are relative to Population

Think Percentile

Page 31: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Interpreting Your Z-Score

Page 32: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Interpreting Your Z-Score, cont.

Page 33: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Interpreting Your Z-Score, cont.

Page 34: Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data

Interpreting Your Z-Score, cont.