29
Statistics, an Overview 2 What We Will Cover in This Section What statistics are. Descriptive Statistics – Frequency distributions – Graphs – Mean Standard deviation Inferential Statistics – Z-scores Hypothesis Testing Basic Terms and Concepts

What We Will Cover in This Section

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

1

Statistics, an Overview 2

What We Will Cover in This Section

• What statistics are.• Descriptive Statistics

– Frequency distributions

– Graphs– Mean– Standard deviation

• Inferential Statistics– Z-scores

• Hypothesis Testing

Basic Terms and Concepts

2

Statistics, an Overview 4

Basic Terminology

STATISTICS

Numerical techniques for describing groups of people or events.

Statistics, an Overview 5

Fundamental Uses

DESCRIPTIVE STATISTICS

Techniques used to organize, summarize, and describe sets of numbers.

INFERENTIAL STATISTICS

Techniques that allow us to make estimates about POPULATIONS based on SAMPLE data.

Statistics, an Overview 6

Population vs. Sample

• Population– ALL members of a

group that are alike on some characteristic.

– Infinite in size.– Parameters are

indicated by Greek letters: µ σ

• Sample– A subset of a

population.– Finite in size.– Numerical estimators

are called statistics.– Statistics are

indicated by Roman letters: M, S.

3

Statistics, an Overview 7

Why Used?

• We cannot collect data from populations.

• We collect data from samples.• On the basis of the numerical

characteristics of samples we try to make conclusions about populations.

Using Numbers

Statistics, an Overview 9

Levels of Measurement

NOMINAL SCALE

Numbers are used as labels.

ORDINAL SCALE

Numbers are used to indicate rank order.

4

Statistics, an Overview 10

Levels of Measurement

INTERVAL SCALE

Numbers are used to indicate an actual amount and there is an equal unit of measurement between adjacent numbers.

RATIO SCALE

Numbers indicate an actual amount and there is a true zero.

Statistics, an Overview 11

Frequency Tablesand Graphs

Statistics, an Overview 12

Common Statistics• Frequency

– The number of people who got a certain score.– Symbolized with f.

• Number– The total count of observations in a sample.– Symbolized with N.

• Percent– The number in a group (f) divided by the total number

(N).• Percentile

– The percent of people who got a score and lower.– Symbolized with Pn.

5

Statistics, an Overview 13

Simple frequency distribution (N=20)

1510211

55110

3015312

5020413

7525514

9520415

950016

1005117

Cum %(Percentile)

%Frequency (f)Score

Statistics, an Overview 14

Histogram

0

1

2

3

4

5

6

10 11 12 13 14 15 16 17

Score

Freq

uenc

y

Statistics, an Overview 15

Sociability

1.4501.400

1.3501.300

1.2501.200

1.1501.100

1.0501.000

.950.900.850.800.750.700.650

Normal Distribution

Leptokurtic

Frequency

400

300

200

100

0

6

Statistics, an Overview 16

Raw F V-4

20.018.016.014.012.010.08.06.04.02.00.0

AVA Standardization Sample

Conformity (V-4) Raw Scores1200

1000

800

600

400

200

0

Std. Dev = 4.22 Mean = 4.7N = 3629.00

Statistics, an Overview 17

MeasuresoftheMiddle

Statistics, an Overview 18

Question

What number would you use to describe the typical height of people in this class?

7

Statistics, an Overview 19

Mean

• Sum the scores and divide by the number of scores.

• Symbols– Sample: M or X

Statistics, an Overview 20

Median

• The score below which 50% of the scores fall.

• Represents P50.• Divides the distribution in half.• Symbol: Mdn

Statistics, an Overview 21

Example

8 9 10 11 12 13 14 15 16

8 9 10 11 12 13 16 16 46

8

Statistics, an Overview 22

Mode

• The score that occurs most frequently in a distribution.

• Used for nominal scales or higher.• Symbol: Mo

Statistics, an Overview 23Sociability

1.4501.400

1.3501.300

1.2501.200

1.1501.100

1.0501.000

.950.900.850.800.750.700.650

Normal Distribution

Leptokurtic

Frequency

400

300

200

100

0

MoMdn

M

Statistics, an Overview 24

Conformity Raw Score

20.519.5

18.517.5

16.515.5

14.513.5

12.511.5

10.59.5

8.57.5

6.55.5

4.53.5

2.51.5

.5

Positively Skewed Distibution

Freq

uenc

y

700

600

500

400

300

200

100

0

Mo

Mdn

M

9

Statistics, an Overview 25

Measures of Variability

Statistics, an Overview 26

Two Normal Distributions with the Same Mean

A

B

Statistics, an Overview 27

Overview

The Mean describes the ‘typical score’; measures of variability give

an index of how much the rest of the scores in the sample are spread out

around the mean.

10

Statistics, an Overview 28

Range

• The distance between the lowest and highest score.

• FormulaRange = Highest Score – Lowest Score

• Example

1 3 4 6 8 12 15 16 18 19

1 3 4 6 8 12 15 16 18 79

Statistics, an Overview 29

Deviation Score

Mean

Sum 4510

9

76

5

8

X - MxScore

17.50

2.92

-2.5

-1.5

-.5

2.5

1.5

.5

6.25

2.25

6.252.25

.25

.25

0

0

(X - Mx)2

7.5

Variance

Statistics, an Overview 30

Variance (S2 or σ2)

Mean squareddeviation score

around the mean.

11

Statistics, an Overview 31

Standard Deviation (S or σ)

Square root of the variance.

2.92 = 1.71

Statistics, an Overview 32

Large and Small Standard Deviations

A

B

Statistics, an Overview 33

Key Learning Points

• Most behavioral characteristics are normally distributed.

• The Mean represents the ‘typical’ score for a sample.

• The Variance and Standard Deviationmeasure the variability of scores in a sample.

12

Statistics, an Overview 34

InferentialStatistics

Statistics, an Overview 35

Issue

• In research we collect data from SAMPLES and try to generalize those results to POPULATIONS.

• To what degree does a sample mean representthe population to which we want to make inferences?

• Does this sample represent population A or does it represent population B?

Statistics, an Overview 36

1.8082.357Standard Deviation3.2685.554Variance5.8756.875Mean

4755Sum88Number57411436868958756

M&MRaisin

13

Statistics, an Overview 37

Underlying Concept

Question: Do these two samples represent the same population or do they represent two different populations?

M&M Mean (5.875)

Raisin Mean (6.875)

Statistics, an Overview 38

What is Rare?

Means that are so far apart that we conclude that the distance between them could not have happened by chance.

Statistics, an Overview 39

Units of Measurement

µ or M

+1σ-1σ +2σ-2σ 0σ

14

Statistics, an Overview 40

Units of Measurement

M

+1σ-1σ +2σ-2σ 0σ+1z-1z +2z-2z 0z

Statistics, an Overview 41

Slicing the Normal Curve

M

50%34.13%

13.59%2.28%

-1z

-2z

Statistics, an Overview 42

Key Unit #1

5%

Z=1.64

15

Statistics, an Overview 43

Key Unit #2

1%

Z = -2.33

Statistics, an Overview 44

What Is Rare?

• Some event that has a low probability of happening.

• In research we choose this ‘rare’ value.• Typically it is set at 5% (.05) or less.• Any event that occurs 5% of the time or

less is considered to be rare.• Indicated by: p < .05

Statistics, an Overview 45

Applications

16

Statistics, an Overview 46

How Inferential Statistics are Used

1. When we want to know if the scores for two groups are different.

• t-test• Analysis of Variance (ANOVA)

2. When we want to see if there is a relationship between scores.

• Correlation coefficient

Statistics, an Overview 47

The t-test

Question: Do these two samples represent the same population or do they represent two different populations?

M1 M2

Statistics, an Overview 48

t-test Logic

1M 2M

Distance Between Means

Treatment + Random Error

17

Statistics, an Overview 49

Remember the Memory Study?

M&M Mean (5.875)

Raisin Mean (6.875)

Statistics, an Overview 50

t-Test

t (14) =.95, p < .36

An indicator as to how rare this value is. It indicates the number of times out of 100 you would get this difference between means based on the sample size.

p<.36

The calculated value for t. It ranges from 0 to large. It is possible to have negative values for t.

.95

Degrees of freedom (df). Two less than the number of people in the study.

(14)

The name of the statistict

Statistics, an Overview 51

Another Situation

The management of Sal T. Dogg’srestaurant wanted to see if the saltiness of appetizers would influence the number of drinks people purchased. Three sections of the club are targeted to receive appetizers that have either low, medium, or high saltiness. The dependent variable is the number of drinks ordered.

18

Statistics, an Overview 52

Appetizer saltiness and number or drinks ordered.

M = 1.80M = 3.90M = 2.00244142222131142362251241143332

Group 3 High SaltGroup 2 Medium SaltGroup 1 Low Salt

Statistics, an Overview 53

Issue

How to determine if one mean is significantly different from the other means while minimizing the probability of committing a Type I error.

Statistics, an Overview 54

Analysis of Variance (ANOVA)

19

Statistics, an Overview 55

Graph of Saltiness Ratings

GROUP

High SaltinessMedium SaltinessLow saltiness

Mea

n of

SAL

TINE

S

4.5

4.0

3.5

3.0

2.5

2.0

1.5

1.0

.5

0.0

Statistics, an Overview 56

ANOVA Summary Table

2951.37Total

.912724.50Within Groups

14.7713.435226.87Between Groups

F(crit=3.35)MSdfSSSource

Statistics, an Overview 57

How to Express F

F (2,27) = 14.77, p<.05

Calculated FDegrees of freedom(between, within) Rare?

20

Statistics, an Overview 58

Correlation:Measuring Relationships

Statistics, an Overview 59

0102030405060708090

100110120

8 10 12 14 16 18 20 22 24 26 28 30

Selection Test (Predictor)

Perf

orm

ance

(Cr

iter

ion)

Statistics, an Overview 60

Positive Correlation

0

10

20

30

40

50

0 2 4 6 8 10

Attractiveness

Liki

ng

21

Statistics, an Overview 61

Negative Correlation

1.5

2

2.5

3

3.5

4

0 30 60 90 120 150 180

TV Time

GPA

Statistics, an Overview 62

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

4 5 6 7 8 9 10 11

Shoe Size

GPA

Zero Correlation Example

Statistics, an Overview 63

Visual Interpretation

Shared Variance

Variable AVariable B

22

Statistics, an Overview 64

Relationship Strength

Strong Moderate Weak

Statistics, an Overview 65

What Correlations Look Like

1.369(**).986(**)-.013.398(**)Recklessness(2).369(**)1.415(**)-.480(**).947(**)Competence(2).986(**).415(**)1-.054.433(**)Recklessness(1)

-.013-.480(**)-.0541-.485(**)Self Confidence.398(**).947(**).433(**)-.485(**)1Competence(1)

Recklessness(2)

Competence(2)

Recklessness(1)

Self Confidence

Competence(1)

** Correlation is significant at the 0.01 level (2-tailed).* Correlation is significant at the 0.05 level (2-tailed).

Statistics, an Overview 66

Power and Effect Size

23

Statistics, an Overview 67

Effect Size

1. How strong was the treatment?2. How strong is the relationship?

Statistics, an Overview 68

Power

The ability of the statistical procedure to detect the effect

being measured.

Statistics, an Overview 69

Hypotheses

24

Statistics, an Overview 70

What is hypothesis testing?

A set of logical and statistical guidelines used to make

inferential decisions from sample statistics to population

characteristics.

Statistics, an Overview 71

Types of Hypotheses

• Research hypothesis.• Logical hypotheses.

– Null hypothesis (Ho).– Alternative hypothesis (Ha).

• Statistical hypothesis.

Statistics, an Overview 72

Research Hypothesis

Statement in words as to what the investigator expects to find.

Example.

Students who drink caffeine will be able to memorize information faster than students who do not drink caffeine.

25

Statistics, an Overview 73

Logical Hypotheses

Null Hypothesis (Ho).Statement that the treatment does not have the expected effect.

Alternative Hypothesis (Ha).Statement that the treatment had the expected effect.

Statistics, an Overview 74

Characteristics of the Logical Hypotheses

1. They are mutually exclusive.

2. They are mutually exhaustive.

Statistics, an Overview 75

How they fit together

Research hypothesis.

Students who drink caffeine will be able to memorize information faster than students who do not drink caffeine.

26

Statistics, an Overview 76

How They Fit Together #2

• HoStudents who drink caffeine will not be able to memorize information faster than people who do not drink caffeine.

• Non-caffeine and caffeine drinkers are the same.• Non-caffeine drinkers are faster.

• HaStudents who drink caffeine will memorize information faster than those who do not drink caffeine.

Statistics, an Overview 77

Caffeine Example, AGAIN!

Ha: Mcaffeine < Mno caffeine

Ho: Mcaffeine = Mno caffeine

or

Mcaffeine > Mno affeine

Statistics, an Overview 78

Decision Making Criteria

1. We make statistical inferences based on the probability that the results may or may nothave happened by chance.

2. Since we are dealing with sampling error there is always a possibility that data we collect could have happened by chance.

3. Our model for making this decision is founded on the normal distribution.

27

Statistics, an Overview 79

How the Decision Works

95%

CriticalRegion

CriticalRegion

Statistics, an Overview 80

Decision Steps

1. We start by assuming that the Null Hypothesis (Ho) is true.

2. When a statistical result is rare we conclude that it probably did not happen by chance.

3. If we conclude that a result did not happen by chance (e.g. it is rare), we reject Ho.

4. The only option is to conclude that the true state of affairs is represented by Ha.

ERRORS

28

Statistics, an Overview 82

Outcomes of the Statistical Decision

Retain Ho

Reject Ho

Predicted Effect(Ho False)

No PredictedEffect

(Ho True)

Actual Situation

Exp

erim

ente

r’sD

ecis

ion

Correct Decision

Correct Decision

Type IError

Type IIError

Statistics, an Overview 83

Alpha Level

The probability that a statistical test will lead to a Type I error.

CriticalRegion

CriticalRegion

Statistics, an Overview 84

Key Learning Points #1

1. Science is conservative.2. We assume that the research hypothesis is

invalid until the evidence is so strong that we must conclude that it is true.

3. We statistically ‘test’ the assumption that the research hypothesis is not true.

4. If the data are so strong that we believe that they could not have happened by chance, then we reject Ho.

29

Statistics, an Overview 85

Key Learning Points #2

5. Since our decisions are based on probability theory not absolute surety, we can make mistakes.

6. The probability of concluding that the research hypothesis is correct when it isn’t (rejecting Ho when it is true) is represented by alpha (").

7. The probability of failing to find a result when there is one is represented by beta ($).

Statistics, an Overview 86

THE END