60
Fitting probability models to frequency data

Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Embed Size (px)

Citation preview

Page 1: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Fitting probability models to frequency

data

Page 2: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Review - proportions

• Data: discrete nominal variable with two states (“success” and “failure”)

• You can do two things:– Estimate a parameter with confidence interval

– Test a hypothesis

Page 3: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Estimating a proportion

ˆ p =x

n

Page 4: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Confidence interval for a proportion*

′ p =X + 2

n + 4

′ p − Z′ p 1− ′ p ( )

n + 4

⎝ ⎜ ⎜

⎠ ⎟ ⎟≤ p ≤ ′ p + Z

′ p 1− ′ p ( )

n + 4

⎝ ⎜ ⎜

⎠ ⎟ ⎟

where Z = 1.96 for a 95% confidence interval

* The Agresti-Couli method

Page 5: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Hypothesis testing

• Want to know something about a population

• Take a sample from that population• Measure the sample• What would you expect the sample to look like under the null hypothesis?

• Compare the actual sample to this expectation

Page 6: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

weird

not so weird

Page 7: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Sample

Test statistic

Null hypothesis

Null distributioncompare

How unusual is this test statistic?

P < 0.05 P > 0.05

Reject Ho Fail to reject Ho

Page 8: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Binomial test

The binomial test uses data to test whether a population

proportion p matches a null expectation for the proportion.

Page 9: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Test statistic

• For the binomial test, the test statistic is the number of successes

Page 10: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Binomial test

Page 11: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

The binomial distribution

P(x) =n

x

⎝ ⎜

⎠ ⎟px 1− p( )

n−x

Page 12: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Number of correct choices

Frequency

Binomial distribution, n = 20, p = 0.5

P(x) =n

x

⎝ ⎜

⎠ ⎟px 1− p( )

n−x

x

Page 13: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Number of correct choices

Frequency

Binomial distribution, n = 20, p = 0.5

x

Test statistic

Page 14: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

P-value

• P-value - the probability of obtaining the data* if the null hypothesis were true

*as great or greater difference from the null hypothesis

Page 15: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

P-value

• Add up the probabilities from the null distribution

• Start at the test statistic, and go towards the tail

• Multiply by 2 = two tailed test

Page 16: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Number of correct choices

Frequency

Binomial distribution, n = 20, p = 0.5

x

P = 2*(Pr[16]+Pr[17]+Pr[18]

+Pr[19]+Pr[20])

Page 17: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Sample

Test statistic

Null hypothesis

Null distributioncompare

How unusual is this test statistic?

P < 0.05 P > 0.05

Reject Ho Fail to reject Ho

Page 18: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

N =20, p0 =0.5

P = 2 *(Pr[16] + Pr[17] + Pr[19] + Pr[19] + Pr[20])

= 2

20

16

⎝ ⎜

⎠ ⎟ 0.5( )

160.5( )

4+

20

17

⎝ ⎜

⎠ ⎟ 0.5( )

170.5( )

3+

20

18

⎝ ⎜

⎠ ⎟ 0.5( )

180.5( )

2

+20

19

⎝ ⎜

⎠ ⎟ 0.5( )

190.5( )

1+

20

20

⎝ ⎜

⎠ ⎟ 0.5( )

200.5( )

0

⎜ ⎜ ⎜ ⎜ ⎜

⎟ ⎟ ⎟ ⎟ ⎟

= 0.0026

This is a pain….

Page 19: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Calculating P-values

• By hand• Use computer software like jmp, excel

• Use tables

Page 20: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Sample

Test statistic

Null hypothesis

Null distributioncompare

How unusual is this test statistic?

P < 0.05 P > 0.05

Reject Ho Fail to reject Ho

Page 21: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Discrete distribution

• A probability distribution describing a discrete numerical random variable

Page 22: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Discrete distribution

• A probability distribution describing a discrete numerical random variable

• Examples: – Number of heads from 10 flips of a coin

– Number of flowers in a square meter – Number of disease outbreaks in a year

Page 23: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

2 Goodness-of-fit test

Compares counts to a discrete probability

distribution

Page 24: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

H0: The data come from a particular discrete probability

distribution.

HA: The data do not come from that distribution.

Hypotheses for 2 test

Page 25: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Test statistic for 2 test

2 =Observedi − Expectedi( )

2

Expectediall classes

Page 26: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

The day of birth for 350 births in the USin 1999.

Day Sun. Mon. Tues. Wed. Thu. Fri. Sat.Num.of

Births33 41 63 63 47 56 47

Page 27: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do
Page 28: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

H0: The probability of a birth occurring on any given day of

the week is 1/7.

HA: The probability of a birth occurring on any given day of

the week is NOT 1/7.

Hypotheses for day of birth example

Page 29: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Day Sun.

Mon. Tues.

Wed Thu.

Fri.

Sat.

Total

Obs. 33 41 63 63 47 56 47 350

Exp. 50 50 50 50 50 50 50 350

Page 30: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

The calculation for Sunday

Observed − Expected( )2

Expected=

33− 50( )2

50=

289

50

Page 31: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

2 =Observedi − Expectedi( )

2

Expectediall classes

=289

50+

81

50+

169

50+

169

50+

9

50+

36

50+

9

50

= 15.24

Page 32: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

The sampling distribution of 2 by

simulation

Frequency

2

Page 33: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Sampling distribution of 2 by the 2 distribution

Page 34: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Degrees of freedom

• The number of degrees of freedom specifies which of a family of distributions to use as the sampling distribution

Page 35: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Degrees of freedom for 2 test

df = Number of categories - 1 - (Number of parameters estimated from the data)

Page 36: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Degrees of freedom for day of birth

df = 7 - 1 - 0 = 6

Page 37: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Finding the P-value

Page 38: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Critical value

The value of the test statistic where P = .

Page 39: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

12.59

Page 40: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do
Page 41: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do
Page 42: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

P<0.05, so we can reject the null hypothesis

Babies in the US are not born randomly with respect to the day of the week.

Page 43: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do
Page 44: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Assumptions of 2 test

• No more than 20% of categories have Expected<5

• No category with Expected 1

Page 45: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

2 test as approximation of binomial test

• If the number of data points is large, then a 2 goodness-of-fit test can be used in place of a binomial test.

• See text for an example.

Page 46: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

The Poisson distribution

• Another discrete probability distribution

• Describes the number of successes in blocks of time or space, when successes happen independently of each other and occur with equal probability at every point in time or space

Page 47: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do
Page 48: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Poisson distribution

Pr X[ ] =e−μ μ X

X!

Page 49: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Example: Number of goals per side

in World Cup Soccer

Q: Is the outcome of a soccer game (at this level) random?

In other words, is the number of goals per team distributed as expected by pure chance?

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 50: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

World Cup 2002 scores

Page 51: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Number of goals for a team

(World Cup 2002)Number of goals Frequency

0 371 472 273 134 25 16 07 08 1

Page 52: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

What’s the mean, ?

x =37 0( ) + 47 1( ) + 27 2( ) +13 3( ) + 2 4( ) +1 5( ) +1 8( )

128

=161

128

=1.26

Page 53: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Poisson with = 1.26X Pr[X]

0 0.284

1 0.357

2 0.225

3 0.095

4 0.030

5 0.008

6 0.002

7 0

8 0

Page 54: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Finding the ExpectedX Pr[X] Expect

ed

0 0.284 36.3

1 0.357 45.7

2 0.225 28.8

3 0.095 12.1

4 0.030 3.8

5 0.008 1.0

6 0.002 0.2

7 0 0.04

8 0 0.007

} Too small!

Page 55: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Calculating 2

X Expected

Observed

0 36.3 37 0.013

1 45.7 47 0.037

2 28.8 27 0.113

3 12.1 13 0.067

4 5.0 4 0.200

Observedi − Expectedi( )2

Expectedi

2 =Observedi − Expectedi( )

2

Expectediall classes

∑ = 0.429

Page 56: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Degrees of freedom for poisson

df = Number of categories - 1 - (Number of parameters estimated from the data)

Page 57: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Degrees of freedom for poisson

df = Number of categories - 1 - (Number of parameters estimated from the data)

Estimated one parameter,

Page 58: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Degrees of freedom for poisson

df = Number of categories - 1 - (Number of parameters estimated from the data)

= 5 - 1 - 1 = 3

Page 59: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Critical valueTable 8.2. An excerpt from the 2 critical value tabl ,e fromAppendi x 3.1

dfX 0.999 0.995 0.99 0.975 0.95 0.05 0.025 0.01 0.005 0.001

1 1.6E-6

3.9E-5 0.00016 0.00098 0.00393 3.84 5.02 6.63 7.88 10.83

2 0 0.01 0.02 0.05 0.1 5.99 7.38 9.21 10.6 13.823 0.02 0.07 0.11 0.22 0.35 7.81 9.35 11.34 12.84 16.274 0.09 0.21 0.3 0.48 0.71 9.49 11.14 13.28 14.86 18.475 0.21 0.41 0.55 0.83 1.15 11.07 12.83 15.09 16.75 20.526 0.38 0.68 0.87 1.24 1.64 12.59 14.45 16.81 18.55 22.467 0.6 0.99 1.24 1.69 2.17 14.07 16.01 18.48 20.28 24.32

Page 60: Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do

Comparing 2 to the critical value

2 = 0.429

χ 32 = 7.81

0.49 < 7.81

So we cannot reject the null hypothesis.

There is no evidence that the score of a World Cup Soccer game is not Poisson distributed.