37
Hadley Wickham Stat310 Fin Saturday, 24 April 2010

25 fin

Embed Size (px)

Citation preview

Page 1: 25 fin

Hadley Wickham

Stat310Fin

Saturday, 24 April 2010

Page 2: 25 fin

To those of you who bought your textbooks from my amazon link.

To the textbook publishers who generously sent me free copies of books.

To Kensey for suggesting chik-fil-a

Thank you!

Saturday, 24 April 2010

Page 3: 25 fin

1. Eat!

2. Final & help sessions

3. Finish off hypothesis testing

4. Other statistics opportunities

5. Feedback (TA & me)

Saturday, 24 April 2010

Page 4: 25 fin

Final

Saturday, 24 April 2010

Page 5: 25 fin

FinalTake home. Two hours long. Three (double-sided) pages of notes.

Available Wednesday April 28 9am.Due Wednesday May 5, 5pm, under my door.

Ten small questions of approximately equal weight. Similar to questions from the homework/book.

Saturday, 24 April 2010

Page 6: 25 fin

Common themes

Probability of an event.

Independence & conditioning.

Distributions: pdf/pmf, cdf, mgf, named.

Transformations.

Sampling distribution of mean and variance.

Estimation and testing.

Philosophy of gradingSaturday, 24 April 2010

Page 7: 25 fin

Mon, Tue, Wed, Thurs, Fri, Sat, Sun?

Morning or afternoon?

One-on-one help, plus brief revision of topics of particular interest. Suggest and vote at http://goo.gl/mod/joIx

Help sessions

Saturday, 24 April 2010

Page 8: 25 fin

Honour code

Remember to pledge your exam, and note the time at which you started and ended.

You may refer only to your note sheets, not to the text book or old homeworks etc.

Saturday, 24 April 2010

Page 9: 25 fin

Hypothesis testing

Saturday, 24 April 2010

Page 10: 25 fin

Course grades

Assume I took a random sample of 20 students from each years, and that course grades are normally distributed by variance 80.

What is the distribution of difference of the two group means?

Saturday, 24 April 2010

Page 11: 25 fin

Your turn

The average grade from 2009 was 85 and the average grade from 2010 was 90.

What is the p-value? (The probability that you’d see a difference this large or large if there really was no difference in the population means)

Saturday, 24 April 2010

Page 12: 25 fin

1. Write down Ho and Ha (positions of defence and prosecution)

2. Figure out good test statistic (what numeric summary?)

3. Work out null distribution (distribution of innocents)

4. Calculate p-value by comparing actual value to null distribution (what proportion of true innocents look more guilty than the suspect)

5. Reject Ho if p-value smaller than cutoff

Saturday, 24 April 2010

Page 13: 25 fin

Say is guilty

Say is innocent

Is guilty

Is innocent

CorrectFalse

acquittal

False conviction

Correct

Saturday, 24 April 2010

Page 14: 25 fin

Your turn

Which type of error is more expensive/more costly/worse in the criminal justice system?

Saturday, 24 April 2010

Page 15: 25 fin

Reject HO Accept HO

HO false

HO true

CorrectType II error

Type I error

Correct

Saturday, 24 April 2010

Page 16: 25 fin

For a given test,

P(false conviction) = α = significance level

P(false acquittal) = 1 - ββ = power

What do think happens to β if you try to make α smaller?

Rates

Saturday, 24 April 2010

Page 17: 25 fin

α↑ β↓α↓ β↑

Saturday, 24 April 2010

Page 18: 25 fin

Cut off

Choose cut-off based on rate of false convictions.

If you want a 5% rate of false convictions, reject Ho if the p-value is less than 0.05. (This is the industry standard rate)

Can work out power.

Saturday, 24 April 2010

Page 19: 25 fin

76

78

80

82

84

86

88

90

xx

x

x

xxxxx

xx

x

x

xxxxxx

xxx

x

x

x

x

xx

x

x

x

x

x

x

xxx

x

xxx

x

xxx

xx

xxxx

xx

x

xx

x

xx

xx

x

x

xxxx

x

x

x

x

xxxxxx

x

x

xx

x

x

x

xx

x

xx

xx

xx

x

x

xxxxx

y

yy

yyy

y

y

yyyyy

yy

yy

yy

y

yy

yy

yy

y

yyyy

y

y

y

yy

y

y

y

y

y

y

y

yy

yyy

yyy

y

y

yy

y

y

y

y

yyy

y

y

y

yy

yy

y

y

y

yyyyy

y

yy

y

y

y

yyy

y

y

y

yyy

y

y

yyy

yyy

20 40 60 80 100

μx=80, μy=85

Saturday, 24 April 2010

Page 20: 25 fin

Difference

−2

0

2

4

6

8

10

20 40 60 80 100

μx=80, μy=85

Saturday, 24 April 2010

Page 21: 25 fin

|Difference|

0

2

4

6

8

10

20 40 60 80 100

μx=80, μy=85

Saturday, 24 April 2010

Page 22: 25 fin

z−score

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

20 40 60 80 100

μx=80, μy=85

Saturday, 24 April 2010

Page 23: 25 fin

p−value

0.0

0.2

0.4

0.6

0.8

20 40 60 80 100

μx=80, μy=85

Correctly reject null 39% of the time

Saturday, 24 April 2010

Page 24: 25 fin

76

78

80

82

84

x

xxxx

x

x

x

x

x

x

x

x

xx

x

x

x

xxxx

x

xx

x

x

x

x

x

x

xxxx

xxx

xx

xxx

x

x

xxx

x

x

x

xxx

x

x

x

x

xxxx

x

x

x

x

xx

xxx

xxx

x

xx

x

x

x

x

x

x

x

x

x

x

xxx

x

xx

x

x

xx

x

x

xyy

yy

y

y

y

y

y

yy

yy

y

yyy

y

y

y

y

y

y

y

yy

yy

y

y

y

y

y

y

yy

y

yyy

y

yy

y

y

y

yyy

yy

yyy

yyyy

yyy

yy

y

y

yy

yy

y

y

y

y

y

y

y

y

yy

yyyy

y

y

y

yyyy

y

y

y

yyy

y

y

y

y

20 40 60 80 100

μx=μy=80

Saturday, 24 April 2010

Page 25: 25 fin

difference

−5

0

5

20 40 60 80 100

μx=μy=80

Saturday, 24 April 2010

Page 26: 25 fin

z−score

0.0

0.5

1.0

1.5

2.0

2.5

3.0

20 40 60 80 100

μx=μy=80

Saturday, 24 April 2010

Page 27: 25 fin

|difference|

0

2

4

6

8

20 40 60 80 100

μx=μy=80

Saturday, 24 April 2010

Page 28: 25 fin

p−value

0.0

0.2

0.4

0.6

0.8

20 40 60 80 100

μx=μy=80

Incorrectly reject null 6% of the time

Saturday, 24 April 2010

Page 29: 25 fin

Your turn

The average grade from 2009 was 85 and the average grade from 2010 was 90. Would you reject the null hypothesis that the average grade was the same?

Saturday, 24 April 2010

Page 30: 25 fin

Connection to confidence intervals

If you construct a 90% confidence interval, and it doesn’t include the parameter until the null, then the p-value must be > 1 - 0.9 = 0.1.

If the p-value is 0.08, then a 92% or greater confidence interval would include the null parameter, and a smaller confidence interval would not.

Saturday, 24 April 2010

Page 31: 25 fin

Statistics

Saturday, 24 April 2010

Page 32: 25 fin

Majoring3 required stat classes (Stat310, Stat405, Stat410) + 6 stat electives + calc, linear algebra, computing+ design project

Makes for a great double major. Particularly useful if you’re thinking about grad school. (Appealing to employers too)

http://statistics.rice.edu/ShowInterior.aspx?id=58

Saturday, 24 April 2010

Page 33: 25 fin

Minoring

From next year

Three required:Track A: stat310, stat405, stat400/410Track B: stat100, stat280, stat385

Three elective:300 level+, one outside stat if it has strong statistical component

Saturday, 24 April 2010

Page 34: 25 fin

Stat410

Introduction to linear models

Powerful and general statistical tool.

Theory and data.

Offered in Fall.

Saturday, 24 April 2010

Page 35: 25 fin

Stat405

Project based introduction to data analysis. Lots of computing and hardly any maths.

http://had.co.nz/stat405

Offered in Fall, and next year in Spring.

Saturday, 24 April 2010

Page 36: 25 fin

ElectivesSOCI 436 (Houston area survey), 313 (demography)

ECON 340/440 (game theory), 400 (econometrics), 475 (optimisation), 477 (math of economics), 479 (modelling)

STAT 385, 431 (more theory), 420 (process control), 421 (time series), 422 (Bayesian data analysis), 423 (bioinformatics), 453 (biostatistics), 485 (environmental)

Saturday, 24 April 2010

Page 37: 25 fin

One form for me.

One form Xin Zhao, who most of you never met but was the TA in charge of your grading.

No form for Garrett.

Feedback

Saturday, 24 April 2010