Upload
hadley-wickham
View
693
Download
5
Tags:
Embed Size (px)
Citation preview
Hadley Wickham
Stat310Fin
Saturday, 24 April 2010
To those of you who bought your textbooks from my amazon link.
To the textbook publishers who generously sent me free copies of books.
To Kensey for suggesting chik-fil-a
Thank you!
Saturday, 24 April 2010
1. Eat!
2. Final & help sessions
3. Finish off hypothesis testing
4. Other statistics opportunities
5. Feedback (TA & me)
Saturday, 24 April 2010
Final
Saturday, 24 April 2010
FinalTake home. Two hours long. Three (double-sided) pages of notes.
Available Wednesday April 28 9am.Due Wednesday May 5, 5pm, under my door.
Ten small questions of approximately equal weight. Similar to questions from the homework/book.
Saturday, 24 April 2010
Common themes
Probability of an event.
Independence & conditioning.
Distributions: pdf/pmf, cdf, mgf, named.
Transformations.
Sampling distribution of mean and variance.
Estimation and testing.
Philosophy of gradingSaturday, 24 April 2010
Mon, Tue, Wed, Thurs, Fri, Sat, Sun?
Morning or afternoon?
One-on-one help, plus brief revision of topics of particular interest. Suggest and vote at http://goo.gl/mod/joIx
Help sessions
Saturday, 24 April 2010
Honour code
Remember to pledge your exam, and note the time at which you started and ended.
You may refer only to your note sheets, not to the text book or old homeworks etc.
Saturday, 24 April 2010
Hypothesis testing
Saturday, 24 April 2010
Course grades
Assume I took a random sample of 20 students from each years, and that course grades are normally distributed by variance 80.
What is the distribution of difference of the two group means?
Saturday, 24 April 2010
Your turn
The average grade from 2009 was 85 and the average grade from 2010 was 90.
What is the p-value? (The probability that you’d see a difference this large or large if there really was no difference in the population means)
Saturday, 24 April 2010
1. Write down Ho and Ha (positions of defence and prosecution)
2. Figure out good test statistic (what numeric summary?)
3. Work out null distribution (distribution of innocents)
4. Calculate p-value by comparing actual value to null distribution (what proportion of true innocents look more guilty than the suspect)
5. Reject Ho if p-value smaller than cutoff
Saturday, 24 April 2010
Say is guilty
Say is innocent
Is guilty
Is innocent
CorrectFalse
acquittal
False conviction
Correct
Saturday, 24 April 2010
Your turn
Which type of error is more expensive/more costly/worse in the criminal justice system?
Saturday, 24 April 2010
Reject HO Accept HO
HO false
HO true
CorrectType II error
Type I error
Correct
Saturday, 24 April 2010
For a given test,
P(false conviction) = α = significance level
P(false acquittal) = 1 - ββ = power
What do think happens to β if you try to make α smaller?
Rates
Saturday, 24 April 2010
α↑ β↓α↓ β↑
Saturday, 24 April 2010
Cut off
Choose cut-off based on rate of false convictions.
If you want a 5% rate of false convictions, reject Ho if the p-value is less than 0.05. (This is the industry standard rate)
Can work out power.
Saturday, 24 April 2010
76
78
80
82
84
86
88
90
xx
x
x
xxxxx
xx
x
x
xxxxxx
xxx
x
x
x
x
xx
x
x
x
x
x
x
xxx
x
xxx
x
xxx
xx
xxxx
xx
x
xx
x
xx
xx
x
x
xxxx
x
x
x
x
xxxxxx
x
x
xx
x
x
x
xx
x
xx
xx
xx
x
x
xxxxx
y
yy
yyy
y
y
yyyyy
yy
yy
yy
y
yy
yy
yy
y
yyyy
y
y
y
yy
y
y
y
y
y
y
y
yy
yyy
yyy
y
y
yy
y
y
y
y
yyy
y
y
y
yy
yy
y
y
y
yyyyy
y
yy
y
y
y
yyy
y
y
y
yyy
y
y
yyy
yyy
20 40 60 80 100
μx=80, μy=85
Saturday, 24 April 2010
Difference
−2
0
2
4
6
8
10
20 40 60 80 100
μx=80, μy=85
Saturday, 24 April 2010
|Difference|
0
2
4
6
8
10
20 40 60 80 100
μx=80, μy=85
Saturday, 24 April 2010
z−score
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
20 40 60 80 100
μx=80, μy=85
Saturday, 24 April 2010
p−value
0.0
0.2
0.4
0.6
0.8
20 40 60 80 100
μx=80, μy=85
Correctly reject null 39% of the time
Saturday, 24 April 2010
76
78
80
82
84
x
xxxx
x
x
x
x
x
x
x
x
xx
x
x
x
xxxx
x
xx
x
x
x
x
x
x
xxxx
xxx
xx
xxx
x
x
xxx
x
x
x
xxx
x
x
x
x
xxxx
x
x
x
x
xx
xxx
xxx
x
xx
x
x
x
x
x
x
x
x
x
x
xxx
x
xx
x
x
xx
x
x
xyy
yy
y
y
y
y
y
yy
yy
y
yyy
y
y
y
y
y
y
y
yy
yy
y
y
y
y
y
y
yy
y
yyy
y
yy
y
y
y
yyy
yy
yyy
yyyy
yyy
yy
y
y
yy
yy
y
y
y
y
y
y
y
y
yy
yyyy
y
y
y
yyyy
y
y
y
yyy
y
y
y
y
20 40 60 80 100
μx=μy=80
Saturday, 24 April 2010
difference
−5
0
5
20 40 60 80 100
μx=μy=80
Saturday, 24 April 2010
z−score
0.0
0.5
1.0
1.5
2.0
2.5
3.0
20 40 60 80 100
μx=μy=80
Saturday, 24 April 2010
|difference|
0
2
4
6
8
20 40 60 80 100
μx=μy=80
Saturday, 24 April 2010
p−value
0.0
0.2
0.4
0.6
0.8
20 40 60 80 100
μx=μy=80
Incorrectly reject null 6% of the time
Saturday, 24 April 2010
Your turn
The average grade from 2009 was 85 and the average grade from 2010 was 90. Would you reject the null hypothesis that the average grade was the same?
Saturday, 24 April 2010
Connection to confidence intervals
If you construct a 90% confidence interval, and it doesn’t include the parameter until the null, then the p-value must be > 1 - 0.9 = 0.1.
If the p-value is 0.08, then a 92% or greater confidence interval would include the null parameter, and a smaller confidence interval would not.
Saturday, 24 April 2010
Statistics
Saturday, 24 April 2010
Majoring3 required stat classes (Stat310, Stat405, Stat410) + 6 stat electives + calc, linear algebra, computing+ design project
Makes for a great double major. Particularly useful if you’re thinking about grad school. (Appealing to employers too)
http://statistics.rice.edu/ShowInterior.aspx?id=58
Saturday, 24 April 2010
Minoring
From next year
Three required:Track A: stat310, stat405, stat400/410Track B: stat100, stat280, stat385
Three elective:300 level+, one outside stat if it has strong statistical component
Saturday, 24 April 2010
Stat410
Introduction to linear models
Powerful and general statistical tool.
Theory and data.
Offered in Fall.
Saturday, 24 April 2010
Stat405
Project based introduction to data analysis. Lots of computing and hardly any maths.
http://had.co.nz/stat405
Offered in Fall, and next year in Spring.
Saturday, 24 April 2010
ElectivesSOCI 436 (Houston area survey), 313 (demography)
ECON 340/440 (game theory), 400 (econometrics), 475 (optimisation), 477 (math of economics), 479 (modelling)
STAT 385, 431 (more theory), 420 (process control), 421 (time series), 422 (Bayesian data analysis), 423 (bioinformatics), 453 (biostatistics), 485 (environmental)
Saturday, 24 April 2010
One form for me.
One form Xin Zhao, who most of you never met but was the TA in charge of your grading.
No form for Garrett.
Feedback
Saturday, 24 April 2010