Upload
ta-wei-huang
View
238
Download
0
Embed Size (px)
Citation preview
Test Your Hypothesis and Make the Decision
Intro to Hypothesis Testing
David Huang
Intern Data Scientist
Yoctol Info.
3
A random variable 𝑿 is a real-valued function from the sample
space 𝛀 to the real line ℝ.
Random Variable 𝑿
Yoctol’s Stock Price
Real-valued functionSample Space, 𝛀
All internal and external
factors of Yoctol Info.
ℙ: probability measure
of the sample space
Real Line ℝ$100 $150
ℙ𝑋 : induced probability
measure of 𝑋
The induced probability
measure of 𝑋 is defined by
ℙ𝑋 𝑋 ∈ 𝐴 = ℙ 𝑋−1(𝐴) .
4
It’s hard to define ℙ𝑋 , so we use alternative 2 functions to
describe the probability distribution of a random variable.
Probability Density Function
The pdf of 𝑋 is a function 𝑓 such that
ℙ𝑋 𝑎 < 𝑋 ≤ 𝑏 = 𝑎𝑏𝑓 𝑥 𝑑𝑥.
Cumulative Distribution Function
The cdf of 𝑋 is a function 𝐹 such that
𝐹 𝑥 = ℙ𝑋 −∞ < 𝑋 ≤ 𝑏 .
5
Characterize a distribution by its mean and variance
Mean / Expectation (Centrality measure)
The expectation of 𝑋 is defined by
𝔼 𝑋 = 𝑋𝑑ℙ𝑋 = ∞−∞𝑥𝑓 𝑥 𝑑𝑥.
Variance (Variation measure)
The variance of 𝑋 is defined by
𝑉𝑎𝑟 𝑋 = 𝔼 𝑋 − 𝔼 𝑋 2.
6
Parametric inferential model – estimate and inference the
parameter 𝜽 ∈ 𝚯 of the assumed distribution.
Population
𝑋1 = 𝑥1 𝑋2 = 𝑥2 𝑋3 = 𝑥3 𝑋𝑛 = 𝑥𝑛
Random Sample:
𝑋1, … , 𝑋𝑛 are identically and independently distributed
from a pdf 𝑓 𝑥 𝜽 , where 𝜽 ∈ 𝚯 is the parameter.
7
3 types of traditional statistical inference problems (1)
Survey sampling – identical and independent distribution
Population
𝑋1 = 1 𝑋2 = 0 𝑋3 = 1 𝑋𝑛 = 1
𝑋𝑖 = 1 if agree
𝑋𝑖 = 0 if disagree
Random Sample:
𝑋1, … , 𝑋𝑛 are identically and independently distributed
(Here, we assume 𝑋𝑖 ~ 𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖(𝑝).)
8
3 types of traditional statistical inference problems (2)
Multivariate analysis – relationship among variables
Population
𝑋11 = 5
𝑋21 = 3
𝑋𝑝1 = 2
Random Sample:
𝑋𝑖1, … , 𝑋𝑖𝑛 are identically and independently distributed
for all variables 𝑖 = 1,2, … , 𝑝.
𝑋12 = 1
𝑋22 = 3
𝑋𝑝2 = 4
𝑋13 = 2
𝑋23 = 1
𝑋𝑝3 = 4
𝑋1𝑛 = 5
𝑋2𝑛 = 5
𝑋𝑝𝑛 = 5
Group 1 Group 2 Group K
• Reduce the dimension
of all variables
• Extract common factors
of all variables
• Understand the effect of
the group
• Cluster individuals to
several groups
9
3 types of traditional statistical inference problems (3)
Sequential decision and time series – order is important
Sample
Space
Time 1
𝑋11 = 5
𝑋21 = 3
𝑋31 = 5
Time 2
𝑋12 = 4
𝑋22 = 2
𝑋32 = 3
Time t
𝑋1𝑡 = 2
𝑋2𝑡 = 6
𝑋3𝑡 = 2
Serial correlation𝑐𝑜𝑟(𝑋1𝑡 , 𝑋1𝑡−𝑖)
Serial correlation𝑐𝑜𝑟(𝑋2𝑡 , 𝑋2𝑡−𝑖)
Serial correlation𝑐𝑜𝑟(𝑋3𝑡 , 𝑋3𝑡−𝑖)
Variable
correlation
11
Hypothesis testing is a concept to test our “knowledge” with
empirical data sets.
Null Hypothesis
Oppose to alternative hypothesis.
𝐻0: 𝑝𝑜𝑙𝑑 ≥ 𝑝𝑛𝑒𝑤
Θ0 =𝑝𝑜𝑙𝑑 , 𝑝𝑛𝑒𝑤 :
0 < 𝑝𝑜𝑙𝑑 , 𝑝𝑛𝑒𝑤 < 1,𝑝𝑜𝑙𝑑≤ 𝑝𝑛𝑒𝑤
Parameter Space 𝚯
𝑝𝑜𝑙𝑑 , 𝑝𝑛𝑒𝑤 :0 < 𝑝𝑜𝑙𝑑 < 1, 0 < 𝑝𝑛𝑒𝑤 < 1
Alternative Hypothesis
The hypothesis we want to test.
𝐻1: 𝑝𝑜𝑙𝑑 > 𝑝𝑛𝑒𝑤
Θ1 =𝑝𝑜𝑙𝑑 , 𝑝𝑛𝑒𝑤 :
0 < 𝑝𝑜𝑙𝑑 , 𝑝𝑛𝑒𝑤 < 1,𝑝𝑜𝑙𝑑> 𝑝𝑛𝑒𝑤
12
Hypothesis testing is a concept to test our “knowledge” with
empirical data sets.
Null Hypothesis
Oppose to alternative hypothesis.
𝐻0: 𝑝𝑜𝑙𝑑 ≥ 𝑝𝑛𝑒𝑤
Θ0 =𝑝𝑜𝑙𝑑 , 𝑝𝑛𝑒𝑤 :
0 < 𝑝𝑜𝑙𝑑 , 𝑝𝑛𝑒𝑤 < 1,𝑝𝑜𝑙𝑑≤ 𝑝𝑛𝑒𝑤
Parameter Space 𝚯
𝑝𝑜𝑙𝑑 , 𝑝𝑛𝑒𝑤 :0 < 𝑝𝑜𝑙𝑑 < 1, 0 < 𝑝𝑛𝑒𝑤 < 1
Alternative Hypothesis
The hypothesis we want to test.
𝐻1: 𝑝𝑜𝑙𝑑 > 𝑝𝑛𝑒𝑤
Θ1 =𝑝𝑜𝑙𝑑 , 𝑝𝑛𝑒𝑤 :
0 < 𝑝𝑜𝑙𝑑 , 𝑝𝑛𝑒𝑤 < 1,𝑝𝑜𝑙𝑑> 𝑝𝑛𝑒𝑤
13
Two important things for a statistical hypothesis test –
“Test Statistic” and “Rejection Region”
All possible samples
Reject
𝐻0: 𝑝𝑜𝑙𝑑 ≥ 𝑝𝑛𝑒𝑤
Don’t Reject
𝐻0: 𝑝𝑜𝑙𝑑 ≥ 𝑝𝑛𝑒𝑤
Our sample give us a strong
evidence that H0 is false.
→ Our test statistic lies in RR
(rejection region)
Our sample do not give a strong
enough evidence that H0 is false.
→ Our test statistic lies in RR
(rejection region)
Question –
1. How to determine the test statistic?
2. How to determine the rejection region?
14
We commonly use likelihood ratio as the test statistic.
Our Dataset
H0 is true H0 is false
Observed
Data
Likelihood Ratio Statistics
Likelihood under H0 divided by
Likelihood under H1
Reject
Boundary
Probability
Density
15
Two types of testing errors – Type I Error and Type II Error
H0 is true H0 is false
Accept H0 Correct Type II Error
Reject H0 Type I Error Correct
𝛼 = ℙ 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 Θ0
𝛽 = ℙ 𝐴𝑐𝑐𝑒𝑝𝑡 𝐻0 Θ1
Power = 1 − 𝛽
Our Dataset
H0 is trueH0 is false
Reject
Boundary
ProbabilityDensity
16
Two types of testing errors – Type I Error and Type II Error
Our Dataset
H0 is trueH0 is false
Reject
Boundary
H0 is true H0 is false
Accept H0 Correct Type II Error
Reject H0 Type I Error Correct
𝛼 = ℙ 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 Θ0
𝛽 = ℙ 𝐴𝑐𝑐𝑒𝑝𝑡 𝐻0 Θ1
Power = 1 − 𝛽
ProbabilityDensity
17
Neyman-Pearson Paradigm: Controlling the significance level,
i.e. false-negative probability
Our Dataset
H0 is true
H0 is false
Reject Boundary
Control the risk
𝛼 = ℙ 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 Θ0
Probability
Density
18
Use the observed significance level “p-value” to test
your hypothesis
Our Dataset
Probability
Density
H0 is true
Observed
Data
Reject
Boundary
Area = P-value
𝜶
19
Use the observed significance level “p-value” to test
your hypothesis
Our Dataset
H0 is true Observed
Data
Reject
Boundary
Area = P-value
𝜶
Probability
Density
New Version
21
Example: Statistical setting for A/B testing
Population
𝑋11 = 1 𝑋12 = 0 𝑋1𝑛 = 1 𝑋21 = 1 𝑋22 = 0 𝑋2𝑛 = 1
Old Version
𝑋11, … , 𝑋1𝑛 are i.i.d. from 𝐵𝑒𝑟 𝑝𝑜𝑙𝑑 𝑋21, … , 𝑋2𝑛 are i.i.d. from 𝐵𝑒𝑟 𝑝𝑛𝑒𝑤
𝑋𝑖𝑗 = 1 if click
𝑋𝑖𝑗 = 0 if not click
Randomized Control Experiment
22
Example: Why we need “statistical” hypothesis testing?
Test
Formats
Test
Algorithms
0
1
2
3
20
16
/4/4
20
16
/4/6
20
16
/4/8
20
16
/4/1
0
20
16
/4/1
2
2016/4
/14
2016/4
/16
20
16
/4/1
8
20
16
/4/2
0
20
16
/4/2
2
20
16
/4/2
4
20
16
/4/2
6
20
16
/4/2
8
20
16
/4/3
0
20
16
/5/2
20
16
/5/4
20
16
/5/6
20
16
/5/8
20
16
/5/1
0
20
16
/5/1
2
20
16
/5/1
4
20
16
/5/1
6
Relative CTR for the Recommendation System
New Version Old Version
23
The Central Limit Theorem allows as to use the “Z-test” for
the testing problem.
Assume (𝑋1, … 𝑋𝑛 ) is a random sample from a population distribution with
mean and variance exists, and denote ത𝑋𝑛 =𝑋1+ 𝑋2+⋯+𝑋𝑛
𝑛. Then, when the
sample size 𝑛 is large, ത𝑋𝑛−𝔼 𝑋𝑖
𝑉𝑎𝑟 𝑋𝑛𝑛
has an approximately N(0,1) distribution.
Central Limit Theorem
• 𝑋11, … , 𝑋1𝑛 are i.i.d. from 𝐵𝑒𝑟 𝑝𝑜𝑙𝑑 → Ƹ𝑝𝑜𝑙𝑑 =𝑋11+ 𝑋12+⋯+𝑋1𝑛
𝑛~𝐴𝑁 𝑝𝑜𝑙𝑑 ,
𝑝𝑜𝑙𝑑(1−𝑝𝑜𝑙𝑑)
𝑛
• 𝑋21, … , 𝑋2𝑛 are i.i.d. from 𝐵𝑒𝑟 𝑝𝑛𝑒𝑤 → Ƹ𝑝𝑛𝑒𝑤 =𝑋21+ 𝑋22+⋯+𝑋2𝑛
𝑛~𝐴𝑁 𝑝𝑛𝑒𝑤 ,
𝑝𝑛𝑒𝑤(1−𝑝𝑛𝑒𝑤)
𝑛
• Ƹ𝑝𝑜𝑙𝑑 − Ƹ𝑝𝑛𝑒𝑤 ~ 𝐴𝑁 0,2
𝑛Ƹ𝑝 (1 − Ƹ𝑝) , where Ƹ𝑝 =
𝑋11+ 𝑋12+⋯+𝑋1𝑛+𝑋21+ 𝑋22+⋯+𝑋2𝑛
2𝑛.
• We can use normal distribution to solve our testing problem.
24
My proposed test schema –
Two-sample Z test to understand the effect of our new version
When the null hypothesis is true, the test statistic ො𝑝𝑜𝑙𝑑− ො𝑝𝑛𝑒𝑤−0
2
𝑛ො𝑝 (1− ො𝑝)
has an approximately
standard normal distribution. So, we conduct a
Test Statistic
Alternative Hypothesis
𝐻1: 𝑝𝑜𝑙𝑑 − 𝑝𝑛𝑒𝑤 > 0
Null Hypothesis
𝐻0: 𝑝𝑜𝑙𝑑 − 𝑝𝑛𝑒𝑤 ≤ 0
Insight – During the test period (4/12 – 4/24), CTRs of the new version
are (statistically) significantly larger than those of the old version for
10 days. So we have the confidence to use new version.
26
Test the Granger Causality –
Root-cause Analysis of Time Series Data
Cause
Series
Result
Series
𝑐𝑜𝑟𝑟 𝑥𝑟𝑒𝑠𝑢𝑙𝑡,𝑡, 𝑥𝑐𝑎𝑢𝑠𝑒, 𝑡−𝑙𝑎𝑔 ≠ 0 significantly
𝑐𝑜𝑟𝑟 𝑥𝑟𝑒𝑠𝑢𝑙𝑡,𝑡−𝑙𝑎𝑔, 𝑥𝑐𝑎𝑢𝑠𝑒, 𝑡 ≠ 0
not significantly
28
Example: Hypothesis Test for Autocorrelation
• Autocorrelation Function - 𝐴𝐶𝐹 𝑙𝑎𝑔 = 𝑐𝑜𝑟𝑟 𝑜𝑡, 𝑜𝑡−𝑙𝑎𝑔
• The results show a strong “long-memory” property => nonstationary process
30
Test the Granger Causality –
Root-cause Analysis of Time Series Data
Cause
Series
Result
Series
𝑐𝑜𝑟𝑟 𝑥𝑟𝑒𝑠𝑢𝑙𝑡,𝑡, 𝑥𝑐𝑎𝑢𝑠𝑒, 𝑡−𝑙𝑎𝑔 ≠ 0 significantly
𝑐𝑜𝑟𝑟 𝑥𝑟𝑒𝑠𝑢𝑙𝑡,𝑡−𝑙𝑎𝑔, 𝑥𝑐𝑎𝑢𝑠𝑒, 𝑡 ≠ 0
not significantly