Statistical Models Explored and Explained

Preview:

Citation preview

Speakers

Statistical Models, Explored and Explained

Sara Vafi, Stats Expert, OptimizelyShana Rusonis, Product Marketing, Optimizely

Today’s Speakers

Sara Vafi Shana Rusonis

Housekeeping• We’re recording!• Slides and recording will

be emailed to you tomorrow

• Time for questions at the end

Agenda• Bayesian & Frequentist Statistics • Error Control - Average vs. All Error Control• Bayes Rule• Benefits & Risks • Optimizely Stats Engine• Q&A

Why Do We Experiment?

● Experimentation is essential for learning● Try new ideas without fear of failure● Give your business a signal to act on

in a sea of noisy data

What’s most Important to You?

● Running experiments quickly● But also reporting on results accurately● When not all statistical solutions are created

equal

Types of Statistical Methods

BayesianOR

Frequentist

Bayesian Statistics● Bayesian statistics take a more bottom-up approach to data

analysis● Our parameters are unknown● The data is fixed● There is a prior probability● “Opinion-based”

“A Bayesian is one who, vaguely

expecting a horse, and catching a

glimpse of a donkey, strongly

believes he has seen a mule.”

Source

Frequentist Statistics● Frequentist arguments are more counter-factual in nature● Parameters remain constant during the repeatable sampling

process● Resemble the type of logic that lawyers use in court● ‘Is this variation different from the control?’ is a basic building

block of this approach.

Example Dan & Pete Rolling a 6-Sided DieScenario:● Pete will roll a die and the outcome can either be 1, 2, 3,

4, 5, or 6● If Pete rolls a 4, he will give Dan $1 million

If Dan was a Bayesian statistician, how would he react? If Dan was a Frequentist statistician, how would he react?

ExampleProbability of the sun exploding

Source● Frequentist, relies on

probability● Bayesian, relies on prior

knowledge

Error Control

Error Control Explained● The likelihood that the observed result of an experiment happened by

chance, rather than a change that you introduced● When we set the statistical significance on an experiment to 90%, that

means there's a 10% chance of a statistical error, or a 1 in 10 chance that the result happened by chance

Average Error Control

● Corresponds to Bayesian A/B Testing

● Less useful for iterating on test results

● Harder to learn from individual experiments with confidence

All Error Control

● Corresponds to Frequentist A/B Testing

● Any experiment will have less than a 10% chance of a mistake

● Rate of errors is 1 in 10

Average Error Control vs. All Error Control

● Average error control leads to lower accuracy for small

improvements

● All error control is accurate for all users

● There are certain cases where average error control is an

appropriate alternative

Error Rates for Experiments

Bayes Rule

Average Error Control & Bayesian A/B Testing

● Requires two sources of randomness• Randomness or “noise” in the data

• The makeup of the “typical” experiment group

● Distribution over experiment improvements

Different Beliefs in Composition of ‘Typical’ Experiments

Bayes Rule

Bayes Rule & Bayesian A/B Testing

Bayes Rule & Average Error Value

Recap Average Error Control

Bayesian A/B Testing

Prior Distributions

Bayes Rule

All Error Control is Frequentist A/B Testing

● All error control corresponds to Frequentist AB testing

● We want to aim to control the false positive rate

● Chance an experiment is either called a winner or loser

Benefits & Risks

Shana Rusonis
Suggestion: take a moment to say - These concepts are very theoretical so far - why would you adopt either method? I'm going to cover what the *business* benefits are of either method and the problems that they help to avoid.
Shana Rusonis
Maybe we add a slide here about what businesses are looking for: speed and accuracy
Shana Rusonis
Also, TIME - you only have a finite amount of time in the day, visitors coming to your website or app - how are you going to maximize the speed and accuracy of your experiments to make the most of your time?
Sara Vafi
where would we add the slide?+shana@optimizely.com +julie@optimizely.com
Julie Ritchie
I would add it as the first slide in this section

Benefits of Bayesian A/B Testing

● Average error control can be very attractive

● Helps solve the “peeking” problem

● Average error control is fast

Risks of Bayesian A/B Testing

● It’s more appealing but it’s risky in practice

● Smaller improvement experiments with fast results = high risk

● Higher error rate than the method actually suggests

Benefits of Frequentist A/B Testing● This type of test will make fewer mistakes on experiments

with non-zero improvements ● The rate of errors will be less than 1 in 10● Option to speed up experimentation by using a prior

Learning from A/B Tests

Learning from A/B Tests

Risk Involved with Typical Realistic Experiments

Realistic Bayesian A/B Tests vs. Stats Engine

● The hardest experiments to call correctly are those with small improvements

● A/B testing in the wild is not easy● We need more and more data in order to...

So what does this mean?

Stats Engine

Stats EngineTM

Results are valid whenever you check

Avoid costly statistics errors

Measure real-time resultswith confidence

Julie Ritchie
+shana@optimizely.com can you please add content to this slide before tomorrow's dry run? thanks!

Key Takeaways

● Bayesian vs. Frequentist methods● All error control vs. average error control● Blended approach leads to greater confidence

QUESTIONS?

THANK YOU!

Appendix

Attic and button example

Attic and button example cont. In relation to all error

control

Attic and button example cont. In relation to Average error

control

How to define a Bayesian AB test *FIX THIS SLIDE*

Trade offs with Bayesian AB testingHigh improvement > low improvement

Bayesian A/B testing is average error control

Introduction slide about what topics will be covered

SARA’S SLIDES

Results are valid whenever you check

Avoid costly statistics errors

Measure real-time resultswith confidence

Stats EngineTM

Recommended