Upload
lily-jennings
View
224
Download
3
Tags:
Embed Size (px)
Citation preview
A Brief History of Statistics
Medieval Times: Dice and Gambling
Modern Times: Dice and Games/Gambing
Dice Probabilities
16
= 16.7%
1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
136 = 2.78%
636
= 16.78%
Dice Outcome are Independent
Sum
Dice Probabilities
1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
Probability Distribution
Blaise Pascal
1600’s: Probability & Gambling
one "6" in four rolls one double-six in 24 throws
Do these have equal probabilities?
Chevalier de Méré1623 - 16621607 - 1684
Binomial / Bernoulli Distribution
1654-1705
Binomial Distribution• The principal reason for using a normal curve test on a dichotomy has been
the past difficulty of calculating the exact binomial distribution.
1761: Bayes Formula
Probability Distribution
New Data
ProbabilityFemale
ProbabilityMale
Height of the Person
=
DataPrior (X) Prior (X)
DataPrior (X)
60 67.5 75
=
Gender
Prior (X)
Child Height
66.5
1701 - 1761
Bayesian Formulas – ExcelD
Google Ngram Viewer• Ngram: word or string in a corpus• Corpus: a large or complete collection of writings
• Team of researchers from Harvard, Google, Encyclopaedia Britannica, and the American Heritage Dictionary
• Analyzed 5 million books from 1500 to 2008• 500 billion unique words• ~4% of all books ever published
Bayes, Bayesian
1800 1900 20001760
Ngram Viewer: “statistics”
1800 1900 2000
Observation on Height
• Adolphe Quételet (1796-1874)• Mid 1800’s studied Social Data, Crime• ‘Quetelet Index’: Weight / Height• Now known as the “Body Mass Index”
"The average person"
Normal
1800 1900 2000
1st Regression Line - 1877
The first “Regression Line”
1822 - 1911
“statistics”, “correlation” “regression”
1800 1900 2000
statistics correlation regression
“Standard Deviation”
1800 1900 2000
Tukey
1915 – 2000
He introduced the box plot in his 1977 book, "Exploratory Data Analysis".
3
1800 1900 2000
Ngram Viewer: “sliderule”
``
1800 1900 2000
Ngram Viewer: “calculator”
Ngram Viewer: “computer”, “internet”
Machine Learning
Ngram Viewer: “chi square”
chi-square test vs. z-test on a proportion
Two-tailed Z-test for two proportions (using a pooled estimate of p) and a chi-square test for a 2-by-2 table will give exactly same P-value.