Upload
phi-slamma-jamma
View
223
Download
0
Embed Size (px)
Citation preview
8/14/2019 Using Real Data For Decision Analysis
1/15
8/14/2019 Using Real Data For Decision Analysis
2/15
Which class would you take?
Next semester, two professors will be teaching
Decision Theory.
VS.
Based on RateMyProfessor.com, the past gradesare posted over the five years.
mean = 1750.82mean = 1770.16 (out of 2000 points)
Which professor would you sign up for?
Dr. UragonnafailDr. Pieceofcake
8/14/2019 Using Real Data For Decision Analysis
3/15
Finding the Distribution
VS.
mean = 1750.82mean = 1770.16 (out of 2000 points)
Dr. Uragonnafail GPA distribution
X
8/14/2019 Using Real Data For Decision Analysis
4/15
Standardizing the axis
Dr. Pieceofcake GPA Distribution
0
1
2
3
4
5
6
1.5 1.591 1.682 1.773 1.864 1.955
Values in Thousands
Valuesx10^-2
Dr. Uragonnafail GPA distribuition
0
1
2
3
4
5
6
7
1.5 1.591 1.682 1.773 1.864 1.955
Values in Thousands
Va
lues
x
10^-3
Which professor would you sign up for now?
mean = 1750.82
var = 63.8635
mean = 1770.16
var = 7.139664
(out of 2000 points)
8/14/2019 Using Real Data For Decision Analysis
5/15
A matter of perspective
Which professor would you want if you
were a 4.0 students?
What if you were a slacker?
Remember: it is not just the centraltendency that is important, the dispersion
is also critical
8/14/2019 Using Real Data For Decision Analysis
6/15
There are a variety of probability models
that can be used to help make decisions
Binomial distribution
Poisson distribution
Normal distribution
Exponential distribution
Beta distribution etc, etc, etc
It is always important to use the correct
distribution to explain your data
BUT
More importantly, it is essential to always
consider the context by which you are
making the decision
8/14/2019 Using Real Data For Decision Analysis
7/15
Caveats in using real data to
make decisions
Fallacy of Averages
Assumptions of normality
Errors in estimations Impact of outliers
Residual Values
8/14/2019 Using Real Data For Decision Analysis
8/15
Fallacy of averages
0
0 .0 5
0 .1
0 .1 5
0 .2
0 .2 5
0 .3
0 .3 5
0 .4
0 .4 5
0 2 4 6 8 1 0 1 2 1 4 1 6 1 8 2 0
Duration of Drug Development
Freq
uencyofDevelopedDrugs
Mean / Average
8/14/2019 Using Real Data For Decision Analysis
9/15
Fallacy of Averages
0
0 .0 5
0 .1
0 .1 5
0 .2
0 .2 5
0 .3
0 .3 5
0 .4
0 .4 5
0 2 4 6 8 1 0 1 2 1 4 1 6 1 8 2 0
Duration of Drug Developme nt
FrequencyofDeve
loped
Drugs
Mean Duration of Typical Process
Mean Duration of
Complex Processes
A simple case of fallacy of averages is the case of overall height of a specific
population. Men and Women have natural bimodal distributions, but combinedhave a normal distribution.
8/14/2019 Using Real Data For Decision Analysis
10/15
Assumptions of Normality
Normal distributions are not always bell
shapedRequirements for
normal distribution
symmetrical
68, 95, 99 rule
Not all distributions are normal, few are perfectly normally
distributed
Much of decision analysis is based on assumptions of the normal curve to make
calculations easier. It is important to understand the limitations of the normal
curve when basing your decisions off of it.
8/14/2019 Using Real Data For Decision Analysis
11/15
Errors in Estimations
Some linear-
regressions force
the line to the 0-0
point .
(meaning:x-intercept = 0
y-intercept = 0 )
When creating
regressions to the
data, have an
understanding of therange to which your
estimation is
reasonable.
Would it make sense to have a the regression line go be forced through the 0-0 pointin this case?
8/14/2019 Using Real Data For Decision Analysis
12/15
Impact of Outliers
Outliers can bias the regression estimation to
accommodate the extreme data point(s). What happens if you add Maudie Hopkins (19) and William Cantrell (86) [last civil
war widow] & Anna Nicole Smith (26) to J. Howard Marshal II (89)? Or, more
recently, Demi Moore (42) to Ashton Kutcher (27)?
8/14/2019 Using Real Data For Decision Analysis
13/15
Impact of Outliers
Outliers can also give a false sense of correlation between two variables. In a
correlation test, the strength of the relationship between the husbands age and the
wifes age would be incorrectly accentuated. This would reflect the incorrect
observation pertaining the compactness of the dataset.
8/14/2019 Using Real Data For Decision Analysis
14/15
Residual value
Two datasets can give identical regression estimations. The graph on the
right has a larger residual value therefore there is a presence of greater
error. In other words, there is not as strong of a relationship of
predicting values of y from x.
8/14/2019 Using Real Data For Decision Analysis
15/15
Things to remember about real
data
It is messy! It is not always memoryless
i.e., the recent past really is a better indicator offuture performance
It can have many outliers that are: Important indications of new trends, or Oddities that should be eliminated
There is a major difference between
statistical significance and practicalsignificance Dont just look at the statistical results, look at
the data itself!