Upload
ken-plummer
View
86
Download
0
Tags:
Embed Size (px)
Citation preview
Difference between Normal and Skewed Distributions
This presentation will help you determine if the data set from the problem you are asked to solve has a normal or skewed distribution
This presentation will help you determine if the data set from the problem you are asked to solve has a normal or skewed distribution
Normal
Skewed
What is a distribution?
We will illustrate what a distribution is with a data set that describes the hours students’ study
Here is the data set:
Student Hours of Study
Student Hours of Study
Bart 1
Student Hours of Study
Bart 1Basheba 2
Student Hours of Study
Bart 1Basheba 2Bella 2
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Data
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Data Set
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
From this data set we will create a distribution:
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
The X Axis, will be the number of hours of
study
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2 3
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2 3 4
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2 3 4 5
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2 3 4 5
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2 3 4 5
The Y Axis, indicates the number of times
the same number occurs
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2 3 4 5
The Y Axis, indicates the number of times
the same number occurs
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2 3 4 5
The Y Axis, indicates the number of times
the same number occurs
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2 3 4 5
The Y Axis, indicates the number of times
the same number occurs
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2 3 4 5
The Y Axis, indicates the number of times
the same number occurs
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2 3 4 5
The Y Axis, indicates the number of times
the same number occurs
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2 3 4 5
Number of Occurrences
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2 3 4 5
Num
ber o
f Occ
urre
nces
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2 3 4 5
Num
ber o
f Occ
urre
nces
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2 3 4 5
Num
ber o
f Occ
urre
nces
1
2
3
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2 3 4 5
Num
ber o
f Occ
urre
nces
1
2
3
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2 3 4 5
Num
ber o
f Occ
urre
nces
1
2
3
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2 3 4 5
Num
ber o
f Occ
urre
nces
1
2
3
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2 3 4 5
Num
ber o
f Occ
urre
nces
1
2
3
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2 3 4 5
Num
ber o
f Occ
urre
nces
1
2
3
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2 3 4 5
Num
ber o
f Occ
urre
nces
1
2
3
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2 3 4 5
Num
ber o
f Occ
urre
nces
1
2
3
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2 3 4 5
Num
ber o
f Occ
urre
nces
1
2
3
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2 3 4 5
Num
ber o
f Occ
urre
nces
1
2
3
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2 3 4 5
Num
ber o
f Occ
urre
nces
1
2
3
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2 3 4 5
Num
ber o
f Occ
urre
nces
1
2
3
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2 3 4 5
Num
ber o
f Occ
urre
nces
1
2
3
Student Hours of Study
Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5
Hours of Study1 2 3 4 5
Num
ber o
f Occ
urre
nces
1
2
3
This is a distribution
One way to represent a distribution like this:
One way to represent a distribution like this:
One way to represent a distribution like this:Is like this:
One way to represent a distribution like this:Is like this:
One way to represent a distribution like this:Is like this:
Normal distributions have the majority of the data in
the middle
One way to represent a distribution like this:Is like this:
Normal distributions have the majority of the data in
the middle
One way to represent a distribution like this:Is like this:
With decreasing but equal amounts
toward the tails
One way to represent a distribution like this:Is like this:
With decreasing but equal amounts
toward the tails
With decreasing but equal amounts
toward the tails
The mean or average works really well with normal distributions
Another way to say it, is that the mean describes well the center point of a normal distribution
A Normal Distribution
The Mean
Here is how you calculate the mean:
Let’s put the data into the distribution
21
2
3
3
3
4
4
5
21
2
3
3
3
4
4
5
Mean
21
2
3
3
3
4
4
5
Mean
21
2
3
3
3
4
4
5
Mean
21
2
3
3
3
4
4
5
Mean
21
2
3
3
3
4
4
5
Mean
21
2
3
3
3
4
4
5
Mean
21
2
3
3
3
4
4
5
Mean
21
2
3
3
3
4
4
5
Mean
21
2
3
3
3
4
4
5
Mean
21
2
3
3
3
4
4
5
Mean
21
2
3
3
3
4
4
5
Mean
21
2
3
3
3
4
4
5
Mean
21
2
3
3
3
4
4
5
Mean
Divided by the number of total values
21
2
3
3
3
4
4
5
Mean
Divided by the number of total values
Mean
21
2
3
3
3
4
4
5
21
2
3
3
3
4
4
5
Mean =
21
2
3
3
3
4
4
5
Mean = = 3
21
2
3
3
3
4
4
5
Mean 3
21
2
3
3
3
4
4
5
Mean 3
The mean is a good estimate of the center of a distribution when the distribution is normal
But, the mean is not a good estimate of the center when the distribution is not normal
This is because of what we call OUTLIERS
What is an outlier?
An outlier is a data point that falls outside the overall pattern of the distribution
As an example, here is the overall pattern
As an example, here is the overall pattern
21
2
3
3
3
4
4
5
But what if we changed the 5
But what if we changed the 5
21
2
3
3
3
4
4
5
to a 50
to a 50
21
2
3
3
3
4
4
50
to a 50
21
2
3
3
3
4
4
50
This is an Outlier
To illustrate what happens to the mean when an outlier is present, let’s go back to this distribution:
To illustrate what happens to the mean when an outlier is present, let’s go back to this distribution:
21
2
3
3
3
4
4
5
Let’s say one student, instead of studying five hours studies 23 hours a day!!!!!
Watch what happens to the mean:
Before
Mean =
21
2
3
3
3
4
4
5
After
21
2
3
3
3
4
4
5
21
2
3
3
3
4
4
23
21
2
3
3
3
4
4
23
21
2
3
3
3
4
4
23
21
2
3
3
3
4
4
23
Mean =
21
2
3
3
3
4
4
23
Mean =
21
2
3
3
3
4
4
23
Mean =
Once again, BEFORE
Once again, BEFORE
21
2
3
3
3
4
4
5
Mean =
AFTER
21
2
3
3
3
4
4
23
Mean =
Just by changing one value from “5” to “23” the mean changed by two values (from “3” to “5”)
Thus, the mean is very sensitive to outliers
Therefore, the mean is not a good estimate of the center of a distribution when the distribution is NOT NORMAL
Therefore, the mean is not a good estimate of the center of a distribution when the distribution is NOT NORMAL
Therefore, the mean is not a good estimate of the center of a distribution when the distribution is NOT NORMAL
Therefore, the mean is not a good estimate of the center of a distribution when the distribution is NOT NORMAL
So, how do you know if your data is normally distributed?
Go to the Learning Module entitled: Assessing Skew. You will find it next to the link for this presentation.
So, how do you know if your data is normally distributed?
Go to the Learning Module entitled: Assessing Skew. You will find it next to the link for this presentation.
So, how do you know if your data is normally distributed?
Go to the Learning Module entitled: Assessing Skew. You will find it next to the link for this presentation.
After you have viewed that learning module use SPSS to assess the skew of your data.
Is your data normally distributed or skewed?
If your data was skewed with a critical ratio greater than 2.0 or less than -2.0 then select
Skewed
Otherwise select
Normal
When your distribution is normal then it is appropriate to use the mean as a way to know something about the center point of the distribution.
Normal distributions use the mean in their calculations
Skewed distributions use the median
What is the median?
The median is simply the middle score of a data set where
The median is simply the middle score of a data set where • 50% of the scores fall below it and
The median is simply the middle score of a data set where • 50% of the scores fall below it and • 50% of the scores are above it
To illustrate let’s go back to this distribution:
To illustrate let’s go back to this distribution:
21
2
3
3
3
4
4
5
With the Median we simply determine the mid point:
21
2
3
3
3
4
4
5
Median
21
2
3
3
3
4
4
5
Median
21
2
3
3
3
4
4
5
4 units
Median
21
2
3
3
3
4
4
5
4 units 4 units
Median
21
2
3
3
3
4
4
5
4 units 4 units
Median
21
2
3
3
3
4
4
5
4 units 4 units
Median
21
2
3
3
3
4
4
5
4 units 4 units
This is the Median
Notice that the Median is unaffected by outliers
To illustrate this, we’ll change the value “5” to a “10”:
21
2
3
3
3
4
4
5
21
2
3
3
3
4
4
10
Watch what happens to the median:
21
2
3
3
3
4
4
10
21
2
3
3
3
4
4
10
Median 10
21
2
3
3
3
4
4
10
Median 10 4 units
10
21
2
3
3
3
4
4
10
Median 10 4 units
10
4 units
21
2
3
3
3
4
4
10
Median 10 4 units
10
4 units
Median
21
2
3
3
3
4
4
10
4 units 4 units
Hmm, it’s still 3
But, what if we change the value 10 to 1,000!!!
Watch again what happens to the median:
21
2
3
3
3
4
4
1,000
21
2
3
3
3
4
4
Median 1000
1,000
21
2
3
3
3
4
4
Median 1000 4 units
1,000
21
2
3
3
3
4
4
Median 1000 4 units 4 units
1,000
21
2
3
3
3
4
4
Median 1000 4 units 4 units
1,000
Median
21
2
3
3
3
4
4
4 units 4 units
1,000
What do you know – It’s still 3
Here is the key take away:
The mean is affected by outliers
The mean is affected by outliers
The median is not affected by outliers
Therefore the mean is used with more or less NORMAL DISTRIBUTIONS
Therefore the mean is used with more or less NORMAL DISTRIBUTIONS
And the median is used with SKEWED OR NON-NORMAL DISTRIBUTIONS
And the median is used with SKEWED OR NON-NORMAL DISTRIBUTIONS
And the median is used with SKEWED OR NON-NORMAL DISTRIBUTIONS
And the median is used with SKEWED OR NON-NORMAL DISTRIBUTIONS
So, based on your analysis, which distribution best reflect your data set:
So, based on your analysis, which distribution best reflect your data set:
Normal
Skewed