View
122
Download
0
Tags:
Embed Size (px)
Citation preview
Difference between Normal and Skewed Distributions
This presentation will help you determine if the data set from the problem you are asked to solve has a normal or skewed distribution
This presentation will help you determine if the data set from the problem you are asked to solve has a normal or skewed distribution
Normal
Skewed
Knowing if your data’s distribution is skewed or normal is the second way of knowing if you will use what is called a parametric or a nonparametric test
The first way (as you may recall from the last decision point) is to determine if the data is scaled, ordinal, or nominal
But first,
What is a distribution?
We will illustrate what a distribution is with a data set that describes the hours students’ study
Here is the data set:
Student Hours of Study
Student Hours of Study
Bart 1
Student Hours of Study
Bart 1
Basheba 2
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Data
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Data Set
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
From this data set we will create a distribution:
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
The X Axis, will be the number of hours of
study
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2 3
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2 3 4
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2 3 4 5
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2 3 4 5
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2 3 4 5
The Y Axis, indicates the number of times
the same number occurs
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2 3 4 5
The Y Axis, indicates the number of times
the same number occurs
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2 3 4 5
The Y Axis, indicates the number of times
the same number occurs
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2 3 4 5
The Y Axis, indicates the number of times
the same number occurs
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2 3 4 5
The Y Axis, indicates the number of times
the same number occurs
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2 3 4 5
The Y Axis, indicates the number of times
the same number occurs
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2 3 4 5
Number of Occurrences
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2 3 4 5
Nu
mb
er
of
Occ
urr
en
ces
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2 3 4 5
Nu
mb
er
of
Occ
urr
en
ces
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2 3 4 5
Nu
mb
er
of
Occ
urr
en
ces
1
2
3
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2 3 4 5
Nu
mb
er
of
Occ
urr
en
ces
1
2
3
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2 3 4 5
Nu
mb
er
of
Occ
urr
en
ces
1
2
3
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2 3 4 5
Nu
mb
er
of
Occ
urr
en
ces
1
2
3
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2 3 4 5
Nu
mb
er
of
Occ
urr
en
ces
1
2
3
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2 3 4 5
Nu
mb
er
of
Occ
urr
en
ces
1
2
3
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2 3 4 5
Nu
mb
er
of
Occ
urr
en
ces
1
2
3
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2 3 4 5
Nu
mb
er
of
Occ
urr
en
ces
1
2
3
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2 3 4 5
Nu
mb
er
of
Occ
urr
en
ces
1
2
3
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2 3 4 5
Nu
mb
er
of
Occ
urr
en
ces
1
2
3
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2 3 4 5
Nu
mb
er
of
Occ
urr
en
ces
1
2
3
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2 3 4 5
Nu
mb
er
of
Occ
urr
en
ces
1
2
3
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2 3 4 5
Nu
mb
er
of
Occ
urr
en
ces
1
2
3
Student Hours of Study
Bart 1
Basheba 2
Bella 2
Bob 3
Boston 3
Bunter 3
Buxby 4
Bybee 4
Bwinda 5
Hours of Study1 2 3 4 5
Nu
mb
er
of
Occ
urr
en
ces
1
2
3
This is a distribution
One way to represent a distribution like this:
One way to represent a distribution like this:
One way to represent a distribution like this:
Is like this:
One way to represent a distribution like this:
Is like this:
One way to represent a distribution like this:
Is like this:Normal distributions have the majority of the data in
the middle
One way to represent a distribution like this:
Is like this:Normal distributions have the majority of the data in
the middle
One way to represent a distribution like this:
Is like this:
With decreasing but equal amounts
toward the tails
One way to represent a distribution like this:
Is like this:
With decreasing but equal amounts
toward the tails
With decreasing but equal amounts
toward the tails
The mean or average works really well with normal distributions
Another way to say it, is that the mean describes well the center point of a normal distribution
A Normal Distribution
The Mean
Here is how you calculate the mean:
Let’s put the data into the distribution
21
2
3
3
3
4
4
5
21
2
3
3
3
4
4
5
Mean =
21
2
3
3
3
4
4
5
Mean =
21
2
3
3
3
4
4
5
Mean =
21
2
3
3
3
4
4
5
Mean =𝟏
21
2
3
3
3
4
4
5
Mean =1+𝟐
21
2
3
3
3
4
4
5
Mean =1+2+𝟐
21
2
3
3
3
4
4
5
Mean =1+2+2+𝟑
21
2
3
3
3
4
4
5
Mean =1+2+2+3+𝟑
21
2
3
3
3
4
4
5
Mean =1+2+2+3+3+𝟑
21
2
3
3
3
4
4
5
Mean =1+2+2+3+3+3+𝟒
21
2
3
3
3
4
4
5
Mean =1+2+2+3+3+3+4+𝟒
21
2
3
3
3
4
4
5
Mean =1+2+2+3+3+3+4+4+𝟓
21
2
3
3
3
4
4
5
Mean =1+2+2+3+3+3+4+4+5
Divided by the number of total values
21
2
3
3
3
4
4
5
Mean =1+2+2+3+3+3+4+4+5
Divided by the number of total values
Mean =1+2+2+3+3+3+4+4+5
𝟗
21
2
3
3
3
4
4
5
21
2
3
3
3
4
4
5
Mean =1+2+2+3+3+3+4+4+5
9= 27
9
21
2
3
3
3
4
4
5
Mean =1+2+2+3+3+3+4+4+5
9= 27
9= 3
21
2
3
3
3
4
4
5
Mean = 3
21
2
3
3
3
4
4
5
Mean = 3
The mean is a good estimate of the center of a distribution when the distribution is normal
But, the mean is not a good estimate of the center when the distribution is not normal
This is because of what we call OUTLIERS
What is an outlier?
An outlier is a data point that falls outside the overall pattern of the distribution
As an example, here is the overall pattern
As an example, here is the overall pattern
21
2
3
3
3
4
4
5
But what if we changed the 5
But what if we changed the 5
21
2
3
3
3
4
4
5
to a 50
to a 50
21
2
3
3
3
4
4
50
to a 50
21
2
3
3
3
4
4
50
To illustrate what happens to the mean when an outlier is present, let’s go back to this distribution:
To illustrate what happens to the mean when an outlier is present, let’s go back to this distribution:
21
2
3
3
3
4
4
5
Let’s say one student, instead of studying five hours studies 23 hours a day!!!!!
Watch what happens to the mean:
Before
Mean =1+2+2+3+3+3+4+4+5
9= 27
9= 3
21
2
3
3
3
4
4
5
After
21
2
3
3
3
4
4
5
21
2
3
3
3
4
4
23
21
2
3
3
3
4
4
23
21
2
3
3
3
4
4
23
Mean =1+2+2+3+3+3+4+4+𝟐𝟑
9=
21
2
3
3
3
4
4
23
Mean =1+2+2+3+3+3+4+4+23
9= 𝟒𝟓
𝟗
21
2
3
3
3
4
4
23
Mean =1+2+2+3+3+3+4+4+23
9= 45
9= 𝟓
Once again, BEFORE
Once again, BEFORE
21
2
3
3
3
4
4
5
Mean =1+2+2+3+3+3+4+4+5
9= 27
9= 𝟑
AFTER
21
2
3
3
3
4
4
23
Mean =1+2+2+3+3+3+4+4+23
9= 45
9= 𝟓
Just by changing one value from “5” to “23” the mean changed by two values (from “3” to “5”)
Thus, the mean is very sensitive to outliers
Therefore, the mean is not a good estimate of the center of a distribution when the distribution is NOT NORMAL
Therefore, the mean is not a good estimate of the center of a distribution when the distribution is NOT NORMAL
Therefore, the mean is not a good estimate of the center of a distribution when the distribution is NOT NORMAL
Therefore, the mean is not a good estimate of the center of a distribution when the distribution is NOT NORMAL
Here is a guiding principle
1 If your data set is normally distributed like this:
1 If your data set is normally distributed like this:
Hours of Study1 2 3 4 5
Nu
mb
er
of
Occ
urr
en
ces
1
2
3
1 If your data set is normally distributed like this, then you will use a parametric test
2
2 If your data set is skewed either to the right
2 If your data set is skewed either to the right
Hours of Study1 2 3 4 5
Nu
mb
er
of
Occ
urr
en
ces
1
2
3
2 If your data set is skewed either to the right
Hours of Study1 2 3 4 5
Nu
mb
er
of
Occ
urr
en
ces
1
2
3
2 If your data set is skewed either to the right
or to the left
2 If your data set is skewed either to the right
or to the left
Hours of Study1 2 3 4 5
Nu
mb
er
of
Occ
urr
en
ces
1
2
3
2 If your data set is skewed either to the right
or to the left
Hours of Study1 2 3 4 5
Nu
mb
er
of
Occ
urr
en
ces
1
2
3
2 If your data set is skewed either to the right
or to the left, then you will use a nonparametrictest
In summary,
In summary,
A parametric test is used when the problem’s data set is normally distributed
In summary,
A parametric test is used when the problem’s data set is normally distributed
In summary,
A parametric test is used when the problem’s data set is normally distributed
A non-parametric test is used when the problem’s data set is very skewed to the right or the left:
In summary,
A parametric test is used when the problem’s data set is normally distributed
A non-parametric test is used when the problem’s data set is very skewed to the right or the left:
In summary,
A parametric test is used when the problem’s data set is normally distributed
A non-parametric test is used when the problem’s data set is very skewed to the right or the left:
In summary,
A parametric test is used when the problem’s data set is normally distributed:
A non-parametric test is used when the problem’s data set is very skewed to the right or the left:
Or very non-normal:
A parametric test is used when the problem’s data set is normally distributed:
A non-parametric test is used when the problem’s data set is very skewed to the right or the left:
Or very non-normal:
In summary,
So, how do you know if your data is normally distributed?
So, how do you know if your data is normally distributed?
Go to the Learning Module entitled: Assessing Skew. You will find it next to the link for this presentation.
So, how do you know if your data is normally distributed?
Go to the Learning Module entitled: Assessing Skew. You will find it next to the link for this presentation.
After you have viewed that learning module use SPSS to assess the skew of your data.
Is your data normally distributed or skewed?
If your data was skewed with a critical ratio greater than 2.0 or less than -2.0 then select
Skewed
Otherwise select
Normal
It is important to note that if you choose Skewed, your data will be analyzed using what are called non-parametric tests
Skewed
Non-parametric tests differ from parametric tests in one simple way:
Parametric tests use the mean in their calculations
Parametric tests use the mean in their calculations
Non-parametric tests use the median
What is the median?
The median is simply the middle score of a data set where
The median is simply the middle score of a data set where
• 50% of the scores fall below it and
The median is simply the middle score of a data set where
• 50% of the scores fall below it and
• 50% of the scores are above it
To illustrate let’s go back to this distribution:
To illustrate let’s go back to this distribution:
21
2
3
3
3
4
4
5
With the Median we simply determine the mid point:
21
2
3
3
3
4
4
5
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5
21
2
3
3
3
4
4
5
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5
21
2
3
3
3
4
4
5
4 units
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5
21
2
3
3
3
4
4
5
4 units 4 units
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5
21
2
3
3
3
4
4
5
4 units 4 units
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5
21
2
3
3
3
4
4
5
4 units 4 units
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5
21
2
3
3
3
4
4
5
4 units 4 units
Notice that the Median is unaffected by outliers
To illustrate this, we’ll change the value “5” to a “10”:
21
2
3
3
3
4
4
5
21
2
3
3
3
4
4
10
Watch what happens to the median:
21
2
3
3
3
4
4
10
21
2
3
3
3
4
4
10
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +10
21
2
3
3
3
4
4
10
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +104 units
10
21
2
3
3
3
4
4
10
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +104 units
10
4 units
21
2
3
3
3
4
4
10
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +104 units
10
4 units
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5
21
2
3
3
3
4
4
10
4 units 4 units
Hmm, it’s still 3
But, what if we change the value 10 to 1,000!!!
Watch again what happens to the median:
21
2
3
3
3
4
4
1,000
21
2
3
3
3
4
4
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +1000
1,000
21
2
3
3
3
4
4
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +10004 units
1,000
21
2
3
3
3
4
4
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +10004 units 4 units
1,000
21
2
3
3
3
4
4
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +10004 units 4 units
1,000
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 1000
21
2
3
3
3
4
4
4 units 4 units
1,000
What do you know –
It’s still 3
Here is the key take away:
The mean is affected by outliers
The mean is affected by outliers
The median is not affected by outliers
Therefore the mean is used with more or less NORMAL DISTRIBUTIONS
Therefore the mean is used with more or less NORMAL DISTRIBUTIONS
And the median is used with SKEWED OR NON-NORMAL DISTRIBUTIONS
And the median is used with SKEWED OR NON-NORMAL DISTRIBUTIONS
And the median is used with SKEWED OR NON-NORMAL DISTRIBUTIONS
And the median is used with SKEWED OR NON-NORMAL DISTRIBUTIONS
So, why doesn’t everyone use non-parametric methods since they are unaffected by outliers?
Because parametric methods provide more meaningful information about the population than do non-parametric methods
So, if your data is skewed it’s better to get what information you can from a non-parametric test,
So, if your data is skewed it’s better to get what information you can from a non-parametric test, even though a parametric test would have provided more information (if your data had been normally distributed)
So, based on your analysis, which distribution best reflect your data set:
So, based on your analysis, which distribution best reflect your data set:
Normal
Skewed