Tutorial normal v skew distributions

Preview:

Citation preview

Difference between Normal and Skewed Distributions

This presentation will help you determine if the data set from the problem you are asked to solve has a normal or skewed distribution

This presentation will help you determine if the data set from the problem you are asked to solve has a normal or skewed distribution

Normal

Skewed

What is a distribution?

We will illustrate what a distribution is with a data set that describes the hours students’ study

Here is the data set:

Student Hours of Study

Student Hours of Study

Bart 1

Student Hours of Study

Bart 1Basheba 2

Student Hours of Study

Bart 1Basheba 2Bella 2

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Data

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Data Set

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

From this data set we will create a distribution:

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

The X Axis, will be the number of hours of

study

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2 3

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2 3 4

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2 3 4 5

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2 3 4 5

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2 3 4 5

The Y Axis, indicates the number of times

the same number occurs

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2 3 4 5

The Y Axis, indicates the number of times

the same number occurs

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2 3 4 5

The Y Axis, indicates the number of times

the same number occurs

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2 3 4 5

The Y Axis, indicates the number of times

the same number occurs

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2 3 4 5

The Y Axis, indicates the number of times

the same number occurs

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2 3 4 5

The Y Axis, indicates the number of times

the same number occurs

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2 3 4 5

Number of Occurrences

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2 3 4 5

Num

ber o

f Occ

urre

nces

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2 3 4 5

Num

ber o

f Occ

urre

nces

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2 3 4 5

Num

ber o

f Occ

urre

nces

1

2

3

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2 3 4 5

Num

ber o

f Occ

urre

nces

1

2

3

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2 3 4 5

Num

ber o

f Occ

urre

nces

1

2

3

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2 3 4 5

Num

ber o

f Occ

urre

nces

1

2

3

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2 3 4 5

Num

ber o

f Occ

urre

nces

1

2

3

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2 3 4 5

Num

ber o

f Occ

urre

nces

1

2

3

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2 3 4 5

Num

ber o

f Occ

urre

nces

1

2

3

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2 3 4 5

Num

ber o

f Occ

urre

nces

1

2

3

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2 3 4 5

Num

ber o

f Occ

urre

nces

1

2

3

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2 3 4 5

Num

ber o

f Occ

urre

nces

1

2

3

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2 3 4 5

Num

ber o

f Occ

urre

nces

1

2

3

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2 3 4 5

Num

ber o

f Occ

urre

nces

1

2

3

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2 3 4 5

Num

ber o

f Occ

urre

nces

1

2

3

Student Hours of Study

Bart 1Basheba 2Bella 2Bob 3Boston 3Bunter 3Buxby 4Bybee 4Bwinda 5

Hours of Study1 2 3 4 5

Num

ber o

f Occ

urre

nces

1

2

3

This is a distribution

One way to represent a distribution like this:

One way to represent a distribution like this:

One way to represent a distribution like this:Is like this:

One way to represent a distribution like this:Is like this:

One way to represent a distribution like this:Is like this:

Normal distributions have the majority of the data in

the middle

One way to represent a distribution like this:Is like this:

Normal distributions have the majority of the data in

the middle

One way to represent a distribution like this:Is like this:

With decreasing but equal amounts

toward the tails

One way to represent a distribution like this:Is like this:

With decreasing but equal amounts

toward the tails

With decreasing but equal amounts

toward the tails

The mean or average works really well with normal distributions

Another way to say it, is that the mean describes well the center point of a normal distribution

A Normal Distribution

The Mean

Here is how you calculate the mean:

Let’s put the data into the distribution

21

2

3

3

3

4

4

5

21

2

3

3

3

4

4

5

Mean

21

2

3

3

3

4

4

5

Mean

21

2

3

3

3

4

4

5

Mean

21

2

3

3

3

4

4

5

Mean

21

2

3

3

3

4

4

5

Mean

21

2

3

3

3

4

4

5

Mean

21

2

3

3

3

4

4

5

Mean

21

2

3

3

3

4

4

5

Mean

21

2

3

3

3

4

4

5

Mean

21

2

3

3

3

4

4

5

Mean

21

2

3

3

3

4

4

5

Mean

21

2

3

3

3

4

4

5

Mean

21

2

3

3

3

4

4

5

Mean

Divided by the number of total values

21

2

3

3

3

4

4

5

Mean

Divided by the number of total values

Mean

21

2

3

3

3

4

4

5

21

2

3

3

3

4

4

5

Mean =

21

2

3

3

3

4

4

5

Mean = = 3

21

2

3

3

3

4

4

5

Mean 3

21

2

3

3

3

4

4

5

Mean 3

The mean is a good estimate of the center of a distribution when the distribution is normal

But, the mean is not a good estimate of the center when the distribution is not normal

This is because of what we call OUTLIERS

What is an outlier?

An outlier is a data point that falls outside the overall pattern of the distribution

As an example, here is the overall pattern

As an example, here is the overall pattern

21

2

3

3

3

4

4

5

But what if we changed the 5

But what if we changed the 5

21

2

3

3

3

4

4

5

to a 50

to a 50

21

2

3

3

3

4

4

50

to a 50

21

2

3

3

3

4

4

50

This is an Outlier

To illustrate what happens to the mean when an outlier is present, let’s go back to this distribution:

To illustrate what happens to the mean when an outlier is present, let’s go back to this distribution:

21

2

3

3

3

4

4

5

Let’s say one student, instead of studying five hours studies 23 hours a day!!!!!

Watch what happens to the mean:

Before

Mean =

21

2

3

3

3

4

4

5

After

21

2

3

3

3

4

4

5

21

2

3

3

3

4

4

23

21

2

3

3

3

4

4

23

21

2

3

3

3

4

4

23

21

2

3

3

3

4

4

23

Mean =

21

2

3

3

3

4

4

23

Mean =

21

2

3

3

3

4

4

23

Mean =

Once again, BEFORE

Once again, BEFORE

21

2

3

3

3

4

4

5

Mean =

AFTER

21

2

3

3

3

4

4

23

Mean =

Just by changing one value from “5” to “23” the mean changed by two values (from “3” to “5”)

Thus, the mean is very sensitive to outliers

Therefore, the mean is not a good estimate of the center of a distribution when the distribution is NOT NORMAL

Therefore, the mean is not a good estimate of the center of a distribution when the distribution is NOT NORMAL

Therefore, the mean is not a good estimate of the center of a distribution when the distribution is NOT NORMAL

Therefore, the mean is not a good estimate of the center of a distribution when the distribution is NOT NORMAL

So, how do you know if your data is normally distributed?

Go to the Learning Module entitled: Assessing Skew. You will find it next to the link for this presentation.

So, how do you know if your data is normally distributed?

Go to the Learning Module entitled: Assessing Skew. You will find it next to the link for this presentation.

So, how do you know if your data is normally distributed?

Go to the Learning Module entitled: Assessing Skew. You will find it next to the link for this presentation.

After you have viewed that learning module use SPSS to assess the skew of your data.

Is your data normally distributed or skewed?

If your data was skewed with a critical ratio greater than 2.0 or less than -2.0 then select

Skewed

Otherwise select

Normal

When your distribution is normal then it is appropriate to use the mean as a way to know something about the center point of the distribution.

Normal distributions use the mean in their calculations

Skewed distributions use the median

What is the median?

The median is simply the middle score of a data set where

The median is simply the middle score of a data set where • 50% of the scores fall below it and

The median is simply the middle score of a data set where • 50% of the scores fall below it and • 50% of the scores are above it

To illustrate let’s go back to this distribution:

To illustrate let’s go back to this distribution:

21

2

3

3

3

4

4

5

With the Median we simply determine the mid point:

21

2

3

3

3

4

4

5

Median

21

2

3

3

3

4

4

5

Median

21

2

3

3

3

4

4

5

4 units

Median

21

2

3

3

3

4

4

5

4 units 4 units

Median

21

2

3

3

3

4

4

5

4 units 4 units

Median

21

2

3

3

3

4

4

5

4 units 4 units

Median

21

2

3

3

3

4

4

5

4 units 4 units

This is the Median

Notice that the Median is unaffected by outliers

To illustrate this, we’ll change the value “5” to a “10”:

21

2

3

3

3

4

4

5

21

2

3

3

3

4

4

10

Watch what happens to the median:

21

2

3

3

3

4

4

10

21

2

3

3

3

4

4

10

Median 10

21

2

3

3

3

4

4

10

Median 10 4 units

10

21

2

3

3

3

4

4

10

Median 10 4 units

10

4 units

21

2

3

3

3

4

4

10

Median 10 4 units

10

4 units

Median

21

2

3

3

3

4

4

10

4 units 4 units

Hmm, it’s still 3

But, what if we change the value 10 to 1,000!!!

Watch again what happens to the median:

21

2

3

3

3

4

4

1,000

21

2

3

3

3

4

4

Median 1000

1,000

21

2

3

3

3

4

4

Median 1000 4 units

1,000

21

2

3

3

3

4

4

Median 1000 4 units 4 units

1,000

21

2

3

3

3

4

4

Median 1000 4 units 4 units

1,000

Median

21

2

3

3

3

4

4

4 units 4 units

1,000

What do you know – It’s still 3

Here is the key take away:

The mean is affected by outliers

The mean is affected by outliers

The median is not affected by outliers

Therefore the mean is used with more or less NORMAL DISTRIBUTIONS

Therefore the mean is used with more or less NORMAL DISTRIBUTIONS

And the median is used with SKEWED OR NON-NORMAL DISTRIBUTIONS

And the median is used with SKEWED OR NON-NORMAL DISTRIBUTIONS

And the median is used with SKEWED OR NON-NORMAL DISTRIBUTIONS

And the median is used with SKEWED OR NON-NORMAL DISTRIBUTIONS

So, based on your analysis, which distribution best reflect your data set:

So, based on your analysis, which distribution best reflect your data set:

Normal

Skewed

Recommended