21
Chapter 6 Page 1 CHAPTER 6 The Normal Probability Distribution The normal probability distribution is the most widely used distribution in statistics as many statistical procedures are built around it. The central limit theorem is probably the main reason that contributes to the importance of the normal distribution. It is essential for statistics students to learn how to use the normal probability distribution for solving applied problems. In this Chapter we are going to study the normal probability distribution using the appropriate functions in JMP. Also, we are going to perform simulations using a random function to generate a normally distributed random variable with a specified mean and standard deviation. We are going to perform a statistical experiment to demonstrate numerically the central limit theorem, and finally we are going to assess the normality of a given dataset. Class Exercises: Compute probabilities for the normal distribution Class example 1: According to the National Health Survey, heights of adult males are normally distributed with a mean of 69” and a standard deviation of 2.9”. Compute the percentage of the population of adult males that falls between 64” and 76”. First, let’s open a new data table, Figure 6.1 then, right click at the heading of “Column 1”

CHAPTER 6faculty.tamucc.edu/jguardiola/Math 1442/JMP Labs... · Class Exercise: Assessing normality, Using results from the previous exercise we will assess normality of the sampling

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CHAPTER 6faculty.tamucc.edu/jguardiola/Math 1442/JMP Labs... · Class Exercise: Assessing normality, Using results from the previous exercise we will assess normality of the sampling

Chapter 6 Page 1

CHAPTER 6

The Normal Probability Distribution

The normal probability distribution is the most widely used distribution in statistics as many statistical

procedures are built around it. The central limit theorem is probably the main reason that contributes to

the importance of the normal distribution. It is essential for statistics students to learn how to use the

normal probability distribution for solving applied problems. In this Chapter we are going to study the

normal probability distribution using the appropriate functions in JMP. Also, we are going to perform

simulations using a random function to generate a normally distributed random variable with a specified

mean and standard deviation. We are going to perform a statistical experiment to demonstrate

numerically the central limit theorem, and finally we are going to assess the normality of a given

dataset.

Class Exercises: Compute probabilities for the normal distribution

Class example 1:

According to the National Health Survey, heights of adult males are normally distributed with a mean of

69” and a standard deviation of 2.9”. Compute the percentage of the population of adult males that falls

between 64” and 76”.

First, let’s open a new data table,

Figure 6.1

then, right click at the heading of “Column 1”

Page 2: CHAPTER 6faculty.tamucc.edu/jguardiola/Math 1442/JMP Labs... · Class Exercise: Assessing normality, Using results from the previous exercise we will assess normality of the sampling

Chapter 6 Page 2

Figure 6.2

click on the text box for “Column Name” and change the name to “x”, as follows:

Figure 6.3

left click twice at the right side of the first column heading to open a new column

Figure 6.4

you can save the file as “Normal Dist” (or anything you like), then right click over “Column 2”, select

“Column Info” and change the name to “P(x)”, then click over “Column Properties” and select formula,

as shown below,

Page 3: CHAPTER 6faculty.tamucc.edu/jguardiola/Math 1442/JMP Labs... · Class Exercise: Assessing normality, Using results from the previous exercise we will assess normality of the sampling

Chapter 6 Page 3

Figure 6.5

a new window will open, then choose “Probability” from “Functions (grouped)” and select Normal

Distribution as shown below

Figure 6.6

then click twice over variable “x”, and click inside the parenthesis, then after the variable x, type

“,69,2.8” as shown below:

Page 4: CHAPTER 6faculty.tamucc.edu/jguardiola/Math 1442/JMP Labs... · Class Exercise: Assessing normality, Using results from the previous exercise we will assess normality of the sampling

Chapter 6 Page 4

Figure 6.7

click over “Apply”, you are going to see the following screen:

Figure 6.8

then click over “OK” on this window and in the next window, next we want to compute the cumulative

probability for x=64 and x = 76, let’s input these numbers in the first column as shown below:

Page 5: CHAPTER 6faculty.tamucc.edu/jguardiola/Math 1442/JMP Labs... · Class Exercise: Assessing normality, Using results from the previous exercise we will assess normality of the sampling

Chapter 6 Page 5

Figure 6.9

the cumulative probabilities for these numbers are shown above. Thus, the probability that the height of

one person is between 64 and 76 is (rounding to three digits):

P(64<x<76) = 0.994-.037 = 0.957

Class Exercise 2: We can also perform probability computations using a simulation, for example, let’s

generate 10,000 random numbers from a normal distribution with a mean of 69 and standard deviation

of 2.9, to do this, let’s open a new data table as follows,

Figure 6.10

then, right click at the heading of “Column 1” and click over “Column Info”

Figure 6.11

Page 6: CHAPTER 6faculty.tamucc.edu/jguardiola/Math 1442/JMP Labs... · Class Exercise: Assessing normality, Using results from the previous exercise we will assess normality of the sampling

Chapter 6 Page 6

click on the text box for “Column Name” and change the name to “x”, as follows:

Figure 6.12

then click over “Column Properties” and select “Formula”,

Figure 6.13

next, click over “Edit Formula” and select “Random”, then select “Random Normal”

Page 7: CHAPTER 6faculty.tamucc.edu/jguardiola/Math 1442/JMP Labs... · Class Exercise: Assessing normality, Using results from the previous exercise we will assess normality of the sampling

Chapter 6 Page 7

Figure 6.14

click inside the parenthesis, and input the numbers, 69 and 2.8 separated by a comma as follows:

Figure 6.15

click over “OK” on this window and in the next window, then right click over the first column (below the

red arrow) and select “Add Rows…”, as below

Page 8: CHAPTER 6faculty.tamucc.edu/jguardiola/Math 1442/JMP Labs... · Class Exercise: Assessing normality, Using results from the previous exercise we will assess normality of the sampling

Chapter 6 Page 8

Figure 6.16

type 10000 at the dialog box and click “OK”

Figure 6.17

at this point, you are going to see a sequence of randomly generated numbers from a normal

distribution,

Figure 6.18

Page 9: CHAPTER 6faculty.tamucc.edu/jguardiola/Math 1442/JMP Labs... · Class Exercise: Assessing normality, Using results from the previous exercise we will assess normality of the sampling

Chapter 6 Page 9

you can draw a histogram using the “Analyze “ menu and choosing the “Distribution” option, (see

Chapter 3 for more details, this procedure is not shown here). You can check the shape of the

distribution and take a look at the summary statistics that will be approximately equal to the requested

mean and standard deviation (this activity is highly recommended, please ask you lab instructor if you

do not know how to do it).

Next, you need to sort the numbers from lowest to highest, by selecting “Tables” and “Sort”, then

choose the variable “x” and click over “By”, you will see the next window

Figure 6.19

click over OK, and you are going to see the sequence of random numbers ordered from lowest to

highest as follows:

Figure 6.20

Page 10: CHAPTER 6faculty.tamucc.edu/jguardiola/Math 1442/JMP Labs... · Class Exercise: Assessing normality, Using results from the previous exercise we will assess normality of the sampling

Chapter 6 Page 10

computing the simulated probabilities is just a matter of counting the number of observations that

match the requirements for this problem. To compute the requested probabilities, you need to count

the number of observations that are less than 64”, you can do it by scanning the ordered dataset, and

looking at the index number on the left side of the screen,

Figure 6.21

we can see at the Figure above that there are 349 observations less than 64”, then this probability is

computed as follows:

P(x<64) = 349/10,0000 = 0.0349

Which is close to the computed probability using the normal distribution formula (see Figure 6.9) of

0.0370, please do not forget that this is a numeric simulation and the results shown here are

approximations to the true probabilities, but this result is close enough.

Next, we need to find the probability that a man selected at random has a height less than 76”, to do it

we need to count the number of observations that are less than 76 as shown below:

Page 11: CHAPTER 6faculty.tamucc.edu/jguardiola/Math 1442/JMP Labs... · Class Exercise: Assessing normality, Using results from the previous exercise we will assess normality of the sampling

Chapter 6 Page 11

Figure 6.22

we found 9940 observations that are less than 76”, thus the probability associated with that event is

computed as follows

=9,940/10,000 = 0.994, then the computation for the probability that one man selected at random is

between 64” and 76” is as follows:

P(64<x<76) = 0.994 – 0.035 = 0.959, which is very close to the probability computed using the formulas,

as you can see here, the simulation provided acceptable results!

Class Exercise: The Central Limit Theorem

Please go to the website:

http://onlinestatbook.com/rvls.html

or search in your browser “Rice virtual labs”

1) Select “Simulations and Demonstrations”, and select “Sampling Distribution Simulation”

2) Select a normal distribution and choose a small sample size, then you can take 50,000 samples

(or more) and look at the graph for the sampling distribution of the mean

3) Select a skewed distribution and choose a small sample size (n = 2 or 5), repeat the same

procedure and see what happens.

4) Select a skewed distribution and choose the largest sample size available (n=25) and generate

again the sampling distribution of the means

5) What are your conclusions? Did you notice any difference among the previous simulations? How

can you relate your findings to the theory studied in class? Please remember the requirements

for the application of the central limit theorem

Page 12: CHAPTER 6faculty.tamucc.edu/jguardiola/Math 1442/JMP Labs... · Class Exercise: Assessing normality, Using results from the previous exercise we will assess normality of the sampling

Chapter 6 Page 12

Now, let’s do a simulation using JMP, we are going to generate an integer uniform distribution using the

numbers 1 to 10 and we are going to obtain samples from this distribution

First, let’s open a new data table:

Figure 6.23

then, right click over the heading of “Column 1” and select “Column Info”,

Figure 6.24

choose “Formula” from “Column Properties” and select “Edit Formula”

Page 13: CHAPTER 6faculty.tamucc.edu/jguardiola/Math 1442/JMP Labs... · Class Exercise: Assessing normality, Using results from the previous exercise we will assess normality of the sampling

Chapter 6 Page 13

Figure 6.25

select “Random” and “Random Integer” as follows,

Figure 6.26

type 1 inside the red box, and hit enter, type “,” and 10, you should see the following window

Page 14: CHAPTER 6faculty.tamucc.edu/jguardiola/Math 1442/JMP Labs... · Class Exercise: Assessing normality, Using results from the previous exercise we will assess normality of the sampling

Chapter 6 Page 14

Figure 6.27

then hit enter, click “OK” on this window and click “OK” again in the next window. You are not going to

see any changes at the data window as we still have to add some columns. To do this, right click over the

cell below the red triangle and select “Add Rows…” as follows

Figure 6.28

type 200 inside the box

Page 15: CHAPTER 6faculty.tamucc.edu/jguardiola/Math 1442/JMP Labs... · Class Exercise: Assessing normality, Using results from the previous exercise we will assess normality of the sampling

Chapter 6 Page 15

Figure 6.29

you can see randomly generated numbers from 1 to 10,

Figure 6.30

then left click twice over the space to the right of “Column 1” and keep doing that until you generate 4

new columns as follows

Figure 6.31

Page 16: CHAPTER 6faculty.tamucc.edu/jguardiola/Math 1442/JMP Labs... · Class Exercise: Assessing normality, Using results from the previous exercise we will assess normality of the sampling

Chapter 6 Page 16

next, right click over the heading of “Column 1” and select, “Copy Column Properties”

Figure 6.32

then go over the heading of each new column and right click over the heading and select “Paste Column

Properties”, repeat this procedure for each column

Figure 6.33

you are going to see 5 columns with integer random numbers ranging from 1 to 10

Page 17: CHAPTER 6faculty.tamucc.edu/jguardiola/Math 1442/JMP Labs... · Class Exercise: Assessing normality, Using results from the previous exercise we will assess normality of the sampling

Chapter 6 Page 17

Figure 6.34

now, let’s compute the mean for each row, and put these results in column 6. Let’s generate a new

column by double clicking on the space right to the heading of Column 5. Then, right click over the

heading of the new column and as we have done before. Select “Column info”, then select “Formula”

from “Column Properties” and click over “Edit Formula” (as in Figures 6.1 to 6.4) , then choose

“Statistical” from “Functions” and select “Mean” from the menu as follows,

Figure 6.35

then, click inside the parenthesis and click twice over “Column 1” under “Table Columns”, type a comma

and click over “Column 2” and so on, until you add all columns until “Column 5”, your formula should

look like this:

Page 18: CHAPTER 6faculty.tamucc.edu/jguardiola/Math 1442/JMP Labs... · Class Exercise: Assessing normality, Using results from the previous exercise we will assess normality of the sampling

Chapter 6 Page 18

Figure 6.36

click over “OK” on this window and in the next window, now you can see the mean computed for every

row. The interesting thing about the new column is that it contains the sampling distribution of the

means from a uniform probability distribution of integers ranging from 1 to 10.

It will be interesting to take a look at the properties of the sampling distribution of the means that we

got on column 6. With that purpose in mind, let’s choose the “Analyze” menu and select “Distribution”,

then click over “Column 6” and next, click over “Y, Columns”, and click over ”OK”. You are going to

obtain a histogram for the sampling distribution of the means. You can see a bell shaped distribution

with a mean of 5.618 and a standard deviation of 1.278534 (results may vary). You can get a horizontal

layout by choosing this option from the “Display Options” located under the second red triangle. Notice

that the mean of your sampling distribution approximates the mean of the uniform distribution of the

integers (the mean is 5.5).

Figure 6.37

Page 19: CHAPTER 6faculty.tamucc.edu/jguardiola/Math 1442/JMP Labs... · Class Exercise: Assessing normality, Using results from the previous exercise we will assess normality of the sampling

Chapter 6 Page 19

Also, you should observe that the sampling distribution of the means approximates a normal

distribution even that the original population is uniform with integers ranging from 1 to 10 and we used

a small sample size. The next step is to check your sampling distribution of the means for normality.

Class Exercise: Assessing normality,

Using results from the previous exercise we will assess normality of the sampling distribution of the

means located on “Column 6”. Let’s proceed as follows:

click over the lower right triangle on the window shown in Figure 6.37 and select “Continuous Fit”, then

select “Normal”

Figure 6.38

This option overlaps a normal shape over the histogram as shown below, but probably this is not

enough to assess normality,

Figure 6.39

then select from the lower right triangle, and choose the option “Normal Quantile Plot”

Page 20: CHAPTER 6faculty.tamucc.edu/jguardiola/Math 1442/JMP Labs... · Class Exercise: Assessing normality, Using results from the previous exercise we will assess normality of the sampling

Chapter 6 Page 20

Figure 6.40

At this point, you can see a Q-Q plot (normal quantile plot) for the data in “Column 6” as shown bellow

Figure 6.41

we can see that the Q-Q plot follows a straight line pattern (more or less) and the dots are located

within the curves described with red dots. There is no presence of an obvious pattern on the Q-Q plot,

therefore we can accept normality of the sampling distribution of the means as predicted by the central

limit theorem (even that in this case the sample size was small).

Page 21: CHAPTER 6faculty.tamucc.edu/jguardiola/Math 1442/JMP Labs... · Class Exercise: Assessing normality, Using results from the previous exercise we will assess normality of the sampling

Chapter 6 Page 21

Class Exercises:

1- Probability functions: Consider that women’s heights are normally distributed with a mean of

63.6”and a standard deviation of 2.5” then, answer the following questions using the function

“Normal Distribution” as in class example 1 (shown at the beginning of this Chapter).

a. Find the probability that a woman selected at random is between the heights of 60” and

66”.

b. Find the probability that a woman selected at random is taller than 69”

2- Simulations: Solve the previous problems using a simulation (Generate a sequence of 10,000

normally distributed random numbers). Compare the simulated results with the computed

probabilities from problem 1.

3- Central Limit Theorem: Generate 4 columns with 250 numbers in each column, using a random

normal distribution with a mean of 63.6”and a standard deviation of 2.5”

a. Compute the mean for each row on the fifth column

b. Analyze the sampling distribution of the means on the fifth column, obtain summary

statistics, describe the shape of the distribution and make comments

c. Compare the population mean with the mean from the sample means at Column 6, Are

they similar?

d. Compare the standard deviation of Column 6, with the standard deviation of the

population, how they are related? (Hint: take a look at the CLT)

e. Discuss your findings with your classmates

Team Assignment: Assessing Normality

Use your random sample that you obtained from the file “Small Town.xls” and do the following:

1- Assess normality using a Q-Q plot (Normal Probability Plot) for all numeric variables

2- Write a report showing your findings:

a. Show a histogram for each continuous variable

b. Show a Q-Q plot (normal probability plot) for each numeric continuous variable

c. Based on the previous graphs discuss if normality is acceptable for each variable, write

briefly the reasons that support your conclusion

d. Explore transformations for those variables that normality was not acceptable, that is:

apply a mathematical function such as the logarithmic function or the square root to

transform every value, and discuss if the results are different (better) than before

e. Summarize your findings on a table, showing which variables can be considered

normally distributed and which variables can’t be considered normally distributed,

specify if a transformation was applied to achieve normality

3- Choose a variable that is normally distributed, compute the mean and standard deviation and

simulate the results an equivalent normal distribution. Simulate a normal random variable with

these parameters, and find the probability that one observation is between 1.5 standard

deviations below the mean and 1.2 standard deviations around the mean, compare the result

obtained by simulation with the probability for a standard normal distribution P(-1.5< z <1.2)