Hypothesis Testing

The logic of statistical hypothesis testing follows the logic of judicial decision making.

A jury is asked to decide whether a defendant is guilty or not guilty.

A jury is asked to decide whether a defendant is guilty or not guilty. It is a dicho-tomous decision, guilty or not guilty.

A jury is asked to decide whether a defendant is guilty or not guilty. It is a dicho-tomous decision, guilty or not guilty. There is no in-between or partial decision.

A jury is asked to decide whether a defendant is guilty or not guilty. It is a dicho-tomous decision, guilty or not guilty. There is no in-between or partial decision.The jury does not begin its decision-making process in a neutral position.

The default position is “not guilty.”

A jury is asked to decide whether a defendant is guilty or not guilty. It is a dicho-tomous decision, guilty or not guilty. There is no in-between or partial decision.The jury does not begin its decision-making process in a neutral position.

The default position is “not guilty.”

The prosecution must mount enough evidence to convince the jury to move from its default position of not guilty to a verdict of guilty.

The jury will make a decision which may or may not coincide with reality.

When the jury decides “not guilty” and the defendant is, in reality, not guilty,

It is true because the not guilty (negative) decision aligns with the not guilty (negative) reality.

When the jury decides “not guilty” and the defendant is, in reality, not guilty, they have made a correct decision called a “true negative decision.”

not guilty

and I reallywasn’t guilty!

true negative

When the jury decides “guilty” and the defendant is, in reality, guilty,

It is true because the guilty (positive) decision aligns with the guilty (positive) reality.

When the jury decides “guilty” and the defendant is, in reality, guilty, they have made a correct decision called a “true positive” decision.

guilty

and I reallyWAS guilty!

true positive

When the jury decides “not guilty” and the defendant is, in reality, guilty,

When the jury decides “not guilty” and the defendant is, in reality, guilty, they have made an incorrect decision called a “false negative error” which is also called a Type II or beta error.

When the jury decides “not guilty” and the defendant is, in reality, guilty, they have made an incorrect decision called a “false negative error” which is also called a Type II or beta error. It is false because the “not guilty” (negative) decision does not align with the guilty (positive) reality.

not guilty

false negative

When a jury decides “guilty” and the defendant is, in reality, not guilty,

When a jury decides “guilty” and the defendant is, in reality, not guilty, they have made an incorrect decision called a “false positive error”

When a jury decides “guilty” and the defendant is, in reality, not guilty, they have made an incorrect decision called a “false positive error” which is also called a Type I or alpha error.

When a jury decides “guilty” and the defendant is, in reality, not guilty, they have made an incorrect decision called a “false positive error” which is also called a Type I or alpha error. It is false because the “guilty” (positive) decision is not aligned with the not guilty (negative) reality.

guilty

but I reallyWASN’T guilty!

false positive

Although we prefer correct decisions, if we cannot be correct, we prefer the false negative error over the false positive error.

In other words you’d rather render a “NOT GUILTY” verdict when there is GUILT.

Than a “GUILTY” verdict where there is NO GUILT.

not guilty

guilty

In judicial decisions we would rather let a guilty defendant go free . . .

than convict and imprison an innocent defendant.

Our default position of “not guilty” supports this

preference and protects against the least favorable condition.

Review the following slide and answer the questions that follow:

What type of decision is made when a guilty (+) verdict is rendered and the person is guilty (+)?

What type of decision is made when a not guilty (-) verdict is rendered and the person is not guilty (-)?

What type of decision is made when a guilty (+) verdict is rendered and the person is not guilty (-)?

What type of decision is made when a not guilty (-)verdict is rendered and the person is guilty (+) ?

Each conviction protects against Type I error at a different stringency according to the gravity of the punishment to be imposed.

The haunting reality is that we really never know the reality of the guilt or innocence of defendants.

We make our best decisions knowing that there is a probability that we have made an error.

The haunting reality is that we really never know the reality of the guilt or innocence of defendants.

We make our best decisions knowing that there is a probability that we have made an error.

Judicial Decisions

Statistical hypothesis testing and decision-making are directly analogous to judicial decision making.

Statistical Decisions

Let’s consider an example:

A statistician is asked to decide whether a difference exists between two groups of people in terms of some attribute (e.g., excitability). It is a dichotomous decision (meaning only two options), different or not different. There is no in-between or partial decision.

A statistician is asked to decide whether a difference exists between two groups of people in terms of some attribute (e.g., excitability).

It is a dichotomous decision (meaning only two options), different or not different. There is no in-between or partial decision. ?x

The statistician does not begin her decision-making in a neutral position. The default position is “not different.” This is also called the “null hypothesis.”

The research findings must present sufficient evidence to convince the statistician to move from her default position of no difference to a conclusion that the groups are different in terms of the attribute.

The statistician will make a decision which may or may not coincide with reality.

The apparent differences may be due to chance or may be real.

The statistician will make a decision which may or may not coincide with reality.

The apparent differences may be due to chance or may be real.

OR Something that is really happening

When the statistician decides “not different” (fails to reject the null hypothesis, maintains the default position) and the groups are, in reality, not different, she has made a correct decision called a “true negative decision.”

true negative

It is true because the “no difference” (negative) decision aligns with “no difference” reality.

not guilty

and I reallyWASN’T guilty!

true negative

When the statistician decides that there is a difference (rejects the null hypothesis, moves off of the default position) and the groups are, in reality, different, she has made a correct decision called a true positive decision.

true positive

It is true because the “different” (positive) decision aligns with the “different” (positive) reality.

guilty

true positive

When the statistician decides “not different” (fails to reject the null hypothesis, maintains the default position) and the group are, in reality different, she has made a false negative error.

false negative

It is false because the decision of no difference (negative) does not align with difference (positive) reality.

false negative

not guilty

Ha ha! and I really

WAS guilty!

Although we prefer correct decisions, if we cannot be correct, we then prefer false negative error over the alternative error.

When a statistician decides that there is a difference (positive) between the groups and rejects the null hypothesis of no difference and, in reality, there is no difference, she has made a false positive error (also called Type I error or alpha error.)

false positive

It is false because the “difference” (positive) decision does not align with the “no difference” (negative) reality.

guilty

false positive

Our hypothesis testing conventions protect against false positive, Type I error by holding a default position of the null hypothesis.

αBeware of

Type I Error

We set a standard of evidence that is required before rejecting the default null hypothesis.

The standard of evidence is based on the probability density of the sampling distribution.

Using probability density we can estimate the probability of Type I error.

If the mean of the sample is here, then we

have a .0001 or .01% chance that we made a

Type I error.

Or in other words, we have a .01% chance of rejecting the null hypothesis that the group scores come from two different populations (claiming guilty) and being wrong when both groups were really part of the same population (not guilty)

When the probability of Type I error is at a low enough level, we reject the default, null hypothesis. Like in our previous example.

The conventional level of tolerable Type I error is .05.

.05 or 5% chance that we selected a sample from this population and claimed it was a sample from another

population = false positive

This means that out of 100 similar decisions based on these data …

05 06 07 08 09 10 11 12 13 14 15 16

06 07 08 09 10 11 12 13 14 15

07 08 09 10 11 12 13 14

08 09 10 11 12 13

09 10 11 12

10 155

… we will be wrong (make a Type I error) less than 5 times.

05 06 07 08 09 10 11 12 13 14 15 16

06 07 08 09 10 11 12 13 14 15

07 08 09 10 11 12 13 14

08 09 10 11 12 13

09 10 11 12

10 155

One advantage that statisticians have over juries is that we can estimate the probability of Type I error while they cannot.

I can estimate the probability

of being right or wrong

Not sure of the probability of being right or

(Or, at least it is easier for us to do so than for them. There is some recent research in rape cases that has estimated how frequently juries make Type I errors in such cases.)

Even so, we do not get to make the similar decision 100 times.

We tend to make the decision once. The haunting reality is that we never know in this one decision whether it is one of the probably occurring Type I errors.

In other words, we take a sample of 30 persons and get a score of 7.

05 06 07 08 09 10 11 12 13 15 16

06 07 08 09 10 11 12 13 15

07 08 09 10 11 12 13 14

08 09 10 11 12 13

09 10 11 12

10 155

In other words, we take a sample of 30 persons and get a score of 7. And then another sample and get a score of 12, and another with a score of 11, and so on and so on until the distribution below emerges.

05 06 07 08 09 10 11 12 13 15 16

06 07 08 09 10 11 12 13 15

07 08 09 10 11 12 13 14

08 09 10 11 12 13

09 10 11 12

10 155

But since, in real life, we usually only take one sample of 30 for our research purposes,

05 06 07 08 09 10 11 12 13 14 15 16

06 07 08 09 10 11 12 13 14 15

07 08 09 10 11 12 13 14

08 09 10 11 12 13

09 10 11 12

10 155

But since, in real life, we usually only take one sample of 30 for our research purposes,

05 06 07 08 09 10 11 12 13 14 15 16

06 07 08 09 10 11 12 13 14 15

07 08 09 10 11 12 13 14

08 09 10 11 12 13

09 10 11 12

10 155

But since, in real life, we usually only take one sample of 30 for our research purposes, we don’t know if the sample was selected from the far left of the distribution below

05 06 07 08 09 10 11 12 13 14 15 16

06 07 08 09 10 11 12 13 14 15

07 08 09 10 11 12 13 14

08 09 10 11 12 13

09 10 11 12

10 155

But since, in real life, we usually only take one sample of 30 for our research purposes, we don’t know if the sample was selected from the far left of the distribution below or the far right

05 06 07 08 09 10 11 12 13 14 15 16

06 07 08 09 10 11 12 13 14 15

07 08 09 10 11 12 13 14

08 09 10 11 12 13

09 10 11 12

10 155

But since, in real life, we usually only take one sample of 30 for our research purposes, we don’t know if the sample was selected from the far left of the distribution below or the far right or the middle.

05 06 07 08 09 10 11 12 13 14 15 16

06 07 08 09 10 11 12 13 14 15

07 08 09 10 11 12 13 14

08 09 10 11 12 13

09 10 11 12

10 155

But since, in real life, we usually only take one sample of 30 for our research purposes, we don’t know if the sample was selected from the far left of the distribution below or the far right or the middle. So, we examine the probability that the sample did or did not come from the far left or the far right.

05 06 07 08 09 10 11 12 13 14 15 16

06 07 08 09 10 11 12 13 14 15

07 08 09 10 11 12 13 14

08 09 10 11 12 13

09 10 11 12

10 155

But since, in real life, we usually only take one sample of 30 for our research purposes, we don’t know if the sample was selected from the far left of the distribution below or the far right or the middle. So, we examine the probability that the sample did or did not come from the far left or the far right.

05 06 07 08 09 10 11 12 13 14 15 16

06 07 08 09 10 11 12 13 14 15

07 08 09 10 11 12 13 14

08 09 10 11 12 13

09 10 11 12

10 155

Hmm. . . What are

the chances

the sample came from

the far right or left

of the Distri-

bution?

So let’s say we want to know if the students who go to a college party are more excited to be there than little girls at a birthday party.

Here are the sampling distributions of the excitability of young girls at a birthday party.

Let’s say we don’t have the same kind of distribution for college student excitability at a party.

We want to know if there is a statistical difference between the girls at the birthday party and the excitability of college students at a Friday night party.

We randomly select a group of college students at a party and measure their levels of excitability.

Our random selection is “13”.

Our random selection is “13”. Since this number does not lie in the extreme ends we would reject the null hypothesis or render a judgment of “not guilty”.

Our random selection is “13”. Since this number does not lie in the extreme ends we would reject the null hypothesis or render a judgment of “not guilty”. College Students and little girls show no difference.

However, what if we randomly selected a college student sample with an average excitability value of “05”.

However, what if we randomly selected a college student sample with an average excitability value of “05”. Wow! This is a rare occurrence.

Because the chance of that happening is so rare we would reject the null hypothesis.

Because the chance of that happening is so rare we would reject the null hypothesis. We would say “guilty!”

Because the chance of that happening is so rare we would reject the null hypothesis. We would say “guilty!” But if in reality there is no difference,

Because the chance of that happening is so rare we would reject the null hypothesis. We would say “guilty!” But if in reality there is no difference, then we have made a type I error.

Researchers are willing to take that chance.

In conclusion, hypothesis testing, is a way of determining the probability of our default position (not guilty or no difference) being correct or incorrect.

We determine the likelihood of being right or wrong based on the results.

In conclusion, hypothesis testing, is a way of determining the probability of our default position (not guilty or no difference) being correct or incorrect.

We determine the likelihood of being right or wrong based on the results. Then we decide if we are willing to maintain our default position (no difference) or go out on a limb and change our default position (yes there is a difference).

What follows are exercises to help you check your understanding.

Go as far as you feel you need to until you have a good feel for what you know.

First Set of Questions

1. Which expression below from the world of judicial decision-making best describes the “Null-hypothesis”?

1. Which expression below from the world of judicial decision-making best describes the “Null-hypothesis”?A. “Guilty as charged”B. “Not guilty until proven innocent”C. “Pleading no contest”

2. What is another way to say “Null-hypothesis”?

2. What is another way to say “Null-hypothesis”?A. Not clearB. Not differentC. Not important

With hypothesis testing we are attempting to set up a default position of not guilty. We stay in that position unless we have enough evidence to overturn it.

With hypothesis testing we are attempting to set up a default position of not guilty. We stay in that position unless we have enough evidence to overturn it. Let’s say our null-hypothesis is the following:

There is no difference in IQ between children who are exposed to classical music between the ages of 0 and 3 and those who were not.

With hypothesis testing we are attempting to set up a default position of not guilty. We stay in that position unless we have enough evidence to overturn it. Let’s say our null-hypothesis is the following:

There is no difference in IQ between children who are exposed to classical music between the ages of 0 and 3 and those who were not.

This is our default position. We are not neutral, we are claiming at the outset that there is no difference.

But then along comes some evidence that over turns that position. So we reject the null hypothesis and claim there is a probable difference.

Notice how we don’t say “there is a difference”. We say there is a probable or statistical difference. This just means that with statistics we are never 100% certain. We just say that the probability that we are wrong is a certain percent. Usually that percent needs to be pretty low.

If we have estimated that there is a 60% chance that we are wrong, that is a risk not worth taking. If you were told that you had a 60% chance of losing a lot of money and a 40% chance of making a lot of money, would you take that chance?

Probably not. But if you were told that you had only a 5% chance of losing a lot of money and a 95% of earning a lot, that might be a chance you would be willing to take. The same holds true with hypothesis testing.

If we have estimated that there is a 60% chance that we are wrong, that is a risk not worth taking. If you were told that you had a 60% chance of losing a lot of money and a 40% chance of making a lot of money, would you take that chance?

Probably not. But if you were told that you had only a 5% chance of losing a lot of money and a 95% of earning a lot, that might be a chance you would be willing to take. The same holds true with hypothesis testing.

Based on that instruction, consider your answer to these questions again and explain the correct answer in your own words.

Second Set of Questions – see if you can answer these questions, if not go to the instruction that follows and you’ll be given an opportunity to respond to the questions armed with the instruction.

3. When the jury decides “not guilty” and the defendant really is “not guilty”, in statistics that is the same as saying:

A. ACCEPT the null hypothesis and it turns out - - - you were right to do so.

B. REJECT the null hypothesis and it turns out - - - you were right to do so.

4. When the jury decides “guilty” and the defendant actually was “not guilty”, in statistics that is the same as saying:

A. ACCEPT the null hypothesis and it turns out - - - you were wrong to do so.

B. REJECT the null hypothesis and it turns out - - - you were wrong to do so.

5. When the jury decides “not guilty” and the defendant actually was “guilty”, in statistics that is the same as saying:

6. When the jury decides “guilty” and the defendant really is “guilty”, in statistics that is the same as saying:

Accepting the null-hypothesis is essentially like saying “not guilty” or that we accept the default position of innocence or no difference.Rejecting the null-hypothesis is essentially like saying “guilty” or that we reject the default position of innocence or there is enough evidence to suggest there is a difference.

Here is a visual:

Null-hypothesis ACCEPTED!

Here is a visual:

I was found NOT

GUILTY!

Here is a visual:

I was found NOT

GUILTY!

Na, na, . . . nanana! There is NOT enough statistical evidence to convict or reject the null-hypothesis!

Here is a visual:

I was found NOT

GUILTY!

Na, na, . . . nanana! There is NOT enough statistical evidence to convict or reject the null-hypothesis!

Not Guilty = Accept the Null

Here is a visual:

Null-hypothesis REJECTED!

Here is a visual:

I was found

GUILTY!

Here is a visual:

I was found

GUILTY!

Wa, Wa! There IS enough statistical evidence to convict or reject the null-Hypothesis!

Here is a visual:

I was found

GUILTY!

Wa, Wa! There IS enough statistical evidence to convict or reject the null-Hypothesis!

Guilty = Reject the Null

Third Set of Questions - see if you can answer these questions, if not go to the instruction that follows and you’ll be given an opportunity to respond to the questions armed with the instruction.

7. When the jury decides “guilty” (reject the null) and the defendant actually was “not guilty” (shouldn’t have rejected the null), what type of error has been committed?

A. Type I errorB. Type II error

8. When the jury decides “not guilty” (accept the null) and the defendant actually was “guilty” (reject the null), what type of error has been committed?

9. Which type of error is preferable?A. Type I errorB. Type II error

10. Question: What is the haunting reality? Answer: We actually never know for sure if we have committed a type I or II error. All we are doing is determining the probability that we . . .have committed an error. are correct in our hypothesis.

10. Question: What is the haunting reality? Answer: We actually never know for sure if we have committed a type I or II error. All we are doing is determining the probability that we . . .

A. have committed an error. B. are correct in our hypothesis.

Let’s consider each type of error

1. State your null-hypothesis; There is no significant difference between females and males in terms of their preference of certain sports-car colors

2. Collect your evidence,3. Determine if the evidence merits accepting or rejecting

the null-hypothesis,4. You accept the null5. In reality (and you could never know this for sure) you

were wrong. In actuality there is a difference between men and women sports-car color preference and you should have rejected the null.

6. This is a type I error

2. Collect your evidence3. Determine if the evidence merits accepting or rejecting

the null-hypothesis,4. You accept the null5. In reality (and you could never know this for sure) you

the null-hypothesis4. You accept the null5. In reality (and you could never know this for sure) you

This is a type I error

the null-hypothesis4. You reject the null5. In reality (and you could never know this for sure) you

were wrong. In actuality there is NO difference between men and women sports-car color preference and you should have accepted the null

This is a type II error

You’ll never know if you committed a type I or II error.

You can only estimate the probability that you did!

That’s because with statistics we deal in

probability, not certainty.

Based on the instruction you just received, respond to these questions again. Explain your reasoning for selecting the options you did.

Answers: 7-A, 8-B, 9-B, 10-A

Fourth Set of Questions - see if you can answer these questions, if not go to the instruction that follows and you’ll be given an opportunity to respond to the questions armed with the instruction.

11. Question: How do we decide how much evidence is required before we will reject the null hypothesis?

Answer: We estimate the probability of being ______ a certain percent of the time (e.g., .05 or 5% of the time).

a. rightb. wrong

a. rightb. wrong12. Question: What does a .05 rejection level mean?

Answer: If we were to take the same small sample 100 times from a population, we would be willing to _____________________ .05 or 5% of the time

a. . . . take the chance of being wrong . . . b. . . . reject the null hypothesis . . .

In statistics we generally ask ourselves, “What is the probability that we have made a type I error?”

In statistics we generally ask ourselves, “What is the probability that we have made a Type I Error?”

Type I errors are considered a bigger issue because if we are wrong, than we might waste a lot of money or impact people negatively (e.g., spend millions of dollars on a new drug that doesn’t work).

In statistics we generally ask ourselves, “What is the probability that we have made a Type I Error?”

Type I errors are considered a bigger issue because if we are wrong, than we might waste a lot of money or impact people negatively (e.g., spend millions of dollars on a new drug that doesn’t work).

Type II errors are considered less of an issue because if we are wrong, than we may stop or continue researching.

We have to have determine a cut-off point as to when we will reject the null-hypothesis. No matter what cut-off point we could have chosen, the decision would always have been somewhat arbitrary.

Would we be satisfied with a 75% chance of committing a type I error? Probably not. That means out of 100 experiments we would live with being wrong about our conclusions 75 times.

Would we be satisfied with a .01% chance of committing a type I error? Probably not. That means out of 10,000 experiments we would live with being wrong about our conclusions only once. If that were the case, then almost no null-hypothesis could ever be rejected.

In the discipline of statistics .05 or 5% of a chance of committing a type I error has been deemed an acceptable arbitrary cut-off point. This means that out of 100 experiments we will live with being wrong five times.

Would we be satisfied with a .01% chance of committing a type I error? Probably not. That means out of 10,000 experiments we would live with being wrong about our conclusions only once. If that were the case, then almost no null-hypothesis could ever be rejected.

Based on the instruction you just received, respond to these questions again. Explain your reasoning for selecting the options you did.

a. rightb. wrong

a. . . . take the chance of being wrong . . . b. . . . reject the null hypothesis . . . Answers: 11-B, 12-A

Hypothesis Testing

Education

Hypothesis-Testing Model-Complexity. Hypothesis Testing …

Hypothesis - Testing

Hypothesis and Hypothesis Testing HYPOTHESIS A statement about the value of a population parameter developed for the purpose of testing. HYPOTHESIS TESTING

Chapter 10 Hypothesis Testing 10 HYPOTHESIS TESTING

Introduction Hypothesis testing for one mean Hypothesis testing for one proportion Hypothesis testing for two mean (difference) Hypothesis testing for

Ch7: Hypothesis Testing (1 Sample) 7.1 Introduction to Hypothesis Testing 7.2 Hypothesis Testing for the Mean (σ known ) 7.3 Hypothesis Testing for the

Chapter 7 Hypothesis Testing. Define a hypothesis and hypothesis testing. Describe the five step hypothesis testing procedure. Distinguish between a one-tailed

HYPOTHESIS TESTING Null Hypothesis and Research Hypothesis ?

Non-parametric Hypothesis Testing Procedureshaalshraideh/Courses/IE347/Non...Non-parametric Hypothesis Testing Procedures Hypothesis Testing General Procedure for Hypothesis Tests

Hypothesis testing

Hypothesis Testing Judicial Analogy Hypothesis Testing Hypothesis testing Null hypothesis Purpose Test the viability Null hypothesis Population

Hypothesis Testing I: 1 Hypothesis Testing – Part I