Lecture 19. Chi-square and Correlation

Chi-square Test and

Correlation

Statistics 101 - Villejo

Objectives

• Know the assumptions of the chi-square test for

independence.

• Perform a chi-square test for independence.

• Learn the concept of correlation.

• Compute for and interpret the coefficient of

correlation.

• Perform the test of significance on the coefficient of

correlation.

Chi-square Distribution

a. Like the t-distribution, it has a single parameter called the degrees of freedom.

b. The distribution is skewed to the right. As the degrees of freedom increases, its distribution becomes more symmetric.

c. Its mean is equal to its degrees of freedom. Its variances is twice its degrees of freedom.

d. Notation: If X is a random variable that follows a chi-square distribution with v degrees of freedom, we write X~χ2(𝑣).

e. If X~χ2 𝑣 , then χ2α(𝑣) satisfies the condition that

P(X > χ2α(𝑣)) = α or P(X ≤ χ2

α(𝑣)) = 1 -α

Example

Suppose X~χ2 𝑣 , determine the following:

a. P(X > 18.307)

b. P(X < 20.483)

c. χ20.01(10)

• The chi-square test for independence is used to

determine whether two categorical variables are

related.

• Example:

Remarks

• An r x c contingency table has r rows and c columns

• The marginal frequencies are the row and

column totals

Procedure in testing for independence

1. State the null and alternative hypotheses:

Ho: the two variables are independent

Ha: the two variables are not independent

2. Choose the level of significance α

3. The test statistic is:

where the Oi’s are the observed frequencies and

the Ei’s are the expected frequencies given by:

Decision Rule: Reject Ho if χ2 > χ2α(v), where

v = (r – 1) x (c – 1).

4. Compute for the value of the test statistic

5. Make the decision (whether the two variables are

independent or not). Express this decision in terms

of the problem.

The chi-square test for independence is valid only if

1. At least 80% of the cells have expected

frequencies ≥ 5.

2. No cell has expected frequency ≤ 1

• For a 2 x 2 contingency table, we use Yates’

correction for continuity:

Example. People are chosen at random from the state of Illinois until 1000 people have been classified as to whether they are of Protestant, Catholic, or Jewish faith, and whether or not they worship regularly. Test at 0.05 level of significance if religion is independent of pattern of worship.

Definition

• The coefficient of correlation is a measure of the

strength of the linear relationship existing between

two variables, X and Y, that is independent of their

respective scales of measurement.

• This is denoted by the Greek letter ρ (rho), where -1

≤ ρ ≤ 1.

• ρ = 1 → perfect positive linear relationship

• ρ = -1 → perfect negative linear relationship

• ρ = 0 → no linear relationship between

X and Y

Remarks

• The coefficient of correlation can establish the

strength of the linear relationship between two

quantitative variables, but it can not establish

causality!

• We cannot say X causes Y or Y causes X.

Pearson product-moment coefficient of correlation:

Testing the Correlation Coefficient

Documents

Lecture 19. Chi-square and Correlation