Upload
marie-smith
View
16
Download
4
Embed Size (px)
DESCRIPTION
Statistics 101 UP Diliman Lecture Slide
Citation preview
Chi-square Test and
Correlation
Statistics 101 - Villejo
Objectives
• Know the assumptions of the chi-square test for
independence.
• Perform a chi-square test for independence.
• Learn the concept of correlation.
• Compute for and interpret the coefficient of
correlation.
• Perform the test of significance on the coefficient of
correlation.
Chi-square Distribution
a. Like the t-distribution, it has a single parameter called the degrees of freedom.
b. The distribution is skewed to the right. As the degrees of freedom increases, its distribution becomes more symmetric.
c. Its mean is equal to its degrees of freedom. Its variances is twice its degrees of freedom.
d. Notation: If X is a random variable that follows a chi-square distribution with v degrees of freedom, we write X~χ2(𝑣).
e. If X~χ2 𝑣 , then χ2α(𝑣) satisfies the condition that
P(X > χ2α(𝑣)) = α or P(X ≤ χ2
α(𝑣)) = 1 -α
Example
Suppose X~χ2 𝑣 , determine the following:
a. P(X > 18.307)
b. P(X < 20.483)
c. χ20.01(10)
• The chi-square test for independence is used to
determine whether two categorical variables are
related.
• Example:
Remarks
• An r x c contingency table has r rows and c columns
• The marginal frequencies are the row and
column totals
Procedure in testing for independence
1. State the null and alternative hypotheses:
Ho: the two variables are independent
Ha: the two variables are not independent
2. Choose the level of significance α
3. The test statistic is:
where the Oi’s are the observed frequencies and
the Ei’s are the expected frequencies given by:
Decision Rule: Reject Ho if χ2 > χ2α(v), where
v = (r – 1) x (c – 1).
4. Compute for the value of the test statistic
5. Make the decision (whether the two variables are
independent or not). Express this decision in terms
of the problem.
The chi-square test for independence is valid only if
1. At least 80% of the cells have expected
frequencies ≥ 5.
2. No cell has expected frequency ≤ 1
• For a 2 x 2 contingency table, we use Yates’
correction for continuity:
Example. People are chosen at random from the state of Illinois until 1000 people have been classified as to whether they are of Protestant, Catholic, or Jewish faith, and whether or not they worship regularly. Test at 0.05 level of significance if religion is independent of pattern of worship.
Definition
• The coefficient of correlation is a measure of the
strength of the linear relationship existing between
two variables, X and Y, that is independent of their
respective scales of measurement.
• This is denoted by the Greek letter ρ (rho), where -1
≤ ρ ≤ 1.
• ρ = 1 → perfect positive linear relationship
• ρ = -1 → perfect negative linear relationship
• ρ = 0 → no linear relationship between
X and Y
Remarks
• The coefficient of correlation can establish the
strength of the linear relationship between two
quantitative variables, but it can not establish
causality!
• We cannot say X causes Y or Y causes X.
Pearson product-moment coefficient of correlation:
Testing the Correlation Coefficient