31
INDEPENDENT INDEPENDENT VARIABLES AND VARIABLES AND CHI SQUARE CHI SQUARE

INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

Embed Size (px)

Citation preview

Page 1: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

INDEPENDENT INDEPENDENT VARIABLES ANDVARIABLES AND

CHI SQUARECHI SQUARE

Page 2: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

Independent versus Dependent Independent versus Dependent VariablesVariables

Given two variables X and Y, they are said to be independent if the occurance of one does not affect the probability of the occurence of the other.

Formally, X and Y are independent if

P (P (XX | | Y Y) = P () = P (XX)) or or P (P (Y Y | | XX) = P () = P (YY))

What does it mean?

Page 3: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

Independent versus Dependent Independent versus Dependent VariablesVariables

yy11 … y … yk k … y … yqq

xx11

……

xxhh

……

XXpp

nn1111 ... n ... n1k 1k … n … n1q1q

nnh1 h1 … n … nhk hk … n … nhqhq

nnp1 p1 … n … npkpk … n … npq pq

nn1100

nnh0h0

nnp0p0

nn01 01 nn0k0k nnpqpq nn

kh YPXYPn

n

n

n

khXPYXPn

n

n

n

khkh

k

hk

hkhk

h

hk

and every for )()|(

and every for )()|(

0

0

0

0

Consider the following contingency table

XXYY

We say that X is independent from Y if

Page 4: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

Independent versus Dependent Independent versus Dependent Variables: example 1 Variables: example 1

The following table gives a contingency table of an observed population (in million) based on gender (X) and healt insurance coverage (Y). Are the two variables independent? That is the health insurance coverage depends on gender?

Covered by healt

insurance

Not Covered by healt

insurance

Total

MaleFemale

107.5112.6

19.4920.41

127133

Total 220.1 39.9 260

Page 5: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

Independent versus Dependent Independent versus Dependent Variables Variables

Covered by healt

insurance

Not Covered by healt

insurance

Total

MaleFemale

0.490.51

0.490.51

0.490.51

Total 1 1 1

We have to verify

2.

YESYES

XX

YY

49.01.220

5.107

01

11 n

n 51.01.220

49.19

01

21 n

n

49.0260

12710 n

n51.0

260

13320 n

n

kh XPYXPn

n

n

nhkh

h

k

hk and every for )()|(0

0

Page 6: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

Independent versus Dependent Independent versus Dependent Variables Variables

Covered by healt

insurance

Not Covered by healt

insurance

Total

MaleFemale

0.850.85

0.150.15

11

Total 0.85 0.15 1

khYPXYPn

n

n

nkhk

k

h

hk and every for )()|( 0

0

We have to verify

1.

YESYES

XX

YY

85.0127

5.107

10

11 n

n 15.0127

49.19

10

12 n

n

85.0260

1.22001 n

n15.0

260

9.3902 n

n

Page 7: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

Independent versus Dependent Independent versus Dependent Variables: example 2 Variables: example 2

Consider the example of the 420 employees. Are the variable Smoke (X) independent from the variable College Graduate (Y)?

College Graduate

Not a College Graduate Total

Smoker 35 80 115

Nonsmoker 130 175 305

Total 165 255 420

Page 8: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

Independent versus Dependent Independent versus Dependent Variables: example 2 Variables: example 2

College Graduate

Not a College

GraduateTotal

Smoker 0.30 0.69 1

Nonsmoker 0.43 0.57 1

Total 0.39 0.61 1

We have to verify

39.043.030.0 No independence!!No independence!!

khYPXYPn

n

n

nkhk

k

h

hk and every for )()|( 0

0

Page 9: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

Independent versus Dependent Independent versus Dependent VariablesVariables

Two variables are maximally dependent if the contingency table is

yy11 … y … yk k … y … yqq

xx11

……

xxhh

……

xxpp

nn1111 ... 0 ... 0 … 0 … 0

00 … 0 … 0 … n … nhqhq

0 … n0 … npkpk … 0 … 0

nn1111

nnhqhq

nnpkpk

nn11 11 … … nnpkpk … … nnhqhq nn

There is a one-to-one relation between the categories of the two variables

Page 10: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

Chi squareChi squareHow caw we measure the “degree” of dependence between two variables?

Remind that two variables are independentindependent if

From these relations we get:

n

nnn

n

n

n

n hkhk

h

k

hk 00*0

0

nnhkhk** is called theoretical theoretical or expected frequencyexpected frequency (EE) since it

expresses the frequency of the category h of X and k of Y in condition of independence.

khYPXYPn

n

n

nkhk

k

h

hk and every for )()|( 0

0

kh XPYXPn

n

n

nhkh

h

k

hk and every for )()|(0

0

Page 11: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

Chi squareChi square

The observed frequencies nik are indicated with (O).

If the observed frequencies (O) are equal to the expected frequencies (E ) the variables are independent.

We can build an indicator of independence/dependence between the two variables called Chi square. Chi square. The formula is

hkhk

hkhk

E

EO 22 )(

It is evident the if Chi square is equal to 0 (O=E ) the two variables are independent.

Page 12: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

Chi square: example 1Chi square: example 1Violence and lack of discipline have become major problems in schools in the United States. A random sample of 300 adults was selected, and they were asked if they favor giving more freedom to schoolteachers to punish students for violence and lack of discipline. The two-way classification of the responses of these adults is represented in the following table. Are the two variables gendergender and opinionopinion independent?

In Favor(F)

Against(A)

No Opinions(N)

Men (M)Women

(W)

9387

7032

126

Page 13: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

Chi square: example 1Chi square: example 1

In Favor

(F)

Against(A)

No Opinion(N)

Row Tota

ls

Men (M) 93 70 12 175

Women (W)

87 32 6 125

Column Totals

180 102 18 300In order to compute the chi square we have to compute the expected frequencies as follows:

n

nnn hk

hk00*

totalGrand

)lumn total total)(CoRow(E

Page 14: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

Chi square: example 1Chi square: example 1In Favor

(F)Against

(A)No Opinion

(N)Row

Totals

Men (M)

93 (O O )(105.00) (E E

)

70(59.50)

12(10.50)

175

Women (W)

87(75.00)

32(42.50)

6(7.50)

125

Column Totals

180 102 18 300

For example

300

1801751050110*

11

n

nnn

Page 15: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

Chi square: example 1Chi square: example 1

252.8300.594.2920.1214.853.1371.1 50.7

50.76

50.42

50.4232

0.75

0.7587

50.10

50.1012

50.59

50.5970

0.105

0.10593

)(

222

222

22

E

EO

The value of the chi square is different from 0 and hence we should conclude that the two variables are independent.

Page 16: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

Chi square: critical valueChi square: critical value

However it can happen that even if the chi square is different from 0, its value is sufficiently small to think that there is independence between the variables of interest.

But which value of the chi square can be considered a critical valuecritical value so that values under this critical value indicate independence and values over this critical value indicate dependence between the two variables?

It does not exist a fixed critical value. It is determined time by time depending on the data we are examining by using the methods and the principles of the statistical inference

Page 17: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

Chi square: critical valueChi square: critical valueWe do not deal with the computation of the critical valuecritical value.However the critical value critical value is computed from all the Statistical software, included Excel.

RuleRule

1.If the critical value > chi square critical value > chi square the two variables can be considered independentindependent2.If the critical value < chi square critical value < chi square the two variables can be considered dependentdependent in the sense that they influence reciprocally.

In the previous example the critical value is 9.21.It is greater than the value of the chi square (8.252) than we can say that the two variables are independent, that is the opinion of the selected people is not influenced by the gender.

Page 18: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

Chi square: example 2Chi square: example 2A researcher wanted to study the relationship between gender and owning cell phones. She took a sample of 2000 adults and obtained the information given in the following table.

Own Cell Phones

Do Not Own Cell Phones

Men Wome

n

640440

450470

Looking at the table can we conclude that gender and owning cell phones are related for all adults?

Page 19: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

Chi square: example 2Chi square: example 2

Own Cell Phones (Y)

Do Not Own Cell Phones

(N)

Row Total

s

Men (M)

640(588.60)

450(501.40)

1090

Women

(W)

440(491.40)

470(418.60)

910

Column

Totals1080 920 2000

We have to compute the expected frequencies

Page 20: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

Chi square: example 2Chi square: example 2

445.21311.6376.5269.5489.4 60.418

60.418470

40.491

40.491440

40.501

40.501450

60.588

60.588640

)(

22

22

22

E

EO

Critical value= 3.841

The critical value is less than the chi square and hence we can conclude the two variables are dependent, that is owning cell phone depends on gender.

Page 21: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

LINEAR REGRESSIONLINEAR REGRESSION

Page 22: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

LINEAR REGRESSIONLINEAR REGRESSION

So far we investigated the relation of independence/dependence between two variables (qualitative or quantitative).However this kind of relation is reciprocal, in the sense that we don’t know if one variable influences the other or vice versa and we don’t know how strong is this relation.

If we would like to know if one variable influences the other and how strong this relation is we have to refer to Linear regression.Linear regression.By using the regression analysis we can evaluate the magnitude of change in one variable due to a certain change in another variable and we can predict the value of one variable for a given value of the other variable.

(Linear) regression (Linear) regression is a statistical analysis that evaluates if exists a linear relationship between two quantitativequantitative variables, X and Y.

Page 23: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

SIMPLE LINEAR REGRESSIONSIMPLE LINEAR REGRESSION

DefinitionDefinition

A regression model is a mathematical equation that describes the relationship between two or more variables. A simple regression modelsimple regression model includes only two variables: one independentindependent and one dependent.dependent. The dependent variable is the one being explained, and the independent variable is the one used to explain the variation in the dependent variable.

Page 24: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

Why is it called “regression model” or “regression analysis”?

The method was first used to examine the relationship between the heights of fathers and sons. The two were related, of course. But they found that a tall father tended to have sons shorter than himself; a short father tended to have sons taller than himself. The height of sons regressed to the mean. The term "regression" is now used for many sorts of curve fitting.

SIMPLE LINEAR REGRESSIONSIMPLE LINEAR REGRESSION

A (simple) regression model that gives a straight-line relationship between two variables is called a linear regression model. linear regression model.

Page 25: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

LINEAR REGRESSIONLINEAR REGRESSION: : example example 11

We want to investigate the relation between Incomes (in hundreds of dollars) (X) and Food Expenditures of Seven Households (Y). That is we want to investigate if Income influences Household’s decision about Food Expenditure and how strong is this influence.

Income (X) Food Expenditure (Y)

35 49 21 39 15 28 25

915 711 5 8 9

Page 26: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

LINEAR REGRESSIONLINEAR REGRESSION: : example example 11

We can represent the data with a Scatter plot.Scatter plot.A scatter plot is a plot of the values of Y versus the corresponding values of X:

Income

Food e

xpend

iture

First householdSeventh household

Page 27: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

LINEAR REGRESSIONLINEAR REGRESSION: : example example 11

The scatter plot seems to reveal a linear relationship between the two variables: a linear regression model might be indicated.In the Figure the points (observations) are replaced by a linear model (a) and non linear model (b).

Linear

Income

Nonlinear

Income

Food

E

xp

en

dit

ure

Food

E

xp

en

dit

ure

Page 28: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

LINEAR REGRESSION: the LINEAR REGRESSION: the equationequation

How can we write the linear model mathematically?

yy = = aa + + b b xx

Constant term or y- interceptintercept

Slope Slope

Dependent variableDependent variableIndependent variableIndependent variable

Page 29: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

LINEAR REGRESSION: interceptLINEAR REGRESSION: intercept

How can we represent aa graphically??

The intercept is the Y value of the line when X equals zero. The intercept determines the position of the line on the Y axis.

a1

a2

a3

a4

Y

X0

Page 30: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

LINEAR REGRESSION: slopeLINEAR REGRESSION: slopeHow can we represent bb graphically??

The slope quantifies the steepness of the line. It equals the change in Y for each unit change in X. If the slope is positive, Y increases as X increases. If the slope is negative, Y decreases as X increases.

X

Y

b>0b>0

X

Y

b<0b<0

X

Y

b1

b2

bb22>>bb11

Page 31: INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance

LINEAR REGRESSIONLINEAR REGRESSIONComing back to the example, among all the possible lines that can interpolate the points in the scatter plot which is the “bestbest” ” ?

Income

Food e

xpend

iture

Choosing the best line (or the line that best describes the relation between X and Y) means finding the “best” a and the “best” b