Chi square test final

Embed Size (px)

DESCRIPTION

apply it with spss

Citation preview

  • 1.CHI SQUARE TESTDR HAR ASHISH JINDAL JR

2. Contents Definitions Milestone in Statistics Chi square test Chi Square test Goodness of Fit Chi square test for homogeneity of Proportion Chi Square Independent test Limitation of Chi square Fischer Exact test Continuity correction Overuse of chi square 3. Definitions Statistics defined as the science, which deals with collection, presentation, analysis and interpretation of data. Biostatistics defined as application of statistical method to medical, biological and public health related problems. 4. StatisticsDescriptiveCollecting Organizing Summarizing Presenting DataInferentialMaking inference Hypothesis testing Chi Determining Square Test relationships Making predictions 5. Introduction Data : A collection of facts from which conclusions can be made. An observations made on the subjects one after the other is called raw data It becomes useful - when they are arranged and organized in a manner that we can extract information from the data and communicate it to others. 6. Definitions A variable is any characteristics, number, or quantity that can be measured or counted. Independent variable: doesnt changed by the other variables. E.g age Dependent variable: depends on other factors e.g test score on time studied Parameter: is any numerical quantity that characterizes a given population or some aspect of it. E.g mean 7. Data Types DISCRETEInterval dataQUANTITATIVE CONTINOUS Ratio dataDataNOMINAL QUALITATIVE ORDINAL 8. Qualitative Data Qualitative variables Example: gender (male, female)Frequency in categoryNominal or ordinal scaleExamples Do you have a disease? - nominal What is the Socio economic status ? ordinal 9. MILESTONE IN STATISTICS "Karl Pearson's famous chi-square paper appeared in the spring of 1900, an auspicious beginning to a wonderful century for the field of statistics." (published in the Philosophical magazine ) 10. Chi Square Test Simplest & most widely used non-parametric test in statistical work. 11. Logic of the chi-square The total number of observations in each column and the total number of observations in each row are considered to be given or fixed. If we assume that columns and rows are independent, we can calculate - expected frequencies. 12. Logic of Chi square If no relationship exists between the column and row variable If a relationship (or dependency) does occurthe observed frequencies will be very close The observed frequencies will vary from the to the expected frequencies Compares thefrequencies frequency in expected observedwith the expected frequency. they will differ only by small amountsThe value of the chi-square statistic will be large. the value of the chi-square statistic will be smalleach cell 13. Steps for Chi square test Define Null and alternative hypothesisState alpha Calculate degree of freedom State decision ruleCalculate test statistics State and Interpret results 14. Hypothesis Testing Tests a claim about a parameter using evidence (data in a sample) gives causal relationshipsSteps 1. Formulate Hypothesis about the population 2. Random sample 3. Summarizing the information (descriptive statistic) 4. Does the information given by the sample support the hypothesis? Are we making any error? (inferential stat.) Decision rule: Convert the research question to null and alternative hypothesis 15. Null Hypothesis H0 = No difference between observed and expected observations H1 = difference is present between observed and expected observations 16. What is statistical significance? A statistical concept indicating that the result is very unlikely due to chance and, therefore, likely represents a true relationship between the variables. Statistical significance is usually indicated by the alpha value (or probability value), which should be smaller than a chosen significance level. 17. State alpha value Alpha is error(type I) that is Rejecting a true null hypothesis For majority of the studies alpha is 0.05 Meaning: the investigator has set 5% as the maximum chance of incorrectly rejecting the null hypothesis 18. Degree of freedom It is positive whole number that indicates the lack of restrictions in calculations. Calculation For Goodness of Fit = Number of levels (outcome)-1 For independent variables / Homogeneity of The degree of (No. of columns numberof rows 1) in proportion : freedom is the 1) (No. of valuesa calculation that can vary. 19. The Chi-Square Distribution No negative values Mean is equal to the degrees of freedom The standard deviation increases as degrees of freedom increase, so the chi-square curve spreads out more as the degrees of freedom increase. As the degrees of freedom become very large, the shape becomes more like the normal distribution. 20. The Chi-Square Distribution The chi-square distribution is different for each value of the degrees of freedom, different critical values correspond to degrees of freedom. we find the critical value that separates the area defined by from that defined by 1 . 21. Finding Critical Value Q. What is the critical 2 value if df = 2, and =0.05?If ni = E(ni), 2 = 0Reject H0Do not reject H0 = 0.05df =20 2 Table (Portion)DF 1 20.995 ... 0.0105.9912Significance level 0.95 0.004 0.103 0.05 3.841 5.991 22. State decision rule If the value obtained is greater than the critical value of chi square , the null hypothesis will be rejected 23. Expected ValueCalculate test statistics Calculated using the formulaChi square for independent variables 2 = of fit ( O E )2 Chi square for goodness Homogeneity of proportion E O = observed frequencies E = expected frequencies a theory Previous study Comparison groups Previous study standard Expected Value = Row total * Column total / Table totalQuestion >>> How to find the Expected value 24. State and interpret results See whether the value of chi square is more than or less than the critical valueIf the value of chi square is less than the critical value we accept the null hypothesisIf the value of chi square is more than the critical value the null hypothesis can be rejected 25. Chi square test Goodness of fit For homogeneity of Proportions For 2 independent groups Cohort Study Case control study Matched case control Study For > 2 independent groups 26. Goodness of fit Q How "close" are the observed values tocan be based Expected frequency those which would be expected in a on theory study previous experience OR comparison groups Q.whether a variable has a frequency distribution compariable to the one expected.Chi-square goodness of fit test 27. Goodness of fit A goodness-of-fit test is an inferential procedure used to determine whether a frequency distribution follows a claimed distribution. It is a test of the agreement or conformity between the observed frequencies (Oi) and the expected frequencies (Ei) for several classes or categories (i) 28. Example :Is Sudden Infant Death Syndrome seasonal?? Null Hypothesis: The proportion of deaths due to SIDS in winter , summer , autumn , spring is equal = = 25% Alternative :Not all probabilities stated a in null hypothesis is correct SIDS casesObservedExpected = 322*1/4Summer7880.5Spring7180.5Autumn8780.5Winter8680.5Total322For =0.05 for df =3 critical value X2 = 7.81 X2 = (78-80.5)2/80.5 + (71- 80.5)2/80.5 + (87.5 80.5)2/80.5 + (86 80.5)2/80.5 = 2.09 Degree of freedom = k-1 = 4-1 =3Conclusion: As calculated X2 value is less than Critical value we can accept the null hypothesis and state that deaths due to SIDS across seasons are not statistically different from what's expected by chance (i.e. all seasons being equal) 29. Chi square test Goodness of fit For homogeneity of Proportions For 2 independent groups Cohort Study Case control study Matched case control Study For > 2 independent groups 30. Homogeneity of proportions In a chi-square test for homogeneity of proportions, we test the claim that different populations have the same proportion of individuals with some characteristic. EXAMPLE: Is there evidence to indicate that the perception of effects of vaccination is the same in 2013 as was in 2000? Q what is the effect of vaccination on health ? Answers :- Good , No , BadNull hypothesis: Ho = No difference between the two population H1 = There is difference between the two population 31. State alpha = 0.05 find df = (3-1)(2-1)= 2 =5.99Chi square distributionX2= 5.991 32. 20002013Expected 2000 frequency Good -656 No- 283 Good effect (989)(1382)/1 Bad- 50 987 = 687.872013No effect(989)(505)/19 87 = 251.36 2000(998)(505)/1987 = 253.64 2013656 (989)(100)/19 87= 49.77 283726 (998)(100)/1987 = 50.23 222Observed Good Bad effect No effect Bad Total Column total(998)(1382)/198 7=694.1350 989 98950 998998Row total Good- 726 No-222 1382 Bad -50505 Total 1382 100 505 100 1987 1987 33. Homogeneity of proportions 2 value = (O-E)2/E Calculated 2= 10.871 Results: as 10.871> 5.991 we reject the null hypothesis at 0.05 significance . >There is a statistically significant difference in the level of feeling towards vaccination between 2000 and 2013 34. Chi square test Goodness of fit For homogeneity of Proportions For 2 independent groups Cohort Study case control study Matched case control Study For > 2 independent groups 35. Chi square Independence test It is used to find out whether there is an association between a row variable and column variable in a contingency table constructed from sample data. 36. Assumption The variables should be independent. All expected frequencies are greater than or equal to 1 (i.e., E>1.) No more than 20% of the expected frequencies are less than 5Calculated as 2 value = (O-E)2/E 37. Expected Count Joint probability =Exposurea+b a+c tt ttMarginal probability = a+b ttLocation Disease Disease present neg.TotalPresentaNegativecdc+dTotala+cb+dttMarginal probability =ba+c ttExpected count =a+ bsample size (tt)a+b a+c tt tt 38. Short cut of Chi Square 39. Short cut of Chi Square Observed valuesExpected values 40. => (37- 22.5)2/22.5 +(13 27.5)2/27.5 +(17-31.5)2 /31.5+ (53-38.5)2/38.5 = 29.1 120[(37)(53)(13)(17)]2 / 54(66)(50)(70) = 29.1 41. Application in various studies Cohort study Case control study Matched case control study 42. Cohort StudyAssumptions: The two samples are independent Let a+b = number of people exposed to the risk factor Let c+d = number of people not exposed to the risk factor Assess whether there is association between exposure and disease by calculating the relative risk (RR) 43. Example: To test the association in a cohort study among smoking and Lung CA Null hypothesis :Ho=the association risk of Smoking and Lung CA (RR=1) We can define No relative between disease: H1 =Association present b/w smoking and Lung CAp1= (Incidence of disease in exposure present) p2 = (Incidence of disease exposure CA Sing Lung CA Lung absent) Total present absent Relative risk YES 84 2914 3000 RR= p1/p2 NO 87 4913 5000 Hence for these studies TOTAL 171 7827 8000 RR= (a/a+ b)/(c/c + d) RR = (84/3000)/(87/5000)=1.21We can test the hypothesis that RR=1 by calculating the Alpha value= 0.05 and df = 1 chi-square test statistic CONCLUSION:As the X2 > than 3.82 we reject the null hypothesis of RR=1 at 0.05 significance. 44. Case control studyAssumptions The samples are independent Cases = diseased individuals = a+c Controls = non-diseased individuals = b+d Assess whether there is association between exposure and disease by calculating the odds ratio (OR) 45. Example: To test the association in a case control study between CHD and smoking Null hypothesis Ho: No association between CHD and smoking(OR=1) H1= Association exists between CHD and Smoking(OR>1 or 3.84 46. Matched case control study Case-control pairs are matched on characteristics such as age, race, sex Assumptions Samples are not independent The discordant pairs are case-control pairs with different exposure histories The matched odds ratio is estimated by bb/cc Pairs in which cases exposed but controls not = bb Pairs in which controls exposed but cases not = cc Assess whether there is association between exposure and disease by calculating the matched odds ratio (OR) 47. To test association of smoking exposure and CHD in a matched case control study Null hypothesis : No association of smoking exposure and CHD (OR=1) Alternative Hypothesis: Association exists between smoking exposure and CHD(OR>1 or< 1) CHD absent Test whether OR = 1 by calculating Smoking history Smoking history McNemars statistic present absent Smoking history present2040(bb)Smoking history absentCHD present10(cc)30Alpha value= 0.05 and df = 1OR=40/10 = 4X2= [(40-10)-1]2/(40+10) = 841/50 = 16.81 Conclusion: We reject the Null Hypothesis that OR =1 as calculated X 2 >3.84 48. Chi square for > 2 independent variables The chi-square test is used regardless of whether the research question in terms of proportions or frequencies Contingency tables can have any number of rows and columns. The sample size needs to increase as the number of categories increases to keep the expected values of an acceptable size. 49. Limitation of Chi square test Conditions for approximation of chi square is adequate: No expected frequency should be