Issues in case-control studies Internal Medicine Samsung Medical Center Sungkyunkwan University School of Medicine Kwang Hyuck Lee [email protected] [email protected]

Presenters Name Date Issues in case-control studies Eliseo Guallar, MD, DrPH [email protected] [email protected] Juhee Cho, M.A., Ph.D. [email protected]

Presenters Name Date Case-control study historical synonyms Retrospective study Trohoc study Case comparison study Case compeer study Case history study Case referent study 3

Presenters Name Date Case Control Study Disease YesNo ExposedYesA1A1 B1B1 NoA0A0 B0B0 Case Control

Presenters Name Date , , , , *, *, ** , , *, **

(LDLT) : 50% . LDLT .

2006 1 2008 12 duct to duct (hepaticojejunostomy ) ,

group LDLT group ( : AST>80, ALT>80, ALP>250 or bilirubin>2.2) Group A : ERCP Vs ERCP Group B : Vs Group C : CT ERCP Vs

n=46 23 7 5 3 3 5 n=74 58 13 3 LDLT patients during 3years : n=213 need ERCP stricture leakage stone Patients with LFT elevation : n=120 not need ERCP rejection infection HCC viral reactivation vessel stenosis etc Analysis group B Analysis group A Analysis group C CT(-) need ERCP : 32 CT(-) not need ERCP : 40

Case-Control Study or not?

Presenters Name Date 11

Presenters Name Date Brock MV, et al. N Engl J Med 2008;358:900-9 13

Presenters Name Date Conducting case-control studies Case and Control selection Exposure measurement Odds ratio

Presenters Name Date Research New Question ?? Method Clinical study Translational study Laboratory study Clinical study Observational studies Case-control study Vs Cohort study Randomized controlled trial

Presenters Name Date Why case-control studies? New question of interest Cohort study with the appropriate outcome or exposure ascertainment does NOT exist Need to initiate a new study Do you have the time and/or resources to establish and follow new cohort? 16

Presenters Name Date Case control study ?? High cholesterol Myocardial infarction MI (+) case MI (-) control Cholesterol level Result Negative Positive 17

Presenters Name Date Impetus for case-control studies : EFFICIENCY May not have the sufficient duration of time to see the development of diseases with long latency periods. May not have the sufficiently large cohort to observe outcomes of low incidence. NOTE: Rare outcomes are not necessary for a case-control study, but are often the drive. 18

Presenters Name Date Efficiency of case-control study Do maternal exposures to estrogens around time of conception cause an increase in congenital heart defects? Assume RR = 2, 2-sided = 0.05, 90% power Cohort study: If I 0 = 8/1000, I 1 = 16/1000, would need 3889 exposed and 3889 unexposed mothers Case-control study: If ~30% of women are exposed to estrogens around time of conception, would need 188 cases and 188 controls Schlesselman, p. 17 20

Presenters Name Date Strengths of case-control study Efficient typically: Shorter period of time Not as many individuals needed Cases are selected, thus particularly good for rare diseases Informative may assess multiple exposures and thus hypothesized causal mechanisms 21

Presenters Name Date Learning objectives Exposure Selection of cases and controls Bias Selection, Recall, Interviewer, Information Odds ratios Matching Nested studies Conducting a case-control study DCR Chapter 8 22

Presenters Name Date Exposure ascertainment examples Active methods Questionnaire (self- or interviewer- administered) Biomarkers Passive methods Medical records Insurance records Employment records School records 23

Presenters Name Date Exposure ascertainment issues Establish biologically relevant period Measurement occurs once at current time Repeated exposure Previous exposure Measure of exposure occurs after outcome has developed Possibility of information bias Possibility of reverse causation (outcome influences the measure of exposure) 24

Presenters Name Date Is it possible in case-control study? relevant period 25 Yesterday smoking and radiation Cancer risk

Presenters Name Date Information bias: recall bias Mothers of babies born with congenital malformations more likely to recall (accurately or over-recall) events during pregnancy such as illnesses, diet, etc. 26

Presenters Name Date Possibility of reverse causation High cholesterol Myocardial infarction MI (+) case MI (-) control Cholesterol level Result ? MI Cholesterol level decrease Measure cholesterol after MI 27

Presenters Name Date Case selection basic tenets Eligibility criteria Characteristics of the target and source population Diagnostic criteria Definition of a case: misclassification Feasibility 28

Presenters Name Date Source populations samples Health providers: clinics, hospitals, insurers Occupations: work place, unions Surveillance/screening programs Laboratories, pathology records Birth records Existing cohorts Special interest groups: disease foundations or organizations 29

Presenters Name Date Incident versus prevalent cases Incident cases: All new cases of disease cases (that become diagnosed) in a certain period Prevalent cases: All current cases regardless of when the case was diagnosed 30

Presenters Name Date Incident Vs Prevalence Do the cases represent all incident cases in the target population? Exposuredisease association Vs Exposuresurvival association 31

Presenters Name Date Prevalence cases 32 Disease only A (causal factor) 1-month survival A+B (protective factor) 1-year survival A+C (protective factor)10-year survival Patient A: A11 month Patient B: A1+B1 year Patient C: A1+C 10 years Prevalence cases A1,B,C : Causes intervention of B or C Survival

Presenters Name Date Disease severity Which stage is chosen for a case? Early stage onlyProgression not always Late stage onlyInfluence of severity Increase sample size for stratification 33

Presenters Name Date Early stage only Case selection was done in prevalent cases of thyroid cancer Case: small thyroid cancer Control: normal population Determined the differences Clinical meaning of this study if there is no difference of survival between them 34

Presenters Name Date Late stage only difficult diagnosis 35 Pancreatic cancer Vs. Weight Cases: late stage pancreatic cancer Low weight due to Cancer progression Conclusion low weight pancreatic cancer Increase sample size for stratification

Presenters Name Date Selection bias Selection of cases independent of exposure status Related to severity Related to hospitalization or visiting 36

Presenters Name Date Example selection bias (1) Hypothesis Common cold Asthma Setting Patients in Hospital Truth Common cold: aggravating factor not causal factor No different incidence of asthma according to common cold Common cold (+) aggravation hospital visit Common cold (-) no symptoms no visit 37

Presenters Name Date 38 TotalCommon cold in society Patients in hospital Common cold in hospital Asthma1000105010 General2000002000100020 (10+ alpha) Cause positiveCause negative Case (asthma)1040 Control149 Odds ratio = (1X49)/(4X1) Example selection bias (2)

Presenters Name Date Case and Control selection 39 Same distribution of risk factors ??

Presenters Name Date Guallar E, et al. N Engl J Med 2002;347:1747-54 40

Presenters Name Date Selection of controls basic tenets Same target population of cases Confirmation of lack of outcome/disease Selection needs to be independent of exposure 41

Presenters Name Date Controls in case-control studies Should have the same proportion of exposed to non-exposed persons as the underlying cohort (source population) Should answer yes to: If developed disease of interest during study period, would they have been included as a case? 42

Presenters Name Date Selecting controls Same as case source Characteristics 1.Convenient 2.Most likely same target population 3.Rule out outcome avoids misclassification 4.Similar factors leading to inclusion into source population 5.Sometimes impractical Examples Breast cancer screening program Confirmed breast cancer cases No breast cancer controls Same hospital as case series Similar referral pattern examine by illness types Pediatric clinics Geographic population Other special populations (e.g., occupational setting) 43

Presenters Name Date Source for controls Geographic population Roster needed Probability sampling Neighborhood controls Random sample of the neighborhood Friends and family members Hospital-based control 44

Presenters Name Date Selection of controls: Friends or family members Friends or family members Ask each case for list of possible friends who meet eligibility criteria Randomly select among list Type of matching - will be addressed later Concerns: May inadvertently select on exposure status, that is, friends because of engaging in similar activities or having similar characteristics/culture/tastes over-matching 45

Presenters Name Date Am J Epidemiol 2004;159:915-21 46

Presenters Name Date Selection of controls Hospital or clinic-based Strengths Ease and accessibility Avoid recall bias Concerns Section bias: exposure related to the hospitalization A mixture of the best defensible control Referral pattern Same Or not 47

Presenters Name Date Diet pattern: Colon cancer (GI referral center) Case: (+) Control: (-) : . Control: (+) 48

Presenters Name Date Guallar E, et al. N Engl J Med 2002;347:1747-54 49

Presenters Name Date Weakness of Case-Control Studies Time period from which the cases arose Survival factor, Reverse causation Biologically relevant period Only one outcome measured Susceptibility to bias Separate sampling of the cases and controls Retrospective measurement of the predictor variables 50

Presenters Name Date Issues in case-control studies Eliseo Guallar, MD, DrPH [email protected] [email protected] Juhee Cho, M.A., Ph.D. [email protected]

Presenters Name Date Case and Control selection 52 Same distribution of risk factors ??

Presenters Name Date Selection of cases Case selection in hospitals Alcohol Hip fractures: All visit hospitals IUD abortion 1 st abortion: Some visit but others not Women with IUD in general population more frequently visit clinics 53 Disease No disease Exposed Non-exposed Target population Disease No disease Exposed Non-exposed Study sample a AB b C c D d

Presenters Name Date 1 st abortion: 3% rate and no relation of IUD IUD: frequent visit General population IUD(+) 1000 970/30 IUD(-) 9000 8730/270 Hospital population IUD (+) 90% 873/27 IUD (-) 45% 4050/120 54 casecontrol Yes10 No90 100 casecontrol Yes18 No82 100 Control: general population difference due to frequent visit Control: Hospital population theoretically same unless this control group has higher abortion rates due to other problems Control mixture: both

Presenters Name Date Actual situation Limited cases Selection bias from control selection 55

Presenters Name Date Nomura A, et al. N Engl J Med 1991;325:1132-6 57

Presenters Name Date Selection bias in nested case-control study Controls were excluded if they had had gastrectomy or history of peptic ulcer disease Controls with a cardiovascular disease or cancer at baseline or during follow-up were excluded Disease No disease Exposed Non- exposed Target population Disease No disease Exposed Non- exposed Study sample a AB b C c D d 58

Presenters Name Date MacMachon B, et al. N Engl J Med 1981;304:630-3 60

Presenters Name Date Selection bias in case-control study Controls were largely patients with diseases of the gastrointestinal tract Control patients may have reduced their coffee intake as a consequence of GI symptoms Disease No disease Exposed Non- exposed Target population Disease No disease Exposed Non- exposed Study sample a AB b C c D d 63

Presenters Name Date Antunes CMF, et al. N Engl J Med 1979;300:9-13 65

Presenters Name Date Antunes CMF, et al. N Engl J Med 1979;300:9-13 66 Non-GY Control 6.0 GY Control 2.1

Presenters Name Date Criticisms of prior case-control studies Diagnostic surveillance bias Women on estrogens are evaluated more intensively they are more likely to be diagnosed and to be diagnosed at earlier stages Women with asymptomatic cancer who receive estrogens are more likely to bleed and to be diagnosed Antunes CMF, et al. N Engl J Med 1979;300:9-13 67

Presenters Name Date To avoid selection bias in case-control studies Selection of cases Types of cases selected (non-fatal, symptomatic, advanced) Response rates among cases Relation of selection to exposure Are exposed cases more (or less) likely to be included in the study? Selection of controls Type of controls (general population, hospital, friends and relatives) For hospital controls, diseases selected as control conditions Response rate among controls Relation of selection to exposure Are exposed controls more (or less) likely to be included in the study? Similar response rates in cases and controls do NOT rule out selection bias 68

Presenters Name Date Recall issues All information in case-control studies is historic, so if relying on reporting by participants, accuracy depends on recall Concerns: Do cases recall prior events differently from controls? Mindset of someone with disease : Is there something that I did that may have caused the disease? Recall Bias (Information Bias) 70

Presenters Name Date Recall bias example Mothers of babies born with congenital malformations more likely to recall (accurately or over-recall) events during pregnancy such as illnesses, diet, etc. 71

Folic acid and neural tube defects Figure 1: Features of neural tube development and neural tube defects. Botto et el. Neural tube defects. NEJM 1999. (28 th days after fertilization)

Background and Aim A reduced recurrent risk of neural tube defects among women receiving muti-vitamin supplements containing folic acid. Most of NTDs are de-novo; less than 10% of NTDs are recurrent. First occurrence of only NTDs and periconceptional folate supplements

Study population Case NTDs Control Other major malformations due to recall bias Subjects with oral clefts were excluded because vitamin supplementation has been hypothesized to reduce the risk: selection bias Pregnant women Target Source Study

Overall data 76 Folate (+) OR = 0.6 (0.4 0.8)

Recall Bias: Previous knowledge 77

Recall Bias quantification CaseControlORIn this study 1000 Recall rate real5008000.625Control 75% all4006000.667Case 80%0.6 Prev known4506000.750Case 90%0.8 Prev unknown3756000.625Case 75%0.4 78

Presenters Name Date Recall bias assessment / avoidance Check with recorded information, if possible Use objective markers or surrogates for exposure careful of markers that are affected by disease Ask participant to identify which factor(s) are important for disease Build in false risk factor to test for over- reporting Use controls with another disease 79

Study population Case NTDs Control Other major malformations due to recall bias Subjects with oral clefts were excluded because vitamin supplementation has been hypothesized to reduce the risk: selection bias Pregnant women Target Source Study

Selection bias If oral clefts were included in control group, control with exposure (lack of vitamin supplement or folate intake) increased. As B number increases, the probability of rejecting null hypothesis decreases. CaseControl Exposure (+)AB Exposrue (-)CD Exposure: lack of folate intake Cleft = intake of vitamin

Methods Periconceptional folic acid exposure was determined by Interview with study nurses Demographic Health behavior factors Reproductive history Family history of birth defects Occupation Illnesses (chronic and during pregnancy) Use of alcohol, cigarettes and medications Vitamin use during the 6 months before the last LMP through the end of pregnancy Semi-quantitative food frequency questionnaire Knowledge of vitamins and birth defects

Confounding Exposure Folate intake Outcome NTDs Confounding Alcohol

Presenters Name Date Interviewer bias Differential interviewing of cases and controls, i.e., may probe or interpret responses differently Interviewer Bias (Information Bias) 84

Presenters Name Date Interviewer bias avoidance / assessment Self-administered instruments (prone to more non-response) Standardized instruments Computerized instruments (CADI, ACASI) Avoid open-ended questions but rather use questions with each possible response elicited Training Masking interviewers to research question Masking interviewers to case/control status Same interviewers for cases and controls 85

Presenters Name Date Odds ratio Disease YesNo ExposedYesA1A1 B1B1 NoA0A0 B0B0

Presenters Name Date Example: CHD and Diabetes CHD YesNo DiabetesYes18365 No575735 No units! 87

Presenters Name Date Some properties of odds ratios Null value: OR = 1 OR >= 0 (cannot be negative) Multiplicative scale (be careful with plots) Use logistic regression to estimate multivariate adjusted odds ratios in case- control studies 88

Presenters Name Date Odds ratios and the rare disease assumption With incidence density sampling (represents underlying cohort at time of case) and sampling of cases and controls independent of exposure: OR IR With outcomes of very low incidence in the underlying cohort and sampling of cases and controls independent of exposure: OR RR Higher incidence increases the bias away from the null 89

Presenters Name Date Matching Individual matching Frequency matching Stratified matching Nested study Case-control study Case-cohort study 91

Presenters Name Date Siegel DS, et al. Blood 1999;93:51-4 Matching in cohort study example 92

Presenters Name Date Matching in case-control studies individual matching Pairing or grouping controls to case by known risk factors in the design phase, i.e., when selecting controls In protocol, define matching characteristics and their boundaries Dichotomous or categorical: self-explanatory (e.g., sex, race, blood type, disease stage) Continuous: can be exact, or typically a window (e.g., age 5 years, CD4 cell count 50 cells) For each recruited case, search in control source population for the person(s) who meet the matching criteria Select 1 or more of them at random 93

Presenters Name Date Odds ratio matched pairs Case Control # pairs A 1 B 1 n 11 A 1 B 0 n 10 A 0 B 1 n 01 A 0 B 0 n 00 N = total # pairs N pairs = N cases and N controls 2 N people 94

Presenters Name Date Antunes CMF, et al. N Engl J Med 1979;300:9-13 95

Presenters Name Date Frequency matching Select cases Examine distribution of potential confounder (matching variable) Select controls so that they have same distribution of the potential confounder Conduct stratified analyses or regression to control for the induced selection bias 96

Presenters Name Date Stratified sampling alternative to matching Decide up front what distribution of cases and controls according to confounder is desired Select cases and controls so that expectations are met Selection of controls does not depend on cases being selected first Note that distribution of confounder is not the distribution one may see among all cases in the population 97

Presenters Name Date Stratified sampling example Want 50% females in 100 cases and controls 50 female cases and 50 male cases 50 female controls and 50 male controls In the study period, 175 incident male cases and 75 incident female cases occur As they occur, enroll cases until 50 are recruited in each stratum Throughout study period, enroll 50 male and 50 female controls 98

Presenters Name Date Matching limitations Cannot examine the independent effect of matched variable on outcome Cases are controls are balanced for the matched factor May be costly to perform May inadvertently match On the exposure itself or its surrogate On a factor in the causal pathway On a factor that is affected by the outcome Matching on an exposure-related factor but not a disease determinant may reduce the statistical efficiency (matched cases and controls with same exposure are not used in matched analysis) Logistical complexity of matching 99

Presenters Name Date Matching strengths Costs of finding a matched control may < costs of performing tests to assess confounding < costs of recruiting additional controls to yield enough persons across entire range of confounding variable Particularly useful when distribution of confounders is very different in cases and controls Increases amount of information/subject Matching yields same ratio of cases and controls according to distribution of matched variable 100

Presenters Name Date Nested studies In an existing cohort study New questions arise Need efficient method to use existing information Do not want to conduct methods on entire cohort, due to limited resources Nest a study without sacrificing validity and too much precision Some nesting options: Case-cohort Sub-cohort Case-control 101

Presenters Name Date 102 Nested Case-Control and Case- Cohort Studies Case-comparison studies Use all cases or representative subset as of date of analysis Comparison group : Cohort member for all nested designs Study DesignComparison Case-controlEvent-free member at time of cases event (incidence density sampling) Case-cohortMembers of subcohort, selected at random from cohort at time of enrollment, at risk at time of cases event= In the subcohort riskset

Presenters Name Date Full Cohort Events: A 1 1 2 S1 S6 S3,S8 At risk: N 8 6 4 S1,S2,S3,S4,S5,S6,S7,S8 S3,S4,S5,S6,S7,S8 S3,S4,S7,S8 10 20 30 35 S1 S2 S3 S4 S5 S6 S7 S8 103

Presenters Name Date 104 Case-cohort study

Presenters Name Date Nested case-control study Events: A 1 1 2 S1 S6 S3,S8 At risk: N 8 6 4 S1,S2,S3,S4,S5,S6,S7,S8 S3,S4,S5,S6,S7,S8 S3,S4,S7,S8 10 20 30 35 S1 S2 S3 S4 S5 S6 S7 S8 Potential controls: S2,S3,S4,S5,S6,S7,S8 S3,S4,S5,S7,S8 S4,S7 105

Presenters Name Date 106 A cohort study 3 events or cases occur among 8 people, of whom 5 are ever exposed Exposed are solid lines, unexposed are dashed Dots are events Time Persons

Presenters Name Date 107 A nested case-control study Compare 3 cases to 3 non-cases (at event time) among cohort members Time Persons Incidence Density Sampling

Presenters Name Date 108 A case-control study Compare 3 cases to 3 non-cases (at event time) among cohort members but what is the cohort? They arise from some underlying cohort!! Time Persons Incidence Density Sampling

Presenters Name Date Designing a case-control study Overview I What is the research question? In what target population? What source(s) will be used? How long will recruitment take? What is the definition of the cases? What confirmation is needed? Is screening/additional testing necessary? Will prevalent cases be used? Does exposure influence the disease prognosis? What is the underlying cohort? How many cases are seen per year in the source? 109

Presenters Name Date What are the eligibility criteria for controls? What source(s) will be used to identify controls? Do they represent the same underlying cohort as the cases? What confirmation is needed? Is screening/additional testing necessary? Sampling methods? Will the controls be selected throughout the study period? Can they be selected as cases if they later develop disease? Do additional sources need to be used? For both cases and controls, does exposure status affect: inclusion in source populations or participation? 110 Designing a case-control study Overview II

Presenters Name Date Are there known confounders? Should matching be used? What methods will be used to recruit cases and controls? What methods will be used to obtain information about exposures and potential confounders? Active / Passive? Are the methods of data collection objective and independent of case/control status? What methods are in-place to avert and monitor differential recall by case/control status if interviewing is involved? If study involves personnel-administered data collection, are the personnel masked to case-control status? 111 Designing a case-control study Overview III

Documents

Issues in case-control studies Internal Medicine Samsung Medical Center Sungkyunkwan University School of Medicine Kwang Hyuck Lee [email protected] [email protected]