SAMS EBM Online Course: Observational Study Designs

Observational Study Designs Ahmad Al-Moujahed, M.D.

Ph.D. Student at Boston University School of Medicine and Mass. Eye and Ear Infirmary, Harvard Medical School

2015 Evidence Based Medicine Course

May 2, 2015

Outline Introduction about some terminology.

Cohort studies: design, population and control selection,

potential biases, and examples.

Cases-control studies: design, cases and controls selection,

potential biases, issues with controls, and examples.

Cohort-nested studies.

Cross-sectional studies.

The Natural History of Disease in a Patient

Gordis, Epidemiology, 2013

Types of Clinical Questions

Supporting Clinical Care: An Institute in Evidence-Based Practice for Medical Librarians Dartmouth College

Prevalence Vs. Incidence

Prevalence Vs. Incidence

Relationship between incidence and prevalence.


Bias and Confounding Bias: systematic errors in any type of epidemiological or clinical study that result in an incorrect

estimate of the association between exposures and outcomes.

Confounding: a distortion (inaccuracy) in the estimated measure of association that occurs

when the primary exposure of interest is mixed up with some other factor that is associated

with the outcome.

Characteristics of Confounding

There are 3 conditions that must be present for

confounding to occur:

1. The confounding factor must be associated with both

the risk factor of interest and the outcome.

2. The confounding factor must be distributed unequally

among the groups being compared.

3. A confounder cannot be an intermediary step in the

causal pathway from the exposure of interest to the

outcome of interest.

http://sphweb.bumc.bu.edu/

Characteristics of Confounding

Is the Association Causal?

If we observe an association between an exposure and a disease or another outcome,

the question is: Is the association causal?


Overview of clinical and epidemiological study designs

Descriptive

Populations

Individuals

• Case report

• Case series

Analytic studies

Observational

• Cross-sectional studies

• Case control

• Cohort

- Retrospective

- Prospective

Interventional/Experimental

• Randomized controlled trial

• Clinical trial

• Field trials

Cohort Studies

Cohort: a group of similar people followed through time together.

A cohort study follows participants through time to calculate the rate at which new (incident)

disease occurs and to identify risk factors for the disease.

Closed (Fixed) vs. Open Cohorts

A closed cohort is one with fixed membership. Example: Japanese atomic bomb survivors

An open cohort is dynamic; members can leave or be added over time. Example: state

cancer registry.

Most cohort studies are conducted in closed (or fixed) cohorts.

Design of a Cohort Study


Design of a Cohort Study


Risk Ration (RR): tells you how many times higher or lower the disease risk is among the exposed as compared to the unexposed. Is commonly used in etiologic research.

Risk Difference (RD): tell the absolute effect of exposure on disease occurrence.

General population cohort (defined population).

Special exposure cohort (exposed vs. non-exposed)

Selection of a Cohort Study Population


General population cohort (defined population)


Select a defined population before any of its members

become exposed or before their exposures are

identified.

For common risk factors, (e.g., smoking, obesity).

Examples:

• The general population (e.g., the Framingham Heart

Study).

• A particular subset of the general population (e.g.,

Nurses’ Health study and British Doctors Health

Study)


Select groups for inclusion in the study on the

basis of whether or not they were exposed

(e.g., occupationally exposed cohorts).

For uncommon risks (e.g., occupational risks).

Examples

• Soldiers exposed to dioxin (agent orange) in

Vietnam.

• Survivors of the bombing of Hiroshima and

Nagasaki.

Special exposure cohort (exposed vs. non-exposed)



By geographical region.

• Framingham heart study

By occupational group

• Nurses health study

• British Doctor’s health study

By disease

• Multi-center AIDS Cohort (MACS)

By risk groups

• IV Drug Users cohort (ALIVE Study in Baltimore - AIDS Linked to the Intravenous

Experience)

By exposure event

• Japanese Atomic Bomb Survivors


Selection of a Cohort Study Population Converting a cross-sectional survey into a cohort design

Pai M, Gokhale K, Joshi R, et al. JAMA. 2005;293(22):2746-2755. American Journal of Respiratory and Critical Care Medicine. 2006 174(3), 349–355

The Comparison (Control) Group in Cohort Studies

Two essential things in selecting the comparison group in a cohort study:

The unexposed (or less exposed) comparison group should be as similar as possible with

respect to other factors that could influence the outcome being studied (possible confounding

factors).

Information collection should be as accurate & as comparable as possible in all groups in

order to avoid biasing the association.

General Types of Comparison Groups for Cohort Studies

1. An internal comparison group: generally the best.

2. An external comparison cohort: to study occupational exposures.

3. The general population

Types of Cohort Studies

Prospective cohort study.

Retrospective cohort study.

Ambidirectional study.

Grimes et al. Lancet 2002;359:341-45

Prospective Cohort Study(concurrent cohort or longitudinal study)


Retrospective Cohort Study(historical cohort study or nonconcurrent prospective study)


Prospective vs. Retrospective Cohort Study


Potential Biases in Cohort Studies

Bias in assessment of the outcome.

Information bias.

Biases from nonresponse and losses to follow-up.

Analytic bias.

Advantages of Cohort Studies

1.More clearly indicate the temporal sequence between exposure and outcome.

2. allow calculating the incidence of disease in each group, so we can calculate:

• Absolute risk (incidence)

• Relative risk (risk ratio or rate ratio)

• Risk difference

• Attributable proportion (attributable risk %)

3. particularly useful for evaluating the effects of rare or unusual exposures,

4. A cohort study also enables examination of multiple outcomes of a single risk factor.

5. Cohort studies, especially prospective cohort studies, reduce the possibility that the results will

be biased.


Disadvantages of Prospective Cohort

Studies

1. May have to follow large numbers of

subjects for a long time.

2. Can be very expensive and time

consuming.

3. Not good for diseases with a long

latency.

Disadvantages of Cohort Studies

Disadvantages of Retrospective Cohort

Studies

1. If one uses records that were not designed

for the study, the available data may be of

poor quality.

2. There is frequently an absence of data on

potential confounding factors if the data was

recorded in the past.

3. It may be difficult to identify an appropriate

exposed cohort and an appropriate

comparison group.

4. Not good for rare diseases.

5. Differential loss to follow up can introduce bias.


The Framingham Study

Began in 1948.

Residents were considered eligible if they were

between 30 and 62 years of age.

The cohort consisted of 5,127 men and

women were free of cardiovascular disease at

the time of study entry.

Many “exposures” were defined, including

smoking, obesity, elevated blood pressure,

elevated cholesterol levels, low levels of

physical activity, and other factors.

www.framinghamheartstudy.orgGordis, Epidemiology, 2013

http://www.framinghamheartstudy.org

The incidence of CHD increases with age. It occurs earlier and more frequently in males.

Persons with hypertension develop CHD at a greater rate than those who are normotensive.

Elevated blood cholesterol level is associated with an increased risk of CHD.

Tobacco smoking and habitual use of alcohol are associated with an increased incidence of

CHD.

Increased physical activity is associated with a decrease in the development of CHD.

An increase in body weight predisposes a person to the development of CHD.

An increased rate of development of CHD occurs in patients with diabetes mellitus.

www.framinghamheartstudy.org

The Framingham Study: the Tested Hypotheses


1960 Cigarette smoking found to increase the risk of heart disease

1961 Cholesterol level, blood pressure, and electrocardiogram abnormalities found to increase the risk of heart disease

1967 Physical activity found to reduce the risk of heart disease and obesity to increase the risk of heart disease

1970 High blood pressure found to increase the risk of stroke

1970 Atrial fibrillation increases stroke risk 5-fold

1976 Menopause found to increase the risk of heart disease

1978 Psychosocial factors found to affect heart disease

1988 High levels of HDL cholesterol found to reduce risk of death

1994 Enlarged left ventricle (one of two lower chambers of the heart) shown to increase the risk of stroke

1996 Progression from hypertension to heart failure described

1998 Framingham Heart Study researchers identify that atrial fibrillation is associated with an increased risk of all-cause mortality.

1998 Development of simple coronary disease prediction algorithm involving risk factor categories to allow physicians to predict multivariate coronary heart disease risk in patients without overt CHD

1999 Lifetime risk at age 40 years of developing coronary heart disease is one in two for men and one in three for women

The Framingham Study: Research Milestones




The Framingham Study: Publications


Nurses’ Health Study

Began in1976.

Cohort: married registered nurses who were

aged 30 to 55 in 1976, who lived in the 11 most

populous states.

Approximately 122,000 nurses out of the

170,000 mailed responded.

www.channing.harvard.edu/nhs/

Original goal was to evaluate risks of oral

contraceptives.

Has become one of the principal sources of observational

data on diet and chronic diseases.

Questionnaires are periodically mailed out to thousands of

nurses.

BMJ 2008;337:a1440

BMJ 2008;337:a1440

Incidence of Breast Cancer and Progesterone Deficiency

Research Question:

Is the relationship between late age at first pregnancy and increased risk of breast cancer related to the finding that early first pregnancy protects against breast cancer (and

therefore such protection is missing in women who have a later pregnancy or no

pregnancy), or are both a delayed first pregnancy and an increased risk of breast cancer

the result of some third factor, such as an underlying hormonal abnormality?

Am J Epidemiol 114:209–217, 1981

Design of Cowan's retrospective cohort study of breast cancer. (Data from Cowan LD, Gordis L, Tonascia JA, et  al: Breast cancer incidence in women with progesterone deficiency. Am J Epidemiol 114:209–217, 1981.)

Incidence of Breast Cancer and Progesterone Deficiency


Cohort Studies for Investigating Childhood Health and Disease

Examples

Follow-up studies of fetuses exposed to radiation from atomic bombs in Hiroshima and Nagasaki during World War II.

The Collaborative Perinatal Study, begun in the United States in the 1950s, was a multicenter cohort study that followed more than 58,000 children from birth to age 7 years.

1. At what point should the individuals in the cohort first be identified?

2. Should the cohort be drawn from one center or from a few centers, or should it be

a national sample drawn in an attempt to make the cohort representative of a

national population? Will the findings of studies based on the cohort be broadly

generalizable only if the cohort is drawn from a national sample?

3. For how long should a cohort be followed?

4.What hypotheses and how many hypotheses should be tested in the cohort that

will be established?

Challenging questions:

Points to Look For While Reading Cohort Studies

1. Who is at risk? (Selection)

2. Who is exposed? (Selection)

3. Who is an appropriate control? (Control)

4. Have outcomes been assessed equally? (Outcome)

Grimes et al. Lancet 2002;359:341-45

Hypothetical Scenario

Note the following aspects:

1. The disease is rare.

2. There is a fairly large number of

exposed individuals in the state, but

most of these are not diseased.


RR = Relative Risk (Risk Ratio) = (700/1,000,000) / (600/5,000,000) = 5.83

"The purpose of the control group is to determine the relative size of the exposed and unexposed components of the source population."

OR = Odds Ratio = (700/1,000) / (600/5,000) = 5.83

Hypothetical Scenario


Clinical Scenarios

Suppose you are a clinician and you have seen a few patients with a certain type of

cancer, almost all of whom report that they have been exposed to a particular

chemical. You hypothesize that the exposure is related to the risk of developing this

type of cancer. How would you go about confirming or refuting your hypothesis?

In the 1940s, Sir Norman Gregg, an Australian ophthalmologist, observed a number of

infants and young children in his ophthalmology practice who presented with an unusual

form of cataract. Gregg noted that these children had been in utero during the time of

a rubella outbreak. He suggested that there was an association between prenatal rubella

exposure and the development of the unusual cataracts.

In the early 1940s, Alton Ochsner, a surgeon in New Orleans, observed that virtually all of the

patients on whom he was operating for lung cancer gave a history of cigarette smoking.

He hypothesized that cigarette smoking was linked to lung cancer.


Case-Control Studies

Individual participants in a case-control study are selected for inclusion in the study based on

their disease status.

• Cases = participants with the disease of interest.

• Controls = participants without the disease.

Both cases and controls are asked the same set of questions about past exposures.

A case definition should specify exactly what characteristics must be present or absent for a

person to be deemed a case.

Design of a Case-control Study





Doll R, Hill AB: A study of the aetiology of carcinoma of the lung. BMJ 2:1271–1286, 1952



Selection of Cases in Case-Control Studies

A key initial step is identifying an appropriate and accessible source of individuals with the disease of interest:

Hospitals.

Specialty clinics.

Public health agencies.

Disease registries.

Death certificates.

Cross-sectional surveys.

Disease support groups.

Incident or Prevalent Cases?

Prevalent cases: more practical. However, identified risk factors using prevalent cases may be related more to survival with the disease than to the development of the disease (incidence).

Incident cases: preferable in case-control studies of disease etiology.

Let’s Think About This!

Does tuberculosis protect against cancer?

Pearl concluded that tuberculosis had an antagonistic or protective effect against cancer.

How could Pearl have overcome this problem in his study?

A fundamental conceptual issue: should the controls be similar to the cases in all respects other than having the disease in question, or should they be representative of all persons without the disease in the population from which the cases are selected?


Selection of Controls in Case-control Studies

1. The comparison group ("controls") should be

representative of the source population that produced

the cases.

2. The "controls" must be sampled in a way that is

independent of the exposure, meaning that their selection

should not be more (or less) likely if they have the

exposure of interest.

3. Controls must be reasonably similar to cases except for

their disease status

4. The inclusion and exclusion criteria for cases that do not

specifically relate to the disease should also apply to

controls. - For example, if cases must be males between 25 and 39

years of age, controls must also be men in this age group.

Gordis, Epidemiology, 2013 http://sphweb.bumc.bu.edu/

OR = (700/1,500) / (600/4,500) = 3.50

Selection Bias in Case-control Studies

OR = Odds Ratio = (700/1,000) / (600/5,000) = 5.83


Nonhospitalized persons as controls: • Probability sample of the total population • School lists • Insurance company lists • Selective service lists • Neighborhood controls • Best friend control.

Sources of Controls in Case-control Studies


Hospitalized Patients as controls:

• Easier to identify

• More likely to participate than general population controls.

• Minimize selection bias because they generally come from the same source population

(provided referral patterns are similar).

• Recall bias would be minimized, because they are sick, but with a different diagnosis.

• More economical.

Sources of Controls in Case-control Studies

If cases are obtained from a medical facility, the comparison groups should be obtained from

the same facility, provided they meet two criteria:

1 They have diseases that are unrelated to the exposure being studied.

2 Control patients in the comparison should have diseases with similar referral patterns as

the cases, in order to minimize selection bias.

Considerations:

Hospital patients differ from people in the community.

A disease group is unlikely to be representative of the general reference population.

Should we use a sample of all other patients admitted to the hospital (other than those with

the cases-diagnosis) or should we select a specific “other diagnosis” ?

Hospitalized Patients as Controls

Example: case-control study of lung cancer and smoking.

• Do we exclude from our control group those persons who have other smoking-related diagnoses,

such as coronary heart disease, bladder cancer, pancreatic cancer, and emphysema?

• One alternative may be “subgroup analysis”

Problems In Control Selection

N Engl J Med 304:630–633, 1981.





Did patients with cancer of the pancreas drink more coffee than did people without cancer of the pancreas in the same population?



Selection Bias

Lancet 2002: 359: 431–34

Use of Multiple Controls in Case-control Studies

Multiple controls of the same type: to increase the power of the study.

Multiple controls of different types: in case we are concerned that the exposure of the

hospital controls used in our study may not represent the rate of exposure that is “expected” in

a population of nondiseased persons.

Multiple Controls of Different Types

Am. J. Epidemiol. (1979) 109 (3): 309-319.

Study groups in Gold's study of brain tumors in children.



Did mothers of children with brain tumors have more prenatal radiation exposure than control mothers?

• The carcinogen effect of prenatal radiation is NOT site specific.

• Recall bias?

• The carcinogen effect of prenatal radiation is specific for the brain.

• Recall bias is unlikely to be the explanation.



Matching in Case-Control Studies

Three basic options for matching cases and controls:

No matching.

Group (frequency) matching: the proportion of controls with a certain characteristic is

identical to the proportion of cases with the same characteristic.

Individual (matched-pairs) matching: each case selected for the study, a control is selected

who is similar to the case in terms of the specific variable or variables of concern.

Individual matching often used in case-control studies that use hospital controls and in

genetic studies.

Matching: the process of selecting the controls so that they are similar to the cases in

certain characteristics, such as age, race, sex, socioeconomic status, and occupation.

Problems with Matching

Practical Problems.

Conceptual Problems: once we have matched controls to cases according to a given

characteristic, we cannot study that characteristic.

We do not want to match on any variable that we may wish to explore in our study.

Overmatching: matching on variables other than the variables that are risk factors for

the disease (which we are not interested in investigating in the current study)

Problems with Recall

Limitations in Recall: If it affects all subjects in a study to the same extent, regardless of

whether they are cases or controls, a misclassification of exposure status may result;

generally leads to an underestimate of the true risk of the disease associated with the

exposure.

Recall Bias: occurs when cases and controls systematically have different memories of

the past

When is a Case-Control Study Desirable?

When the disease or outcome being studied is rare.

When the disease or outcome has a long induction and latent period

When exposure data is difficult or expensive to obtain.

When the study population is dynamic.

When little is known about the risk factors for the disease.

Less time-consuming and much less costly than prospective cohort studies.


Advantages and Disadvantages of Case-Control Studies

Advantages:

Efficient for rare diseases or diseases with

a long latency period.

Less costly and less time-consuming.

Advantageous when exposure data is

expensive or hard to obtain.

Advantageous when studying dynamic

populations in which follow-up is difficult.

Disadvantages:

Subject to selection bias.

Inefficient for rare exposures.

Information on exposure is subject to

observation bias.

They generally do not allow calculation

of incidence (absolute risk).


Case-Control Studies Based in a Defined Cohort

Design of a case-control study initiated within a cohort.


Nested Case-Control Study.

Case-Cohort Study.

Case-Crossover Design.

Nested Case-Control Studies




Controls are a sample of individuals who

are at risk for the disease at the time each

case of the disease develops.

Cases and controls are matched on

calendar time and length of follow-up.



Design of a hypothetical case-cohort study

Cases develop at the same times that were

seen in the nested case-control design, but

the controls are randomly chosen from

the defined cohort with which the study

began (subcohort).

Cases and controls are not matched on

calendar time and length of follow-up.

Possible to study different diseases

(different sets of cases) in the same case-

cohort study using the same cohort for

controls.

Case-Cohort Studies


Advantages of Embedding a Case-Control Study in a Defined Cohort

1.No recall bias.

2.Can establish a temporal relationship.

3. More economical to conduct.

4. Greater comparability between cases and controls.

Case-Crossover Design Primarily used for studying the etiology of acute outcomes such as myocardial infarctions.


At-risk periods: red brackets. Control periods: blue brackets.

Each person who is a case serves as his

own control

More economical to conduct

Recall bias?

Case-Crossover Design


Cross-sectional studies (prevalence studies)

Both exposure and disease outcome are determined simultaneously for each subject


Remember: cohort studies Remember: case-control studies

Cross-sectional studies (prevalence studies)


Limitations of Cross-Sectional Studies

Identify prevalent cases rather than incident (new) cases;

the association may be with survival after the disease rather

than with the risk of developing the disease.

Often not possible to establish a temporal relationship

between the exposure and the onset of disease

Ecological studies

Example: Is the rate of asthma higher in cities with higher levels of air pollution?

Explore correlations between aggregate (group level) exposure and outcomes.

Unit of analysis: not individuals, but clusters (e.g., countries, schools).

Correlation between dietary fat intake and breast cancer by country. (From Prentice RL, Kakar F, Hursting S, et  al: Aspects of the rationale for the Women's Health Trial. J Natl Cancer Inst 80:802–814, 1988.)

The authors themselves wrote: “The observed association is between pregnancy during an influenza epidemic and subsequent leukemia in the offspring of that pregnancy. It is not known if the mothers of any of these children actually had influenza during their pregnancy.”

we are missing individual data on exposure

Ecological studies

EBM Levels of Evidence

http://researchguides.dml.georgetown.edu/ebmclinicalquestions

http://researchguides.dml.georgetown.edu/ebmclinicalquestions

Types of Clinical Questions and Types of Studies to Answer them

Supporting Clinical Care: An Institute in Evidence-Based Practice for Medical Librarians Dartmouth College

Question to Guide Selection of Study Type

cipha.ca

Grimes and Shulz, Lancet 2002; 359: 57–61

Be Familiar with the Terminology!

Grimes and Shulz, Lancet 2002; 359: 57–61

Health & Medicine

SAMS EBM Online Course: Observational Study Designs