25
Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University of Wisconsin Madison

Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University

Embed Size (px)

Citation preview

Page 1: Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University

Large-scale phenome-wide scan in twins using electronic health records

June 29th 2015

Scott HebbringMarshfield Clinic Research Foundation

University of Wisconsin Madison

Page 2: Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University

Association studies GWAS: Thousands of variants associated with a few hundred phenotypes

a. Relatively easy to recruit unrelated individualsb. Multiple testing challengesa. Weak effectsb. Difficult to interpret biologyc. Clinical utility?d. Disease limited

PheWAS: Dramatically increases the number of diseases that can be studieda. Can start with biologically/clinically relevant variantsb. May be limited to the same challenges of GWAS

Family studies Linkage, Segregation Analysis, Heritability…

a. Thousands of mutations in thousands of genes causing human diseases.b. Often easier to interpret biologyc. large effect sizesd. Clinically relevante. Difficult to recruit familiesf. One disease at a time

Human Genetics

Page 3: Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University

Classical Twins Studies

1. Gold standard for heritability studiesUnique family/genetic relationships (monozygotic twins)Strong shared environmental exposures starting in utero

2. Rare (~20/1,000 births)

3. Difficult to recruitLargest twin registries include the Swedish and Danish twin registries (~200,000 twins)

Others: UK Adult, Australian, Sri Lankan, and Chinese National Minnesota, Univ-Wash, MI-State, Mid-Atlantic twin registries.

Sample ascertainment bias

4. Phenotypic data is often acquired by surveys and questionnaires and limited to only a few measurables.

5. Updating data is costly and labor intensive.

Page 4: Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University

l

l

l

Madison Milwaukee

Marshfield

Marshfield Clinic Personalized Medicine Research Project

Personalized Medicine Research Project Study Area (19 Zip Codes in Central WI)

Marshfield Clinic Primary Service Area

2.6 Million patients

Twin population

-same last name

-same date of birth

-same billing account

-same home address

-key word “twin”

Marshfield Clinic Twin Cohort (~16,000 patients)

Page 5: Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University

Genet Epidemiol. 2014 Dec;38(8):692-8.

Page 6: Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University

A. MCTC is one of the first cross sectional twin population

~80% accuracy

B. Methods are easily translatable~12,000 twins have been ID in Mayo’s EHR.

C. Little to no zygosity data

D. All patients are uniquely linked to Marshfield Clinic’s EHR.

Phenotypic data is collected in real time

Not disease limited

Amendable to phenome-wide strategies?

Genet Epidemiol. 2014 Dec;38(8):692-8.

Page 7: Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University

Hypothesis: EHR-linked twin cohorts can be used for phenome-wide studies to identify diseases with genetic

etiologies.

MethodsPopulation: MCTC and Mayo twin cohort (28,888 twins)

Phenotypes were defined by collapsing ICD9 coding e.g., ICD9 100.01 100.0* 100.*

For every phenotype/ICD9 codes, a p-value was estimated to determine if the disease co-occurred in twins more frequently that by chance.

For every phenotype/ICD9 code, a relative risk was estimated which estimated the risk of disease if the other twin is affected relative to the population risk in the twin cohorts.

Page 8: Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University

9,906 and 5,987 unique phenotypes/ICD9 codes in MCTC and Mayo-TC, respectively

5,598 shared phenotypes/ICD9 codes

Diseases in MCTC were more common than in Mayo-CT

Page 9: Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University

Hypothesis: EHR-linked twin cohorts can be used for phenome-wide studies to identify diseases with genetic

etiologies.

MethodsPopulation: MCTC and Mayo twin cohort (28,888 twins)

Phenotypes were defined by collapsing ICD9 coding e.g., ICD9 100.01 100.0* 100.*

For every phenotype/ICD9 codes, a p-value was estimated to determine if the disease co-occurred in twins more frequently that by chance.

For every phenotype/ICD9 code, a relative risk was estimated which estimated the risk of disease if the other twin is affected relative to the population risk in the twin cohorts.

Page 10: Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University

Phenome-wide Scan

A. 1,222 phenotypes/ICD9 codes were statistically enriched for concordance in MCTC (p<8.9E-6)

929 (76%) were replicated in Mayo-TC (p<0.05)

B. 928 phenotypes/ICD9 codes were statistically enriched for concordance in Mayo-TC

739 (80%) were replicated in MCTC

C. 1,406 phenotypes were statistically enriched for concordance by combined meta-analysis

Page 11: Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University

Phenome-wide Scan

Page 12: Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University

Phenome-wide Scan

Page 13: Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University

Phenome-wide Scan

Page 14: Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University

MCTC Mayo-TC Combined

ICD9 Disease Affected P-value RR Affected P-value RR P-value382.9 Unspecific otitis media 4,318 5.0E-203 1.8 1,130 4.4E-252 7.1 2.3E-451382.0 Suppurative and unspecified otitis media 4,514 3.4E-202 1.7 1,275 4.8E-231 5.6 1.6E-429465.9 Acute upper respiratory infections of

unspecified site 5,272 1.5E-138 1.4 1,223 8.2E-258 6.5 1.1E-392

465 Acute upper respiratory infections of multiple or unspecified sites 5,297 1.2E-137 1.4 1,250 2.0E-253 6.2 2.1E-387

462 Acute pharyngitis 5,202 4.9E-123 1.3 950 1.2E-224 8.0 4.8E-344520.6 Disturbances in tooth eruption 1,350 8.3E-122 3.6 230 1.1E-90 27.7 4.4E-209783.4 Lack of expected normal physiological

development in childhood 726 6.5E-134 8.2 416 1.6E-73 9.0 4.9E-204

520 Disorders of tooth development and eruption 1,556 7.3E-117 3.0 311 5.0E-87 16.2 1.7E-200786.2 Cough 4,245 4.1E-80 1.2 720 1.8E-122 6.6 3.4E-199466.1 Acute bronchiolitis 575 3.9E-146 12.4 212 1.0E-45 15.9 1.7E-188315 Specific delays in development 891 1.8E-134 6.3 501 2.9E-57 5.8 2.2E-188367 Disorders of refraction and accommodation 3,645 8.1E-116 1.5 718 1.7E-74 4.5 6.1E-187780.6 Fever and other physiologic disturbances of

temperature regulation 2,875 1.6E-90 1.6 664 1.6E-81 5.3 1.0E-168

315.3 Developmental speech or language disorder 451 5.9E-113 14.0 284 2.7E-54 12.0 6.1E-164367.1 Myopia 2,144 9.5E-101 2.1 215 6.0E-61 20.5 2.1E-158

Top non V-codes and perinatal codes

Page 15: Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University

Hypothesis: EHR-linked twin cohorts can be used for phenome-wide studies to identify diseases with genetic

etiologies.

MethodsPopulation: MCTC and Mayo twin cohort (28,888 twins)

Phenotypes were defined by collapsing ICD9 coding e.g., ICD9 100.01 100.0* 100.*

For every phenotype/ICD9 codes, a p-value was estimated to determine if the disease co-occurred in twins more frequently that by chance.

For every phenotype/ICD9 code, a relative risk was estimated which estimated the risk of disease if the other twin is affected relative to the population risk in the twin cohorts.

Page 16: Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University

Relative Risks

RR=relative riskADF=average disease frequency

Page 17: Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University

1,455 phenotypes/ICD9 codes had at least one concordant pair in both cohorts

498 and 139 phenotypes had RRs >10 and >100 in both cohorts, respectively

Page 18: Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University

MCTC Mayo-TC Combined

ICD9 Disease Affected Concordant P-value RR Affected Concordant P-value RR P-value

282.6 Sickle-cell disease 3 1 2.7E-04 2,747 2 1 1.6E-04 6,096 8.0E-07

282 Hereditary spherocytosis 3 1 2.7E-04 2,747 3 1 3.7E-04 2,032 1.7E-06

356.1 Peroneal muscular atrophy

3 1 2.7E-04 2,747 3 1 3.7E-04 2,032 1.7E-06

282.49 Other thalassemia 3 1 2.7E-04 2,747 9 3 6.1E-09 677 4.7E-11

334.3 Other cerebellar ataxia 3 1 2.7E-04 2,747 6 1 1.5E-03 406 6.3E-06

426.82 Long QT syndrome 4 1 4.9E-04 1,374 18 8 1.4E-17 542 3.3E-19

Genetic diseases with large estimated RRs

Page 19: Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University

Same-Sex

Opposite-Sex

Page 20: Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University

Same-Sex

Opposite-Sex

Page 21: Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University

Potential limitations1. Limited by the inherent challenges of ICD9 coding.

2. Parental/Familial biases

3. Lack of zygosity still limits this approachNLP or blood types may help enrich for specific twin types.

Conclusions1. Most diseases are not random events in the twins.

a. 1,406/5,598 (25%) of phenotypes are statistically enriched in pairs of twinsb. ~1% of phenotypes have RRs < 1.0

2. Genetics plays an important component to the diseases process for thousands of diseases.

3. Family data may be efficiently captured in in EHR and may be used to predict, prevent, and treat human disease for the advancement of “precision medicine.”

Page 22: Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University

PrecisionMedicine

Future of genomic research

Populations Genome

Page 23: Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University

PrecisionMedicine

Future of genomic research

Populations Genome

Phenome

Page 24: Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University

Families

PrecisionMedicine

Future of genomic research

Populations Genome

Phenome

Page 25: Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University

Acknowledgements

Marshfield Clinic:Murray BrilliantPeggy PeissigSteven SchrodiZhan (Harold) YeJohn Mayermany more…

Mayo Clinic:Jyotishman PathakYijing Cheng

Funding:NHGRI 1U01HG006389NLM K22LM011938NCATS 9U54TR000021NCRR 1UL1RR025011

Marshfield Clinic Research FoundationMarshfield Clinic donors