1
NATIONAL HUMAN GENOME RESEARCH INSTITUTE Division of Intramural Research National Human Genome Research Institute Phenotype-Genotype Integrator (PheGenI) updates: synthesizing genome-wide association study (GWAS) data with existing genomic resources Lucia A. Hindorff, PhD, MPH 1 , Douglas J. Hoffman, MS 2 , Heather A. Junkins, MS 1 , Masato Kimura, PhD 2 , Donna Maglott, PhD 2 , Lon Phan, PhD 2 , Stephen Sherry, PhD 2 , Michael Feolo, MS 2 , Erin M. Ramos, PhD, MPH 1 1 NHGRI, NIH, Bethesda, MD; 2 NCBI, NIH, Bethesda, MD Abstract Rapidly accumulating data from genome-wide association studies (GWAS) are most useful when synthesized with existing databases. We developed and updated the Phenotype- Genotype Integrator (PheGenI), integrating NCBI genomic databases with association data from the NHGRI GWAS Catalog. Integrating over 66,000 * association records with extensive SNP, gene, and eQTL data, PheGenI enables deeper investigation of SNPs associated with a wide range of traits, facilitating the examination of the relationships between SNPs and human disease. * Updated as of 3/7/2013 URL: http://www.ncbi.nlm.nih.gov/ gap/PheGenI GWAS have identified >8,700 genetic variants associated with a range of human traits and diseases at p<10 -5 (3,716 at p < 5 x 10 -8 ) Replication, fine mapping and follow up studies are crucial next steps to understanding functional consequences of these variants Integration of GWAS data with existing complementary databases can inform prioritization of variants to follow up, study design considerations, and generation of biological hypotheses Background / Rationale Data Sources PheGenI merges data from the NHGRI GWAS catalog with several resources housed at NCBI (Figure 1). http://www.genome.gov/gwastudies/ http://www.ncbi.nlm.nih.gov/projects/gapplusprev/sgap_plus.htm http://www.ncbi.nlm.nih.gov/gene/ http://www.ncbi.nlm.nih.gov/projects/SNP/ http://www.ncbi.nlm.nih.gov/gap http://www.ncbi.nlm.nih.gov/gtex/GTEX2/gtex.cgi The association records span a broad range of human phenotypes and diseases (Figure 2). Counts are current as of 3/7/2013. Conclusions and Future Improvements By providing a user-friendly web interface that integrates various NCBI genomic databases with association data from the NHGRI GWAS Catalog, PheGenI enables deeper investigation and interrogation of SNPs associated with a wide range of traits, facilitating the examination of the relationships between SNPs and human disease. Future improvements to PheGenI include Annotation of supporting results Enhanced mapping of phenotype terms Incorporation of additional data, such as functional and regulatory elements. Ability to programmatically retrieve data using the NCBI Entrez Programming Utilities Figure 6. Gene Table Figure 7. SNP Table User customizability and support Downloadable results tables and high-resolution ideogram Ability to dynamically sort tables Most data points hyperlinked to underlying databases Interactive genome browser with customizable tracks Links to related GWAS datasets available for request YouTube tutorial (http://bit.ly/WeqNKd) Information buttons for documentation Figure 3. PheGenI Search Interface & Results Summary Users may search on phenotype (broad categories or one or more traits), gene, SNP, or chromosomal range. Additional filters on for p-value or SNP functional class are available (Fig. 3). A PheGenI search of the example trait ‘celiac disease’ returned several records (summarized in Fig. 3 and shown in detail in Fig. 4-8). Figure 4. Association Results Table Results are shown in the context of the GWAS association (Fig. 4) and linked to additional information based on gene (Figure 6) and SNP (Figure 7). Figure 8. eQTL Data and GWAS datasets Where available, links are shown to eQTL data & dbGaP datasets for which GWAS data are available for request (Fig. 8). Genetic Association Results NHGRI GWAS Catalog dbGaP N = 11,781 N = 54,282 Genomic Variation dbSNP N = 63,222,716 Genes Gene N = 23,550 mRNA Expression eQTL N = 60,657 Figure 1. Databases and Record Counts Figure 2. Phenotype Distribution of PheGenI SNPs The Genome View maps SNPs and genes to a genomic region and allows for interactive browsing and pinning of individual results (Fig. 5). Figure 5. Genome View and Browser

Phenotype-Genotype Integrator (PheGenI) Updates

  • Upload
    amia

  • View
    55

  • Download
    2

Embed Size (px)

DESCRIPTION

2013 Summit on Translational Bioinformatics

Citation preview

Page 1: Phenotype-Genotype Integrator (PheGenI) Updates

NATIONAL HUMAN GENOME RESEARCH INSTITUTE

Division of Intramural Research

National Human Genome Research Institute

Phenotype-Genotype Integrator (PheGenI) updates: synthesizing genome-wide association study (GWAS) data with existing genomic resources

Lucia A. Hindorff, PhD, MPH1, Douglas J. Hoffman, MS2, Heather A. Junkins, MS1, Masato Kimura, PhD2, Donna Maglott, PhD2, Lon Phan, PhD2, Stephen Sherry, PhD2, Michael Feolo, MS2, Erin M. Ramos, PhD, MPH1

1 NHGRI, NIH, Bethesda, MD; 2 NCBI, NIH, Bethesda, MD

Abstract Rapidly accumulating data from genome-wide association studies (GWAS) are most useful when synthesized with existing databases. We developed and updated the Phenotype-Genotype Integrator (PheGenI), integrating NCBI genomic databases with association data from the NHGRI GWAS Catalog. Integrating over 66,000* association records with extensive SNP, gene, and eQTL data, PheGenI enables deeper investigation of SNPs associated with a wide range of traits, facilitating the examination of the relationships between SNPs and human disease. * Updated as of 3/7/2013 URL: http://www.ncbi.nlm.nih.gov/ gap/PheGenI

• GWAS have identified >8,700 genetic variants associated with a range of human traits and diseases at p<10-5 (3,716 at p < 5 x 10-8)

• Replication, fine mapping and follow up studies are crucial next steps to understanding functional consequences of these variants

• Integration of GWAS data with existing complementary databases can inform prioritization of variants to follow up, study design considerations, and generation of biological hypotheses

Background / Rationale

Data Sources

• PheGenI merges data from the NHGRI GWAS catalog with several resources housed at NCBI (Figure 1).

http://www.genome.gov/gwastudies/

http://www.ncbi.nlm.nih.gov/projects/gapplusprev/sgap_plus.htm

http://www.ncbi.nlm.nih.gov/gene/

http://www.ncbi.nlm.nih.gov/projects/SNP/

http://www.ncbi.nlm.nih.gov/gap

http://www.ncbi.nlm.nih.gov/gtex/GTEX2/gtex.cgi

• The association records span a broad range of human phenotypes and diseases (Figure 2).

• Counts are current as of 3/7/2013.

Conclusions and Future Improvements

• By providing a user-friendly web interface that integrates various NCBI genomic databases with association data from the NHGRI GWAS Catalog, PheGenI enables deeper investigation and interrogation of SNPs associated with a wide range of traits, facilitating the examination of the relationships between SNPs and human disease.

• Future improvements to PheGenI include

• Annotation of supporting results

• Enhanced mapping of phenotype terms

• Incorporation of additional data, such as functional and regulatory elements.

• Ability to programmatically retrieve data using the NCBI Entrez Programming Utilities

Figure 6. Gene Table

Figure 7. SNP Table

User customizability and support

• Downloadable results tables and high-resolution ideogram • Ability to dynamically sort tables • Most data points hyperlinked to underlying databases • Interactive genome browser with customizable tracks • Links to related GWAS datasets available for request • YouTube tutorial (http://bit.ly/WeqNKd) • Information buttons for documentation

Figure 3. PheGenI Search Interface & Results Summary

Users may search on phenotype (broad categories or one or more traits), gene, SNP, or chromosomal range. Additional filters on for p-value or SNP functional class are available (Fig. 3).

A PheGenI search of the example trait ‘celiac disease’ returned several records (summarized in Fig. 3 and shown in detail in Fig. 4-8).

Figure 4. Association Results Table

Results are shown in the context of the GWAS association (Fig. 4) and linked to additional information based on gene (Figure 6) and SNP (Figure 7).

Figure 8. eQTL Data and GWAS datasets

Where available, links are shown to eQTL data & dbGaP datasets for which GWAS data are available for request (Fig. 8).

Genetic Association Results

NHGRI GWAS Catalog dbGaP

N = 11,781 N = 54,282

Genomic Variation

dbSNP N = 63,222,716

Genes

Gene N = 23,550

mRNA Expression

eQTL N = 60,657

Figure 1. Databases and Record Counts

Figure 2. Phenotype Distribution of PheGenI SNPs

The Genome View maps SNPs and genes to a genomic region and allows for interactive browsing and pinning of individual results (Fig. 5).

Figure 5. Genome View and Browser