P-STAR: PGRN Statistical Analysis Resource Vanderbilt University, Nashville, TN ABSTRACT Pharmacogenomics is the study of the relationship between individual

P-STAR: PGRN Statistical Analysis ResourceVanderbilt University, Nashville, TN

ABSTRACTPharmacogenomics is the study of the relationship between individual genetic variation and drug response. One of the major goals of the field is the use of an individual's genomic information in conjunction with other demographic and environmental covariates to personalize a previously uniform treatment regimen. Realizing this ambition requires nothing less than the ability to derive a genotype-to-phenotype map for a trait of interest. In the specific case of pharmacogenomics this trait is often a drug dosage, efficacy, toxicity, or a variable indicating response/non-response or adverse-event/no-adverse-event, and the genotype is frequently a vector of SNP measurements. As such, progress in this area is intimately tied to progress in the more general search for the genetic determinants of complex traits.

As with any complex trait, the molecular, epidemiological, and analytical techniques used in pharmacogenomics are under constant evolution and development. Parallel to human statistical genetics, the most common methodology/design has evolved from linkage analysis to candidate gene association studies, and now the mainstream study design is genome-wide association studies (GWAS). That said, over the course of the next 18-36 months, this trend is likely to shift to next-generation sequence data, structural variation, rare variants, and gene-gene-drug interactions. To maximize our ability to dissect complex patterns from these complex datasets, it is important to continue developing novel analytic approaches. In response to RFA-GM-10-001 ‘Pharmacogenomics Research Network (PGRN), we have created a network resource, the PGRN STatistical Analysis Resource (P-STAR) for coordination of statistical analysis and methods development in the PGRN.

Aim 1: To provide a PGRN consultation and statistical review service

Aim 3. To develop novel methods and analytic approaches for pharmacogenomics data.

Currently, genome-wide association studies (GWAS) are an area of focus for many PGRN sites. Over the course of the next five years, PGRN sites will likely generate data via exomic or whole-genome sequencing as well as epigenomics and high-throughput data generation platforms. Methods developed by this resource will stay current with the needs of the PGRN. We currently have three proposed areas of methodological development; recognizing that these are intended to be examples of the methods developed in the context of pharmacogenomic studies, and that novel methods for analysis of pharmacogenomic phenotypes will continue over the 5-year cycle.

SCAN: SNP and Copy Number Variant Annotation Network (University of Chicago): The PAAR group at the University of Chicago has developed a database and web site to facilitate annotation of SNPs and CNVs (http://scandb.org) that provides not only physical, functional, linkage disequilibrium (LD) annotations, but also serves results of qQTL studies that we have conducted in the HapMap CEU and YRI lymphoblastoid cell lines (LCLs)2-5 as an adjunct to PAAR studies on the cytotoxicity of chemotherapeutic agents2-5. Thus, using the SCAN database, it is possible to enter a list of SNPs – the top signals from a genome-wide association study (GWAS), for example – and obtain physical, functional, LD annotations for each SNP, as well as an indication of distant and local transcript levels for which that SNP is a significant predictor.

CGANs for integration of genomic data (St. Jude Children’s Research Hospital): The group at St. Jude Children’s Research Hospital will develop a methodology to integrate various types of genomic data (SNP genotypes, CNVs, mRNA expressions, etc.) and multiple related phenotypes into coherent genomic association networks (CGANs) that reveal inter-relationships among different types of genomic features and their associations with (effects on) phenotypes of interest in pharmacogenomics studies. The molecular mechanism underlying inter-related, multiple phenotypes can be a combination of effects from different genomic components at different levels. Thus integrated analyses combining various genomic features can help elucidate further the underlying biology than single-feature screening methods.

The Biofilter for Using Prior Knowledge in the Analysis Pipeline (Vanderbilt University): Recently, the Vanderbilt University group proposed a strategy that steps beyond the annotation and grouping of independent SNP effects, but does not attempt to jointly model large numbers of SNPs simultaneously. The Biofilter is a tool for knowledge-driven multi-SNP analysis of large scale SNP data using information from public databases (The Gene Ontology, The Database of Interacting Proteins, The Protein Families Database, The Kyoto Encyclopedia of Genes and Genomes, Reactome, and Biopath). As part of P-STAR, we will continue to develop the Biofilter resource. In its current implementation, it was primarily developed for common disease GWAS data. There are a number of knowledge sources that would primarily be important to include for pharmacogenomics studies in particular including: PharmGKB VIP genes/pathways, DMET pathways, and eQTL data. We will integrate these and other relevant additional sources of information into the Biofilter.

Aim 2. To disseminate information regarding the state-of-the-art analytical methods and approaches for pharmacogenomics data.

The Team

P-STAR Analysis Workshops

The PGRN analysis workshop has had enormous success over the past six years. Participation has been between 50-100 individuals from 10-13 PGRN sites and several affiliate members. In four of the workshops, we announced a call for ‘research-in-progress’ abstracts. This resulted in presentations/posters describing new methodology developments, simulation studies, real-world issues, and real data analysis. In the other year, we had a GAW-like workshop (GAW = Genetic Analysis Workshop). The goal is to submit a collaborative manuscript summarizing the important topics of the meeting. P-STAR will organize the collection of information and writing of this yearly manuscript.

Marylyn Ritchie, PIVanderbilt University

It is anticipated that all network sites of PGRN-III will likely have a well-established statistical analysis team; however, there are occasions when additional analytical advice and expertise may be needed. Our goal is to provide consultation in the areas of study design, analysis plans, and power calculations for cross-network projects that develop post-award and site-specific projects lacking in specific expertise for a new direction. In addition, we will serve as a review board for PGRN network projects when needed.

Potential Review Services

RIKEN proposal, statistical review Next-gen sequencing proposal, statistical review Cancer Working group proposal, statistical review PGRN Annual meeting abstract review P-STAR workshop abstract review

Potential Consultation Services Study design Analysis plans Power calculations

*This is not a fee-for-service resource. No monetary payments will be needed for standard consultation. P-STAR wants to promote collaboration and co-authorship rather than fee-for-service. We did not, however, budget to perform statistical analysis for new projects. If needed, P-STAR will work with sites to apply for supplemental funding as described in RFA # RFA-GM-10-001*

Proposed P-STAR Workshop Schedule

Year 1: Works in Progress October 20, 2010 – Vanderbilt University, Nashville, TNYear 2: GAW-like workshop, Fall/Winter 2011Year 3: Works in Progress, Fall/Winter 2012Year 4: Works in Progress, Fall/Winter 2013Year 5: GAW-like workshop, Fall/Winter 2014

P-STAR Educational Workshops

P-STAR has also proposed to host one educational workshop in Pharmacogenomics Statistical Analysis Issues each year in conjunction with another annual scientific meeting (ASHG, ASCPT, AACR, etc). Topics may include but are not limited to: GWAS, follow-up after GWAS, next-generation sequencing, population substructure, and gene-gene-drug interactions. The goal is to educate the pharmacogenomics community in the state-of-the-art techniques and study design for the field. P-STAR will submit workshop proposals and send up to three speakers per year.

P-STAR Methods/Software Website

The final approach for dissemination of pharmacogenomics analysis information includes providing software and methods developed by the PGRN sites on a central website. Many PGRN sites conduct methods and software development and make their software available via their institutional websites. While this approach can work, we propose to develop a centralized website to distribute this software to a larger audience, and to facilitate more rapid recognition of novel approaches.

Nancy CoxUniversity of Chicago

Cheng ChengSt. Jude Children’s Hospital

Christopher AmosMD Anderson Cancer Center

David ContiUSC

Eric JorgensonGallo Research Center, UCSF

Christoph LangeHarvard University

Gary RosnerJohns Hopkins

Michael ProvinceWashington University

Daniel SchaidMayo Clinic

Lang LiIndiana University

John WitteUCSF

Maria RitchieP-STAR Coordinator

Ex

pe

rt C

on

su

lta

nts

P-S

TAR

Le

ade

rsh

ip T

eam

Documents

P-STAR: PGRN Statistical Analysis Resource Vanderbilt University, Nashville, TN ABSTRACT Pharmacogenomics is the study of the relationship between individual