19
EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center University of California, San Francisco (AKA BCBC HDFCCC UCSF)

EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center

Embed Size (px)

Citation preview

Page 1: EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center

EGAN: Exploratory Gene Association Networks

by Jesse PaquetteBiostatistics and Computational Biology Core

Helen Diller Family Comprehensive Cancer CenterUniversity of California, San Francisco

(AKA BCBC HDFCCC UCSF)

Page 2: EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center

EGAN http://akt.ucsf.edu/EGAN/

• Features– Downloadable Java application –

• but could be re-composed as components for web service architecture– Graphics provided by Cytoscape; graph layout algorithms imported from open

source– Data pre-loaded for analysis. Each data set must include assay id, a measure

(e.g., correlation coefficient, expression level) and significance value (e.g., p value)

– Currently for Human and Rat Genome, but other model species in August (including arabidopsis)

• Key focus- interactive analysis of sets of genes– User identifies the sets interactively– Enrichment -- uses Fishers exact test to see whether genes in a pathway are

“overrepresented” relative to chance selection. Based on hypergeometric distribution, an n choose k sampling distribution

– Gene sets graphed based on relationships• Counts (simply connect each gene to others in the set– can graph multiple sets)• Protein-protein interaction• Co-occurrence in literature

– Access to pub med literature and external links• For demos, slides, presentations

http://akt.ucsf.edu/EGAN/documentation.php

Page 3: EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center

Producing insight from clusters and gene lists

• Summarize: find enriched pathways (and other gene sets)– Hypergeometric over-representation

• DAVID– Global trends

• GSEA

• Visualize: gene relationships in a graph– Protein-protein interactions

• Cytoscape– Network module discovery

• Ingenuity IPA– Literature co-occurrence

• PubGene

• Contextualize: pertinent literature• PubMed• Google• iHOP

Page 4: EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center

High-throughput experiments

• EGAN applies to– Expression microarrays– aCGH– SNP/CNV arrays– MS/MS Proteomics– DNA methylation– ChIP-Seq– RNA-Seq– In-silico experiments

• If parts of the output can be mapped to gene IDs– You can use EGAN

Page 5: EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center
Page 6: EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center
Page 7: EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center

Gene sets

• EGAN contains a database of gene sets– You can also add your own– Download from MSigDB (Broad)

• A gene set defines a semantically-meaningful subset of genes– Signaling or metabolic pathway– Gene Ontology (GO) term– Previously-reported gene list (“signature”)– Cytoband– Transcription factor targets– miRNA targets– Conserved domain– Drug targets– &c.

Page 8: EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center
Page 9: EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center
Page 10: EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center

Gene-gene relationships

• EGAN contains– Protein-protein interactions (PPI)– Literature co-occurrence– Chromosomal adjacency– Kinase-target relationships

Page 11: EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center
Page 12: EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center

The article will be shown in your default web browser.

Page 13: EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center
Page 14: EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center

Finding Counts

Page 15: EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center
Page 16: EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center
Page 17: EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center

EGAN Summary: Exploratory Gene Association Networks

• Methods: state-of-the-art analysis of clusters and gene lists– Hypergeometric enrichment of gene sets– Global trends of gene sets– Graph visualization– Literature identification– Network module discovery

• User Interface: responds quickly to new queries from the biologist– Fluid adjustment of p-value cutoffs– Point-and-click interface– All data in-memory for immediate access– Links to external websites

• Modular: integrates as a flexible plug-and-play cog – All data is customizable– Proprietary data can be restricted to the client location– Java runs on almost every OS (PC, Mac, LINUX)– Can be configured and launched from a different application (e.g. GenePattern)– Analyses can be scripted for automation

Page 18: EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center

Keys to getting the most out of EGAN

• Don’t panic!• Load as much data as possible

– Assay results for every gene– Multiple experiments– Pathways and gene sets

• MSigDB– Previously-published gene lists

and clusters• Supplementary data • Oncomine

• Think about the context of the experiment

– Show appropriate genes on graph• Think about the semantic meaning

of the enriched gene sets– Show appropriate gene sets on

graph

• Follow links to literature• Use appropriate Google/PubMed

search queries• Create high-quality reports

– Save your custom gene sets– Export graph screenshots to PDF– Export tables with enrichment

scores to Excel– Record details in your lab

notebook

Page 19: EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center

Where to find EGAN

• Website– http://akt.ucsf.edu/EGAN/

• 2010 paper in Bioinformatics– http://www.ncbi.nlm.nih.gov/pubmed/19933825