ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

  • Published on

  • View

  • Download

Embed Size (px)


Note, several slides use animation, so for best display please download and view in Powerpoint.


  • 1.The Gene Wiki: Crowdsourcing human gene annotation Andrew Su, Ph.D. The Scripps Research InstituteISMBSpecial Session: Harnessing community intelligence for bioinformatics#ISMB #SS7 July 17, 2012

2. 2The Long Tail is a prolific source of contentShortHeadContent producedLong TailContributors (sorted)News :Newspapers Blogs Video: TV/HollywoodYouTubeProduct reviews:Consumer reports Amazon reviewsFood reviews:Food critics YelpTalent judging: OlympicsAmerican IdolGene annotation: Manual curation Gene Wiki 3. 3We can harness theLong Tail of scientiststo directly participate inthe gene annotationprocess. 4. 4Wikipedia is reasonably accurate 5. 5Wikipedia has breadth and depth ArticlesWords(millions) Wikipedia BritannicaOnline, July 2008 6. Filtering, extracting, and summarizing PubMedDocuments Concepts 7. 7Wiki success depends on a positive feedbackGene wiki page utility 1 100 2 200Number ofNumber of contributorsusers 8. 8 10,000 gene stubs within WikipediaUtility UsersContributors Protein structureGenesummarySymbols and identifiers Gene Ontologyannotations ProteininteractionsTissue expressionLinked patternreferences Links to structured databasesHuss, PLoS Biol, 2008 9. 9 Gene Wiki has a critical mass of readersUtilityUsers Contributors Total: ~4.3 million views / monthHuss, PLoS Biol, 2008; Good, NAR, 2011 10. 10 Gene Wiki has a critical mass of editorsUtility~10,000 words added / monthUsersContributors Total 1.42 million words 230 full-length articles4.3 million views / month Cumulative editsProductive edits 1000 edits / monthVandalismGood, NAR, 2011 11. 11A review article for every gene is powerful Reelin: 98 editors, 703 edits since July 2002Hyperlinks to related concepts Heparin: 358 editors, 654 edits since June 2003 AMPK: 109 editors, 203 edits since March 2004 RNAi: 394 editors, 994 edits since October 2002 References to the literature 12. 12Making the Gene Wiki more computableFree text Structured annotations 13. 13Filling the gaps in gene annotation Good, BMC Genomics 2011, 12:603 NCBI Entrez Gene: 3362 Gene Wiki mappingWikilink Candidate assertion GO:0004993 GO exact synonym Annotator 14. 14Filling the gaps in gene annotation Good, BMC Genomics 2011, 12:603 NCBI Entrez Gene: 334 Gene Wiki mappingWikilink Candidate assertion GO:0006897 GO exactmatch Annotator 15. 15Novel GO annotations so what?Good, BMC Genomics 2011, 12:603 631911,022 ~100,000novel4703 (43%)annotations annotationsannotations match knownmined from from GO@ 48-64% annotations Gene Wikiconsortium specificity 16. 16Gene Wiki content improves enrichment analysisaxonEnrichmentguidance GO term analysis(GO:0007411) 811 articles 264 genes PubMedConcept Gene list abstractsrecognition GO:0007411YesNoLinked genes Yes 132 through No 251 12033 PubMed P = 1.55 E-20 17. 17Gene Wiki content improves enrichment analysis muscleEnrichment contraction GO termanalysis(GO:0006936) 251 articles87 genesPubMedConcept Gene listabstractsrecognition +Gene Wiki 87 articles GO:0006936 GO:0006936Linked genes Linked genes throughthrough PubMedPubMed + Gene Wiki P = 1.0P = 1.22 E-09 18. 18Gene Wiki content improves enrichment analysisMorep-valuesignificant with(PubMed + GW) PubMed only Muscle contraction Moresignificant withPubMed + GW p-value (PubMed only) 19. 19Gene Wiki+ for integrative queriesmwsync 20. 20Dynamic queries across genes, diseases, SNPs 21. 21 22. 22TOP 100GENES 23. 23Gene Wiki+ for integrative queries mwsyncOMIMPharmGKB {{#ask: [[Category:Human_proteins]] [[is_associated_with:: [[Category:Breast_cancer] ]]] [[HasSNP:: [[is_associated_with:: 24. 24Gene Wiki+ for integrative queriesmwsync OMIM PharmGKB 25. 25The Long Tail of scientistsis a valuable source ofinformation on genefunction 26. 26Crowdsourcing a gene annotation portal 27. 27 CollaboratorsGroup membersDoug Howe, ZFIN Erik Clarke Ian MacleodJohn Hogenesch, U PennJon Huss, GNFBen GoodMax NanisLuca de Alfaro, UCSCSalvatore Loguercio Chunlei WuAngel Pizzaro, U PennFaramarz Valafar, SDSUPierre Lindenbaum,Fondation Jean DaussetISMB travel supportMichael Martone, RushKonrad Koehler, Karo BioWarren Kibbe, Simon Lim, NorthwesternMany Wikipedia editorsWP:MCB Project Contact SuFunding and Support (BioGPS: GM83924, Gene Wiki: GM089820)