ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

  • Published on
    10-May-2015

  • View
    728

  • Download
    6

Embed Size (px)

DESCRIPTION

some animations don't adapt well to static slides -- download the ppt file to view...

Transcript

  • 1.The Gene Wiki: Crowdsourcing human gene annotationAndrew Su, Ph.D.Department of Molecular and Experimental Medicine The Scripps Research InstituteBiocuration 2012 April 2, 2012

2. 2The Long Tail is a prolific source of contentShortHeadContent producedLong TailContributors (sorted)News :Newspapers Blogs Video: TV/HollywoodYouTubeProduct reviews:Consumer reports Amazon reviewsFood reviews:Food critics YelpTalent judging: OlympicsAmerican IdolGene annotation: Manual curation Gene Wiki 3. 3We can harness theLong Tail of scientiststo directly participate inthe gene annotationprocess. 4. 4Wikipedia is reasonably accurate 5. 5Wikipedia has breadth and depth ArticlesWords(millions) Wikipedia BritannicaOnline http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons, July 2008 6. Filtering, extracting, and summarizing PubMedDocuments Concepts 7. 7Wiki success depends on a positive feedbackGene wiki page utility 1 100 2 200Number ofNumber of contributorsusers 8. 8 10,000 gene stubs within WikipediaUtility UsersContributors Protein structureGenesummarySymbols and identifiers Gene Ontologyannotations ProteininteractionsTissue expressionLinked patternreferences Links to structured databasesHuss, PLoS Biol, 2008 9. 9 Gene Wiki has a critical mass of readersUtilityUsers Contributors Total: ~4.3 million views / monthHuss, PLoS Biol, 2008; Good, NAR, 2011 10. 10 Gene Wiki has a critical mass of editorsUtility~10,000 words added / monthUsersContributors Total 1.42 million words 230 full-length articles4.3 million views / month Cumulative editsProductive edits 1000 edits / monthVandalismGood, NAR, 2011 11. 11A review article for every gene is powerfulReelin: 68 editors, 543 edits since July 2002Heparin: 175 editors, 320 edits since June 2003AMPK: 44 editors, 84 edits since March 2004RNAi: 232 editors, 708 edits since October 2002References to the literature Hyperlinks to related concepts 12. 12Making the Gene Wiki more computableFree text Structured annotations 13. 13Filling the gaps in gene annotation NCBI Entrez Gene: 3362 Gene Wiki mappingWikilinkCandidateassertion GO:0004993 GO exact synonym 14. 14Filling the gaps in gene annotation NCBI Entrez Gene: 334 Gene Wiki mappingWikilinkCandidateassertion GO:0006897 GO exactmatch 15. Disease associations mined from the Gene WikiGood, BMC Genomics 2011, 12:603Gene Wiki Articles(10,271) 23% exact matchFilter out5% match seeded text parent2% matchchild 70% have NCBO no match AnnotatorMatched Disease2147Compare toOntology terms candidateDO database(2983)annotations 16. Disease associations mined from the Gene Wiki Good, BMC Genomics 2011, 12:603Expert curationCorrect Incorrect: 10%86% Maybe: 4% Overall specificity: 90-93% 17. GO associations mined from the Gene WikiGood, BMC Genomics 2011, 12:603Gene Wiki Articles(10,271) 17% exact matchFilter out seeded text 26% match parent 55% have NCBOno match Annotator 2% match child Matched Gene6319Compare to Ontology termscandidateGO database(11,022)annotations 18. GO associations mined from the Gene WikiGood, BMC Genomics 2011, 12:603Expert curationCorrect14%Maybe 60%26%Incorrect Overall specificity: 48-64% 19. 19Common sources of error in GO associationsGood, BMC Genomics 2011, 12:603 1) Incorrect concept recognitionOR2F1: Olfactory receptors areresponsible for the recognition and G protein-mediated transduction of odorant signals. Signal transduction (GO:0007165)Transduction (GO:0009293) The cellular process in which a signalThe transfer of genetic information to a is conveyed to trigger a change in thebacterium from a bacteriophage or activity or state of a cell. Signal between bacterial or yeast cells transduction begins with reception of a mediated by a phage vector. signal, e.g. a ligand binding to a receptor or receptor activation by a stimulus such as light, and ends with regulation of a downstream cellular process 20. 20Common sources of error in GO associations Good, BMC Genomics 2011, 12:6032) Incorrect sentence contextMEF2C: Several post translationalmodifications have been identified includingphosphorylation on serine-59 DephosphorylationExcretionPhosporylationGene expressionGlycosylationLocalization MEF2C Neurogenesis MethylationProteolysisSecretionTransportMyelination TranscriptionTranslation 21. 21Novel GO annotations so what? 631911,022 ~100,000novel4703 (43%)annotations annotationsannotations match knownmined from from GO@ 48-64% annotations Gene Wikiconsortium specificity 22. 22Gene Wiki content improves enrichment analysisaxonEnrichmentguidance GO term analysis(GO:0007411) 811 articles 264 genes PubMedConcept Gene list abstractsrecognition GO:0007411YesNoLinked genes Yes 132 through No 251 12033 PubMed P = 1.55 E-20 23. 23Gene Wiki content improves enrichment analysis muscleEnrichment contraction GO termanalysis(GO:0006936) 251 articles87 genesPubMedConcept Gene listabstractsrecognition +Gene Wiki 87 articles GO:0006936 GO:0006936Linked genes Linked genes throughthrough PubMedPubMed + Gene Wiki P = 1.0P = 1.22 E-09 24. 24Gene Wiki content improves enrichment analysis Morep-value significant(PubMed + GW)PubMed onlyMusclecontraction Moresignificant PubMed + GW p-value (PubMed only) 25. 25Challenges and future directions How to complement and integrate with traditional biocuration workflows? How to disseminate and utilize crowdsourced annotations? 26. 26The Long Tail of scientistsis a valuable source ofinformation on genefunction 27. 27 CollaboratorsGroup membersDoug Howe, ZFIN Erik Clarke Ian MacleodJohn Hogenesch, U PennJon Huss, GNFBen Good (*)Chunlei WuLuca de Alfaro, UCSCSalvatore LoguercioAngel Pizzaro, U PennFaramarz Valafar, SDSUPierre Lindenbaum,Fondation Jean DaussetMichael Martone, RushSee poster # 30 for more onKonrad Koehler, Karo BioWarren Kibbe, Simon Lim, Northwesternthe Gene Wiki andMany Wikipedia editorscrowdsourcing in biology!WP:MCB Project Contact http://sulab.orgasu@scripps.edu@andrewsu+Andrew SuFunding and Support (BioGPS: GM83924, Gene Wiki: GM089820) 28. 28Making the Gene Wiki more reliableNovartis is a multinational 2 The company name is derivedpharmaceutical company from old Greek, and means based in Basel, Switzerland "destroyer of birds".that manufactures drugs such as clozapine (Clozaril), diclofenac (Voltaren), 2 29. 29Making the Gene Wiki more reliableNovartis is a multinational 2 The company name is derivedpharmaceutical company from old Greek, and means based in Basel, Switzerland "destroyer of birds".that manufactures drugs such as clozapine (Clozaril), diclofenac (Voltaren), 36211 total edits36 total edits**************High-trust author Low-trust authorhttp://www.wikitrust.net/