Transcript
Page 1: ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

The Gene Wiki: Crowdsourcing human gene annotation

Andrew Su, Ph.D.The Scripps Research Institute

ISMBSpecial Session: Harnessing community

intelligence for bioinformatics#ISMB #SS7

July 17, 2012

Page 2: ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

The Long Tail is a prolific source of content2

ShortHead

Long Tail

Content produced

Contributors (sorted)

News :Video:

Product reviews:Food reviews:Talent judging:

Gene annotation:

NewspapersTV/Hollywood

Consumer reportsFood criticsOlympics

Manual curation

BlogsYouTube

Amazon reviewsYelp

American IdolGene Wiki

Page 3: ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

3

We can harness the Long Tail of scientists to directly participate in

the gene annotation process.

Page 4: ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

Wikipedia is reasonably accurate4

Page 5: ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

Wikipedia has breadth and depth5

http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons, July 2008

Articles

Words(millions)

Wikipedia Britannica Online

Page 6: ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

Filtering, extracting, and summarizing PubMed

Documents

Concepts

Page 7: ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

Wiki success depends on a positive feedback7

Gene wiki page utility

Number ofusers

Number ofcontributors

1001

2002

Page 8: ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

10,000 gene “stubs” within Wikipedia8

Protein structure

Symbols and identifiers

Tissue expression pattern

Gene Ontology annotations

Links to structured databases

Gene summary

Protein interactions

Linked references

Huss, PLoS Biol, 2008

Utility

Users

Contributors

Page 9: ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

Gene Wiki has a critical mass of readers9

Total: ~4.3 million views / month

Huss, PLoS Biol, 2008; Good, NAR, 2011

Utility

Users

Contributors

Page 10: ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

Gene Wiki has a critical mass of editors10

Good, NAR, 2011

Utility

Users

Contributors

Cum

ulat

ive

edits

Productive edits

Vandalism

~10,000 words added / month

4.3 million views / month

1000 edits / month

Total 1.42 million words ≈ 230 full-length articles

Page 11: ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

A review article for every gene is powerful11

References to the literature

Hyperlinks to related conceptsReelin: 98 editors, 703 edits since July 2002

Heparin: 358 editors, 654 edits since June 2003

AMPK: 109 editors, 203 edits since March 2004

RNAi: 394 editors, 994 edits since October 2002

Page 12: ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

Making the Gene Wiki more computable12

Structured annotationsFree text

Page 13: ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

Annotator

Filling the gaps in gene annotation13

Wikilink

GO exact synonym

Gene Wiki mapping

NCBI Entrez Gene: 3362

GO:0004993

Candidate assertion

Good, BMC Genomics 2011, 12:603

Page 14: ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

Annotator

Filling the gaps in gene annotation14

Wikilink

GO exact match

Gene Wiki mapping

NCBI Entrez Gene: 334

GO:0006897

Candidate assertion

Good, BMC Genomics 2011, 12:603

Page 15: ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

Novel GO annotations – so what?15

11,022 annotations mined from Gene Wiki

4703 (43%) match known annotations

~100,000 annotations

from GO consortium

6319 “novel”

annotations @ 48-64% specificity

Good, BMC Genomics 2011, 12:603

Page 16: ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

Gene Wiki content improves enrichment analysis16

GO term

Gene listConcept

recognitionPubMed abstracts

Enrichment analysis

GO:0007411

axon guidance

(GO:0007411)

264 genes

Linked genes through PubMed

P = 1.55 E-20

811 articles

Yes No

Yes 13 2

No 251 12033

Page 17: ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

Gene Wiki content improves enrichment analysis17

GO term

Gene listConcept

recognitionPubMed abstracts

Gene Wiki

+

Enrichment analysis

GO:0006936 GO:0006936

muscle contraction

(GO:0006936)

87 genes

Linked genes through PubMed

Linked genes through

PubMed + Gene Wiki

P = 1.0 P = 1.22 E-09

251 articles

87 articles

Page 18: ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

Gene Wiki content improves enrichment analysis18

p-value (PubMed only)

p-value (PubMed + GW)

Muscle contraction

More significant with PubMed + GW

More significant with PubMed only

Page 19: ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

Gene Wiki+ for integrative queries19

http://genewikiplus.org

mwsync

Page 20: ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

Dynamic queries across genes, diseases, SNPs20

Page 21: ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

21

Page 22: ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

22

TOP 100 GENES

Page 23: ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

Gene Wiki+ for integrative queries23

http://genewikiplus.org

mwsync

{{#ask: [[Category:Human_proteins]] [[is_associated_with:: <q>[[Category:Breast_cancer]]</q>]] [[HasSNP:: <q>[[is_associated_with:: <q>[[Category:Breast_cancer]]</q>]] </q>]]}}

OMIMPharmGKB

Page 24: ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

OMIMPharmGKB

Gene Wiki+ for integrative queries24

http://genewikiplus.org

mwsync

Page 25: ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

The Long Tail of scientists is a valuable source of

information on gene function

25

Page 26: ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

Crowdsourcing a gene annotation portal26

Page 27: ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

27

Doug Howe, ZFINJohn Hogenesch, U PennJon Huss, GNFLuca de Alfaro, UCSCAngel Pizzaro, U PennFaramarz Valafar, SDSUPierre Lindenbaum,

Fondation Jean DaussetMichael Martone, RushKonrad Koehler, Karo BioWarren Kibbe, Simon Lim, NorthwesternMany Wikipedia editors

WP:MCB Project

Collaborators

Erik ClarkeBen GoodSalvatore Loguercio

Ian MacleodMax NanisChunlei Wu

Group members

Funding and Support

(BioGPS: GM83924, Gene Wiki: GM089820)

Contacthttp://sulab.org

[email protected]@andrewsu+Andrew Su

ISMB travel support


Recommended