33
The Gene Wiki: Crowdsourcing human gene annotation Andrew Su, Ph.D. @andrewsu [email protected] http://sulab.org GeneGames.org Genome Informatics September 6, 2012 OK OK

GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

Embed Size (px)

DESCRIPTION

Talk given at the Genome Informatics conference 2012 at Robinson College, Cambridge University.

Citation preview

Page 1: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

The Gene Wiki: Crowdsourcing human gene annotation

Andrew Su, Ph.D.@andrewsu

[email protected]://sulab.org

GeneGames.org

Genome Informatics

September 6, 2012

OK

OK

Page 2: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

The Gene Wiki crib sheet

• Bulk creation of ~10k Wikipedia articles (http://dx.doi.org/10.1371/journal.pbio.0060175)

• Monthly stats: > 4 million views, > 1000 edits (http://

dx.doi.org/10.1093/nar/gkr925) • Text mining reveals novel Gene Ontology and Disease

Ontology annotations (http://dx.doi.org/doi:10.1186/1471-2164-12-603)

• Mash-up with SNPedia for crowdsourced gene-disease database (http://www.jbiomedsem.com/content/3/S1/S6)

• Merging Wikipedia with the Semantic Web (http://dx.doi.org/10.1093/database/bar060)

2

http://www.slideshare.net/andrewsu

Page 3: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

3

http://www.flickr.com/photos/archana3k1/4124330493/

Seven million human hours

Page 4: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

4

Twenty million human hours

http://www.flickr.com/photos/ableman/2171326385/

Page 5: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

-5

150 billion human hours

http://www.flickr.com/photos/rvp-cw/6243289302/

per year

Page 6: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

Using games to fold proteins6

Fold.it players have successfully:• Outperformed state of the art protein

folding algorithms (Cooper, Nature, 2010)

• Solved a previously-intractable crystal structure (Khatib, Nat Struct Mol Biol, 2011)

• Designed an improved protein folding algorithm (Khatib, PNAS, 2011)

• Improved enzyme activity of de novo designed enzyme (Eiben, Nat Biotechnol, 2011)

http://fold.it

Page 7: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

Using games to fold RNAs7

http://eterna.cmu.edu/

Page 8: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

Using games to align sequences 8

http://phylo.cs.mcgill.ca

Page 9: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

Using games to annotate genes?9

http://genegames.org

Page 10: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

No good gene-disease annotation database10

Alzheimer's disease (AD)Lipoprotein glomerulopathySea-blue histiocyte disease

Query: Apolipoprotein E

Page 11: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

No good gene-disease annotation database11

Alzheimer's disease (AD)Lipoprotein glomerulopathy Sea-blue histiocyte diseaseHyperlipoproteinemia, type IIIMacular degeneration, age-relatedMyocardial infarction susceptibility

Query: Apolipoprotein E

Page 12: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

No good gene-disease annotation database12

Alzheimer's disease (AD)Lipoprotein glomerulopathy Sea-blue histiocyte diseaseHyperlipoproteinemia, type IIIMacular degeneration, age-relatedMyocardial infarction susceptibilityHIVPsoriasisVascular Diseases

Query: Apolipoprotein E

?

?

?

?

?

Page 13: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

No good gene-disease annotation database13

Alzheimer's disease (AD)Neuropsychological Tests Cognition Disorders Dementia Cognition Disease Progression Cardiovascular Diseases Coronary Disease Diabetes Mellitus, Type 2 Memory Disorders 

Query: Apolipoprotein E

Memory Coronary Artery Disease Hypertension Mental Status Schedule Psychiatric Status Rating

Scales Hyperlipidemias Atrophy Dementia, Vascular Parkinson Disease Brain Injuries Myocardial Infarction …

477 diseases!

Page 14: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

Play Dizeez to annotate gene-disease links14

3. If it’s ‘right’, you get points

4. Then on to the next question…

2. Click the related disease (only one is “right”)

5. Hurry!

1. Read the clue (gene)

6. Play to win!

Page 15: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

Dizeez players seem pretty smart…15

In total (since Dec 2011):• 207 unique gamers• 1045 games played• 8525 guesses

# Occurrences Gene Disease

7 GAST gastrinoma

7 RBP3 retinoblastoma

7 SSX1 synovial sarcoma

6 TG Graves' disease

6 CRYGC Cataract

6 SOX8 mental retardation

6 WRN Werner syndrome

6 ABL1 leukemia

6 MLL3 leukemia

6 SNAI2 breast carcinoma

Pubmed OMIM PharmGKB Gene Wiki

Page 16: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

Dizeez players seem pretty smart…16

# Occurrences Gene Disease

5 MECOM sarcoma

4 ATF7 cancer

3 ABCB5 acute myeloid leukemia

3 SART1 glioblastoma

3 NCK1 leukemia

3 NEK1 cancer

Pubmed OMIM PharmGKB Gene Wiki

In total (since Dec 2011):• 207 unique gamers• 1045 games played• 8525 guesses

Page 17: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

Using games to predict phenotype from genotype?17

http://genegames.org

The Cure

Page 18: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

Classification problems in genome biology18

cancer normal

find patterns

Classify new samples

cancer

normalSVM

Neural networks

Naïve Bayes

KNN

…100s samples

100,

000s

fea

ture

s

Page 19: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

Random forests19

Sample subset of cases and

featuresTrain decision

treecancer normal

100s samples

100,

000s

fea

ture

s

Page 20: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

Random forests20

cancer normal

100s samples

100,

000s

fea

ture

s

Page 21: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

Random forests21

Classify new samples

cancer

normal

cancer normal

100s samples

100,

000s

fea

ture

s

How to interject biological

knowledge?

Page 22: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

Network-guided forests22

Dutkowski & Ideker (2011). PLoS Computational Biology

Page 23: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

Network-guided forests23

Sample features by PPI

networkTrain decision

treecancer normal

100s samples

100,

000s

fea

ture

s

Page 24: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

Human-guided forests24

Sample features by

human intelligence

Train decision treecancer normal

100s samples

100,

000s

fea

ture

s

Page 25: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

The Cure: Genomic predictors for disease25

Page 26: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

The Cure: Genomic predictors for disease26

Page 27: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

The Cure: Genomic predictors for disease27

Page 28: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

The Cure: Genomic predictors for disease28

Page 29: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

The Cure: Genomic predictors for disease29

Page 30: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

The Cure: Genomic predictors for disease30

Page 31: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

Human-guided forests31

Classify new samples

cancer

normal

Page 32: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

“Critical Assessment”-style challenge32

Will this work? Check our blog after October 15.

Coming soon to genegames.org

Page 33: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

33

Doug Howe, ZFINJohn Hogenesch, U PennJon Huss, GNFLuca de Alfaro, UCSCAngel Pizzaro, U PennFaramarz Valafar, SDSUPierre Lindenbaum,

Fondation Jean DaussetMichael Martone, RushKonrad Koehler, Karo BioWarren Kibbe, Simon Lim, NorthwesternMany Wikipedia editors

WP:MCB Project

Collaborators

Ben GoodSalvatore LoguercioIan Macleod

Max NanisChunlei Wu

Group members

Funding and Support

(BioGPS: GM83924, Gene Wiki: GM089820)

Contacthttp://sulab.org

[email protected]@andrewsu+Andrew Su

Recruiting graduate students in quantitative biology! See http://education.scripps.edu/

@genegame