33
Crowdsourcing Gene Annotation Anurag Priyam

Biocuration - Crowdsourcing Gene Annotation

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Biocuration - Crowdsourcing Gene Annotation

Crowdsourcing Gene Annotation

Anurag Priyam

Page 2: Biocuration - Crowdsourcing Gene Annotation
Page 3: Biocuration - Crowdsourcing Gene Annotation

Sequencing cost

• Sequencing genomes is now inexpensive.

• Many many genomes are now being sequenced.

Page 4: Biocuration - Crowdsourcing Gene Annotation

Gene prediction

ab initioSearch genome for signs

Sequence similarity basedSearch genome for known sequences

Page 5: Biocuration - Crowdsourcing Gene Annotation

Gene prediction is challenging

Some examples:

• missing exon

• truncated or overextended exon

• gene split into several gene predictions (e.g. if introns are very large)

• merged genes (two adjacent gene models are predicted to be a single “megagene”)

Page 6: Biocuration - Crowdsourcing Gene Annotation

Incorrect Gene Prediction is Problematic

• studying gene family evolution

• RNAseq analyses

• molecular evolution analyses

Page 7: Biocuration - Crowdsourcing Gene Annotation

Manual curation

• yields the best gene models

• is time consuming

• plausible for large communities (e.g. Human, C. elegans)

• but what if a small lab sequenced their favorite bug’s genome?

Page 8: Biocuration - Crowdsourcing Gene Annotation
Page 9: Biocuration - Crowdsourcing Gene Annotation
Page 10: Biocuration - Crowdsourcing Gene Annotation

Crowdsourcing

Page 11: Biocuration - Crowdsourcing Gene Annotation
Page 12: Biocuration - Crowdsourcing Gene Annotation

GalaxyZoo

Page 13: Biocuration - Crowdsourcing Gene Annotation
Page 14: Biocuration - Crowdsourcing Gene Annotation
Page 15: Biocuration - Crowdsourcing Gene Annotation

Foldit

Foldit

Page 16: Biocuration - Crowdsourcing Gene Annotation

Foldit players contribute to real science!

• Christopher B Eiben et al (2012) Increase Diels-Alderase activity through backbone remodelling guided by Foldit players. Nature Biotechnology.

• Firas Khatib et al (2011) Crystal structure of a monomeric retroviral protease solved by protein folding game players. Nature Structural and Molecular Biology.

• Firas Khatib et al (2011) Algorithm discovery by protein folding game players. Proceeding of the National Academy of Sciences.

Page 17: Biocuration - Crowdsourcing Gene Annotation

Crowdsourcing works

• GalaxyZoo volunteers have discovered real galaxies.

• Foldit players have guided real scientific questions.

Page 18: Biocuration - Crowdsourcing Gene Annotation

Can we crowdsource gene model

curation?

Page 19: Biocuration - Crowdsourcing Gene Annotation

Challenges

• recruiting contributors

• retaining contributors

• ensuring quality gene models

Page 20: Biocuration - Crowdsourcing Gene Annotation

Lower the entry barrier to contribution

• contributors refine one gene model at a time

• present gene model based on user’s experience – beginners see easy to curate models

• tutorial or learning tasks

• assisting UI

Page 21: Biocuration - Crowdsourcing Gene Annotation

Social network

• Passive recruitment: post curation activities of contributors to social network.• Cathy contributed to cancer research by refining three

gene models. Can you help too?• Mike helped researchers understand how ant societies

are organised by refining two gene models.• Amos earned the “expert gene curator badge” by curating

1000 gene models.

• Active recruitment of friends on social network.

Page 22: Biocuration - Crowdsourcing Gene Annotation

Challenges

• recruiting contributors

• retaining contributors

• ensuring quality gene models

Page 23: Biocuration - Crowdsourcing Gene Annotation

Retaining contributors

Learning experience

Helping science

Prestige & pride:

• points and badges.

• being featured on our leaderboard.

• acknowledgement or coauthorship in publication

• responsibility: “senior” contributors are • asked to arbitrate between conflicting submissions of junior

contributors. • asked to curate a specific set of genes (developing expertise)

Page 24: Biocuration - Crowdsourcing Gene Annotation

Challenges

• recruiting contributors

• retaining contributors

• ensuring quality gene models

Page 25: Biocuration - Crowdsourcing Gene Annotation
Page 26: Biocuration - Crowdsourcing Gene Annotation

Ensure quality gene models

• make tasks small & simple

• beginners are trained

• redundant curation

• review of conflicts by experienced users.

Page 27: Biocuration - Crowdsourcing Gene Annotation

Work in progress

• gene prediction: MAKER2

• gene visualization & editing: Jbrowse (WebApollo)

• http://afra.sbcs.qmul.ac.uk

• Our code: Ruby, Sinatra, DataMapper, jQuery

Page 28: Biocuration - Crowdsourcing Gene Annotation

Summary

• many emerging model organsims are being studied

• gene prediction hasn’t caught up yet

• manual curation requires huge amount of time

• crowdsourcing exists

• crowdsourcing works – even in science

• there are many challenges

• work in progress

Page 29: Biocuration - Crowdsourcing Gene Annotation

Thanks

http://afra.sbcs.qmul.ac.uk

Dr. Yannick WurmDr. Mitchell E. SkinnerDr. Mark Yandell

Page 30: Biocuration - Crowdsourcing Gene Annotation

Task

Page 31: Biocuration - Crowdsourcing Gene Annotation

Recruiting alternatives

• Force upon students –curriculum (learn / practical)

• Pay people

Page 32: Biocuration - Crowdsourcing Gene Annotation

Summing up

• 180+ eukaryotic genomes and more coming

• gene prediction hasn’t caught up

• best gene models are manually curated

• manual curation can take hours to days• curating a full genome can take years

Page 33: Biocuration - Crowdsourcing Gene Annotation