Upload
anurag-priyam
View
442
Download
5
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
Crowdsourcing Gene Annotation
Anurag Priyam
Sequencing cost
• Sequencing genomes is now inexpensive.
• Many many genomes are now being sequenced.
Gene prediction
ab initioSearch genome for signs
Sequence similarity basedSearch genome for known sequences
Gene prediction is challenging
Some examples:
• missing exon
• truncated or overextended exon
• gene split into several gene predictions (e.g. if introns are very large)
• merged genes (two adjacent gene models are predicted to be a single “megagene”)
Incorrect Gene Prediction is Problematic
• studying gene family evolution
• RNAseq analyses
• molecular evolution analyses
Manual curation
• yields the best gene models
• is time consuming
• plausible for large communities (e.g. Human, C. elegans)
• but what if a small lab sequenced their favorite bug’s genome?
Crowdsourcing
GalaxyZoo
Foldit
Foldit
Foldit players contribute to real science!
• Christopher B Eiben et al (2012) Increase Diels-Alderase activity through backbone remodelling guided by Foldit players. Nature Biotechnology.
• Firas Khatib et al (2011) Crystal structure of a monomeric retroviral protease solved by protein folding game players. Nature Structural and Molecular Biology.
• Firas Khatib et al (2011) Algorithm discovery by protein folding game players. Proceeding of the National Academy of Sciences.
Crowdsourcing works
• GalaxyZoo volunteers have discovered real galaxies.
• Foldit players have guided real scientific questions.
Can we crowdsource gene model
curation?
Challenges
• recruiting contributors
• retaining contributors
• ensuring quality gene models
Lower the entry barrier to contribution
• contributors refine one gene model at a time
• present gene model based on user’s experience – beginners see easy to curate models
• tutorial or learning tasks
• assisting UI
Social network
• Passive recruitment: post curation activities of contributors to social network.• Cathy contributed to cancer research by refining three
gene models. Can you help too?• Mike helped researchers understand how ant societies
are organised by refining two gene models.• Amos earned the “expert gene curator badge” by curating
1000 gene models.
• Active recruitment of friends on social network.
Challenges
• recruiting contributors
• retaining contributors
• ensuring quality gene models
Retaining contributors
Learning experience
Helping science
Prestige & pride:
• points and badges.
• being featured on our leaderboard.
• acknowledgement or coauthorship in publication
• responsibility: “senior” contributors are • asked to arbitrate between conflicting submissions of junior
contributors. • asked to curate a specific set of genes (developing expertise)
Challenges
• recruiting contributors
• retaining contributors
• ensuring quality gene models
Ensure quality gene models
• make tasks small & simple
• beginners are trained
• redundant curation
• review of conflicts by experienced users.
Work in progress
• gene prediction: MAKER2
• gene visualization & editing: Jbrowse (WebApollo)
• http://afra.sbcs.qmul.ac.uk
• Our code: Ruby, Sinatra, DataMapper, jQuery
Summary
• many emerging model organsims are being studied
• gene prediction hasn’t caught up yet
• manual curation requires huge amount of time
• crowdsourcing exists
• crowdsourcing works – even in science
• there are many challenges
• work in progress
Thanks
http://afra.sbcs.qmul.ac.uk
Dr. Yannick WurmDr. Mitchell E. SkinnerDr. Mark Yandell
Task
Recruiting alternatives
• Force upon students –curriculum (learn / practical)
• Pay people
Summing up
• 180+ eukaryotic genomes and more coming
• gene prediction hasn’t caught up
• best gene models are manually curated
• manual curation can take hours to days• curating a full genome can take years