107
Introduction to Apollo Collaborative genome annotation editing A webinar for the i5K Research Community – Calanoida (copepod) Monica Munoz-Torres | @monimunozto Berkeley Bioinformatics Open-Source Projects (BBOP) Environmental Genomics & Systems Biology Division, Lawrence Berkeley National Laboratory i5k Pilot Project Species Calls | 17 October, 2016 http://GenomeArchitect.org

Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Embed Size (px)

Citation preview

Page 1: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Introduction to ApolloCollaborative genome annotation editing

A webinar for the i5K Research Community – Calanoida (copepod)

Monica Munoz-Torres | @monimunozto

Berkeley Bioinformatics Open-Source Projects (BBOP)Environmental Genomics & Systems Biology Division, Lawrence Berkeley National Laboratory

i5k Pilot Project Species Calls | 17 October, 2016

http://GenomeArchitect.org

Page 2: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Outline

• Today you will discovereffective ways to extract valuable information about a genome through curation efforts.

Page 3: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

After this talk you will...• Better understand ‘curation’ in the context of genome annotation:

assembled genome à automated annotation à manual annotation

• Become familiar with Apollo’s environment and functionality.

• Learn to identify homologs of known genes of interest in your newly sequenced genome.

• Learn how to corroborate and modify automatically annotated gene models using all available evidence in Apollo.

Page 4: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Experimental design, sampling.

Comparative analyses

Official / Merged Gene Set

Manual Annotation

Automated Annotation

SequencingAssembly

Synthesis & dissemination.

This is our focus.

Page 5: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

We must care about curation

Marbach et al. 2011. Nature Methods | Shutterstock.com | Alexander Wild

The gene set of an organism informs a variety of studies:• Characterization: Gene number, GC%, TEs, repeats.• Functional assignments.• Molecular evolution, sequence conservation.• Gene families.• Metabolic pathways.• What makes an organism what it is?

What makes a bee a “bee”?

Page 6: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Genome Curation

Identifies elements that best represent the underlying biology and eliminates elements that reflect systemic errors of automated analyses.

Assigns function through comparative analysis of similar genome elements from closely

related species using literature, databases, and experimental

data.

Apollo

Gene Ontology Resources

Page 7: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

A few things to rememberwhen conducting manual annotation

7BIO-REFRESHER

• KEEPAGLOSSARY HANDYfromcontig tosplicesite

• WHATISAGENE?definingyourgoal

• TRANSCRIPTIONmRNAindetail

• TRANSLATIONreadingframes,etc.

• GENOMECURATIONstepsinvolved

Page 8: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

The gene: a “moving target”

“The gene is a union of genomic

sequences encoding a coherent set of

potentially overlapping

functional products.”

Gerstein et al., 2007. Genome Res

Page 9: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

9

"Gene structure" by Daycd- Wikimedia Commons

BIO-REFRESHER

mRNA

• Although of brief existence, understanding mRNAs is crucial,as they will become the center of your work.

Page 10: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

10BIO-REFRESHER

Reading frames

v In eukaryotes, only one reading frame per section of DNA is biologically relevant at a time: it has the potential to be transcribed into RNA and translated into protein. This is called the OPEN READING FRAME (ORF)• ORF = Start signal + coding sequence (divisible by 3) + Stop signal

Page 11: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

11BIO-REFRESHER

Splice sites

v The spliceosome catalyzes the removal of introns and the ligation of flanking exons.

v Splicing signals (from the point of view of an intron): • One splice signal (site) on the 5’ end: usually GT (less common: GC)• And a 3’ end splice site: usually AG• Canonical splice sites look like this: …]5’-GT/AG-3’[…

Page 12: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

12BIO-REFRESHER

Exons and Introns

v Introns can interrupt the reading frame of a gene by inserting a sequence between two consecutive codons

v Between the first and second nucleotide of a codon

v Or between the second and third nucleotide of a codon

"Exon and Intron classes”. Licensed under Fair use via Wikipedia

Page 13: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Prediction&Annotation

Page 14: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

14GENE PREDICTION & ANNOTATION

PREDICTION & ANNOTATION

v Identificationandannotationofgenomefeatures:

• primarilyfocusesonprotein-codinggenes.• alsoidentifiesRNAs(tRNA,rRNA,longandsmallnon-coding

RNAs(ncRNA)),regulatorymotifs,repetitiveelements,etc.

• happensin2phases:1. Computationphase2. Annotationphase

Page 15: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

15GENE PREDICTION & ANNOTATION

COMPUTATION PHASE

a. Experimentaldataarealignedtothegenome:expressedsequencetags,RNA-sequencingreads,proteins(alsofromotherspecies).

a. Genepredictionsaregenerated:- ab initio:basedonnucleotidesequenceandcompositione.g.Augustus,GENSCAN,geneid,fgenesh,etc.

- evidence-driven:identifyingalsodomainsandmotifse.g.SGP2,JAMg,fgenesh++,etc.

Result:thesinglemostlikelycodingsequence,noUTRs,noisoforms.Yandell & Ence. Nature Rev 2012 doi:10.1038/nrg3174

Page 16: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

16GENE PREDICTION & ANNOTATION

ANNOTATION PHASE

Experimentaldata(evidence)and predictionsaresynthetizedintogeneannotations.

Result: genemodelsthatgenerallyincludeUTRs,isoforms,evidencetrails.

Yandell & Ence. Nature Rev 2012 doi:10.1038/nrg3174

5’UTR 3’UTR

Page 17: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

17

Insomecasesalgorithmsandmetricsusedtogenerateconsensussetsmayactuallyreducetheaccuracyofthegene’srepresentation.

CONSENSUS GENE SETS

Genemodelsmaybeorganizedintosetsusing:v combinersforautomaticintegrationofpredictedsets

e.g:GLEAN,EvidenceModeler

orv toolspackagedintopipelines

e.g:MAKER,PASA,Gnomon,Ensembl,etc.

GENE PREDICTION & ANNOTATION

Page 18: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

ANNOTATIONneeds some refinement

No one is perfect, least of all automated annotation. 18

Newtechnologiesbringnewchallenges:• Assembly errorscancausefragmented

annotations• Limited coveragemakesprecise

identificationadifficulttask

Page 19: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

MANUAL ANNOTATIONimproving predictions

Preciseelucidationofbiologicalfeaturesencodedinthegenomerequirescareful

examinationandreview.

Schiex etal.Nucleic Acids2003 (31)13:3738-3741

Automated Predictions

Experimental Evidence

Manual Annotation – to the rescue. 19

cDNAs,HMMdomainsearches,RNAseq,genesfromotherspecies.

Page 20: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

GENOME CURATIONan inherently collaborative task

GENE PREDICTION & ANNOTATION 20

Somanysequences,notenoughhands.

Apismellifera|AlexanderWild|www.alexanderwild.com

Page 21: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

We have provided continuous training and support for hundreds ofgeographically dispersed scientists to conduct manual annotationsefforts in order to recover coding sequences in agreement with allavailable biological evidence.

21

Collaboration is key!

APOLLO

• Collaborative work distills invaluable knowledge.

• A little training goes a long way! Wet lab scientists can easily learn to maximize the generation of accurate, biologically supported gene models.

Page 22: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Apollo

Page 23: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

APOLLO: versatile genome annotation editing• Apollo is a web-based genome annotation editor, integrated with JBrowse

• Supports real time collaboration & generates analysis-ready data

USER-CREATED ANNOTATIONS

EVIDENCE TRACKS

ANNOTATOR PANEL

Page 24: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

BECOMING ACQUAINTED WITH APOLLO

General process of curation

1. Selectorfinda regionofinterest,e.g.scaffold.

2. Selectappropriateevidencetrackstoreviewthegenemodel.

3. Determinewhetherafeatureinanexistingevidencetrackwillprovideareasonablegenemodeltostartworking.

4. Ifnecessary,adjust thegenemodel.

5. Checkyoureditedgenemodelforintegrityandaccuracy bycomparingitwithavailablehomologs.

6. Comment andfinish.

Page 25: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Apollo - version at i5K Workspace@NAL

4. Becoming Acquainted with Web Apollo.

25

TheSequenceSelectionWindow

Page 26: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Sort

Apollo - version at i5K Workspace@NAL

“OldTrackSelectPage”

4. Becoming Acquainted with Web Apollo.

26

Page 27: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

APOLLOannotation editing environment

BECOMING ACQUAINTED WITH APOLLO

ColorbyCDSframe,togglestrands,setcolorschemeandhighlights.

- Uploadevidencefiles(GFF3,BAM,BigWig),- combinationtrack- sequencesearchtrack

QuerythegenomeusingBLAT.

Navigationandzoom.

Searchforagenemodelorascaffold.

Getcoordinatesand“rubberband”selectionforzooming.

Login

User-createdannotations. New

annotatorpanel.

EvidenceTracks

Stageandcell-typespecifictranscriptiondata.

http://genomearchitect.org/web_apollo_user_guide

Page 28: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

28 | BECOMING ACQUAINTED WITH APOLLO

USER NAVIGATION

Annotatorpanel.

• Chooseappropriateevidencefromlistof“Tracks”onannotatorpanel.

• Select&dragelementsfromevidencetrackintothe‘User-createdAnnotations’area.

• Hoveringoverannotationinprogressbringsupaninformationpop-up.

• Creatinganewannotation

Page 29: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Adding a gene model

Page 30: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Adding a gene model

Page 31: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Adding a gene model

Page 32: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Editing functionality

Page 33: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Editing functionalityExample: Adding an exon supported by experimental data

• RNAseq reads show evidence in support of a transcribed product that was not predicted.• Add exon by dragging up one of the RNAseq reads.

Page 34: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Editing functionalityExample: Adjusting exon boundaries supported by experimental data

Page 35: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

CuratingwithApollo

Page 36: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

36 |

USER NAVIGATION

BECOMING ACQUAINTED WITH APOLLO

• ‘Zoomtobaselevel’ revealstheDNATrack.

Page 37: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

37 |

USER NAVIGATION

BECOMING ACQUAINTED WITH APOLLO

• ColorexonsbyCDSfromthe‘View’menu.

Page 38: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

38 |

Zoomin/outwithkeyboard:shift+arrowkeysup/down

USER NAVIGATION

BECOMING ACQUAINTED WITH APOLLO

• TogglereferenceDNAsequenceand translationframesinforwardstrand.Togglemodelsineitherdirection.

Page 39: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

annotatingsimplecases

Page 40: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

“Simplecase”:- thepredictedgenemodeliscorrectornearlycorrect,and- thismodelissupportedbyevidencethatcompletely ormostlyagreeswiththeprediction.- evidencethatextendsbeyondthepredictedmodelisassumedtobenon-codingsequence.

Thefollowingaresimplemodifications.

ANNOTATING SIMPLE CASES

BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES

Page 41: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

• A confirmation box will warn you if the receiving transcript is not on thesame strand as the feature where the new exon originated.

• Check ‘Start’ and ‘Stop’ signals after each edit.

ADDING EXONS

BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES

Page 42: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Iftranscriptalignmentdataareavailable&extendbeyondyouroriginalannotation,youmayextendoraddUTRs.

1. Rightclickattheexonedgeand‘Zoomtobaselevel’.

2. PlacethecursorovertheedgeoftheexonuntilitbecomesablackarrowthenclickanddragtheedgeoftheexontothenewcoordinatepositionthatincludestheUTR.

ADDING UTRs

ToaddanewsplicedUTRtoanexistingannotationalsofollowtheprocedureforaddinganexon.

BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES

Page 43: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

To modify an exon boundary and matchdata in the evidence tracks: selectboth the [offending] exon and thefeature with the expected boundary,then right click on the annotation toselect ‘Set 3’ end’ or ‘Set 5’ end’ asappropriate.

Insomecasesallthedatamaydisagreewiththeannotation,inothercasessomedatasupporttheannotationandsomeofthe

datasupportoneormorealternativetranscripts.Trytoannotateasmanyalternativetranscriptsasarewellsupportedbythedata.

MATCHING EXON BOUNDARY TO EVIDENCE

BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES

Page 44: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

1. Twoexonsfromdifferenttrackssharingthesamestart/endcoordinatesdisplayaredbartoindicatematchingedges.

2. Selectingthewholeannotationoroneexonatatime,usethis edge-matching functionandscrollalongthelengthoftheannotation,verifyingexonboundariesagainstavailabledata.Usesquare[]bracketstoscrollfromexontoexon.Usercurly{}bracketstoscrollfromannotationtoannotation.

3. CheckifcDNA/RNAseqreadslackoneormoreoftheannotatedexonsorincludeadditionalexons.

CHECKING EXON INTEGRITY

BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES

Page 45: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Non-canonicalsplicesitesflags. Doubleclick:selectionoffeatureandsub-features

EvidenceTracksArea

‘User-createdAnnotations’Track

Edge-matching

Apollo’seditinglogic(brain):§ selectslongestORFasCDS§ flagsnon-canonicalsplicesites

ORFs AND SPLICE SITES

BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES

Page 46: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Non-canonical splices are indicated byan orange circle with a whiteexclamation point inside, placed overthe edge of the offending exon.

Canonicalsplicesites:

3’-…exon]GA/TG[exon…-5’

5’-…exon]GT/AG[exon…-3’reversestrand,notreverse-complemented:

forwardstrand

SPLICE SITES

Zoom toreviewnon-canonicalsplicesitewarnings.Althoughthesemaynotalwayshavetobecorrected(e.g GCdonor),theyshouldbeflaggedwithacomment.

Exon/intronsplicesiteerrorwarning

Curatedmodel

BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES

Page 47: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Apollocalculatesthelongestpossibleopenreadingframe(ORF)thatincludescanonical‘Start’and‘Stop’signalswithinthepredictedexons.

If‘Start’appearstobeincorrect,modifyitbyselectinganin-frame‘Start’codonfurtherupordownstream,dependingonevidence(proteins,RNAseq).

Itmaybepresentoutsidethepredictedgenemodel,withinaregionsupportedbyanotherevidencetrack.

Inveryrarecases,theactual‘Start’ codonmaybenon-canonical(non-ATG).

‘Start’ AND ‘Stop’ SITES

BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES

Page 48: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

annotatingcomplexcases

Page 49: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Evidencemaysupportjoiningtwoormoredifferentgenemodels.Warning: proteinalignmentsmayhaveincorrectsplicesitesandlacknon-conservedregions!

1. In‘User-createdAnnotations’area shift-clicktoselectanintronfromeachgenemodelandrightclicktoselectthe‘Merge’ optionfromthemenu.

2. Dragsupportingevidencetracksoverthecandidatemodelstocorroborateoverlap,orreviewedgematchingandcoverageacrossmodels.

3. Checktheresultingtranslationbyqueryingaproteindatabase e.g.UniProt,NCBInr.Addcommentstorecordthatthisannotationistheresultofamerge.

Redlinesaroundexons:‘edge-matching’allowsannotatorstoconfirmwhethertheevidenceisinagreementwithoutexaminingeachexonatthebaselevel.

COMPLEX CASESmerge two gene predictions on the same scaffold

BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES

Page 50: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Oneormoresplitsmayberecommendedwhen:- differentsegmentsofthepredictedproteinaligntotwoormoredifferentgenefamilies- predictedproteindoesn’taligntoknownproteinsoveritsentirelength- Transcriptdatamaysupportasplit,butfirstverifywhethertheyarealternativetranscripts.

COMPLEX CASESsplit a gene prediction

BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES

Page 51: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

DNATrack

‘User-createdAnnotations’Track

COMPLEX CASESannotate frameshifts and correct single-base errors

Alwaysremember:whenannotatinggenemodelsusingApollo,youarelookingata‘frozen’versionofthegenomeassemblyandyouwillnotbeabletomodifytheassemblyitself.

BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES

Page 52: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

COMPLEX CASEScorrecting selenocysteine containing proteins

BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES

Page 53: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

COMPLEX CASEScorrecting selenocysteine containing proteins

BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES

Page 54: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

1. Apolloallowsannotatorstomakesinglebasemodificationsorframeshifts thatarereflectedinthesequenceandstructureofanytranscriptsoverlappingthemodification.ThesemanipulationsdoNOTchangetheunderlyinggenomicsequence.

2. Ifyoudeterminethatyouneedtomakeoneofthesechanges,zoomintothenucleotidelevelandrightclickoverasinglenucleotideonthegenomicsequencetoaccessamenuthatprovidesoptionsforcreatinginsertions,deletionsorsubstitutions.

3. The‘CreateGenomicInsertion’featurewillrequireyoutoenterthenecessarystringofnucleotideresiduesthatwillbeinsertedtotherightofthecursor’scurrentlocation.The‘CreateGenomicDeletion’ optionwillrequireyoutoenterthelengthofthedeletion,startingwiththenucleotidewherethecursorispositioned.The‘CreateGenomicSubstitution’featureasksforthestringofnucleotideresiduesthatwillreplacetheonesontheDNAtrack.

4. Onceyouhaveenteredthemodifications,Apollowillrecalculatethecorrectedtranscriptandproteinsequences,whichwillappearwhenyouusetheright-clickmenu‘GetSequence’option.Sincetheunderlyinggenomicsequenceisreflectedinallannotationsthatincludethemodifiedregionyoushouldalertthecuratorsofyourorganismsdatabaseusingthe‘Comments’sectiontoreporttheCDSedits.

5. Inspecialcasessuchasselenocysteine containingproteins(read-throughs),right-clickovertheoffending/premature‘Stop’signalandchoosethe‘Setreadthrough stopcodon’optionfromthemenu.

COMPLEX CASESannotating frameshifts and correcting single-base errors & selenocysteines

BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES

Page 55: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

55 |

USER NAVIGATION

BECOMING ACQUAINTED WITH APOLLO

• Information Editor

Page 56: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

TheAnnotationInformationEditorUSER NAVIGATION

BECOMING ACQUAINTED WITH APOLLO

Page 57: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

TheAnnotationInformationEditor

• AddPubMedIDs• IncludeGO termsasappropriate

fromanyofthethreeontologies• Writecomments statinghowyou

havevalidatedeachmodel.

USER NAVIGATION

BECOMING ACQUAINTED WITH APOLLO

Page 58: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

58 |

USER NAVIGATION

BECOMING ACQUAINTED WITH APOLLO

• Keeping track of each edit

Page 59: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Annotations,annotationedits,andHistory: storedinacentralizeddatabase.

USER NAVIGATION

BECOMING ACQUAINTED WITH APOLLO

Page 60: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Followthechecklistuntilyouarehappywiththeannotation!

Andrememberto…– commenttovalidateyourannotation,evenifyoumadenochangestoanexistingmodel.Thinkofcommentsasyourvoteofconfidence.

– oraddacommenttoinformthecommunityofunresolvedissuesyouthinkthismodelmayhave.

60 |

AlwaysRemember:Apollocurationisacommunityeffortsopleaseusecommentstocommunicatethereasonsforyour

annotation.Yourcommentswillbevisibletoeveryone.

COMPLETING THE ANNOTATION

BECOMING ACQUAINTED WITH APOLLO

Page 61: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Checklist

Page 62: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

• Check‘Start’ and‘Stop’sites.

• Checksplicesites:mostsplicesitesdisplaytheseresidues…]5’-GT/AG-3’[…

• CheckifyoucanannotateUTRs,forexampleusingRNA-Seq data:– alignitagainstrelevantgenes/genefamily– blastp againstNCBI’sRefSeq ornr

• Checkforgaps inthegenome.

• Additionalfunctionalitymaybenecessary:–merging 2genepredictions- samescaffold– ‘merging’ 2genepredictions- differentscaffolds

– splitting ageneprediction– annotating frameshifts– annotatingselenocysteines,correctingsingle-baseandotherassemblyerrors,etc.

62 |

• Add:– Importantprojectinformationintheformof

comments– IDsfrompublicdatabasese.g.GenBank (via

DBXRef),genesymbol(s),commonname(s),synonyms,topBLASThits,orthologswithspeciesnames,andeverythingelseyoucanthinkof,becauseyouaretheexpert.

– Commentsaboutthekindsofchangesyoumadetothegenemodelofinterest,ifany.

– Anyappropriatefunctionalassignments,e.g.viaBLAST,RNA-Seq data,literaturesearches,etc.

CHECKLISTfor accuracy and integrity

MANUAL ANNOTATION CHECKLIST

Page 63: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Genomecurationwithi5k

Page 64: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

64i5K Workspace@NAL

The collaborative curation process at i5k

1. Acomputationallypredictedconsensusgenesethasbeengeneratedusingmultiplelinesofevidence;e.g.HVIT_v0.5.3-Models

1. i5KProjectswillintegrateconsensuscomputationalpredictionswithmanualannotationstoproduceanupdatedOfficialGeneSet(OGS):Warning!• Ifit’snotoneithertrack,itwon’tmaketheOGS!• Ifit’sthereanditshouldn’t,itwillstillmaketheOGS!

Page 65: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

The ‘Replace Models’ rules

BECOMING ACQUAINTED WITH APOLLO http://tinyurl.com/apollo-i5k-replace

Page 66: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

66i5K Workspace@NAL

3. Insomecasesalgorithmsandmetricsusedtogenerateconsensussetsmayactuallyreducetheaccuracyofthegene’srepresentation.Useyourjudgment,trychoosingadifferentmodeltobegintheannotation.

4. Isoforms:dragoriginalandalternativelysplicedformto‘User-createdAnnotations’area.

5. Ifanannotationneedstoberemovedfromtheconsensusset,dragittothe‘User-createdAnnotations’areaandlabelas‘Delete’ontheInformationEditor.

6. Overlappinginterests?Collaboratetoreachagreement.

7. Followguidelinesfori5KPilotSpeciesProjects,athttp://goo.gl/LRu1VY

The collaborative curation process at i5k

Page 67: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Example

Page 68: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

What’s new?... finding inspiration in PubMed.

Example 68

“Molecular analysis of bed bug populations from across the USA and Europe found that >80% and >95% of the respective populations contained V419L and/or L925I mutations in the voltage-gated sodium channel gene, indicating widespread distribution of target-site-based pyrethroid resistance.”

Homalodisca vitripennis | Alexander Wild | www.alexanderwild.comHalyomorpha halys | Fondazione Edmund Mach - Italy

Now for our species of interest. . .

Page 69: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Example

Example 69

CurationexampleusingtheHyalella aztecagenome(amphipodcrustacean).

Page 70: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

What do we know about this genome?

• CurrentlypubliclyavailabledataatNCBI:• >37,000 nucleotideseqsà scaffolds,mitochondrialgenes• 344 aminoacidseqsàmitochondrion• 47 ESTs• 0 conserveddomainsidentified• 0 “gene”entriessubmitted

• Dataati5KWorkspace@NAL(annotationhostedatUSDA)- 10,832scaffolds:23,288transcripts:12,906proteins

Example 70

Page 71: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

PubMed Search: what’s new?

Example 71

Page 72: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

PubMed Search: what’s new?

Example 72

“Tenpopulationsdifferedbyatleast550-foldinsensitivity topyrethroids.”

“Sequencingtheprimarypyrethroid targetsite,thevoltage-gatedsodiumchannel(vgsc),showsthatpointmutationsandtheirspreadinnaturalpopulationswereresponsiblefordifferencesinpyrethroid sensitivity.”

“Thefindingthatanon-targetaquaticspecieshasacquiredresistancetopesticidesusedonlyonterrestrialpestsistroublingevidenceoftheimpactofchronicpesticidetransportfromland-basedapplicationsintoaquaticsystems.”

Page 73: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

How many sequences are there, publicly available, for our gene of interest?

Example 73

• Para,(voltage-gatedsodiumchannelalphasubunit;Nasonia vitripennis).

• NaCP60E (Sodiumchannelprotein60E;D.melanogaster).– MF:voltage-gatedcation channelactivity(IDA,GO:0022843).

– BP:olfactorybehavior(IMP,GO:0042048),sodiumiontransmembrane transport(ISS,GO:0035725).

– CC:voltage-gatedsodiumchannelcomplex(IEA,GO:0001518).

Andwhatdoweknowaboutthem?

Page 74: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Retrieving sequences for a sequence similarity search.

Example 74

>vgsc-Segment3-DomainIIRVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDGQMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR

Page 75: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

BLAT searchinput

Example 75

>vgsc-Segment3-DomainIIRVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDGQMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR

Page 76: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

BLAT searchresults

Example 76

• High-scoringsegmentpairs(hsp)arelistedintabulatedformat.

• Clickingononelineofresultssendsyoutothosecoordinates.

Page 77: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

BLAST at i5K https://i5k.nal.usda.gov/blast

Example 77

>vgsc-Segment3-DomainIIRVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDGQMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR

Page 78: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

BLAST at i5K https://i5k.nal.usda.gov/blast

Example 78

Page 79: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

BLAST at i5K: hsps in“BLAST+Results”track

Example 79

Page 80: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Creating a new gene model: drag and drop

Example 80

• ApolloautomaticallycalculateslongestORF.

• Inthiscase,ORFincludesthehigh-scoringsegmentpairs(hsp),markedhereinblue.

• Notethatgeneistranscribedfromreversestrand.

Page 81: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Available Tracks

Example 81

Page 82: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Get Sequence

Example 82

http://blast.ncbi.nlm.nih.gov/Blast.cgi

Page 83: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Also, flanking sequences (other gene models) vs. NCBI nr

Example 83

Inthiscase,twogenemodelsupstream,at5’end.

BLASThsps

Page 84: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Review alignments

Example 84

HaztTmpM006234

HaztTmpM006233

HaztTmpM006232

Page 85: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Hypothesis for vgsc gene model

Example 85

Page 86: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Editing: merge the three models

Example 86

Mergebydroppinganexonorgenemodelontoanother.

Mergebyselectingtwoexons(holdingdown“Shift”)andusingtherightclickmenu.

or…

Page 87: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Result of merging the gene models:

Example 87

Page 88: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Editing: correct offending splice site

Example 88

Modifyexon/intronboundary:- Dragtheendofthe

exontothenearestcanonicalsplicesite.

or

- Useright-clickmenu.

Page 89: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Editing: set translation start

Example 89

Page 90: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Editing: delete exon not supported by evidence

Example 90

DeletefirstexonfromHaztTmpM006233

Page 91: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Editing: add an exon supported by RNAseq

Example 91

• RNAseqreadsshowevidenceinsupportoftranscribedproduct,whichwasnotpredicted.• Addexonatcoordinates97946-98012bydragginguponeoftheRNAseqreads.

Page 92: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Editing: adjust offending splice site using evidence

Example 92

Page 93: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Editing: adjust other boundaries supported by evidence

Example 93

Page 94: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Finished model

Example 94

Corroborateintegrityandaccuracyofthemodel:- Start andStop- Exonstructureandsplicesites…]5’-GT/AG-3’[…- Checkthepredictedproteinproductvs.NCBInr,UniProt,etc.

Page 95: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Information Editor

• DBXRefs:e.g.NP_001128389.1,N.vitripennis,RefSeq

• PubMedidentifier:PMID:24065824

• GeneOntologyIDs:GO:0022843,GO:0042048,GO:0035725,GO:0001518.

• Comments

• Name,Symbol

• Approve/Deleteradiobutton

Example 95

Comments(ifapplicable)

Page 96: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Goplay!

Page 97: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

PUBLIC DEMO97 |

APOLLO ON THE WEBinstructions

Ati5K1. RegisterforaccesstoApolloatthei5KWorkspace@NALat

https://i5k.nal.usda.gov/web-apollo-registration

2. Contactthecoordinatorforeachspeciescommunitytoreceivemoreinformationabouthowtocontribute.Contactinfoisavailableoneachorganism’spage.

Page 98: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

PUBLIC DEMO98 |

APOLLO ON THE WEBinstructions

PublicHoneybeedemoavailableat:

http://GenomeArchitect.org/WebApolloDemo

Username:[email protected]

Password:demo

Page 99: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

APOLLOdemonstration

PUBLIC DEMO 99

Demonstrationvideoisavailableathttps://youtu.be/VgPtAP_fvxY

Page 100: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

OUTLINE

100OUTLINE

• BIO-REFRESHERbiologicalconceptsforcuration

• ANNOTATIONautomaticpredictions

• MANUALANNOTATIONnecessary,collaborative

• APOLLOadvancingcollaborativecuration

• EXAMPLEdemos

Page 101: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Apollo Development

Nathan DunnTechnical Lead Eric Yao

Christine Elsik’s Lab, University of Missouri

Suzi LewisPrincipal Investigator

BBOP

Moni Munoz-TorresProject Manager

Deepak Unni

JBrowse. Ian Holmes’ Lab University of California, Berkeley

Page 102: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

• Berkeley Bioinformatics Open-source Projects (BBOP), Berkeley Lab: Apollo and Gene Ontology teams. Suzanna E. Lewis (PI).

• § Christine G. Elsik (PI). University of Missouri. • * Ian Holmes (PI). University of California Berkeley.• Arthropod genomics community & i5K Steering

Committee.• Stephen Ficklin, GenSAS, Washington State University• Apollo is supported by NIH grants 5R01GM080203

from NIGMS, and 5R01HG004483 from NHGRI. Also supported by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231

• For your attention, thank you!

ApolloNathan DunnDeepak Unni §

Gene Ontology

Chris Mungall

Seth Carbon

Heiko Dietze

BBOP

Learn more about Apollo at http://GenomeArchitect.org

Thank you!

NAL at USDA

Monica Poelchau

Mei-Ju Chen

Christopher ChildersGary Moore

HGSC at BCM

fringy Richards

Kim Worley

JBrowse Eric Yao *

Page 103: Introduction to Apollo - i5k Research Community – Calanoida (copepod)
Page 104: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Interface Updates

Annotator Panel

Page 105: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Interface Updates

gene

mRNA

Page 106: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Update: Transforming coordinatesBringing exons closer together to facilitate annotation of gene models with long introns.

1,275bp

Concept for Apollo v2.1 – Northern Spring 2016

Page 107: Introduction to Apollo - i5k Research Community – Calanoida (copepod)

Transforming coordinatesAssembly artifacts may cause gene models to be splitacross two or more scaffolds. To facilitate annotation,Apollo allows the generation of an artificial space wherethe annotation can be completed.

Scaffold 2Scaffold 1

Genome Assembly

. . . . . .

Scaffold n