View
221
Download
0
Embed Size (px)
Citation preview
International Tomato Finishing Workshop
Wellcome Trust Sanger InstituteApril 2007
Wellcome Trust Medical Photographic Library
Overview
WTSIFinishingPipeline
WTSIFinishingStrategy
Tomato GenomeFinishingStandards
Document on SGN
Contiguous Finished Sequence to HTGS Phase 3
All bases above phred 30
Base Error rate <1:10,000
Discussion on Day 2
WTSI Finishing Pipeline
ShotgunSequencing
Auto-Prefinishing
ManualFinishing
QCFinal EMBLSubmission
HTGS 3
Established Clone Pipeline
Finishing Software
Finishing Software
Main software tools used in the WTSI finishing process:
Sequence data viewed in Gap4 databases (Staden)(assemblies created using phrap)
Read pair viewer – Orchid (Flowers)
Restriction Digest Viewer - Confirm (Attwood - WTSI)
Sequence plot viewer – Dotter (Sonnhammer)
(URLs on Handout)
Used throughout the finishing process and for final confirmation of assembly (and QC)
WTSI Finishing Strategy
BAC Confirmation
Identify Region to be Finished
Contig Order and Orientation
Assessment of Gap Sizes and Type
Improvement of Low Quality Sequence
Confirmation of Contiguous Assembly
Selection of
Finishing Reactions
Finishing Strategy – Getting Started
BAC confirmation
Checking for overlapping BACs
Identifying Region to be Finished
Confirming BAC Clone Ends
Prevents overlaps being finished twice
Confirmation of clone placement on map
Resources available for BAC confirmation
Identifying region to be finished in your BAC:Whether overlapping BACs are available
Overlapping BAC availableNo overlaps available
Confirm BAC Ends (BES)
Confirm status of overlap(shotgun, in finishing, finished)
Confirm region to be finished (who is finishing overlap)
Finish 2Kb into overlap if already finished or being finished by
someone else
Confirm ends of your BAC and overlapping BAC
(BES or sequence if available)
Finish whole BAC
insert
If overlap is not being finished by someone
else
Resources available for BAC confirmation
SOL Genomics Network (SGN)
• BES
• Marker verification
• Blast– repeats, unigenes, ESTs, markers, overlaps
Aligning BES to BAC sequence data
BACs are clipped to cloning vector cutsite, dependant on the library used:
SL_Mbol Library = GATCLE_HBa Library = AAGCTT
Sequence Resources Searching for Finished Overlaps
Match to self (bTH198L24)
Match to left overlap(bTH119A16)
Match to right overlap(bTH27G19)
BLAST
Finishing 2Kb into existing finished overlapping BACs
Overlapping sequence ends at 48435 Finished region will begin at 46435 to give a 2Kb overlap
Making use of available overlaps
Finished consensus spans 2 gaps reducing number of contigs to finish
Sequence VerificationSearching for Expected Markers
Marker size would be 1284bp
Matches predicted product sizefor S.lycopersicum on SGN
Finishing Strategy
BAC Confirmation
Identify Region to be Finished
Contig Order and Orientation
Use read pair information to order and orientate contigsPlasmid inserts typically 4-6KbBoth strands sequenced to give read pairs
Contig Order and Orientation
Assembled SequenceContigs
Forward Strand Reverse Strand
LeftCloneEnd
RightCloneEnd
Sequence Gap
Good read pair link across gap
Double StrandedSequencingVector(pUC)
Inserted sequence (BAC)
Forward and ReverseRead pairs 4-6Kb apart in assembly
Look at read pair information across gaps to order contigs
Shotgun Sequencing
Contig Order and Orientation
LeftCloneEnd
RightCloneEnd
Assembled sequence contigs
Sequence Gap
LeftCloneEnd
RightCloneEnd
Using Read Pair information to find Assembly Problems
Finishing Strategy
BAC Confirmation
Identify Region to be Finished
Contig Order and Orientation
Assessment of Gap Sizes and Type
Use all available informationType of gap and size determines finishing approach
Assess Sequence Restriction Digest Data
Restriction Digest Data
• Used in confirmation of finished contiguous assembly
• Also used throughout finishing process
• Sizing of gaps within BACs– use appropriate finishing strategy
• Identifying assembly problems– caused by repeats
• Sizing of repeats– confirming size of assembly of tandem repeats– sizing force joins made in repeats for tagging
purposes
Restriction Digests
• Minimum of three restriction enzymes used to confirm the assembly
• Selection depends on organism and the nature of the sequence
• S. lycopersicum BACs are digested with
– BamHI– EcoRI– HindIII
• Comparison of real and virtual digest of entire BAC sequence
Gap4 - Restriction Digest Viewer
Compare fragment lengths from
virtual digest in gap4 to actual
fragment sizes on the gel produced
in the lab
Using Restriction Digest Datato check for Assembly Problems
• Identifying assembly problems from digests– mis-assemblies caused by repeats
– direct repeats– Inverted repeats
• All digests showing similar amount of missing data or extra data at a particular position
– Possible repeat with incorrect copy number represented
• Certain digests show too much data, others have missing cutsites or data missing
– Possible inverted repeat in wrong orientation– Possible E.coli transposon insertion
Assessment of Sequence - Dotter
Sequence plot of BAC used throughout finishing process
– Check for repeats sequences at gaps– Highlight any potential areas of mis-assembly
Also used to confirm sequence overlaps
– Confirm unique sequence– Not false repeat matches
Used as final assembly check
– Repeats– Cross reference sizes with restriction digests
WTSI Finishing Strategy
BAC Confirmation
Identify Region to be Finished
Contig Order and Orientation
Assessment of Gap Sizes and Type
Improvement of Low Quality Sequence
Selection of
Finishing Reactions
Options for Gap Closure and Improving Sequence Quality
Primer walking on subclones across region or gap
Resequencing of subclones across region if appropriate read length, using alternative chemistries if possible
Sequence any unpaired reads which may fall in low quality region or in gap
Direct clone walks
Depending on length of region or gap and associated sequence (repeat, structural problems)
Manual Editing
PCR SIL or TIL
Comment Tag for EMBL submission
Gap Closure in BACs – Gap Types
Spanned Gap Un-spanned Gap
Re-sequencing (read pairs)Oligo walks
Restriction Fragment Library
Direct clone walksPCR
Small Insert Libraries,Transposon Libraries
AlternativeLibrary Sizes
Repeats
Primer Walking into Spanned Gaps
Primer 1 Primer 2
Assembled SequenceContigs
LeftCloneEnd
RightCloneEnd
Sequence Gap
Good read pair link across gap
Forward and ReverseRead pairs 4-6Kb apart in assembly
Look at read pair information across gaps to order contigs
Good read pair linksacross gap
Primer extended template
Original shotguntemplates
Assembled SequenceContigs
Gap Closed
Small Insert Library (SIL)
Spanning Shotgun Template4-6Kb insert
SIL templatesaverage300-500bpinsert
Assembled SequenceContigs
Spanning subclone is shattered into smaller fragments to create a SIL. Smaller insert sizes can break up structural problems.
Transposon Insertion Library (TIL)Double StrandedSequencingVector(pUC)
Inserted sequence (BAC)
Normal sequencing from either end of insert Read pairs ~4-6Kb apart
Transposon Insertion Library (TIL)Double StrandedSequencingVector(pUC)
Inserted sequence (BAC)
Sequence outwards from transposon insertion site
Transposon randomly inserts across entire plasmid
Normal sequencing from either end of insert Read pairs ~4-6Kb apart
Transposon Insertion Library (TIL)Double StrandedSequencingVector(pUC)
Inserted sequence (BAC)
Transposon randomly inserts across entire plasmid
TIL Read pairs overlapby 9bp duplication site
Sequence outwards from transposon insertion site
Unspanned Gaps and gaps unresolved by walking on spanning subclones
Resequence any unpaired reads that face into gapPartner may fall in gap, reducing gap size or may fall within other contig and span the gap.
Assembled Sequence Contigs
Primer Sequenceneeds to be unique
Unspanned Gaps and gaps unresolved by walking on spanning subclones
No unpaired reads.Design oligo primers from each contig end to read into gap. Use for walking directly on BAC (clone/stock) DNA and PCRTry to find unique sequence within BAC for oligo selection
Primer 1 Primer 2
Sequence search facility in Gap4
Assembled Sequence Contigs
Direct Clone Walks
Depending on gap size (from restriction digest data) the direct clone walks may close the gap.Alternatively they may extend into the gap allowing further primers to be designed on the newly recovered sequence
Primer 1 Primer 2
Primer 3 Primer 4
Assembled Sequence Contigs
PCR
The same principle applies to PCR.Design unique primers from each contig end to obtain a product that can be sequenced and extended with further primer walking.If confirmed to span the gap a PCR product may be shattered into a SIL but may skip out repetitive sequence.
Primer 1 Primer 2
Assembled Sequence Contigs
SIL from Restriction Fragmet
Alternatively a restriction fragment known to contain the missing data can be isolated from the digest gel and be made into a SIL. The fragment of interest must be distinct from other fragments on the gel and be a suitable size.
Assembled Sequence Contigs
Cutsite1
Cutsite2
Shatter thisFragment of Digested BAC DNA
Sequence gap known to be within this fragment
Gaps and Assembly ProblemsCaused by Repeats
Varying complexity of repeats depending on:
RepeatUnit
Size ofUnit
CopyNumber
Direct orInvertedCopies
How Conserved?
Importance of visualising repeat sequence to assess repeat type
Lower Complexitye.g. Di-nucleotide Runs
Higher Complexitye.g. LTRs
Alter phrap parameters for more stringent assemblyAlternative library sizes if necessaryDiscussion point for Tuesday
Improving Sequence Quality - Summary
Primer walking on subclones across region
Resequencing of subclones across region if appropriate read length, using alternative chemistries if possible
Sequence any unpaired reads which may fall in region
Direct clone walks
Depending on length of poor quality region and associated sequence (repeat, structural problems)
Manual Editing
PCR SIL or TIL
Comment Tag for EMBL Submission
WTSI Finishing Strategy
BAC Confirmation
Identify Region to be Finished
Contig Order and Orientation
Assessment of Gap Sizes and Type
Improvement of Low Quality Sequence
Confirmation of Contiguous Assembly
Selection of
Finishing Reactions
Confirmation of contiguous sequence
ContiguousSequenceGenerated
No QualityIssuesRemain
All Assembly checks completed
Identify any regions to be tagged
Restriction Digests
Read pair coverage
Dotplot
QC checkFinal Submission of Finished Sequence to
EMBL as HTGS Phase 3
Tomato Genome Finishing Strategy
General overview of WTSI finishing approach
Real time finishing examples on Tuesday afternoon if anyone wants to look at something specific
Main Discussion point for Tuesday:
Updated Finishing Standards Document on SGN
http://www.sgn.cornell.edu/solanaceae-project/sol-bioinformatics/
Printed copies available
AcknowledgementsWellcome Trust Sanger Institute:
Jane Rogers
Mapping Core GroupSean HumphrayChristine Nicholson
SubcloningMatt Jones and Team 53
Shotgun SequencingKaren Oliver and Team 41Sarah Sims
Auto-Prefinishing, Finishing and QC Karen McLaren and Team 46 Stuart McLaren and Team 58Christine Lloyd and Team 57
Cornell University: Lukas MuellerJim GiovannoniSteve TanksleyYimin Xu
MIPS/IBI Institute for Bioinformatics:Klaus MayerRemy Bruggmann
Imperial College London:Gerard BishopDaniel BuchanJames AbbottSarah Butcher
University of Nottingham:Graham Seymour
Scottish Crop Research Institute:Glenn Bryan
FUNDING