Accurate Assembly of Maize BACs Patrick S. Schnable Srinivas Aluru Iowa State University

Accurate Assembly of Maize BACs

Patrick S. SchnableSrinivas Aluru

Iowa State University

Motivation

• Maize genome is more complex than previously sequenced genomes– Many high-copy, long, highly conserved repeats– Genome contains many NIPs (Nearly Identical

Paralogs, low-copy genes that are expressed and >98% identical; Emrich et al., 2007) (= CNPs and CNV)

• Hence, assembling this genome presents new challenges

• Are existing assembly programs up to the task?

Evidence of Assembly Errors

• Wash U noticed examples of collapse of repeats

• ISU identified examples of NIP collapse

SNP: single nucleotide polymorphism between alleles of a single geneParamorphism (PM): a single nucleotide substitution between paralogs Nearly Identical Paralogs (NIPs): paralogous sequences with >99% identity

Paramorphisms Provide Evidence of NIPs

Frequency of NIPs

• Conservatively ~1% of maize genes have NIPs (Emrich et al., 2007)

• Inspection of assembled BACs reveals NIP clusters

• But in addition also detect examples of “NIP collapse”

• CNPs/CNV associated with adaptive evolution in humans (Perry et al., Nat. Genetics, 2007)

BAC Assembly, Example 1

• MAGI3.1 ID: MAGI_18749 (Emrich et al., 2007)

• BAC ID: CH201-140C17

Paramorphic Sites: C/T (1,175), C/T (1,293), C/T (1,359)

CH201-140C17: gi|146322123|gb|AC203431.1 (152,054 bp)

GenBank

56,572 55,984589 bp

BAC Assembly Example 1 - Site #1BAC ID: CH201-140C17GI: 146322123GB: AC203431.1152,054 bp

MAGI_18749

Paramorphic Site #1:C/T (1,175)

2 C vs 2 T

“Consensus Base”

Paramorphic Site #1

2/7 assembled BACs known to contain NIPs exhibitevidence of NIP collapse (conservative)

Traditional Assembly• Sequence alignments between

reads are identified

• Construct contigs– Start at a good alignment – Extend ends of contig one

sequence at a time

• Clone pair information is used to scaffold contigs after contig construction.

Our Approach• Integrate clone pair data into contig assembly process

• Model sequence alignments & clone pairs as a graph.First, construct an alignment graph

Sequence reads are nodesA black edge is drawn between a pair of nodes if there is a valid sequence alignment

Clone Pair Informed AssemblySecond, introduce two add’l types of edges into the graph

Clone pair edges (red)

Path edges (green)A path edge exists between two nodes if: • they are close together in the graph • AND their clone pairs are also close together

Identifies assembly-relevant sequence alignments

Repeat Example

Our Approach• Series of graph transformations to ensure black edges (sequence

alignments) represent correct genomic overlaps, and resolve entries into and exits out of repeats.– Use clone pairs to validate alignments in repeat regions if the

corresponding mate pairs are anchored to unique regions and exhibit alignment.

– Use paramorphisms to break spurious alignments due to NIPs.– Use clone pairs to match entries into and exits out of repeats.– Use clone pairs and validated alignments to guide contigs.– Use graph min-cuts to find correct assignment of reads to the

complementary strands.– Use graph reductions and visualization for further analysis.

Example: Use Paramorphisms to Break Spurious Alignments

GTCT A CAGGTCT A CAGGTCT A CAG

GTCT C CAGGTCT C CAGGTCT C CAGGTCT C CAG

GTCT A CAGGTCT A CAGGTCT A CAG

GTCT C CAGGTCT C CAGGTCT C CAGGTCT C CAG

Three Random “Stage 3” BACs

• Shotgun sequences extracted from Genbank and trimmed

Name Reads Post Trim Corrupt Quality Info

273D22 1402 1352 5

306N19 1396 1310 1

396H10 1391 1337 33

273D22• Annotate paths via

walking through the graph.

• Make use of three levels of pointers:– Black edges: show

what steps are available

– Green edges: indicate the best path

– Red edges: indicate our final destination

273D22: Incorrect Contiging

Contig 0

Contig 1 is a small contig inthe finished BAC that containssequences that shouldbe attached to the end of Contig 0.

273D22: Missing Scaffold

306N19: Mis-assembly

Contig 3

Contig 5

Contig 0

306N19: Complex Repeat

D396H10: Missed Scaffolding

Contig 8

Contig 5

D396H10: Missed Scaffolding

Contig 7

Contig 2

Contig 3

Identifying Assembly Errors???

273D22: Weak Link not Corroborated by Clone Pairs

Contig 3

Conclusions & Future Directions• Discovered misassembled regions in all three randomly chosen BACs

– Conclusions supported by multiple lines evidence (clone pair + overlap)– Mis-assemblies (e.g., repeat-induced “knots”; collapsed repeats & NIPs) and missed

scaffolding

• Benefits of our approach– Can provide better assemblies

• Can navigate through repeats• Can correctly assemble NIPs

– With development could output contigs and perform scaffolding in one step– Could provide refined finishing advice– Could include a community-accessible visualization of assembled BAC contigs and

supporting data (confidence levels)

• Longer term– Our assembly approach could be applied to whole genome assembly of maize and other

complex genomes– Could incorporate paired next generation sequencing data (e.g. 454, Solexa, Solid)

• Needed research– Random collection of finished BACs (“truth”)– Develop algorithms for navigating paths through the graph– Accurately construct final contigs that contain multiple copies of repeats– Create BAC re-assembly pipeline (inform finishing efforts in future sequencing projects)– Scale approach to whole genome level

Accurate Assembly of Maize BACs Patrick S. Schnable Srinivas Aluru Iowa State University

Documents

Computational Methods for MEMS - mcc.uiuc.edu Aluru/Aluru_files/Aluru... · Computational MEMS/NEMS Beckman Institute University of Illinois at Urbana-Champaign Scaling Laws ... ·By

Les Bacs Professionnels dans l'académie de la Guyane Les Bacs

WEEK.pdf · Noida Chennai Pune Coimbatore Nitte Chennai Chennai Mumbai Kochi Ghaziabad H derabad Ben aluru H derabad Patiala pune Chennai Ben aluru H derabad H derabad Ben aluru Chennai

Instituciones Financieras - BACS

BACS Bulletin 2011

· PDF file335.82 137.28 122.00 575.40 41.54 3914.08 ... no 705310 no 705311 no 705312 bacs bacs bacs bacs ... 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105

BACS - Battery Analysis & Care System 3 Generation Battery ... · BACS is a holistic battery management system and global market leader… GENEREX’s 3rd generation BACS® (Battery

Gunasekhar Aluru B - Digital Library/67531/metadc849770/m2/1/high...Gunasekhar Aluru B.E ... Schematic diagram of a JK Flip-Flop. 52 ... EDA tools are the only e ective way for students

BACS Presentation 7

ACCOUNT SWITCHING SERVICE - Bacs

Bacs Image'IN

Bacs presentation

BACS Informantion

Les bacs pro session 2008 Constats et Analyse. Les bacs Pro 57 Bacs Pro en 2008 41 dans le secteur Production 16 dans le secteur des services 12 bacs

Bacs Annual Review 2012

BACS Physique-Chimie 2013

Bacs Pro PSE

BACS 371 Computer Forensics

Cloning and Characterization of CER2, an ... - Schnable Lab

AccessPay Bacs Services