PD: Ion M ă ndoiu , UConn Co-PDs: Mazhar Khan, UConn Rachel O’Neill, UConn

Preview:

DESCRIPTION

Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization. PD: Ion M ă ndoiu , UConn Co-PDs: Mazhar Khan, UConn Rachel O’Neill, UConn Alex Zelikovsky , GSU . Outline. Background & aims of the project - PowerPoint PPT Presentation

Citation preview

Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and

vaccine optimization

PD: Ion Măndoiu, UConnCo-PDs: Mazhar Khan, UConn

Rachel O’Neill, UConnAlex Zelikovsky, GSU

Outline• Background & aims of the project• Bioinformatics tools for quasispecies spectrum

reconstruction from NGS reads• Experimental validation on IBV data• Summary and ongoing work

Infectious Bronchitis Virus (IBV)• Group 3 coronavirus• Biggest single cause of

economic loss in US poultry farms−Young chickens: coughing, tracheal

rales, dyspnea−Broiler chickens: reduced growth rate−Layers: egg production drops 5-50%,

thin-shelled, watery albumin• Worldwide distribution, with

dozens of serotypes in circulation‒ Co-infection with multiple serotypes is

not uncommon, creating conditions for recombination IBV-infected

embryonormalembryo

IBV-infected egg defects

IBV Vaccination Broadly used, most commonly with attenuated live vaccine• Short lived protection• Layers need to be re-vaccinated multiple times

during their lifespan• Vaccines might undergo selection in vivo and

regain virulence [Hilt, Jackwood, and McKinley 2008]

RNA Virus ReplicationHigh mutation rate (~10-4)

Lauring & Andino, PLoS Pathogens 2011

Quasispecies identified by cloning and Sanger sequencing in both IBV infected poultry and commercial vaccines [Jackwood, Hilt, and Callison 2003; Hilt, Jackwood, and McKinley 2008]

Evolution of IBV

How Are Quasispecies Contributing to Virus Persistence and Evolution?

• Variants differ in– Virulence– Ability to escape immune response– Resistance to antiviral therapies– Tissue tropism

Lauring & Andino, PLoS Pathogens 2011

Project Aims• Develop bioinformatics tools for accurate

reconstruction of quasispecies sequences and their frequencies from next-generation reads

• Study quasispecies persistence and evolution of IBV in commercial layer flocks following vaccination

• Use results of this study to optimize vaccine development and vaccination protocols

Outline• Background & aims of the project• Bioinformatics tools for quasispecies spectrum

reconstruction from NGS reads• Experimental validation on IBV data• Summary and ongoing work

Next Generation Sequencing

10

http://www.economist.com/node/16349358

Roche/454 FLX Titanium400-600 million reads/run

Length up to 1,000 bp

Illumina HiSeq 2000up to 6 billion PE reads/run

35-100bp read length

SOLiD 4/55001.4-2.4 billion PE reads/run

35-50bp read length

Ion Torrent PGM1-10M reads/run

length up to 400bp

• Shotgun reads—starting positions

distributed ~uniformly

• Amplicon reads— reads have

predefined start/end positionscovering fixed overlappingwindows

Shotgun vs. Amplicon Reads

Reconstruction from Shotgun Reads: ViSpA

Read Error Correction

Read Alignment

Preprocessing of Aligned

Reads

Read Graph ConstructionContig AssemblyFrequency

Estimation

Shotgun reads

Quasispecies sequences w/ frequencies

User Specified Parameters: (A) Number of mismatches (B) Mutation rate

Reconstruction from Amplicon Reads: VirA

Reference in FASTAformat

Error-correctedSAM/BAMRead data

Estimate Amplicons

Max-Bandwidth Paths

Viral population variants with frequencies

Amplicon Read Graph

Frequency Estimation

Amplicon Sequencing Challenges

• Multiple reads from consecutive amplicons may match over their overlap

• Distinct quasispecies may be indistinguishable in an amplicon interval

Outline• Background & aims of the project• Bioinformatics tools for quasispecies spectrum

reconstruction from NGS reads• Experimental validation on IBV data• Summary and ongoing work

IBV Genome

Rev. Bras. Cienc. Avic. vol.12 no.2 Campinas Apr./June 2010

RT-PCR of S1 using redesigned primers

Experiment 110 clone pool

C1 20%C2 20%C3 15%C4 15%C5 10%C6 10%C7 4%C8 4%C9 1%C10 1%

Assembled quasispeciesPV1 PV2PV3…

PVk

454 reads

M42 Sample

454 reads

53 plasmid clones

V1 V2V3…Vn

Assembled quasispecies

Evaluated Reconstruction Flows

Reads Statistics & Coverage

Sample

Number of Reads

Uncorrected SAET Corrected Shorah Corrected KEC Corrected

M42 isolate 53062 53062 50858 48945

M42 clone pool 21040 21040 19439 17122

Reads Validation

How well we predicted sanger

clones

How well our prediction is

Average Prediction Error

Neighbor-Joining Tree for M42 Sanger Clones & Vispa Qsps

Experiment 2

Reads Statistics & CoverageSample

Number of Reads

Uncorrected SAET corrected Shorah corrected KEC corrected

M41 Vaccine 92113 92113 87883 85311

Field #1 38502 38502 33685 32521

Field #2 132513 132513 123370 111686

Field #3 76906 76906 71408 64507

Field #4 44467 44467 41653 37295

Neighbor-Joining Tree for Sanger clones and ViSpA Reconstructed Sequences

Outline• Background & aims of the project• Bioinformatics tools for quasispecies spectrum

reconstruction from NGS reads• Experimental validation on IBV data• Summary and ongoing work

Summary

• Developed software tools for quasispecies reconstruction from both shotgun and amplicon next-generation reads‒ Code and executables freely available at

http://alla.cs.gsu.edu/~software/VISPA/vispa.html http://alan.cs.gsu.edu/vira/

– ViSpA plugin developed for users of ION Torrent, available on ION community

• Experimental results on both simulated and real data show improved accuracy tradeoffs compared to previous methods

• Tools are applicable to quasispecies studies of other viruses

Ongoing Work

• Deployment of ViSpA and VirA on Galaxy servers maintained at UConn and GSU

• Tool validation on ION Torrent reads

• Comparison of shotgun and amplicon based reconstruction methods

• Combining long and short read technologies

• Quasispecies persistence studies using longitudinal sampling

Tool Validation for ION Torrent reads

• Shotgun IBV reads generated using 316 ION chip

– 2,384,007 reads (1,177,740 after SAET correction)– mean length 203.58 bp

• ViSpA results– 23 quasispecies with estimated frequency > .5%,

2,200 total

Longitudinal Sampling

Amplicon / shotgun

sequencing

Contributors

University of Connecticut:Rachel O’Neal, PhD. Mazhar Kahn, Ph.D.

Hongjun Wang, Ph.D. Craig ObergfellAndrew Bligh

Bassam TorkEkaterina Nenastyeva

Alex ArtyomenkoSerghei Mangul

Nicholas MancusoAlexander Zelikovsky

University of MarylandIrina Astrovskaya, Ph.D.