32
Bioinformatics tools for viral quasispecies reconstruction from next- generation sequencing data and vaccine optimization PD: Ion Măndoiu, UConn Co-PDs: Mazhar Khan, UConn Rachel O’Neill, UConn Alex Zelikovsky, GSU

PD: Ion M ă ndoiu , UConn Co-PDs: Mazhar Khan, UConn Rachel O’Neill, UConn

  • Upload
    kareem

  • View
    57

  • Download
    0

Embed Size (px)

DESCRIPTION

Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization. PD: Ion M ă ndoiu , UConn Co-PDs: Mazhar Khan, UConn Rachel O’Neill, UConn Alex Zelikovsky , GSU . Outline. Background & aims of the project - PowerPoint PPT Presentation

Citation preview

Page 1: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and

vaccine optimization

PD: Ion Măndoiu, UConnCo-PDs: Mazhar Khan, UConn

Rachel O’Neill, UConnAlex Zelikovsky, GSU

Page 2: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

Outline• Background & aims of the project• Bioinformatics tools for quasispecies spectrum

reconstruction from NGS reads• Experimental validation on IBV data• Summary and ongoing work

Page 3: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

Infectious Bronchitis Virus (IBV)• Group 3 coronavirus• Biggest single cause of

economic loss in US poultry farms−Young chickens: coughing, tracheal

rales, dyspnea−Broiler chickens: reduced growth rate−Layers: egg production drops 5-50%,

thin-shelled, watery albumin• Worldwide distribution, with

dozens of serotypes in circulation‒ Co-infection with multiple serotypes is

not uncommon, creating conditions for recombination IBV-infected

embryonormalembryo

IBV-infected egg defects

Page 4: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

IBV Vaccination Broadly used, most commonly with attenuated live vaccine• Short lived protection• Layers need to be re-vaccinated multiple times

during their lifespan• Vaccines might undergo selection in vivo and

regain virulence [Hilt, Jackwood, and McKinley 2008]

Page 5: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

RNA Virus ReplicationHigh mutation rate (~10-4)

Lauring & Andino, PLoS Pathogens 2011

Page 6: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

Quasispecies identified by cloning and Sanger sequencing in both IBV infected poultry and commercial vaccines [Jackwood, Hilt, and Callison 2003; Hilt, Jackwood, and McKinley 2008]

Evolution of IBV

Page 7: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

How Are Quasispecies Contributing to Virus Persistence and Evolution?

• Variants differ in– Virulence– Ability to escape immune response– Resistance to antiviral therapies– Tissue tropism

Lauring & Andino, PLoS Pathogens 2011

Page 8: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

Project Aims• Develop bioinformatics tools for accurate

reconstruction of quasispecies sequences and their frequencies from next-generation reads

• Study quasispecies persistence and evolution of IBV in commercial layer flocks following vaccination

• Use results of this study to optimize vaccine development and vaccination protocols

Page 9: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

Outline• Background & aims of the project• Bioinformatics tools for quasispecies spectrum

reconstruction from NGS reads• Experimental validation on IBV data• Summary and ongoing work

Page 10: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

Next Generation Sequencing

10

http://www.economist.com/node/16349358

Roche/454 FLX Titanium400-600 million reads/run

Length up to 1,000 bp

Illumina HiSeq 2000up to 6 billion PE reads/run

35-100bp read length

SOLiD 4/55001.4-2.4 billion PE reads/run

35-50bp read length

Ion Torrent PGM1-10M reads/run

length up to 400bp

Page 11: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

• Shotgun reads—starting positions

distributed ~uniformly

• Amplicon reads— reads have

predefined start/end positionscovering fixed overlappingwindows

Shotgun vs. Amplicon Reads

Page 12: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

Reconstruction from Shotgun Reads: ViSpA

Read Error Correction

Read Alignment

Preprocessing of Aligned

Reads

Read Graph ConstructionContig AssemblyFrequency

Estimation

Shotgun reads

Quasispecies sequences w/ frequencies

User Specified Parameters: (A) Number of mismatches (B) Mutation rate

Page 13: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

Reconstruction from Amplicon Reads: VirA

Reference in FASTAformat

Error-correctedSAM/BAMRead data

Estimate Amplicons

Max-Bandwidth Paths

Viral population variants with frequencies

Amplicon Read Graph

Frequency Estimation

Page 14: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

Amplicon Sequencing Challenges

• Multiple reads from consecutive amplicons may match over their overlap

• Distinct quasispecies may be indistinguishable in an amplicon interval

Page 15: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

Outline• Background & aims of the project• Bioinformatics tools for quasispecies spectrum

reconstruction from NGS reads• Experimental validation on IBV data• Summary and ongoing work

Page 16: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

IBV Genome

Rev. Bras. Cienc. Avic. vol.12 no.2 Campinas Apr./June 2010

RT-PCR of S1 using redesigned primers

Page 17: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

Experiment 110 clone pool

C1 20%C2 20%C3 15%C4 15%C5 10%C6 10%C7 4%C8 4%C9 1%C10 1%

Assembled quasispeciesPV1 PV2PV3…

PVk

454 reads

M42 Sample

454 reads

53 plasmid clones

V1 V2V3…Vn

Assembled quasispecies

Page 18: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

Evaluated Reconstruction Flows

Page 19: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

Reads Statistics & Coverage

Sample

Number of Reads

Uncorrected SAET Corrected Shorah Corrected KEC Corrected

M42 isolate 53062 53062 50858 48945

M42 clone pool 21040 21040 19439 17122

Page 20: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

Reads Validation

Page 21: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

How well we predicted sanger

clones

How well our prediction is

Page 22: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

Average Prediction Error

Page 23: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

Neighbor-Joining Tree for M42 Sanger Clones & Vispa Qsps

Page 24: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

Experiment 2

Page 25: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

Reads Statistics & CoverageSample

Number of Reads

Uncorrected SAET corrected Shorah corrected KEC corrected

M41 Vaccine 92113 92113 87883 85311

Field #1 38502 38502 33685 32521

Field #2 132513 132513 123370 111686

Field #3 76906 76906 71408 64507

Field #4 44467 44467 41653 37295

Page 26: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

Neighbor-Joining Tree for Sanger clones and ViSpA Reconstructed Sequences

Page 27: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

Outline• Background & aims of the project• Bioinformatics tools for quasispecies spectrum

reconstruction from NGS reads• Experimental validation on IBV data• Summary and ongoing work

Page 28: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

Summary

• Developed software tools for quasispecies reconstruction from both shotgun and amplicon next-generation reads‒ Code and executables freely available at

http://alla.cs.gsu.edu/~software/VISPA/vispa.html http://alan.cs.gsu.edu/vira/

– ViSpA plugin developed for users of ION Torrent, available on ION community

• Experimental results on both simulated and real data show improved accuracy tradeoffs compared to previous methods

• Tools are applicable to quasispecies studies of other viruses

Page 29: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

Ongoing Work

• Deployment of ViSpA and VirA on Galaxy servers maintained at UConn and GSU

• Tool validation on ION Torrent reads

• Comparison of shotgun and amplicon based reconstruction methods

• Combining long and short read technologies

• Quasispecies persistence studies using longitudinal sampling

Page 30: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

Tool Validation for ION Torrent reads

• Shotgun IBV reads generated using 316 ION chip

– 2,384,007 reads (1,177,740 after SAET correction)– mean length 203.58 bp

• ViSpA results– 23 quasispecies with estimated frequency > .5%,

2,200 total

Page 31: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

Longitudinal Sampling

Amplicon / shotgun

sequencing

Page 32: PD: Ion M ă ndoiu ,  UConn Co-PDs: Mazhar  Khan,  UConn Rachel O’Neill,  UConn

Contributors

University of Connecticut:Rachel O’Neal, PhD. Mazhar Kahn, Ph.D.

Hongjun Wang, Ph.D. Craig ObergfellAndrew Bligh

Bassam TorkEkaterina Nenastyeva

Alex ArtyomenkoSerghei Mangul

Nicholas MancusoAlexander Zelikovsky

University of MarylandIrina Astrovskaya, Ph.D.