42
New era for molecular breeding with cost effective SNP genotyping solutions Dr. Bhaswar Maity Imperial Life Sciences 18/2/2015 ICRISAT

New era for molecular breeding with cost effective SNP ...ksiconnect.icrisat.org/wp-content/uploads/2015/03/BhaswarMaity.pdf · molecular breeding with cost effective SNP genotyping

  • Upload
    haxuyen

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

New era for molecular

breeding with cost effective

SNP genotyping solutions

Dr. Bhaswar Maity

Imperial Life Sciences 18/2/2015

ICRISAT

DNA Sequencing to Genomic Selection

Plant Genomes up to 2013

• Incomplete:

– Average completeness: 85%

• Fragmented:

– Average contiguity: 20 kb

Michael & Jackson (2013) The Plant Genome 6: 1-7.

Hyderabad was known originally as ?????

Hyderabad was known originally as Bhagyanagar, a city Sultan Muhammad

Quli of the Qutub Shahi dynasty had founded and named after his beloved

Bhagmati or Bhagyamati in 1590. Once she entered the royal household and

embraced Islam, she was rechristened Hydermahal and as a natural

consequence, the city got its second name, Hyderabad.

Why does everybody want longer reads?

P6-C4: Read Length Performance

P6-C4, 4 hr movie, 20 kb BluePippin™ size-selected E coli (1 SMRT Cell)

N50 Read Lengths: >15 kb 95th Percentile: >20 kb Maximum Read Lengths >40 kb Throughput / SMRT Cell >800Mb

200,000+ SNPs Were Missed In Short-Read Assemblies

In collaboration with Joe Ecker at Salk Institute for Biological Studies

509,836

95%/68%

685,104

92%/72%

Ler0

ILMN PE

27,106

PacBio Ler0

Assembly

PacBio Cvi

Assembly

271,335

Cvi

ILMN PE

55,947

238,637

Mapping of Illumina® PE or PacBio®

Assemblies to TAIR 10

“Not only did PacBio discover pretty much everything that Illumina paired-end reads was able to find, in this case 95% and 92%, it identified another 250,000 of these variants”

Chongyuan Luo, Ph.D The Salk Institute for Biological Studies

Resolving the Complexity of Genomic and

Epigenetic Variations in Arabidopsis PAG 2014 Workshop Recording

@ blog.pacificsciences.com

Resolve Gene Duplications in Difficult BACs

Aluminum tolerance in maize is important for drought resistance and protecting against nutrient deficiencies

• Segregating population localized a QTL on a BAC, but unable to genotype with short-read sequencing because of high repeat content and GC skew

• BAC assembly with PacBio® long reads revealed a triplication of the ZnMATE1 membrane transporter

Maron, LG et al. (2012) A rare gene copy-number variant that contributes to maize aluminum tolerance and adaptation to acid soils. PNAS

Genomic organization of the MATE1 locus

Resolve Difficult Genomic Regions

Novel patterns of higher-order repeat structures in Switchgrass centromeres:

Melters et al. (2013) Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution. Genome Biology, 14:R10

SampleNet: Iso-Seq Method with Clontech® cDNA Synthesis Kit

PacBio’s Iso-Seq™ Method for High-quality, Full-length Transcripts

PolyA mRNA

AAAAA

AAAAA

AAAAA

AAAAA

cDNA synthesis

with adapters

AAAAA TTTTT

AAAAA TTTTT

AAAAA TTTTT

AAAAA TTTTT

AAAAA TTTTT

AAAAA TTTTT

AAAAA TTTTT

AAAAA TTTTT

Size partitioning &

PCR amplification

SMRTbell™

ligation

PacBio® RS II

Sequencing

Experimental Pipeline

Informatics Pipeline

Remove adapters

Remove artifacts

Clean sequence reads

Reads

clustering

Isoform clusters

Consensus

calling

Nonredundant transcript isoforms

Quality

filtering

Final isoforms PacBio raw

sequence reads

Raw 5’ primer 3’ primer

Map to

reference genome

Experimental pipeline Informatics pipeline

PacBio raw

sequence reads

Figure 1

a b

AAAA

AAAA

AAAAA

AAAAA

AAAAA

AAAAA

AAAAA

Size partitioning &

PCR amplification

cDNA synthesis

with adapters

SMRTbell ligation

RS sequencing

Remove adapters

Remove artifacts

Reads clustering

Quality filtering

Clean

sequence reads

Nonredundant

transcript isoforms

Final isoforms

TTTT

TTTT

Consensus calling

Isoform clusters

Map to reference genome

Evidence-based gene models

polyA mRNA

AAAA

AAAA

TTTT

TTTT

AAAATTTT

AAAATTTT

AAAATTTT

AAAATTTT

Evidenced-based

gene models

(AAA)n

(TTT)n

SMRT® adapter

1 2 3 4 5

6 7 8 9 10

(TTT)n

(AAA)n

Coding sequence polyA

tail

SMRT® adapter

DevNet: Iso-Seq wiki page

(AAA)n Reads of Insert (AAA)n

Axiom Genotyping Arrays

Best for your breeding program –

Now and tomorrow

14

Largest portfolio of catalog arrays in Agri-genomics

22 Catalog Designs

15

Axiom Genotyping Publications

Routine Use – 384-format

Robust, Med density , Very High Throughput Cost Effective Assay for routine use across all animals

• Axiom 384-format – High throughput ideal for

genotyping applications in breeding

– Improves imputation accuracy

– Low cost

Breeder arrays

Key Features:

~50,000 markers per array 3,000+ samples/week throughput

Demystifying Expert Array Designs

Genotyping microarrays for 1,500 – 600,000 SNPs and indels

384-array plate 96-array plate 1,500-50,000 markers 50,000-600,000 markers

Pre-designed arrays

◦ Catalog (off the shelf)

◦ Expert Designs (custom arrays available to anyone)

myDesign custom arrays

◦ Any species

Custom arrays are the largest revenue drivers of the Ag genotyping business!

MyDesign Arrays Span All Application Needs m

ult

iple

xin

g

Low

Hig

h

Discovery

675K -200K markers

Genotype-trait association

200K – 90K markers

Selection

90K – 50K markers

Screening

50K-1.5K markers

• 480 sample minimum volume

SNP Array Density

Price/S

am

ple

90K 200K 675K 1.5K 50K

More SNP Content for

No Additional Cost

• Same prices within SNP tiers

• More content at no additional cost

19

Axiom catalog and custom designs

Plants

• Wheat

• Maize

• Rice

• Soybean

• Lettuce

• Pepper

• Apple

• Strawberry

• Rose

• Cotton

Animal

• Bovine

• Buffalo

• Chicken

• Equine

• Goat

• Mouse

• Ovine

• Porcine

• Rat

• Salmon

• Trout

• Turkey

Catalog designs Custom designs

Animals

•Bovine

•Buffalo

•Carp

•Catfish

•Chicken

•A. Aegypti

•Eel

•Great tit

•Herring

•Mouse

•Pig

•Rat

•Salmon

•Sea lice

•Trout

•Yellow tail

Plants

•Alstroemeria Lily

•Barley

•Apple

•Chrysanthemum

•Japanese Cedar

•Lettuce

•Maize

•Potato

•Rice

•Rose

•Rye

•Sorgum

•Soybean

•Strawberry

•Sunflower

•Tobacco

•Tomato

•Wheat

•Brassica

•Rapeseed

•Pine

•Spruce

•Cabbage

Crops, vegetables, fruits & trees

26+ species

Farm animals

8+ species

Aquaculture

8+ species

Axiom Expert Design arrays Test drive the experience: Available to all !

>200k

Bovine

Chicken

Equine

Porcine

Q1 2015

Maize

Wheat

Apple

Q2 2015

90k-200k

Salmon Soybean

70k-90k

Buffalo Straw-berry

Rose

50k

Trout

Goat

Q1 2015

Ovine

Q1 2015

Cotton

Rice 44k

Q1 2015 Axiom

384

Wheat

Maize 50k

Trout

21

Screening Arrays: Validate sequencing discoveries

• Sub-select makers • Add new markers • Select markers from

multiple breeds

• Validate discovery across multiple samples

• Selection of markers using insilico design

Unique Advantage: Multi-Species format

Rice

Maize Chick

Pea

Pigeo

n Pea

Greater Flexibility- choose cost effective fast solution

Unique Advantage: Multi-Species format

23

Fast track genomics selection using arrays

GBS experiments take longer to run, data analysis requires bioinformatics staff and can take weeks/months to complete.

GBS – Missing data • 40%-60% missing data

• 30-50% errors in calling hets

• < 1000 markers common between samples

Relationship between amount of markers

available using genotyping by sequencing

techniques, proportion of missing data

and cost per sample as observed with

wheat. (doi: 10.3835/plantgenome)

25

No beads! No Missing SNPs 100% custom content on Axiom, 0% SNP drop out

Affymetrix array manufacture process ensures no batch effects

Bead-based array experiences ~5-20% difference in SNP content between different batches

For example: see Eeles et al., Nat Genet 40:316-321 (2008).

Initial Manufacture Event Subsequent Manufacture events

Customer content is present on the array, EVERY time

1 2 3 4 5 6 7 8 9 10

11 12 14 15 16 17 18 19 20

21 22 23 24 25 26 27 28 29 30

31 32 33 34 35 37 38 39 40

41 42 43 44 45 46 47 48 49 50

51 52 53 54 55 57 58 59 60

61 62 63 64 65 66 67 68 69 70

71 72 73 75 76 77 78 80

81 82 83 84 85 86 87 88 89 90

91 92 93 94 95 96 97 98 99 100

1 2 4 5 6 7 8 9 10

11 12 13 14 15 16 17 18 19 20

21 22 23 24 25 26 27 28 29

31 32 33 34 35 36 37 38 40

41 42 43 44 45 46 47 48 49 50

51 52 53 54 55 56 57 58 59 60

61 62 63 65 66 67 68 69 70

71 72 73 74 75 76 77 78 79 80

82 83 84 85 86 87 88 89 90

91 92 93 94 95 96 97 98 99 100

1 2 4 5 6 7 8 9 10

11 12 14 15 16 17 18 19 20

21 22 23 24 25 26 28 29

32 33 34 35 37 38 40

41 42 43 44 45 46 47 48 49 50

51 52 53 54 55 57 58 59 60

62 63 65 66 67 68 69 70

71 72 73 75 76 78 80

82 83 84 85 86 87 88 89 90

91 92 93 94 95 96 97 98 99

1 2 3 4 5 6 7 8 9 10

11 12 13 14 15 16 17 18 19 20

21 22 23 24 25 26 28 29 30

32 33 34 35 36 37 38 39 40

41 42 43 44 45 46 47 48 49 50

51 52 53 54 55 56 57 58 59 60

62 63 64 65 66 67 68 69 70

71 72 73 74 75 76 78 79 80

81 82 83 84 85 86 87 88 89 90

91 92 93 94 95 96 97 98 99

Bead based Arrays

Bead pool 2 Bead pool 3 Common SNPs across

3 bead pools

First bead pool ~5% bead loss from pool

5-20% drop

Polyploidy & informatics

More than 2 sets of chromosomes (humans are diploid = 2 sets)

What is it?

*Axiom is capable of genotyping allopolyploids and SOME autopolyploids

Why do we care?

Non-human species can have complex genomes and varying numbers of ploidy. Polyploidy is especially common in plants. Wheat is hexaploid Strawberry is octoploid!

What are Affy’s capabilities?

Genotyping polyploids is complex as it causes cluster compression.

Diploid (Bovine) Polyploid (wheat)

Axiom is the only high density genotyping platform that can automatically call genotypes from polyploid species!

Automated SNP calls for Polyploid Crops

Axiom genotyping categories Setting the gold standard for displaying results

Sr. No. Features Affymetrix GeneTitan Axiom Technology Technology “X”

1 Arrays Production

Consistency

Arrays are synthesized using highly sensitive photolithographic manufacturing technology. Flexibility to select SNPs from Affymetrix database, user defined SNPs, SNPs of initial array and redesign array at

very high conversion rate, minimizing batch-to-batch variability.

15-20 % batch-to-batch SNP Loss from Bead Pool

2 Arrays Supply

100% identical SNP content at any time and for as long as research necessitates. Continuous supply of plates and can be

ordered any time on with 100% reproducibility among batches.

Inconsistent supply of arrays. bead-pools expire in 12 months.

3 Support for polyploid Genomes

Automated genotype-calling algorithm which performs analysis of diploid and

polyploid genomes without the need for manual data editing.

Manual genotype checking and correction is required. This is labor intensive and time

consuming, as it can require a whole day for every 1,000 SNPs that require checking. This is a considerable analysis burden, even for a low-

density SNP panel.

4 Marker selection

freedom

Axiom Assay tolerates a single base-pair mismatch outside of 10b window from

the SNP interogation site

Infinium Assay does not support interfering SNPs within 60 bp of SNP of interest

5

Minimum commitement for Customization for

future arrays/ versions

Only 480 arrays Minimum 1152 arrays

Axiom® Features

No SNP dropouts

• Semi-conductor based photo-lithographic technology

• All designed markers are accessible

More Marker Selection Flexibility

• Compatible with Interfering SNPs 10 bp away form candidate SNP

• Proven INDEL calling

• Multi-Species Design

Automated Genotyping

Analysis

• Automated analysis of diploid and polyploid organisms

• No manual editing required

Candidate SNP Neighboring SNP

Oth

er

Pla

tfo

rm

Missing valuable data

Inability to target specific markers

Tedious Manual Analysis

35

MassARRAY® Complements High-throughput Sequencing & Genomic Selection

de novo Discovery

Novel SNPs

Somatic Mutations

RNA-Seq

Meth-Seq

CNVs

Validation & Translation

Custom Assays

Multiplexing

QC/Tracking

High-throughput

Low cost

The MassARRAY® System is for Research Use Only. Not for use in diagnostic procedures. 35 | Improving healthcare through revolutionary genetic analysis solutions

36

Agena MassARRAY

.

Miniaturized SpectroCHIP

Data Analysis Packages

Mass Spectrometry

Biochemistry Processes

Genotyping Methylation Analysis

Quantitative Gene Analysis Comparative Sequence Analysis

36 | All Sequenom Products and Assays Are For Research Use Only Not For Use in Diagnostic Procedures.

37

Distinct advantages

Of MassARRAY for Nucleic Acid Analysis

1. We don’t use fluorescence

• Mass of the actual bioanalyte is detected - 4 decimal place accuracy

• No non-specific background issues – background is a different mass

• Incredible sensitivity – push PCR amplification to the max – single molecule detection possible

2. system is quantitative

• Many biological phenomena need to be accurately quantified

• Allele ratios, gene copy number, gene expression, methylation

3. We have an ability to multiplex due to wide mass window and high resolution detector

• Provides high throughput

• Simple and flexible assay design with little optimization required

• Cost savings

4. The system is very flexible

• Small, medium and large scale studies

• Numbers of samples and markers are easily scaled

• Simple assay design and ordering of reagents

• Comprehensive Genetic Analysis >> Genetic – SNP, Transcriptome – Gene Expression, Epigenetic – methylation

• Adopted by leading centers engaged in basic, translational, clinical, and agricultural research

• 300+ systems worldwide

• 2000+ peer-reviewed publications to date

• 800k+ samples and ~1.2B genotypes analyzed on the MassARRAY system in the past year

. 38 | All Sequenom Products and Assays Are For Research Use Only Not For Use in Diagnostic Procedures.

39

iPLEX ® Gold Genotyping: Rapid and Easy Workflow

. 39 | All Sequenom Products and Assays Are For Research Use Only Not For Use in Diagnostic Procedures.

40

iPLEX™ Biochemistry – SNP Genotyping & Mutation Detection

• iPLEX Gold for general research genotyping

• Up to 40-plex reactions as standard

• Low cost per genotype

• User assay design or Assays by Agena

• High assay design yield (+90%) with high genotyping call rates (+98%)

• High accuracy – published 99.7%

• Assay requires low cost plain oligos

• Can handle insertions/deletions (complex mutations)

• Small -Medium to High throughput Capability 24 or 96 or 384 well plate format

100s to 1000s of samples per day

• Very flexible Incorporating new assays & re-plexing

Flexible study design in 384 or 96 well formats

Rapidly design new assays and order oligos

42

Thank You

43

The majority of wishes by devotees are visa related, thus Chilkur Balaji is also referred to as 'Visa' Balaji.