Upload
haxuyen
View
214
Download
0
Embed Size (px)
Citation preview
New era for molecular
breeding with cost effective
SNP genotyping solutions
Dr. Bhaswar Maity
Imperial Life Sciences 18/2/2015
ICRISAT
Plant Genomes up to 2013
• Incomplete:
– Average completeness: 85%
• Fragmented:
– Average contiguity: 20 kb
Michael & Jackson (2013) The Plant Genome 6: 1-7.
Hyderabad was known originally as ?????
Hyderabad was known originally as Bhagyanagar, a city Sultan Muhammad
Quli of the Qutub Shahi dynasty had founded and named after his beloved
Bhagmati or Bhagyamati in 1590. Once she entered the royal household and
embraced Islam, she was rechristened Hydermahal and as a natural
consequence, the city got its second name, Hyderabad.
P6-C4: Read Length Performance
P6-C4, 4 hr movie, 20 kb BluePippin™ size-selected E coli (1 SMRT Cell)
N50 Read Lengths: >15 kb 95th Percentile: >20 kb Maximum Read Lengths >40 kb Throughput / SMRT Cell >800Mb
200,000+ SNPs Were Missed In Short-Read Assemblies
In collaboration with Joe Ecker at Salk Institute for Biological Studies
509,836
95%/68%
685,104
92%/72%
Ler0
ILMN PE
27,106
PacBio Ler0
Assembly
PacBio Cvi
Assembly
271,335
Cvi
ILMN PE
55,947
238,637
Mapping of Illumina® PE or PacBio®
Assemblies to TAIR 10
“Not only did PacBio discover pretty much everything that Illumina paired-end reads was able to find, in this case 95% and 92%, it identified another 250,000 of these variants”
Chongyuan Luo, Ph.D The Salk Institute for Biological Studies
Resolving the Complexity of Genomic and
Epigenetic Variations in Arabidopsis PAG 2014 Workshop Recording
@ blog.pacificsciences.com
Resolve Gene Duplications in Difficult BACs
Aluminum tolerance in maize is important for drought resistance and protecting against nutrient deficiencies
• Segregating population localized a QTL on a BAC, but unable to genotype with short-read sequencing because of high repeat content and GC skew
• BAC assembly with PacBio® long reads revealed a triplication of the ZnMATE1 membrane transporter
Maron, LG et al. (2012) A rare gene copy-number variant that contributes to maize aluminum tolerance and adaptation to acid soils. PNAS
Genomic organization of the MATE1 locus
Resolve Difficult Genomic Regions
Novel patterns of higher-order repeat structures in Switchgrass centromeres:
Melters et al. (2013) Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution. Genome Biology, 14:R10
SampleNet: Iso-Seq Method with Clontech® cDNA Synthesis Kit
PacBio’s Iso-Seq™ Method for High-quality, Full-length Transcripts
PolyA mRNA
AAAAA
AAAAA
AAAAA
AAAAA
cDNA synthesis
with adapters
AAAAA TTTTT
AAAAA TTTTT
AAAAA TTTTT
AAAAA TTTTT
AAAAA TTTTT
AAAAA TTTTT
AAAAA TTTTT
AAAAA TTTTT
Size partitioning &
PCR amplification
SMRTbell™
ligation
PacBio® RS II
Sequencing
Experimental Pipeline
Informatics Pipeline
Remove adapters
Remove artifacts
Clean sequence reads
Reads
clustering
Isoform clusters
Consensus
calling
Nonredundant transcript isoforms
Quality
filtering
Final isoforms PacBio raw
sequence reads
Raw 5’ primer 3’ primer
Map to
reference genome
Experimental pipeline Informatics pipeline
PacBio raw
sequence reads
Figure 1
a b
AAAA
AAAA
AAAAA
AAAAA
AAAAA
AAAAA
AAAAA
Size partitioning &
PCR amplification
cDNA synthesis
with adapters
SMRTbell ligation
RS sequencing
Remove adapters
Remove artifacts
Reads clustering
Quality filtering
Clean
sequence reads
Nonredundant
transcript isoforms
Final isoforms
TTTT
TTTT
Consensus calling
Isoform clusters
Map to reference genome
Evidence-based gene models
polyA mRNA
AAAA
AAAA
TTTT
TTTT
AAAATTTT
AAAATTTT
AAAATTTT
AAAATTTT
Evidenced-based
gene models
(AAA)n
(TTT)n
SMRT® adapter
1 2 3 4 5
6 7 8 9 10
(TTT)n
(AAA)n
Coding sequence polyA
tail
SMRT® adapter
DevNet: Iso-Seq wiki page
(AAA)n Reads of Insert (AAA)n
Routine Use – 384-format
Robust, Med density , Very High Throughput Cost Effective Assay for routine use across all animals
• Axiom 384-format – High throughput ideal for
genotyping applications in breeding
– Improves imputation accuracy
– Low cost
Breeder arrays
Key Features:
~50,000 markers per array 3,000+ samples/week throughput
Demystifying Expert Array Designs
Genotyping microarrays for 1,500 – 600,000 SNPs and indels
384-array plate 96-array plate 1,500-50,000 markers 50,000-600,000 markers
Pre-designed arrays
◦ Catalog (off the shelf)
◦ Expert Designs (custom arrays available to anyone)
myDesign custom arrays
◦ Any species
Custom arrays are the largest revenue drivers of the Ag genotyping business!
MyDesign Arrays Span All Application Needs m
ult
iple
xin
g
Low
Hig
h
Discovery
675K -200K markers
Genotype-trait association
200K – 90K markers
Selection
90K – 50K markers
Screening
50K-1.5K markers
• 480 sample minimum volume
SNP Array Density
Price/S
am
ple
90K 200K 675K 1.5K 50K
More SNP Content for
No Additional Cost
• Same prices within SNP tiers
• More content at no additional cost
19
Axiom catalog and custom designs
Plants
• Wheat
• Maize
• Rice
• Soybean
• Lettuce
• Pepper
• Apple
• Strawberry
• Rose
• Cotton
Animal
• Bovine
• Buffalo
• Chicken
• Equine
• Goat
• Mouse
• Ovine
• Porcine
• Rat
• Salmon
• Trout
• Turkey
Catalog designs Custom designs
Animals
•Bovine
•Buffalo
•Carp
•Catfish
•Chicken
•A. Aegypti
•Eel
•Great tit
•Herring
•Mouse
•Pig
•Rat
•Salmon
•Sea lice
•Trout
•Yellow tail
Plants
•Alstroemeria Lily
•Barley
•Apple
•Chrysanthemum
•Japanese Cedar
•Lettuce
•Maize
•Potato
•Rice
•Rose
•Rye
•Sorgum
•Soybean
•Strawberry
•Sunflower
•Tobacco
•Tomato
•Wheat
•Brassica
•Rapeseed
•Pine
•Spruce
•Cabbage
Crops, vegetables, fruits & trees
26+ species
Farm animals
8+ species
Aquaculture
8+ species
Axiom Expert Design arrays Test drive the experience: Available to all !
>200k
Bovine
Chicken
Equine
Porcine
Q1 2015
Maize
Wheat
Apple
Q2 2015
90k-200k
Salmon Soybean
70k-90k
Buffalo Straw-berry
Rose
50k
Trout
Goat
Q1 2015
Ovine
Q1 2015
Cotton
Rice 44k
Q1 2015 Axiom
384
Wheat
Maize 50k
Trout
21
Screening Arrays: Validate sequencing discoveries
• Sub-select makers • Add new markers • Select markers from
multiple breeds
• Validate discovery across multiple samples
• Selection of markers using insilico design
Unique Advantage: Multi-Species format
Rice
Maize Chick
Pea
Pigeo
n Pea
Greater Flexibility- choose cost effective fast solution
Unique Advantage: Multi-Species format
23
Fast track genomics selection using arrays
GBS experiments take longer to run, data analysis requires bioinformatics staff and can take weeks/months to complete.
GBS – Missing data • 40%-60% missing data
• 30-50% errors in calling hets
• < 1000 markers common between samples
Relationship between amount of markers
available using genotyping by sequencing
techniques, proportion of missing data
and cost per sample as observed with
wheat. (doi: 10.3835/plantgenome)
25
No beads! No Missing SNPs 100% custom content on Axiom, 0% SNP drop out
Affymetrix array manufacture process ensures no batch effects
Bead-based array experiences ~5-20% difference in SNP content between different batches
For example: see Eeles et al., Nat Genet 40:316-321 (2008).
Initial Manufacture Event Subsequent Manufacture events
Customer content is present on the array, EVERY time
1 2 3 4 5 6 7 8 9 10
11 12 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 37 38 39 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 57 58 59 60
61 62 63 64 65 66 67 68 69 70
71 72 73 75 76 77 78 80
81 82 83 84 85 86 87 88 89 90
91 92 93 94 95 96 97 98 99 100
1 2 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29
31 32 33 34 35 36 37 38 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60
61 62 63 65 66 67 68 69 70
71 72 73 74 75 76 77 78 79 80
82 83 84 85 86 87 88 89 90
91 92 93 94 95 96 97 98 99 100
1 2 4 5 6 7 8 9 10
11 12 14 15 16 17 18 19 20
21 22 23 24 25 26 28 29
32 33 34 35 37 38 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 57 58 59 60
62 63 65 66 67 68 69 70
71 72 73 75 76 78 80
82 83 84 85 86 87 88 89 90
91 92 93 94 95 96 97 98 99
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 28 29 30
32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60
62 63 64 65 66 67 68 69 70
71 72 73 74 75 76 78 79 80
81 82 83 84 85 86 87 88 89 90
91 92 93 94 95 96 97 98 99
Bead based Arrays
Bead pool 2 Bead pool 3 Common SNPs across
3 bead pools
First bead pool ~5% bead loss from pool
5-20% drop
Polyploidy & informatics
More than 2 sets of chromosomes (humans are diploid = 2 sets)
What is it?
*Axiom is capable of genotyping allopolyploids and SOME autopolyploids
Why do we care?
Non-human species can have complex genomes and varying numbers of ploidy. Polyploidy is especially common in plants. Wheat is hexaploid Strawberry is octoploid!
What are Affy’s capabilities?
Genotyping polyploids is complex as it causes cluster compression.
Diploid (Bovine) Polyploid (wheat)
Axiom is the only high density genotyping platform that can automatically call genotypes from polyploid species!
Sr. No. Features Affymetrix GeneTitan Axiom Technology Technology “X”
1 Arrays Production
Consistency
Arrays are synthesized using highly sensitive photolithographic manufacturing technology. Flexibility to select SNPs from Affymetrix database, user defined SNPs, SNPs of initial array and redesign array at
very high conversion rate, minimizing batch-to-batch variability.
15-20 % batch-to-batch SNP Loss from Bead Pool
2 Arrays Supply
100% identical SNP content at any time and for as long as research necessitates. Continuous supply of plates and can be
ordered any time on with 100% reproducibility among batches.
Inconsistent supply of arrays. bead-pools expire in 12 months.
3 Support for polyploid Genomes
Automated genotype-calling algorithm which performs analysis of diploid and
polyploid genomes without the need for manual data editing.
Manual genotype checking and correction is required. This is labor intensive and time
consuming, as it can require a whole day for every 1,000 SNPs that require checking. This is a considerable analysis burden, even for a low-
density SNP panel.
4 Marker selection
freedom
Axiom Assay tolerates a single base-pair mismatch outside of 10b window from
the SNP interogation site
Infinium Assay does not support interfering SNPs within 60 bp of SNP of interest
5
Minimum commitement for Customization for
future arrays/ versions
Only 480 arrays Minimum 1152 arrays
Axiom® Features
No SNP dropouts
• Semi-conductor based photo-lithographic technology
• All designed markers are accessible
More Marker Selection Flexibility
• Compatible with Interfering SNPs 10 bp away form candidate SNP
• Proven INDEL calling
• Multi-Species Design
Automated Genotyping
Analysis
• Automated analysis of diploid and polyploid organisms
• No manual editing required
Candidate SNP Neighboring SNP
Oth
er
Pla
tfo
rm
Missing valuable data
Inability to target specific markers
Tedious Manual Analysis
35
MassARRAY® Complements High-throughput Sequencing & Genomic Selection
de novo Discovery
Novel SNPs
Somatic Mutations
RNA-Seq
Meth-Seq
CNVs
Validation & Translation
Custom Assays
Multiplexing
QC/Tracking
High-throughput
Low cost
The MassARRAY® System is for Research Use Only. Not for use in diagnostic procedures. 35 | Improving healthcare through revolutionary genetic analysis solutions
36
Agena MassARRAY
.
Miniaturized SpectroCHIP
Data Analysis Packages
Mass Spectrometry
Biochemistry Processes
Genotyping Methylation Analysis
Quantitative Gene Analysis Comparative Sequence Analysis
36 | All Sequenom Products and Assays Are For Research Use Only Not For Use in Diagnostic Procedures.
37
Distinct advantages
Of MassARRAY for Nucleic Acid Analysis
1. We don’t use fluorescence
• Mass of the actual bioanalyte is detected - 4 decimal place accuracy
• No non-specific background issues – background is a different mass
• Incredible sensitivity – push PCR amplification to the max – single molecule detection possible
2. system is quantitative
• Many biological phenomena need to be accurately quantified
• Allele ratios, gene copy number, gene expression, methylation
3. We have an ability to multiplex due to wide mass window and high resolution detector
• Provides high throughput
• Simple and flexible assay design with little optimization required
• Cost savings
4. The system is very flexible
• Small, medium and large scale studies
• Numbers of samples and markers are easily scaled
• Simple assay design and ordering of reagents
• Comprehensive Genetic Analysis >> Genetic – SNP, Transcriptome – Gene Expression, Epigenetic – methylation
• Adopted by leading centers engaged in basic, translational, clinical, and agricultural research
• 300+ systems worldwide
• 2000+ peer-reviewed publications to date
• 800k+ samples and ~1.2B genotypes analyzed on the MassARRAY system in the past year
. 38 | All Sequenom Products and Assays Are For Research Use Only Not For Use in Diagnostic Procedures.
39
iPLEX ® Gold Genotyping: Rapid and Easy Workflow
. 39 | All Sequenom Products and Assays Are For Research Use Only Not For Use in Diagnostic Procedures.
40
iPLEX™ Biochemistry – SNP Genotyping & Mutation Detection
• iPLEX Gold for general research genotyping
• Up to 40-plex reactions as standard
• Low cost per genotype
• User assay design or Assays by Agena
• High assay design yield (+90%) with high genotyping call rates (+98%)
• High accuracy – published 99.7%
• Assay requires low cost plain oligos
• Can handle insertions/deletions (complex mutations)
• Small -Medium to High throughput Capability 24 or 96 or 384 well plate format
100s to 1000s of samples per day
• Very flexible Incorporating new assays & re-plexing
Flexible study design in 384 or 96 well formats
Rapidly design new assays and order oligos
43
The majority of wishes by devotees are visa related, thus Chilkur Balaji is also referred to as 'Visa' Balaji.