Upload
vuongdung
View
219
Download
0
Embed Size (px)
Citation preview
2016 BGI research
Potato Genome Analysis Xin Liu
Deputy director BGI research
2016.1.21 WCRTC 2016 @ Nanning
2016 BGI research
Reference genome construction
HELLO FRIENDS WELCOME TO BGISHENZHEN
Assemble
HELL RIEND WELCOME BGI ZHEN LLOFRI DSWEL
METOBG HENZH HELLOF SWEL METO GISHEN ELLOFR DSW COM OBGI ENZHEN
OFIIEN WELCOM GISH NZHEN
????????????????????????????????????????
Sequencing
2016 BGI research
Second generation sequencing for assembly
Construct libraries with hierarchical insert-sizes;
250bp, 500bp, 800bp, 2kb, 5kb, 10kb, 20kb, 40kb
Sequence the libraries; 60X genome coverage;
De novo assembly
Annotation and evolutionary analysis
2016 BGI research
Genome survey
1. 30X data
2. K-mer analysis
3. Preliminary assembly
4. Heterozygosity simulation analysis
5. GC depth distribution analysis
1.Genome size
2.Heterozygosity rate
3.GC content
4.Repeat sequence proportion
2016 BGI research
Information of potato genome
• Autotetraploid (2n=4x=48) • Highly heterozygous • Heterozygous diploid available • Double haplotype available • Different dataset available • Genome size: 850 Mb
2016 BGI research
Sample selection
DM1-3 516 R44 (DM) resulted from chromosome doubling of a monoploid (1n=1x=12) derived by anther culture of a heterozygous diploid (2n=2x=24) S. tuberosum group Phureja clone (PI 225669).
2016 BGI research
Heterozygosity affecting genome assembly
Heterozygosity would result in breakdown of the assembly.
Rei Kajitani, Kouta Toshimoto, Hideki Noguchi, et al.
2016 BGI research
Assess the genome
33,761,617,031 bases
Peak at 40
Genome size estimated to be: 844 Mb
S. tuberosum group Phureja DM1-3 516 R44
2016 BGI research
The potato genome assembly
a: Chromosome karyotype
b: Gene density
c: Repeats coverage
d: Transcription state
e: GC content
f: Subtelomeric repeats distribution
727 Mb, 6.1% Ns/gaps, 86% of the genome N90 349 kb, 443 super scaffolds
2016 BGI research
Comparing to Sanger sequenced BACs
97.1% of 181,558 available Sanger-sequenced S. tuberosum ESTs
2016 BGI research
Anchoring to the chromosomes
Anchored 623Mb (86%) to chromosomes
With 90.3% of the genes on chromosomes
2016 BGI research
Gene annotation
Protein sequences
cDNA/EST sequences
Rough alignment Alignment
Precise alignment
Homology-based genes
ab initio prediction
ab initio genes
cDNA/EST genes
Genomic sequence
Gene sets combination
Combined gene set
Genome mapping
RNA-seq reads
Post-filtering TE proteins
Syteny info.
Final gene set
Gene setsmodification
31.5 Gb of RNA-Seq data from 32 DM and 16 RH samples/tissues
90.2% of 824,621,408 DM reads and 88.6% of 140,375,647 RH reads mapped
39,031 protein-coding genes
9,875 genes (25.3%) had alternative splicing
12.1% derived solely from ab initio gene predictions
2016 BGI research
Genome evolution – gene families
Oryza sativa Brachypodium distachyon Sorghum bicolor Zea mays Arabidopsis thaliana Carcia papaya Populus trichocarpa Vitis vinifera Glycine max Chalamydomanas reinhardtii Physcomitrella patens
Monocots
Eudicots
Algae, moss 4,479 potato genes clustered in 3,181 families 34,051 potato genes clustered with at least one genome 2,642 genes are asterid-specific 3,372 gens are potato lineage-specific
2016 BGI research
Comparing RH and DM
• 1,644 RH BAC clones • 178Mb of non-redundant sequences (~10%) • 99Mb of RH sequence (55%) to the DM genome • The aligned regions with 97.5% identity • SNP every 40 bp and one indel (12.8 bp in average)
every 394 bp between RH and DM • 6.6Mb of sequence could be aligned with 96.5% identity
with in two haplotypes and SNP per 29 bp and 1 indel per 253 bp (average length 10.4 bp)
2016 BGI research
Comparing at the whole genome level
1,118 million NGS reads (84X) from RH
457.3 million reads aligned to 659.1 Mb (90.6%) of DM genome
Premature stop, frame shift, presence/absence variants
3.67 million SNPs
2016 BGI research
Inbreeding depression
• 3,018 premature stop codons (606 homozygous and 2,412 heterozygous, 1,760 of which are specific)
• 80 frameshift mutations (49 homozygous • and 31 heterozygous) • 275 PAV genes (246 RH specific and 29 were DM specific)
2016 BGI research
Inbreeding depression
• One instance of copy number variation • Five genes with premature stop codons • Seven RH-specific genes
2016 BGI research
Tuber biology
15,235
1,217
333
15,235 genes expressed in the transition from stolons to tubers
1,217 transcripts with >5-fold expression in stolons versus
five RH tuber tissues
333 transcripts upregulated during the transition from
stolons to tubers. Particularly, proteinase inhibitors, i.e. KTI
(Kunitz protease inhibitor)
2016 BGI research
Disease resistance
Many NBS-LRR genes are pseudogenes owing
to indels, frame shift mutations, or premature stop codons, including
R1, R3a et al., which might be driven by the
rapid evolution of effector genes in the
potato late blight pathogen, Phytophthora
infestans 39.4%