24
C. Titus Brown Associate Professor School of Veterinary Medicine UC Davis Jan 2015 Adventures in improving the chicken genome & transcriptome

2015 pag-chicken

Embed Size (px)

Citation preview

C. Titus Brown

Associate Professor

School of Veterinary Medicine

UC Davis

Jan 2015

Adventures in improving the chicken genome &

transcriptome

Current state of chicken genome● galGal2 (2004)

o Sanger sequencing (6.6X)

o Physical and genetic linkage maps

● galGal3 (2006)

o 198K additional reads

Contigs end

Regions of poor quality

o SNP mapping

o chrZ and chrW

● galGal4 (2011)

o 454 (12X)

o - 10Mb artifactual duplications

o +15Mb mapped to chromosomes

o increases in N50 contig size

2. Microchromosomes...

● 10 macrochromosomes

● 28 microchromosomeso GC rich

o high recombination rate

o high gene density

o low intron size

● not sequencing friendly!

Moleculo vs PacBio

Moleculo

● Cheapero High throughput

● Low error rate o ~0%

● Same problems as Illumina…

PacBio

● No 3' bias

● No PCR

● High error rateo ~15%

● Lower throughput

● "$$-plated genome"

Moleculo library preparationKuleshov et al (2014), Nature Biotechnology 32, 261–266

Exploring Moleculo

● 1,578,022 reads

● Covers 88% of galGal4

● 326 reads unmapped to galGal4 (0.02%)o Searched 5 random in ENA (exonerate)

o 3 matched Sediminibacterium sp...

Luiz Irber

Long reads, indeed!

Luiz Irber

Moleculo: fraction of reference

covered

Luiz Irber

But Moleculo does not contain

missing genes… ;(

Search for de novo-assembled UniProt orthologs

from chicken in (a) galGal4 genome, and (b)

Moleculo data.

Luiz Irber

The missing exons are not in

Moleculo data. Might be in

PacBio.

So, now working with PacBio.

● Dealing with PacBio datao Most tools break horribly

(It's getting better)

● Assembling PacBio datao High error rate (~15%)

o Most assemblers target short reads

o PacBio recommended assemblers interact poorly

with MSU HPCC

Would like to produce a step-by-step protocol to

do genome improvement or assembly with

PacBio… Luiz Irber

2) Evaluating effects of gene models

on pathway prediction

Likit Preeyanon

Vertically integrated comparison.

GIMME: Software for Merging Gene Models

Assembly-

based

Local

Assembly

GIMME

Reference

-guided

Merged

Models

In-house software

ENSEMBL

Cufflinks can incorporate

ENSEMBL

Exon Graph approach (“Gimme”)

intron1 intron2exon1

exon2 exons2

exon3

exon1 exon2 exon3

Exon3.bExon3.a

Likit Preeyanonhttps://github.com/ged-lab/gimme.git

Ensembl Enriched KEGG Pathway

Term Count Benjamin

Cytokine-cytokine receptor interaction 36 6.2E-02

Lysosome 25 1.2E-01

Apoptosis 19 3.5E-01

Arginine and proline metabolism 12 3.1E-01

Starch and sucrose metabolism 9 3.4E-01

Toll-like receptor signaling pathway 19 3.7E-01

Natural killer cell mediated cytotoxicity 17 3.4E-01

Cytosolic DNA-sensing pathway 9 4.2E-01

Valine, leucine and isoleucine degradation 11 4.1E-01

Glutathione metabolism 10 4.3E-01

NOD-line receptor signaling pathway 11 4.6E-01

Intestinal immune network for IgA production 9 5.6E-01

VEGF signaling pathway 14 5.6E-01

PPAR signaling pathway 13 6E-01

Gimme Enriched KEGG Pathway

Term Count Benjamin

Cytokine-cytokine receptor interaction 34 3.7E-02

Toll-like receptor signaling pathway 22 2.7E-02

Jak-STAT signaling pathway 28 3.4E-02

Arginine and proline metabolism 13 4.5E-02

Lysosome 22 1.3E-01

Natural killer cell mediated cytotoxicity 17 1.6E-01

Alanine, aspartate and glutamate metabolism 9 1.8E-01

Amino sugar and nucleotide sugar metabolism 10 3.6E-01

Cysteine and methionine metabolism 9 4E-01

ECM-receptor interaction 16 3.7E-01

Apoptosis 16 3.7E-01

Glycosis / Gluconeogenesis 11 4E-01

DNA replication 8 3.8E-01

Cell adhesion molecules (CAMs) 19 4.6E-01

PPAR signaling pathway 12 6E-01

Intestinal immune network for IgA production 8 6.1E-01

Compared Enriched KEGG PathwayTerm

Cytokine-cytokine receptor interaction

Toll-like receptor signaling pathway

Lysosome

Apoptosis

Arginine and proline metabolism

Natural killer cells

Intestinal immune network for IgA production

PPAR signaling pathway

Starch and sucrose

Valine, leucine and isoleucine degradation

Glutathione metabolism

NOD-like receptor signaling pathway

VEGF signaling pathway

Jak-STAT signaling pathway

Alanine, aspartate and glutamate metabolism

Amino sugar and nucleotide sugar metabolism

ECM-receptor interaction

Cell adhesion molecules (CAMs)

DNA replication

Glycosis / Gluconeogenesis

Common

Ensembl

Gimme

Ensembl Common Gimme

INFB – we annotate UTR not

present in other gene models.

INFB – 3’ bias + missing UTR =>

insensitive

Ensembl Common Gimme

Predicted Enriched Pathways

GOseq FDR 0.05

20 pathways

17 pathways

GOseq FDR 0.05

Chicken + Human

KEGG Pathway

40 pathways

RNAseq: your models matterOur methods for generating hypotheses from mRNAseqdata are sensitive to references & technical details of the approaches.

(This is expected but Bad.)

More RNAseq data coming every day.

…but we are not regularly updating gene models…

… and the genome that we have is Not Great.

Follow on Smith & Burt (2014) to continually regenerate gene models for differential expression use.

A general model for vet/ag animals?

Thanks!

Please contact me at [email protected]!