21
Differential gene expression in Polistes dominula Daniel S. Standage, Brendel Group Meeting, 21 Nov 2013

Brendel Group Presentation: 21 Nov 2013

Embed Size (px)

Citation preview

Differential gene expression in Polistes dominulaDaniel S. Standage, Brendel Group Meeting, 21 Nov 2013

Context

Basic differential expression analysis

Isoform-level analysis unreliable

Refocused on locus-level DE analysis

Interval loci (iLoci)

Partition genome into segments that contain

0 protein-coding genes

1 protein-coding gene

2 or more overlapping protein-coding genes

P. dominula genome contains 18,675 iLoci

8,531 with 0 genes

9,197 with 1 gene

947 with 2-5 genes

Out-of-the-box analysis

RSEM: estimate expression levels for each sample independently (uses Bowtie to align reads)

Combine expression data into a single matrix

EBSeq: normalize expression levels and identify differentially expressed genes

Results and observations

295 differentially expressed iLoci

Grouping of samples is troubling

Similar concerns as with previous analysis

Some iLoci with very many reads mapped

Some iLoci with very few reads mapped

Concerns about normalizing over such a large dynamic range

Results and observations

294 differentially expressed iLoci

Grouping of samples is troubling

Similar concerns as with previous analysis

Some iLoci with very many reads mapped

Some iLoci with very few reads mapped

Concerns about normalizing over such a large dynamic range

iLocus filtering

Filtered the iLoci based on

Number of reads mapped

Number of samples with reads mapped

Distribution of mapped reads across samples

10,043 / 18,675 iLoci (54%) passed filtering criteria

Re-ran RSEM/EBSeq procedure from scratch

New results

123 differentially expressed iLoci

1 sample (queen 4) still inconsistently grouped

Analysis sansQ4

Removed the Q4 sample and re-ran EBSeq step

Verified normalization is working as we expected

Found very clean result

Analysis sansQ4

Identified 314 differentially expressed iLoci

219 (70%) over-expressed in workers

95 contain 0 genes

197 contain 1 gene

22 contain 2 or more genes

Biological interpretation

Manual analysis of DE iLoci

xGDBvm

yrGATE

Two protein families occurred very frequently

Cytochrome P450s

NADH dehydrogenases

5 questions

How many CYP genes are in the wasp genome?

What percentage of these CYP genes are DE?

Do CYPs and NADH dehydrogenases belong to the same pathways?

Can the CYP genes in the genome be categorized?

Can reads discarded during genome assembly provide insight into mitochondrial contamination?

CYPs in Polistes dominula

Identified with a basic BLASTP search

Query: translations of Maker annotations

Database: Hymenopteran CYPs from NCBI

154 iLoci potentially contain CYP genes

Not all matched queries represent CYPs

Stricter criteria required for high-confidence count

Differentially expressed CYP genes

Took intersection of 2 lists

mRNAs from DE iLoci

mRNAs potentially encoding CYPs

Identified 12 putative DE CYPs

11 verified manually

9 / 11 over-expressed in queens

DE NADH dehydrogenase genes

BLASTP search found 38 potential NADHdh genes

12-15 DE NADHdh genes

16 putative DE NADHdh genes

1 thrown out by manual examination

3 borderline

14 / 15 are over-expressed in workers