24
A Quality Control pipeline for ChIP-exo and ChIP-nexus data. Rene Welch & S¨ und¨ uz Keles ¸ Department of Statistics University of Wisconsin - Madison June 24, 2016 https://drive.google.com/open?id= 0B8I3K0KB8Sh1U3lvbGxhejh6T2M A Quality Control pipeline for ChIP-exo and ChIP-nexus data. UW-STAT

A Quality Control pipeline for ChIP-exo and ChIP-nexus data....Overview of the pipeline. 1.Partition the genome and generate ChIP-exo/nexus islands. Number of reads by position Bwd

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

A Quality Control pipeline forChIP-exo and ChIP-nexus data.

Rene Welch & Sunduz Keles

Department of StatisticsUniversity of Wisconsin - Madison

June 24, 2016

https://drive.google.com/open?id=0B8I3K0KB8Sh1U3lvbGxhejh6T2M

A Quality Control pipeline for ChIP-exo and ChIP-nexus data. UW-STAT

What is ChIP-exo and ChIP-nexus.

TF DNA

SE ChIP-seq

PE ChIP-seq

ChIP-exo

Exonuclease Enzyme

ChIP-nexusRandom Barcode

A Quality Control pipeline for ChIP-exo and ChIP-nexus data. UW-STAT

Overview of the pipeline.

1. Partition the genome and generate ChIP-exo/nexus islands.N

umbe

r of r

eads

by

posi

tion Bwd. strand coverage

Fwd. strand coverageCumulative coverage

A Quality Control pipeline for ChIP-exo and ChIP-nexus data. UW-STAT

Overview of the pipeline.

2. Calculate a vector of summary statistics for each island.N

umbe

r of r

eads

by

posi

tion

R1 R2 R3 RKT(R1) T(R2) T(R3) T(RK)

Bwd. strand coverageFwd. strand coverageCumulative coverage

A Quality Control pipeline for ChIP-exo and ChIP-nexus data. UW-STAT

Overview of the pipeline.3. Visualize all islands together and calculate quality scores.

0.00

0.25

0.50

0.75

1.00

0 1 2 3 4

Uni

que

read

cov

erag

e ra

te(U

RC

R)

100 102 104 106count

Average read coverage (ARC)

0.00

0.25

0.50

0.75

1.00

0 100 200 300min number of reads

0.1 0.25 0.5 0.75 0.9

Fwd s

trand r

ati

o (

FSR

)

0.00

0.25

0.50

0.75

1.00

0 10 20 30 40 50min number of reads

both fwd bwd

Pro

porti

on o

f isl

ands

A B

C

0.00

0.25

0.50

0.75

1.00

0 1 2 3 4

100 102 104 106count

0.00

0.25

0.50

0.75

1.00

0.1 0.25 0.5 0.75 0.9

0.00

0.25

0.50

0.75

1.00

0 10 20 30 40 50min number of reads

both fwd bwd

Pro

porti

on o

f isl

ands

A B

C

3A - ARC vs. URC - This plot presents a global view of the balance between library complexity and enrichment. There are two arms, one with low ARC, which corresponds to regions formed by few aligned positions , and the other where the URC decreases as the ARC increases.

3B - Min depth vs. FSR - This plot depicts how quickly the distribution of the FSR approximates the median. In a high quality dataset sample, the median is around 0.5, and the other quantiles reach that value quickly.

3C - Min depth vs. Proportion of Islands - This plot provides a more detailled view of the FSR. Islands with low depth then to have reads only from one strand. Hence, the plot compares the percentage of islands that contain at least one readof each strand vs. the regions that consist of only reads with only one strand.

Uni

que

Rea

d C

oeffi

cien

t (U

RC

)

Average Read Coefficient (ARC)

A Quality Control pipeline for ChIP-exo and ChIP-nexus data. UW-STAT

ARC vs. URCR exploration

0.00

0.25

0.50

0.75

1.00

0 1 2 3 4Average read coverage (ARC)

Uni

que

read

cov

erag

e ra

te (

UR

CR

)

100 102 104 106count

QC and analysis of ChIP-exo UW-STAT

ARC vs. URCR exploration

0.00

0.25

0.50

0.75

1.00

0 1 2 3 4Average read coverage (ARC)

Uni

que

read

cov

erag

e ra

te (

UR

CR

)

100 102 104 106count

−6

−3

0

3

6

44630800 44631000 44631200Genomic position

ChI

P r

ead

coun

ts

strand F R

QC and analysis of ChIP-exo UW-STAT

ARC vs. URCR exploration

0.00

0.25

0.50

0.75

1.00

0 1 2 3 4Average read coverage (ARC)

Uni

que

read

cov

erag

e ra

te (

UR

CR

)

100 102 104 106count

QC and analysis of ChIP-exo UW-STAT

ARC vs. URCR exploration

0.00

0.25

0.50

0.75

1.00

0 1 2 3 4Average read coverage (ARC)

Uni

que

read

cov

erag

e ra

te (

UR

CR

)

100 102 104 106count

−10

−5

0

5

10

12140800 12141000 12141200 12141400Genomic position

ChI

P r

ead

coun

ts

strand F R

QC and analysis of ChIP-exo UW-STAT

ARC vs. URCR exploration

0.00

0.25

0.50

0.75

1.00

0 1 2 3 4Average read coverage (ARC)

Uni

que

read

cov

erag

e ra

te (

UR

CR

)

100 102 104 106count

QC and analysis of ChIP-exo UW-STAT

ARC vs. URCR exploration

0.00

0.25

0.50

0.75

1.00

0 1 2 3 4Average read coverage (ARC)

Uni

que

read

cov

erag

e ra

te (

UR

CR

)

100 102 104 106count

−20

−10

0

10

20

115248900 115249000 115249100 115249200Genomic position

ChI

P r

ead

coun

ts

strand F R

QC and analysis of ChIP-exo UW-STAT

ARC vs. URCR exploration

0.00

0.25

0.50

0.75

1.00

0 1 2 3 4Average read coverage (ARC)

Uni

que

read

cov

erag

e ra

te (

UR

CR

)

100 102 104 106count

QC and analysis of ChIP-exo UW-STAT

ARC vs. URCR exploration

0.00

0.25

0.50

0.75

1.00

0 1 2 3 4Average read coverage (ARC)

Uni

que

read

cov

erag

e ra

te (

UR

CR

)

100 102 104 106count

−20

0

20

98819000 98819100 98819200 98819300Genomic position

ChI

P r

ead

coun

ts

strand F R

QC and analysis of ChIP-exo UW-STAT

ARC vs. URCR exploration

0.00

0.25

0.50

0.75

1.00

0 1 2 3 4Average read coverage (ARC)

Uni

que

read

cov

erag

e ra

te (

UR

CR

)

100 102 104 106count

QC and analysis of ChIP-exo UW-STAT

ARC vs. URCR exploration

0.00

0.25

0.50

0.75

1.00

0 1 2 3 4Average read coverage (ARC)

Uni

que

read

cov

erag

e ra

te (

UR

CR

)

100 102 104 106count

−200

−100

0

100

200

68688300 68688400 68688500 68688600 68688700Genomic position

ChI

P r

ead

coun

ts

strand F R

QC and analysis of ChIP-exo UW-STAT

ARC vs. URCR exploration

0.00

0.25

0.50

0.75

1.00

0 1 2 3 4Average read coverage (ARC)

Uni

que

read

cov

erag

e ra

te (

UR

CR

)

100 102 104 106count

QC and analysis of ChIP-exo UW-STAT

ARC vs. URCR exploration

0.00

0.25

0.50

0.75

1.00

0 1 2 3 4Average read coverage (ARC)

Uni

que

read

cov

erag

e ra

te (

UR

CR

)

100 102 104 106count

−30

−20

−10

0

10

20

30

3036520 3036540 3036560 3036580Genomic position

ChI

P r

ead

coun

ts

strand F R

QC and analysis of ChIP-exo UW-STAT

ARC vs. URCR exploration

0.00

0.25

0.50

0.75

1.00

0 1 2 3 4Average read coverage (ARC)

Uni

que

read

cov

erag

e ra

te (

UR

CR

)

100 102 104 106count

QC and analysis of ChIP-exo UW-STAT

ARC vs. URCR exploration

0.00

0.25

0.50

0.75

1.00

0 1 2 3 4Average read coverage (ARC)

Uni

que

read

cov

erag

e ra

te (

UR

CR

)

100 102 104 106count

−20

0

20

3026650 3026700 3026750 3026800Genomic position

ChI

P r

ead

coun

ts

strand F R

QC and analysis of ChIP-exo UW-STAT

Case of Study: FoxA1 on mouse liver.Enrichment and library complexity

Rep−1 Rep−2 Rep−3

0.00

0.25

0.50

0.75

1.00

0 1 2 3 4 0 1 2 3 4 0 1 2 3 4Average Read Coefficient (ARC)

Uni

que

Rea

d C

oeffi

cien

t (U

RC

)

100 102 104 106value

I Rep-1 shows the highest quality.I Rep-2 is not sufficiently enriched.I Rep-3 exhibits low library complexity.

A Quality Control pipeline for ChIP-exo and ChIP-nexus data. UW-STAT

Comparison of samples with FIMO analysis.

We searched for the FoxA1 motif using FIMO.

50 100 250 500 1000 2000

11

12

13

14

15

16

Rep

−1

Rep

−2

Rep

−3

Rep

−1

Rep

−2

Rep

−3

Rep

−1

Rep

−2

Rep

−3

Rep

−1

Rep

−2

Rep

−3

Rep

−1

Rep

−2

Rep

−3

Rep

−1

Rep

−2

Rep

−3

Replicate

FIM

O s

core

Comparison of samples with FIMO analysis.

We searched for the FoxA1 motif using FIMO.

Rep−1

1 2 3 4 5 6 7 8 9 10 11position

A C G T

Rep−2

1 2 3 4 5 6 7 8 9 10 11position

A C G T

Rep−3

1 2 3 4 5 6 7 8 9 10 11position

A C G T

Comparison of samples with FIMO analysis.

We searched for the FoxA1 motif using FIMO.

0

5

10

15

−20 −10 0 10 20Position around motif start

Ave

rage

cou

nts

strand − +

rep Rep−1 Rep−2 Rep−3

Thank you very much!