Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
A Quality Control pipeline forChIP-exo and ChIP-nexus data.
Rene Welch & Sunduz Keles
Department of StatisticsUniversity of Wisconsin - Madison
June 24, 2016
https://drive.google.com/open?id=0B8I3K0KB8Sh1U3lvbGxhejh6T2M
A Quality Control pipeline for ChIP-exo and ChIP-nexus data. UW-STAT
What is ChIP-exo and ChIP-nexus.
TF DNA
SE ChIP-seq
PE ChIP-seq
ChIP-exo
Exonuclease Enzyme
ChIP-nexusRandom Barcode
A Quality Control pipeline for ChIP-exo and ChIP-nexus data. UW-STAT
Overview of the pipeline.
1. Partition the genome and generate ChIP-exo/nexus islands.N
umbe
r of r
eads
by
posi
tion Bwd. strand coverage
Fwd. strand coverageCumulative coverage
A Quality Control pipeline for ChIP-exo and ChIP-nexus data. UW-STAT
Overview of the pipeline.
2. Calculate a vector of summary statistics for each island.N
umbe
r of r
eads
by
posi
tion
R1 R2 R3 RKT(R1) T(R2) T(R3) T(RK)
Bwd. strand coverageFwd. strand coverageCumulative coverage
A Quality Control pipeline for ChIP-exo and ChIP-nexus data. UW-STAT
Overview of the pipeline.3. Visualize all islands together and calculate quality scores.
0.00
0.25
0.50
0.75
1.00
0 1 2 3 4
Uni
que
read
cov
erag
e ra
te(U
RC
R)
100 102 104 106count
Average read coverage (ARC)
0.00
0.25
0.50
0.75
1.00
0 100 200 300min number of reads
0.1 0.25 0.5 0.75 0.9
Fwd s
trand r
ati
o (
FSR
)
0.00
0.25
0.50
0.75
1.00
0 10 20 30 40 50min number of reads
both fwd bwd
Pro
porti
on o
f isl
ands
A B
C
0.00
0.25
0.50
0.75
1.00
0 1 2 3 4
100 102 104 106count
0.00
0.25
0.50
0.75
1.00
0.1 0.25 0.5 0.75 0.9
0.00
0.25
0.50
0.75
1.00
0 10 20 30 40 50min number of reads
both fwd bwd
Pro
porti
on o
f isl
ands
A B
C
3A - ARC vs. URC - This plot presents a global view of the balance between library complexity and enrichment. There are two arms, one with low ARC, which corresponds to regions formed by few aligned positions , and the other where the URC decreases as the ARC increases.
3B - Min depth vs. FSR - This plot depicts how quickly the distribution of the FSR approximates the median. In a high quality dataset sample, the median is around 0.5, and the other quantiles reach that value quickly.
3C - Min depth vs. Proportion of Islands - This plot provides a more detailled view of the FSR. Islands with low depth then to have reads only from one strand. Hence, the plot compares the percentage of islands that contain at least one readof each strand vs. the regions that consist of only reads with only one strand.
Uni
que
Rea
d C
oeffi
cien
t (U
RC
)
Average Read Coefficient (ARC)
A Quality Control pipeline for ChIP-exo and ChIP-nexus data. UW-STAT
ARC vs. URCR exploration
●
0.00
0.25
0.50
0.75
1.00
0 1 2 3 4Average read coverage (ARC)
Uni
que
read
cov
erag
e ra
te (
UR
CR
)
100 102 104 106count
QC and analysis of ChIP-exo UW-STAT
ARC vs. URCR exploration
●
0.00
0.25
0.50
0.75
1.00
0 1 2 3 4Average read coverage (ARC)
Uni
que
read
cov
erag
e ra
te (
UR
CR
)
100 102 104 106count
−6
−3
0
3
6
44630800 44631000 44631200Genomic position
ChI
P r
ead
coun
ts
strand F R
QC and analysis of ChIP-exo UW-STAT
ARC vs. URCR exploration
●
0.00
0.25
0.50
0.75
1.00
0 1 2 3 4Average read coverage (ARC)
Uni
que
read
cov
erag
e ra
te (
UR
CR
)
100 102 104 106count
QC and analysis of ChIP-exo UW-STAT
ARC vs. URCR exploration
●
0.00
0.25
0.50
0.75
1.00
0 1 2 3 4Average read coverage (ARC)
Uni
que
read
cov
erag
e ra
te (
UR
CR
)
100 102 104 106count
−10
−5
0
5
10
12140800 12141000 12141200 12141400Genomic position
ChI
P r
ead
coun
ts
strand F R
QC and analysis of ChIP-exo UW-STAT
ARC vs. URCR exploration
●
0.00
0.25
0.50
0.75
1.00
0 1 2 3 4Average read coverage (ARC)
Uni
que
read
cov
erag
e ra
te (
UR
CR
)
100 102 104 106count
QC and analysis of ChIP-exo UW-STAT
ARC vs. URCR exploration
●
0.00
0.25
0.50
0.75
1.00
0 1 2 3 4Average read coverage (ARC)
Uni
que
read
cov
erag
e ra
te (
UR
CR
)
100 102 104 106count
−20
−10
0
10
20
115248900 115249000 115249100 115249200Genomic position
ChI
P r
ead
coun
ts
strand F R
QC and analysis of ChIP-exo UW-STAT
ARC vs. URCR exploration
●
0.00
0.25
0.50
0.75
1.00
0 1 2 3 4Average read coverage (ARC)
Uni
que
read
cov
erag
e ra
te (
UR
CR
)
100 102 104 106count
QC and analysis of ChIP-exo UW-STAT
ARC vs. URCR exploration
●
0.00
0.25
0.50
0.75
1.00
0 1 2 3 4Average read coverage (ARC)
Uni
que
read
cov
erag
e ra
te (
UR
CR
)
100 102 104 106count
−20
0
20
98819000 98819100 98819200 98819300Genomic position
ChI
P r
ead
coun
ts
strand F R
QC and analysis of ChIP-exo UW-STAT
ARC vs. URCR exploration
●
0.00
0.25
0.50
0.75
1.00
0 1 2 3 4Average read coverage (ARC)
Uni
que
read
cov
erag
e ra
te (
UR
CR
)
100 102 104 106count
QC and analysis of ChIP-exo UW-STAT
ARC vs. URCR exploration
●
0.00
0.25
0.50
0.75
1.00
0 1 2 3 4Average read coverage (ARC)
Uni
que
read
cov
erag
e ra
te (
UR
CR
)
100 102 104 106count
−200
−100
0
100
200
68688300 68688400 68688500 68688600 68688700Genomic position
ChI
P r
ead
coun
ts
strand F R
QC and analysis of ChIP-exo UW-STAT
ARC vs. URCR exploration
●
0.00
0.25
0.50
0.75
1.00
0 1 2 3 4Average read coverage (ARC)
Uni
que
read
cov
erag
e ra
te (
UR
CR
)
100 102 104 106count
QC and analysis of ChIP-exo UW-STAT
ARC vs. URCR exploration
●
0.00
0.25
0.50
0.75
1.00
0 1 2 3 4Average read coverage (ARC)
Uni
que
read
cov
erag
e ra
te (
UR
CR
)
100 102 104 106count
−30
−20
−10
0
10
20
30
3036520 3036540 3036560 3036580Genomic position
ChI
P r
ead
coun
ts
strand F R
QC and analysis of ChIP-exo UW-STAT
ARC vs. URCR exploration
●
0.00
0.25
0.50
0.75
1.00
0 1 2 3 4Average read coverage (ARC)
Uni
que
read
cov
erag
e ra
te (
UR
CR
)
100 102 104 106count
QC and analysis of ChIP-exo UW-STAT
ARC vs. URCR exploration
●
0.00
0.25
0.50
0.75
1.00
0 1 2 3 4Average read coverage (ARC)
Uni
que
read
cov
erag
e ra
te (
UR
CR
)
100 102 104 106count
−20
0
20
3026650 3026700 3026750 3026800Genomic position
ChI
P r
ead
coun
ts
strand F R
QC and analysis of ChIP-exo UW-STAT
Case of Study: FoxA1 on mouse liver.Enrichment and library complexity
Rep−1 Rep−2 Rep−3
0.00
0.25
0.50
0.75
1.00
0 1 2 3 4 0 1 2 3 4 0 1 2 3 4Average Read Coefficient (ARC)
Uni
que
Rea
d C
oeffi
cien
t (U
RC
)
100 102 104 106value
I Rep-1 shows the highest quality.I Rep-2 is not sufficiently enriched.I Rep-3 exhibits low library complexity.
A Quality Control pipeline for ChIP-exo and ChIP-nexus data. UW-STAT
Comparison of samples with FIMO analysis.
We searched for the FoxA1 motif using FIMO.
50 100 250 500 1000 2000
11
12
13
14
15
16
Rep
−1
Rep
−2
Rep
−3
Rep
−1
Rep
−2
Rep
−3
Rep
−1
Rep
−2
Rep
−3
Rep
−1
Rep
−2
Rep
−3
Rep
−1
Rep
−2
Rep
−3
Rep
−1
Rep
−2
Rep
−3
Replicate
FIM
O s
core
Comparison of samples with FIMO analysis.
We searched for the FoxA1 motif using FIMO.
Rep−1
1 2 3 4 5 6 7 8 9 10 11position
A C G T
Rep−2
1 2 3 4 5 6 7 8 9 10 11position
A C G T
Rep−3
1 2 3 4 5 6 7 8 9 10 11position
A C G T
Comparison of samples with FIMO analysis.
We searched for the FoxA1 motif using FIMO.
0
5
10
15
−20 −10 0 10 20Position around motif start
Ave
rage
cou
nts
strand − +
rep Rep−1 Rep−2 Rep−3