Principles and Methods in Regulatory Genomicsbioinfo.ipmb.uni-heidelberg.de/crg/bioinfo1/_downloads/a... · 2018-01-11 · Principles and Methods in Regulatory Genomics Bioinfo1 —

CarlHerrmann UniversitätHeidelberg&DKFZ WS2017/2018

Principles and Methods in Regulatory Genomics

Bioinfo1 — MoBi Bachelor 5. FS

WS 2017/2018


What is it all about …

• What are the molecular actors of transcriptional regulation ?๏ Role of transcription factors ?๏ Role of epigenetic modifications ?๏ Role of 3D chromatin structure ?

• What kind of experimental data can we use to understand transcriptional regulation ?

• How can we use bioinformatics tools to make predictions ?


Bigger genome = more evolved ?

EukaryotesArchaeBacteria

[interactive Tree of Life]


Bigger genome = more evolved ?

Genome size

EukaryotesArchaeBacteria

[interactive Tree of Life]


More genes = more complex ?

Size of genome (Mb)

Num

ber o

f gen

esGlycine- 1.1 Gb- 46000 genes

BacteriumPlantAnimal



Size of genome (Mb)

Num

ber o

f gen



Human- 3.3 Gb- 23000 genes

[https://gf.neocities.org/gs/genes.html]

https://gf.neocities.org/gs/genes.html



Size of genome (Mb)

Num

ber o

f gen



Human- 3.3 Gb- 23000 genes

C. elegans- 100 Mb- 20000 genes

[https://gf.neocities.org/gs/genes.html]

https://gf.neocities.org/gs/genes.html


Bigger non-coding genome = higher complexity





Proportion of non-coding DNA correlates withorganismal complexity


Junk DNA ?


Exploring the genome's activity

• Large scale consortia (ENCODE, Roadmap, …) have systematically explored the activity of the genome using experimental assays

https://www.encodeproject.org/




"The vast majority (80.4%) of the human genome participates in at least one biochemical RNA- and/or chromatin-associated event in at least one cell type. 99% is within 1.7 kb of at least one of the biochemical events measured by ENCODE."





"The vast majority (80.4%) of the human genome participates in at least one biochemical RNA- and/or chromatin-associated event in at least one cell type. 99% is within 1.7 kb of at least one of the biochemical events measured by ENCODE."



Transcriptional regulation in metazoans

[Elodie Darbo]



1. Binding of site specifictranscription factors

[Elodie Darbo]




2. chromatin structure,epigenetic

[Elodie Darbo]





3. three-dimensionalDNA looping

[Elodie Darbo]



4. Readout:gene expression



3. three-dimensionalDNA looping

[Elodie Darbo]


1. data types


Experimental methods

• Genetic component:

• Chromatin structure and epigenetic

• Three-dimensional DNA looping

• Readout: gene expression


Experimental methods

• Genetic component:→ ChIP-seq: transcription factor binding sites

• Chromatin structure and epigenetic→ ChIP-seq : post-translational histone modifications→ whole genome bisulfite sequencing, arrays : DNA methylation→ ATAC-seq, DNAse-seq, FAIRE-seq : open chromatin region

• Three-dimensional DNA looping→ 3C/4C/Hi-C : interacting chromatin regions

• Readout: gene expression→ RNA-seq : expression of transcribed elements


Chromatin Immunoprecipitations

• Chromatin immunoprecipitation (ChIP) yields DNA fragments, that are ๏ bound by the protein of interest๏ marked by a specific chemical

modification (acetylation, methylation,.)

• Identification of the fragments :๏ sequencing (ChIP-seq)

→ genome-wide๏ PCR/qPCR

→ targeted experiment

• Important aspect๏ Quality/Specificity of the antibody ?๏ DNA fragment (~200-300bp)

→ binding site (~10 bp) ?

[Park, Nat.Rev. 2009]


ChIP-sequencing

[Wilbanks & Faccioti PLoS One (2010)]]

Sharp signal Broad signal


ChIP-seq for histone modifications

• histones are subject to post-translational modifications at their N-terminal tail๏ Lysine methylation๏ Lysine/arginine acetylation๏ Serine phosphorylation๏ ubiquitylation

• they modify the physical properties of the DNA-nucleosome interactions

nomenclature: H3K27ac = acetylation of lysine 27 on histone 3


Histone modifications

histone modifications are a good proxy of gene expression and presence of regulatory elements

active marks→ open chromatinH3K4me1; H3K4me3;H3K27ac

repressive marks→ closed chromatinH3K27me3; H3K9me3


Histone modifications

• Histone marks have distinct signal profiles๏ sharp signal : narrow peaks of enrichment at specific loci (H3K4me3 =

promoters, H3K27ac = enhancers,…)๏ broad signal : wide regions of enrichment (H3K36me3 = transcribed

genes; H3K27me3 = repressed regions)

H3K4me1

H3K4me3

H3K9ac

H3K9me3

H3K27ac

H3K27me3

H3K36me3


Example of ChIP-seq signal for transcription factors / DNA-binding proteins

MYCN

H3K4me3

DNMT1

H3K27ac

distal MYCN peak

(away from gene promoters)

MYCN peakand H3K4me3

enrichment at promoter


Measuring DNA methylation

• DNA methylation occurs mainly on cytosines in CpG dinucleotides in the human genome

• DNA methylation is revealed byusing bisulfite conversion:๏ unmethylated cytosines are converted

C → U → T๏ methylated cytosines are protected

mC → mC

• unmethylated CpG are identified by the presence of a mismatch TpG

• 2 approaches: ๏ array based๏ sequencing



• Array based methods

• CpG containing probes on array๏ 27K probes๏ 450K probes ๏ 800K (EPIC)

• all probes contain a methylated (C) and unmethylated (T) version

• Cheap but sparse

• Whole genome bisulfite sequencing

• unmethylated C → T

• methylated C → C

• Shearing, conversion and sequencing (Illumina X-10)

• Information about the 28 million CpGs



• Array based methods

• CpG containing probes on array๏ 27K probes๏ 450K probes ๏ 800K (EPIC)

• all probes contain a methylated (C) and unmethylated (T) version

• Cheap but sparse

• Whole genome bisulfite sequencing

• unmethylated C → T

• methylated C → C

• Shearing, conversion and sequencing (Illumina X-10)

• Information about the 28 million CpGs


Example DNA methylation

• Whole genome bisulfite sequencing provide information about all CpGs in the genome

• Vertical bars = CpG positions; red = high methylation (100%); blue = no methylation (~10%)

sam

ples


Example DNA methylation

• Genome appears highly methylated at a global scale except at gene promoters or CpG islands (blue stripes)

• Some samples have differential methylation (grey box)

• Impact on gene regulation:→ DNAme is a repressivemark which inhibits bindingof transcription factors andexpression of genes

• in cancer:๏ hypomethylation : aberrant

expression of oncogenes๏ hypermethylation : aberant

repression of tumor suppressors


Mapping chromatin interactions

• DNA looping allows interactions between distal DNA loci

• Identification of interacting regions through “chromatin conformation capture” methods (3C / 4C / Hi-C)

[Liebermann-Aiden, 2009]


Hi-C and topological domains

The signal at this pointindicates the strength ofthe signal between these 2 loci

Topologicalassociateddomains

[Dixon (2012,2015)][http://promoter.bx.psu.edu/hi-c/view.php]

http://promoter.bx.psu.edu/hi-c/view.php


Closer look : 4C

• 4C allows to interrogate a specific locus (“viewpoint”) for all its interaction partners

• used to identify all promoter-enhancer interactions for a given gene

Viewpoint sequenceInteracting sequence

3C = one-vs-one 4C = one-vs-all


Closer look: 4C

View point(here : enhancerregion)

Interacting regions(here: promoters)

[Dreidax, Gartlgruber (DKFZ)]


Chromatin interactions shape the cell identity

[Spurell et al., Cell 2016]


A complex picture

• Long range interactions show numerous enhancer-promoter interactions but also many promoter-promoter interactions

• Hypothesis : Promoters can act in trans and be used as enhancers for distal genes

"Intriguingly, the multigene complexes illustrated in this study are, in principle, akin to the bacterial operon as a mechanism for coordinated transcriptional regulation of related genes, suggesting the possibility of a chromatin-based operon mechanism (chro-operon or chroperon) for spatiotemporal regulation of gene transcription in eukaryotic nuclei."


2. Binding of transcription factors

• protein-DNA interactions

• experimental approaches to determine binding sites (BS)

• representing binding specificity of TFs

• databases

• comparing binding profiles


DNA binding domains

• Transcription factors contain a DNA binding domain (DBD) and a transcriptional activator (TA)

• Homologous TFs share similar DBDs (here: forkhead)

[Luscombe 2010]


Protein DNA interactions

• majority of protein-DNA interactions for TF occur through a alpha-helix fitting into the major groove (=DNA binding domain)

• hydrogen bonds with specific bases

• stabilization of the protein-DNA complex is ensured by additional structures (helix, beta-sheet) via van der Walls interactions

[Luscombe et al., NAR (2001)][Cheng et al., JMB (2003)]


Protein DNA interactions

• majority of protein-DNA interactions for TF occur through a alpha-helix fitting into the major groove (=DNA binding domain)

• hydrogen bonds with specific bases

• stabilization of the protein-DNA complex is ensured by additional structures (helix, beta-sheet) via van der Walls interactions

[Luscombe et al., NAR (2001)][Cheng et al., JMB (2003)]


Structural family: Zinc coordinating

cysteine

histidine

Egr1/Zif268

Finger 1

Finger 2

Finger 3

Cys2His2 Fold (“Zinc finger”)→ one of the most common familyof transcription factors in mammalians



cysteine

histidine

Egr1/Zif268

Finger 1

Finger 2

Finger 3

Cys2His2 Fold (“Zinc finger”)→ one of the most common familyof transcription factors in mammalians

Finger 1 Finger 2 Finger 3



- 6 cysteines- 2 Zinc atoms= Cys6Zn2zinc-fingers

Gal4 S. cerevisae → binds as homodimer

[Campbell (2008)]



- 6 cysteines- 2 Zinc atoms= Cys6Zn2zinc-fingers

Gal4 S. cerevisae → binds as homodimer

TF binding siteis a dyad (short repeatedmotif with spacer)[Campbell (2008)]


Structural families of TFs

[Weirauch, Hughes (2010)]



[Weirauch, Hughes (2010)]



• in human, 3 classes account for > 80% of known transcription factors๏ Zinc-finger C2H2

(→ Zinc finger nucleases)๏ homeodomain

(e.g. Hox TFs)๏ helix-loop-helix factors

(e.g. c-Myc, N-Myc,...)

[Vaquerizas et al. Nat.Rev.Gen (2009)]


From protein to DNA motif

• Transcription factors belonging to the same structural family have similar binding profiles

• Highly conserved accross species

• conserved core motif

• variable flanking regions→ specificity

[Wei et al. EMBO Journal 2010]


From protein to DNA motif

• Transcription factors belonging to the same structural family have similar binding profiles

• Highly conserved accross species

• conserved core motif

• variable flanking regions→ specificity

[Wei et al. EMBO Journal 2010]


2. Binding of transcription factors

• protein-DNA interactions

• experimental approaches to determine binding sites (BS)

• representing binding specificity of TFs

• databases

• comparing binding profiles



Experimental identification of binding sites

• DNAse1 footprint:protein-bound DNA is protected from digestion by the DNAse1 enzyme

• gel electrophoresis allows identification of the digested / undigested fragments

• if whole sequence is known, binding sites can be identified

• in vitro / low-throuput

[ A.Stenlund The EMBO Journal (2003) ]



Identification of open chromatin regions → potential binding sites for proteins



• ATAC-seq: using Tn5 transposase prepared with sequencing primers

• requires a small number of input material (~10,000 cells)

• identification of open chromatin regions (peaks)

[Greenleaf (2013)]



• From open regions to transcription factor binding sites→ footprinting

• Zooming into the peaks (open regions) : valleys of undigested / un-transposed DNA→ TF binding sites (TFBS)

• binding sequence can be identified with base-pair resolution

[Greenleaf (2013)]






[Greenleaf (2013)]






[Greenleaf (2013)]



• SELEX (= systematic evolution of ligands by exponential enrichment)

๏ random library of k-mers generated๏ incubated with TF of interest๏ elution to separate bound from unbound

targets๏ PCR (=enrichment)๏ next cycle

• in-vitro / medium throughput

• increasing number of cycles leads to over-select high affinity binding sites

[ A.Stenlund The EMBO Journal (2003) ]


DNA transcription factor interactions

Sequence Si

Ki

=kon

koff

=[TF · S

i

]

[TF ][Si

]

TF + S TF∙Skon

koff

P (s = 1|Si) =[TF · Si]

[TF · Si] + [Si]=

1

1 + eEi�µ

Equilibrium constant

Binding probability

Ei = � lnKi

µ = ln[TF ]

free energy

chemical potential



• How many parameters are needed to describe the DNA-protein interaction ?

Naive approach

• if TF binds regions of size w, then if one Ki per possible sequence

• 4w parameters

• w = 6 : 4096 parameters


[Berg, von Hippel (1987)][Stormo, Bioinformatics (2000)]



• How many parameters are needed to describe the DNA-protein interaction ?

Naive approach

• if TF binds regions of size w, then if one Ki per possible sequence

• 4w parameters



Model :

• Assume independent contribution of each position of Si to the binding energy

• contribution proportional to log(freq at position j)→ 3w parameters



[Berg, von Hippel (1987)][Stormo, Bioinformatics (2000)]


Characterizing binding affinities

• Example :๏ Evi-1 mouse transcription factor๏ 14 binding sites obtained using

SELEX

• Binding sites of transcription factors are not fixed strings !

GGACAAGATAAAGACAAGATAGAGACAAGATAGGGACAAGATAGTGACAAGATCACGACAAGACAAATACAAGACAATGATAAGATAAAGATAAGATAATGATAAGATAAAGATAAGATAAAGATAAGATAAAGATAAGATAAAGATAAGACAA




SELEX



Some positions are very degenerate




SELEX



Some positions are very degenerate

Some positions are well conserved



• Solution 1:represent binding sites as consensus sequence

• IUPAC conventions ๏ W = A or T ; R = A or G (puRines)๏ K = G or T ; S = C or G๏ Y = C or T (pYrimidine) ; M = A or C๏ B = C, G or T ; D = A,G or T๏ H = A,C or T ; V = A,C or G๏ N = any nucleotide

• Transfac conventions ๏ for a given position, order nucleotides by

frequency : f1>f2>f3>f4๏ single nucleotide if f1 > 50% AND f1>2*f2๏ double if f1 + f2 > 75% AND f1<50 %;

f2<50 %๏ triple if one nucleotide never appears and

previous rules don't apply๏ N in all other cases


How can we represent the binding sites ?

AGAyAAGATAA



• Solution 1:represent binding sites as consensus sequence

• Weakness 1:

๏ column 1 has 8/14 A’s๏ column 10 has 14/14 A’s๏ both are represented by A

• Weakness 2: ๏ other nucleotides in column 1 are

not taken into account



AGAyAAGATAA



• Solution 2 :

• count frequencies of nucleotides at each position

• obtain a count matrix

• represent the matrix as a logo

G G A C A A G A T A A A G A C A A G A T A GA G A C A A G A T A GG G A C A A G A T A GT G A C A A G A T C AC G A C A A G A C A AA T A C A A G A C A AT G A T A A G A T A AA G A T A A G A T A AT G A T A A G A T A AA G A T A A G A T A AA G A T A A G A T A AA G A T A A G A T A AA G A T A A G A C A A

a| 8 0 14 0 14 14 0 14 0 13 11c| 1 0 0 7 0 0 0 0 3 1 0g| 2 13 0 0 0 0 14 0 0 0 3t| 3 1 0 7 0 0 0 0 11 0 0




• Solution 2 :





a| 8 0 14 0 14 14 0 14 0 13 11c| 1 0 0 7 0 0 0 0 3 1 0g| 2 13 0 0 0 0 14 0 0 0 3t| 3 1 0 7 0 0 0 0 11 0 0




• Solution 2 :





a| 8 0 14 0 14 14 0 14 0 13 11c| 1 0 0 7 0 0 0 0 3 1 0g| 2 13 0 0 0 0 14 0 0 0 3t| 3 1 0 7 0 0 0 0 11 0 0

Difference beween position 1 and 10 is now obvious !




• Solution 2 :


• normalize to obtain position frequency matrix (PFM)


a| 0.57 0.00 1.00 0.00 1.00 1.00 0.00 1.00 0.00 0.93 0.79c| 0.07 0.00 0.00 0.50 0.00 0.00 0.00 0.00 0.21 0.07 0.00g| 0.14 0.93 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.21t| 0.21 0.07 0.00 0.50 0.00 0.00 0.00 0.00 0.79 0.00 0.00


fi,j



• Beware of sampling effects !

• Two different sets of BS for the same TF will give different PFM

• Example : Evi-1๏ set 1 : 14 BS (selex)๏ set 2 : 47 BS (selex)

a| 0.57 0.00 1.00 0.00 1.00 1.00 0.00 1.00 0.00 0.93 0.79c| 0.07 0.00 0.00 0.50 0.00 0.00 0.00 0.00 0.21 0.07 0.00g| 0.14 0.93 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.21t| 0.21 0.07 0.00 0.50 0.00 0.00 0.00 0.00 0.79 0.00 0.00

a| 0.55 0.74 0.00 1.00 0.02 1.00 0.98 0.00 1.00 0.00 0.87 0.85 0.23 0.64c| 0.09 0.04 0.02 0.00 0.43 0.00 0.00 0.00 0.00 0.13 0.04 0.00 0.23 0.21g| 0.19 0.09 0.94 0.00 0.00 0.00 0.02 1.00 0.00 0.00 0.00 0.15 0.30 0.06t| 0.17 0.13 0.04 0.00 0.55 0.00 0.00 0.00 0.00 0.87 0.09 0.00 0.23 0.09

Increasing the sample of binding sites makes some frequencies become > 0 What if we would further increase the dataset ?



• Sampling effects are attenuated by adding pseudo-counts

• at each position, a pseudo-count (usually 1) is distributed among the nucleotides๏ equal amount (0.25)๏ OR : fraction proportional to nucleotide prior frequency

a| 8 0 14 0 14 14 0 14 0 13 11c| 1 0 0 7 0 0 0 0 3 1 0g| 2 13 0 0 0 0 14 0 0 0 3t| 3 1 0 7 0 0 0 0 11 0 0

a| 8.2 0.2 14.2 0.2 14.2 14.2 0.2 14.2 0.2 13.2 11.2c| 1.3 0.3 0.3 7.3 0.3 0.3 0.3 0.3 3.3 1.3 0.3g| 2.3 13.3 0.3 0.3 0.3 0.3 14.3 0.3 0.3 0.3 3.3t| 3.2 1.2 0.2 7.2 0.2 0.2 0.2 0.2 11.2 0.2 0.2

a| 0.55 0.02 0.94 0.02 0.95 0.95 0.02 0.95 0.02 0.88 0.75c| 0.08 0.02 0.02 0.48 0.02 0.02 0.02 0.02 0.22 0.08 0.02g| 0.15 0.88 0.02 0.02 0.02 0.02 0.95 0.02 0.02 0.02 0.22t| 0.22 0.08 0.02 0.48 0.02 0.02 0.02 0.02 0.75 0.02 0.02

add 1 using priors a:t = 0.2 ; c:g = 0.3

normalize to 1 by dividing by 14+1

ni,j

fi,j

f 0i,j


Polls and pseudo-counts

Polls


Polls and pseudo-counts

Polls

Final results

They should have added pseudo counts !



• why is it so important to get rid of 0 frequencies ?

• compute the probability of a sequence to represent a binding site for Evi-1

a| 0.57 0.00 1.00 0.00 1.00 1.00 0.00 1.00 0.00 0.93 0.79c| 0.07 0.00 0.00 0.50 0.00 0.00 0.00 0.00 0.21 0.07 0.00g| 0.14 0.93 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.21t| 0.21 0.07 0.00 0.50 0.00 0.00 0.00 0.00 0.79 0.00 0.00

C C A C A A G A C A AP = 0.07 * 0 * 1 * 0.5 * 1 * 1 * 1 * 1 * 0.21 * 0.93 * 0.79 = 0

a| 0.55 0.02 0.94 0.02 0.95 0.95 0.02 0.95 0.02 0.88 0.75c| 0.08 0.02 0.02 0.48 0.02 0.02 0.02 0.02 0.22 0.08 0.02g| 0.15 0.88 0.02 0.02 0.02 0.02 0.95 0.02 0.02 0.02 0.22t| 0.22 0.08 0.02 0.48 0.02 0.02 0.02 0.02 0.75 0.02 0.02

C C A C A A G A C A A

P = 0.08* 0.02* 0.94*0.48* 0.95* 0.95* 0.95*0.95*0.22 *0.88*0.75 = 8.6e-5

This sequence can NEVER be a BS → this sequence has non-0 probability of being a BS


Generating random sequences

• We often need to bioinformatically generate sequences according a a specific model (“rolling a dice”)

• this model gives the emission probabilities, i.e. the probability P(X,j) at each position j of the sequence to obtain one of the four bases A,C,G,T

• possible models๏ equiprobable model: P(A)=P(C)=P(G)=P(T)=0.25

๏ more realistic model: P(A)=P(T)=pA,T ; P(C)=P(G)=pC,G

๏ position dependent model: P(A,j) = fA,j

A C

A C

A C A C A C


Information content of a matrix

Would I get the same PFM if I would consider 14 random sequences?

• Generate 14 sequences of length L=11 according to the emission probabilities of the PFM; probability of obtaining exactly the same count matrix:

• Generate 14 random sequences on length L=11; probability of obtaining exactly the same count matrix:


a| 8 0 14 0 14 14 0 14 0 13 11c| 1 0 0 7 0 0 0 0 3 1 0g| 2 13 0 0 0 0 14 0 0 0 3t| 3 1 0 7 0 0 0 0 11 0 0


Information content of a matrix

• Log-likelihood ratio

• Information content (bits)

• Properties๏ if f’ij → pi : IC → 0 : zero information content if the frequencies in the

alignment are equal to the background frequencies ("random matrix”)๏ IC is maximal if the least abundant nucleotide (smallest pi) is the most

represented in the matrix (largest f'ij)๏ information content is related to the Shannon entropy, mesuring the

uncertainty : high information content → low Shannon entropy

[Hertz, Stormo 1999]


Matrix logos

• Information content of the matrix:

• Information content of a column:

• Conventions:

• height of column represents IC

• relative sizes proportional to frequencies

same frequency (100%)but different IC

Mouse backgroundp(C)=p(G)=0.21 ; p(A) = p(T) = 0.29


Matrix logos

a| 8 0 14 0 14 14 0 14 0 13 11c| 1 0 0 7 0 0 0 0 3 1 0g| 2 13 0 0 0 0 14 0 0 0 3t| 3 1 0 7 0 0 0 0 11 0 0

P. falciparum backgroundp(C)=p(G)=0.1 ; p(A) = p(T) = 0.4

Anaeromyxobacter backgroundp(C)=p(G)=0.37 ; p(A) = p(T) = 0.13


From frequencies (PFM) to weights (PWM)

• position frequency matrices do not contain information about the background distribution

• define position-weight matrices (PWM) as(or Position Specific Scoring Matrix PSSM)

a| 8 0 14 0 14 14 0 14 0 13 11c| 1 0 0 7 0 0 0 0 3 1 0g| 2 13 0 0 0 0 14 0 0 0 3t| 3 1 0 7 0 0 0 0 11 0 0

a| 0.55 0.02 0.94 0.02 0.95 0.95 0.02 0.95 0.02 0.88 0.75c| 0.08 0.02 0.02 0.48 0.02 0.02 0.02 0.02 0.22 0.08 0.02g| 0.15 0.88 0.02 0.02 0.02 0.02 0.95 0.02 0.02 0.02 0.22t| 0.22 0.08 0.02 0.48 0.02 0.02 0.02 0.02 0.75 0.02 0.02

a| 0.75 -2.71 1.30 -2.71 1.30 1.30 -2.71 1.30 -2.71 1.22 1.06c| -1.04 -2.71 -2.71 0.73 -2.71 -2.71 -2.71 -2.71 -0.07 -1.04 -2.71g| -0.46 1.31 -2.71 -2.71 -2.71 -2.71 1.39 -2.71 -2.71 -2.71 -0.10t| -0.22 -1.16 -2.71 0.58 -2.71 -2.71 -2.71 -2.71 1.02 -2.71 -2.71

Count matrix

PFM

PWM


Motif databases

• JASPAR is an open database of binding profiles

• contains 1564 profiles (2018 version)

http://jaspar.genereg.net/


Motif databases

• JASPAR is an open database of binding profiles

• contains 1564 profiles (2018 version)


Different sources

ChIP-seq: real bindingsite is hidden in muchlonger sequence→ lower resolution

Documents

Principles and Methods in Regulatory Genomicsbioinfo.ipmb.uni-heidelberg.de/crg/bioinfo1/_downloads/a... · 2018-01-11 · Principles and Methods in Regulatory Genomics Bioinfo1 —