104540 VO/2 Bioinformatik SS2020 · From Frequency to Sequence Logo. 12 ... Mfold (Zuker et al.)...

Hubert HacklBiocenter, Institute of Bioinformatics

Medical University of Innsbruck Innrain 80, 6020 Innsbruck, Austria

Tel: +43-512-9003-71403Email: hubert.hackl@i-med.ac.at

URL: http://icbi.at

104540 VO/2 Bioinformatik SS2020

Transcription factor binding sites

Experimental methods

Computational methods

Matrix based methods

Motif discovery

MicroRNA target prediction

V Regulatory sequences

Transcription factor binding sites

Regulation of transcription

Experimental methods

Reporter gene assays (luciferase)

Electro mobility shift assays (EMSA)

DNase I and Exonulease Footprinting

Protein binding microarrays (PBMs)

Chromatin immuno precipitation (ChIP)

Luciferase reporter assays

Identify functional regulatory region within a sequence and delineate specific TFBS through mutagenesis

Evidence that TF binding has an effect on transcription (not only binding to DNA)

Electromobility/Gel Shift Assays

TF + DNA

DNA (free probe)

Supershift

Detection of labeled probesElectrophoretic gel separation

TF specific antibody

DNase I and Exonuclease footprinting

Neph et al., Nature, 2012

Most position weight matrices (PWMs) in the databasesare derived by SELEX

Systematic evolution of ligands by exponential enrichment

several cycles

Mukherjee et al., Nature Genetics, 2004

Protein binding microarrays (PBMs)

Farnham, Nature Rev Genetics, 2009

ChIP procedure

ChIP-seq analysis

Hawkins et al., Nature Rev Genetics, 2010

ChIP-seq (Peak calling)

Pepke, Nature Methods, 2009

• CisGenome• ERANGE• FindPeaks• F-Seq• GLITR• MACS• PeakSeq• QuEST• SICER• SiSSRs• Spp• Useq

Tools:

Chromatin state and TF localization

H3K4me3

H3K4me2

H3K4me1

H3K27ac

PPARγ

H3K36me3

H3K27me3

Mikkelsen et al., Cell, 2010

Challenges:

Sequences are short (e.g. 6-8bp)

Sometimes degenerated (not always the same)

Different distances from transcription start site

TF binding might have no effect on gene expression

Combination of TF and TF modules

Many false positive predictions

Matrix based methods (knowledge about TF) Position weight matrix (PWM), HMM

Motif discoveryWord counting, EM

Experimental verified binding sites

Position frequency matrix

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

A 10 8 4 3 11 0 1 1 2 19 15 17 2 0 0 0 16

C 3 4 11 5 1 1 2 6 15 0 1 4 1 1 2 17 2

G 3 2 4 2 7 20 19 6 1 1 2 1 17 15 1 4 1

T 6 8 3 12 3 1 0 7 4 2 4 0 2 6 19 1 3

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

A 0.86 0.54 -0.46 -0.87 1.00 -1.32 -2.46 -2.32 -1.46 1.79 1.45 1.63 -1.46 -1.32 -1.32 -1.32 1.54

C -0.87 -0.46 1.00 -0.14 -2.46 -2.46 -1.46 0.26 1.45 -1.32 -2.46 -0.46 -2.46 -2.46 -1.46 1.63 -1.46

G -0.87 -1.46 -0.46 -1.46 0.35 1.86 1.79 0.26 -2.46 -2.46 -1.46 -2.46 1.63 1.45 -2.46 -0.46 -2.46

T 0.13 0.54 -0.87 1.13 -0.87 -2.46 -1.32 0.49 -0.46 -1.46 -0.46 -1.32 -1.46 0.13 1.79 -2.46 -0.87

• Position frequency matrix

• Position weight matrix (PWM),position specific scoring matrix (PSSM)

Position weight matrix (PWM)

Probability of base b at position i

p(b,i) =fb,i + s(b)

N + ∑ s(b’)b’ є {A,C;G,T}

Wb,i=log2

p(b,i)

N … number of sitess(b) … pseudo countsFb,i … frequency of base b

in position i

p(b)… background probabilityof base b

1 2 3 4 5 6

A 1.00 -1.32 -2.46 -2.32 -1.46 1.79

C -2.46 -2.46 -1.46 0.26 1.45 -1.32

G 0.35 1.86 1.79 0.26 -2.46 -2.46

T -0.87 -2.46 -1.32 0.49 -0.46 -1.46

…ACGTAGGTCATAGAGTA.. S=1+1.86+1.79+0.49+1.45+1.79=8.38

Evaluation of sequences

S= ∑ Wb,ii=1

w w … width of PWMb … nucleotide in position iS … PWM score of a sequence

…ACGTAGGTCATAGAGTA.. S=-0.87-2.46-2.46+0.49-1.46-2.46=-9.22

Threshold for similarities

Find optimized similarity score to minimize false predictions

Difficult to determine because of lack of experimental data

One approach could be to define the optimized threshold of a weight matrix as the matrix similarity that allows e.g 3 matches in 10000bp of none regulatory test sequences

From Frequency to Sequence Logo

Information content in position i

Di = 2+ ∑p(b,i)log2p(b,i) -e(n)b

e(n) … correction factor if only few samples nDi … information content at position ib … base A,C,G, or, T

Di=2+4*0.25*log20.25=0 bits

Di=2+1*log21+3*0.001*log20.001=1.97 bits

from pseudocounts (log20 is not defined!!)

All bases with equal probabilities at position i

Only one base is present at position i

Using a set of background sequences

Profile hidden markov models (HMM)

Levkovitz et al. PLoS One. 2010

Phylogenetic footprinting

• Functional regulatory sites are conserved between species

• E.g. using multiz alignment of UCSC genome browser

Start with just sequences Identify strongly enriched motifs de novo Algorithmically on of the most challenging analysis tasks Use it when you suspect important unknown motifs in your

Word counting and enrichment Statistical algorithms

Expectation-maximum algorithm (EM)Gibbs Sampling

Motif discovery

Find common subsequence

Motif discovery by word counting

Motif discovery

• Problem: Don’t know what the motif looks like or where the starting positions are

• EM is a family of algorithms for learning probabilisticmodels in problems that involve hidden state

• In our problem, the hidden state is where the motif starts in each training sequence

Use expectation maximum (EM)

The element Zij of the matrix Z represents the probability that the motif starts in position j in sequence i.

A motif is represented by a matrix of probabilities: Pck

represents the probability of character c in column k

Basic EM-approach

• The basic EM approach has been enhanced by MEME

given: length parameter W, training set of sequences

set initial values for pdo

re-estimate Z from p (E –step)re-estimate p from Z (M-step)

until change in p < εreturn: p, Z

Basic EM-approach

microRNA biogenesis

microRNA/mRNA pairing

1. Sequence complementarity2. Conservation3. Thermodynamics4. Site accessibility5. UTR Context6. Correlation of expression profiles

Principles of microRNA target prediction

Sequence complementarity

Bartel, Cell,2009

Lewis BP et al., Cell, 2003

Conservation

1. Minimum free energy

3. Extreme value distribution of MFE

2. Account for different sequence length

eMfold (Zuker et al.)RNAfold (Hofacker et al.)

Thermodynamics

mfe: -25.3 kcal/molp-value: 0.010068

Target 5' A UC A 3' CACAG UUG UCUGCAGGGGUGUU AGC AGAUGUCCC

miRNA 3' UA CA 5'

Rehmsmeier M et al. RNA (2004)

Site accessibility

Leitner A, 2009

Expression profiles

TargetScan

DIANA MicroT

miRBase Targets

MiRanda

MiRGen

miRNA (EMBL)

PicTar

RNAhybrid

GenMir++

microRNA target prediction tools

Sensitivity = TP / (TP+FN)

Specificity = TN / (FP+TN)

Validation

Reduced protein levels

3T3-L1 adipocyte differentiation timeseries

Hackl, Burkard et al. , Genome Biol, 2005

Microarrayanalysis

Correspondence between co-expression and function

cell cycle

extracellular matrixfat metabolism

cholesterol synthesis

cell cycle

signaling

Hierarchichal clustering and k-means clustering of 780 selected ESTs during fat cell differentiation

Correspondence between co-expression and co-regulation

Functional transcription factor binding site: SREBP

Binding sites for key regulators (PPARg, C/EBP) was not significantly over represented

In a follow up study Pparg binding sites were enriched

Hakim-Weber et al. BMC Research Notes, 2011

ChIPseq, ChIP-chip, and genomic organization (UCSC browser)

Pparg::Rxra DR1 motif search

Pparg binding sites

Luciferase assays (Apmap promoter)

Bogner-Strauss et al. , Cell Mol Life Sci, 2010

104540 VO/2 Bioinformatik SS2020 · From Frequency to Sequence Logo. 12 ... Mfold (Zuker et al.)...

Documents

Isolation of temperature-sensitive ... - dev.biologists.org · sequences from plasmid pDPS48 (a kind gift from Dean P. Smith and Charles Zuker) and the Kpnl-Xbal DT-A sequences (fully

Bioinformatica Predizione della struttura secondaria dell’RNA – MFOLD

New CARTA DE PRESENTACIÓN · 2019. 6. 7. · Ronald Zuker, Canadá. página 7 CATEGORÍAS DE INSCRIPCIÓN. Senior Residente* Argentina Residente* Extranjero Especialista Modulo Inscripción

Julie Gillespie, Ottawa Area ISD Rich Zuker, Holland Public Schools

Feria de Ciencias Zoe Zuker y Julieta Arzt

Date: Time: Online Check Register · 01/14/2016 104530 La Grange Napa 568.94 01/14/2016 104563 LANNY ROYCE THIBODEAUX 135.60 01/14/2016 104540 LCRA 196.88 01/14/2016 104531 Lone Star

Role of the mod(mdg4) common region in homolog segregation ... · over TM6, Tb e and maintained by C. Zuker (KOUNDAKJIAN et al. 2004). Z3-3401 and the mnm and snm lines used in this

Coarse-grained modeling of large RNA molecules with ...rhiju/RNA-2009-Jonikas-189-99.pdf · secondary structure is available through tools such as the thermodynamic method, Mfold

General Practice Compliance Costs 1 of 100 General Practice Compliance Costs Feedback and Context for the Assessment of Commonwealth Forms Research Conducted by Max Yann, Mark Zuker,

Bioinformatica Predizione della struttura secondaria dellRNA – MFOLD Dr. Giuseppe Pigola – pigola@dmi.unict.it

104540 VO/2 Bioinformatik SS2020 · 2020. 6. 5. · Data integration methods VI Data Integration. 2 ... • centralized platform to visualize domain architecture, post-translational

Bar Mitzvá 18 - 11 - 2016... · 2016. 11. 17. · KIDUSHIM Este Viernes 18 de Noviembre La familia Zuker – Silberberg ofrecerá un kidush por iortzait a la bendita memoria de:

Using Expressed Sequence Tags to Improve Gene Structure ...cgm.sjtu.edu.cn/index/pub/courses/2018/pab/week-8... · Zuker, M (1991) JMB 221:403-420 M(x,y) = Forward(x,y) + Backward(x,y)

Comparative Bioinformatics and Experimental Analysis of the … · 2017. 4. 13. · 1 M NaCl) with Mfold v.4.6 (Zuker,2003). Media and Growth Conditions. All cloning steps were performed

MINISTERIO DE LA PRESIDENCIA - BOE.es · BOLETÍN OFICIAL DEL ESTADO Núm. 265 Jueves 5 de noviembre de 2015 Sec. I. Pág. 104540 I. DISPOSICIONES GENERALES MINISTERIO DE LA PRESIDENCIA

50 Jahre AURA-Hotel Kur- und Begegnungszentrum SaulgrubDie Initiative ZuKER entwickelte ein für alle Häuser übergeord-netes Logo, das 2002 der Öffentlichkeit präsentiert wurde:

創人物2016.1月 - 黃聖皓 - 租客Zuker創辦人

c. Zuker and Glenn P. Jenkins, in collaboration · of conclus10ns or reco!!lmendat10ns by etther the ..... hatr man or Council members. Etabli en 1963 par une Loi du Parlement, le

Internal loops in RNA secondary structure prediction Lyngsø, Zuker, and Pedersen (1999) Andrew Hendriks CMPT 889 Selected Topics in Bioinformatics

Forum - Etai · Michele Ben Debbie Lifschitz Shaee Zuker Ex-Officio Sheila Been Elite Olshtain Raphael Gefen Jack Schuldenfrei Valerie Jakar Judy Steiner Brenda Liptz Ephraim Weintraub