Method of MobyDick Chao Wang May 4, 2005. Aim Discover multiple motifs from a large collection of...

Method of MobyDick

Chao Wang

May 4, 2005

• Discover multiple motifs from a large collection of sequences

• It is based on a statistical mechanics model that segments the string probabilistically into ‘‘words’’ and concurrently builds a ‘‘dictionary’’ of these words

Method

• Build a dictionary from a text, one starts from the frequency of individual letters, finds over-represented pairs, adds them to the dictionary, determines their probabilities, and continues to build larger fragments in this way.

• Loop

fitting step: compute the optimal assignment of given the entries w in the dictionary,

prediction step: add new words and terminate• Decompose text into words

Fitting Step

• Dictionary “ D ”, word w with frequency• the optimal is found by maximizing Z(S, ),

the probability of obtaining the sequence S for a given set of normalized dictionary probabilities

• wP

For a given set of dictionary words w, we fit the to the sequence S by maximizing the likelihood function in Eq. 1 with the constraint that and . This condition is equivalent to solving for from the equation

where is the average number of words w in the ensemble defined by Z

Prediction step• Do statistical tests on longer words based on their

predicted frequencies from the current dictionary• To check the completeness of the dictionary,

consider all pairs of dictionary words w, and ask whether the average number of occurrences of the composite word w, created by juxtaposition exceeds by a statistically significant amount the number predicted by the model, (or equivalently ). If so, the composite word is added to the dictionary

Decompose text into words

serves as a quality factor

the number of matches to the word w anywhere in the sequence

The average number of times the string w is delimited as a word among all segmentations of the data

Method of MobyDick Chao Wang May 4, 2005. Aim Discover multiple motifs from a large collection of...

Documents

pascom - mobydick und UCS

Motifs Ch1

Star image Motifs

(Mano Chao)

MobyDick ONE Catalogue - Keston › MobyDick › ONE_Katalog v12_EN.pdf2. Low Spray Pattern 10 3. Galvanised Washing Area 11 4. The Washing Area Consists of Two Elements 11 5. Angle

MobyDick Installation

Previous Lecture: Motifs

Universal Motifs

Sequence Motifs. Motifs Motifs represent a short common sequence –Regulatory motifs (TF binding sites) –Functional site in proteins (DNA binding motif)

Modernist Motifs

Symbols and Motifs How to identify symbols and motifs in literature

MobyDick y Hoy - Memoria Chilena: Portal · 2008. 1. 7. · CJ98G EL MERCURIO - Jueves 15 de Diciembre de 1994 MobyDick y EE.UU./ Hoy Los Estados Unidos de América encabe- zan 13

Vision Motifs

Mutiple Motifs Charles Yan Spring 2006. 2 Mutiple Motifs

Motifs Pastel

Modeling Regulatory Motifs

Motifs de Broderie

MobyDick ConLine KIT Plus rugged, functional, efficient ... - FB …€¦ · . MobyDick ConLine Wheel Washing System KIT Plus MobyDick ConLine | KIT Plus is a wide product range of

Coleccion Chao

Persian Motifs