21
Finding Subtle Motifs by Branching from Sample Strings Xuan Qi Computer Science Dept. Utah State Univ.

Finding Subtle Motifs by Branching from Sample Strings

  • Upload
    melora

  • View
    65

  • Download
    0

Embed Size (px)

DESCRIPTION

Finding Subtle Motifs by Branching from Sample Strings. Xuan Qi Computer Science Dept. Utah State Univ. Preface. This presentation is based on paper “Finding subtle motifs by branching from sample strings” by Alkes Price, Sriram Ramabhadran and Pavel A. Pevzner. Outline. - PowerPoint PPT Presentation

Citation preview

Page 1: Finding Subtle Motifs by Branching from Sample Strings

Finding Subtle Motifs by Branching from Sample Strings

Xuan QiComputer Science Dept.

Utah State Univ.

Page 2: Finding Subtle Motifs by Branching from Sample Strings

Preface This presentation is based on paper “Finding subtle

motifs by branching from sample strings” by Alkes Price, Sriram Ramabhadran and Pavel A. Pevzner.

Page 3: Finding Subtle Motifs by Branching from Sample Strings

Outline Motif finding problem. Methods that have been proposed to address this problem. The contribution of the method presented in this paper. The algorithms proposed in this paper. Experiment results. Discussion of the advantages and disadvantages of the method proposed in this paper. Future research direction.

Page 4: Finding Subtle Motifs by Branching from Sample Strings

Motif Finding Problem Given a set of DNA sequences, find a set of l-mers, one from each sequence, that maximizes the consensus score.

Input: A t*n matrix of DNA, and l, the length of the pattern to find. Output: An array of t starting positions s = (s1, s2, …, st) maximizing Score(s, DNA).

Subtle motif: low score, not significant pattern among the sequences, and thus more difficult to identify

Page 5: Finding Subtle Motifs by Branching from Sample Strings

Methods Proposed Category1: Searching possible starting points of the motif Methods: CONSENSUS, GibbsSampling Disadvantages: Search space is very large. They are not always capable to find optimal motifs.

Category2: Searching possible samples of the motif Methods: Vanet et al. 2000, Marsan and Sagot 2000, Pavesi et al. 2001, Apostolico et al. 2002, Eskin and Pevzner 2002 Advantages: Reduce down the search space. Disadvantages: Still have high computational cost especially for long motifs. The selected sample may only converge to local optima instead of global optimal point. An alternative: extended sample-driven approach Search the neighbors of all samples with exhaustive

search.

Page 6: Finding Subtle Motifs by Branching from Sample Strings

Contribution of This Paper Basic idea: branching from the sample strings Contribution: Much more efficient than previous algorithms. Very powerful to find subtle motifs.

Page 7: Finding Subtle Motifs by Branching from Sample Strings

Comparison between the Methods

Page 8: Finding Subtle Motifs by Branching from Sample Strings

The Algorithms Proposed

Two ways to model a motif: 1. as a pattern

2. as a profile: 4*l matrix

Two algorithms proposed: 1. Pattern-Branching algorithm

2. Profile-Branching algorithm

Page 9: Finding Subtle Motifs by Branching from Sample Strings

Pattern-Branching Algorithm Distance between M and a sample A0: d(M, A0) = k D = k(A0): a set of patterns of distance exactly k from A0

Neighbor: D = 1(A0), changing a single nucleotide of A

E.g., ATTGCCAG, ATTGCCTG, GTTGCCAG Score of a pattern: total distance from the sequences

1. For each sequence si, d(A, si) = min{d(A, P)|Psi}, p is a l-

mer (a pattern of length n).

2. The total distance of A from S is d(A, S) = ∑ siS d(A, si)

BestNeighbor(A): the pattern B D = 1(A0) with the lowest total distance d(B, S)

Page 10: Finding Subtle Motifs by Branching from Sample Strings

Pattern-Branching Algorithm Input: A set of sequences S, the length of the

motif l and * of mutations k. Output: motif of length l with k mutations. Algorithm:

PatternBranching(S, l, k)

1. Motif M arbitrary motif pattern

2. Get a set of samples of M in the sequences (S)

3. For each l-mer A0 in S

4. For j 0 to k

5. {

6. if d(Aj, S) < d(M, S) 7. M Aj

8. Aj+1 Bestneighbour(Aj)

9. Output M

10. }

Page 11: Finding Subtle Motifs by Branching from Sample Strings

Profile-Branching Algorithm

Similar to Pattern-Branching Some changes: 1. convert each sample string to a profile X(A0)

2. generalize the scoring method to score profiles

3. modify the branching method to apply to profiles

4. use the top-scoring profile we find as a seed to the EM algorithm

Page 12: Finding Subtle Motifs by Branching from Sample Strings

Profile-Branching Algorithm

Convert a sample string to a profile X(A0):

A T G C C A T

A 1/2 1/6 1/6 1/6 1/6 1/2 1/6

T 1/6 1/2 1/6 1/6 1/6 1/6 1/2

G 1/6 1/6 1/2 1/6 1/6 1/6 1/6

C 1/6 1/6 1/6 1/2 1/2 1/6 1/6

Page 13: Finding Subtle Motifs by Branching from Sample Strings

Profile-Branching Algorithm Use entropy to score profiles:

Given a profile X = (xvw) and a pattern P = p1…pl, let e(X, P) be the log probability of sampling P from X, i.e. e(X, P) =

∑wlog(xpww).

A T G C C A T

A 1/2 1/6 1/6 1/6 1/6 1/2 1/6

T 1/6 1/2 1/6 1/6 1/6 1/6 1/2

G 1/6 1/6 1/2 1/6 1/6 1/6 1/6

C 1/6 1/6 1/6 1/2 1/2 1/6 1/6

G T G A C A T

1/6 1/2 1/2 1/6 1/2 1/2 1/2

Page 14: Finding Subtle Motifs by Branching from Sample Strings

Profile-Branching Algorithm For each sequence Si in the sample S = {S1, …, Sn}, let

e(X, Si) = max{e(X, P)|PSi}. Then the entropy score of X is e(X, S) = ∑siS e(X, si). Intuitively, e(X, S) describes how well X matches its best

occurrence in each sequence of the sample.

Page 15: Finding Subtle Motifs by Branching from Sample Strings

Profile-Branching Algorithm Branching from the

sample string: 1. Amplify only one column

in the profile (which corresponds to one position in the sample pattern), and we only amplify a nucleotide v if xvw < 0.5.

2. Make sure that the relative entropy ∑v xvwlog(x’vm/xvm)

= . We use = -0.3.

A T G C C A T

A 1/2 1/6 1/6 1/6 1/6 1/2 1/6

T 1/6 1/2 1/6 1/6 1/6 1/6 1/2

G 1/6 1/6 1/2 1/6 1/6 1/6 1/6

C 1/6 1/6 1/6 1/2 1/2 1/6 1/6

A T G C C A T

A 0.27 1/6 1/6 1/6 1/6 1/2 1/6

T 0.55 1/2 1/6 1/6 1/6 1/6 1/2

G 0.09 1/6 1/2 1/6 1/6 1/6 1/6

C 0.09 1/6 1/6 1/2 1/2 1/6 1/6

Page 16: Finding Subtle Motifs by Branching from Sample Strings

Profile-Branching Algorithm Algorithm: ProfileBranching(S, l, k) 1. M arbitrary motif profile 2. For each l-mer A0 in S 3. { 4. X0 X(A0) 5. For j 0 to k 6. { 7. if e(Xj, S) > e(Motif, S) 8. Motif Xj 9. Xj+1 BestNeighbor(Xj) 10. } 11. Run EM algorithm with Motif as seed

Page 17: Finding Subtle Motifs by Branching from Sample Strings

Results on Implanted Motifs Pattern-Branching algorithm VS previous pattern-based motif finding algorithms

WINNOWER, SP-STAR: unable to find subtle motifs

PROJECTION, MITRA, MULTIPROFILER

Page 18: Finding Subtle Motifs by Branching from Sample Strings

Results on Implanted Motifs Profile-Branching algorithm VS previous profile-based motif finding algorithms

Performance coefficient: Let k be the set of n implanted motifs found, and let p be the set of predicted motif positions,the performance coefficient is defined to be |K ∩ P|/|K ∪ P|.

Page 19: Finding Subtle Motifs by Branching from Sample Strings

Results on Biological Samples Pattern-Branching Algorithm:

Profile-Branching Algorithm:The pattern returned by profile-branching matches the reference motif.

Page 20: Finding Subtle Motifs by Branching from Sample Strings

Discussion

Advantages: Much more efficient than previous algorithms. Very powerful to find subtle motifs.

Disadvantages: 1. Pattern-Branching has difficulty finding motifs with many degenerate positions. But profile-Branching works

well on it. 2. Profile-Branching is very powerful to find subtle motifs but is comparatively slow.

Page 21: Finding Subtle Motifs by Branching from Sample Strings

Future Work Apply Pattern-Branching and Profile-Branching

algorithms to more challenging biological samples 1. Larger samples

2. Corrupted samples

Extend the algorithms to address the motif finding problem which involves not only A, T, G, C, but purine(R), pryrimidine(Y), weak bond(W) and strong bond(S).