6
Position-dependent motif characterization using Non- negative matrix Factorization (NMF) Joel H Graber Lucie N. Hutchins, Erik McCarthy, Sean Murphy, Priyam Singh The Jackson Laboratory In collaboration with: Thomas Blumenthal, University of Colorado David Kulp, University of Massachusetts Funding Sources Current: NIH GM 072706, NIH HD037102 Previous: NIH RR 16463 (INBRE-Maine) NSF 2010 Project DBI 0331497

Position-dependent motif characterization using Non-negative matrix Factorization (NMF)

  • Upload
    ailish

  • View
    41

  • Download
    0

Embed Size (px)

DESCRIPTION

Position-dependent motif characterization using Non-negative matrix Factorization (NMF). In collaboration with: Thomas Blumenthal, University of Colorado David Kulp, University of Massachusetts. Joel H Graber Lucie N. Hutchins, Erik McCarthy, Sean Murphy, Priyam Singh - PowerPoint PPT Presentation

Citation preview

Page 1: Position-dependent motif characterization using Non-negative matrix Factorization (NMF)

Position-dependent motif characterization using Non-negative

matrix Factorization (NMF)Joel H Graber

Lucie N. Hutchins, Erik McCarthy, Sean Murphy, Priyam Singh

The Jackson Laboratory

In collaboration with: Thomas Blumenthal, University of Colorado David Kulp, University of Massachusetts

Funding Sources Current: NIH GM 072706, NIH HD037102

Previous: NIH RR 16463 (INBRE-Maine) NSF 2010 Project DBI 0331497

Page 2: Position-dependent motif characterization using Non-negative matrix Factorization (NMF)

Motifs are often constrained in positioningAUGCACAUAGAGGCAAUUGUGUAUCAAUAUUAAAAAUAAAGUAAAACUUA AAGCAUGUGUAGACCGUGUGAUGAAUCCUUGUAUAAGCAACUGCCAAUGAAAUCGGGCUCGCUGUGGUCA UCCGUGAGUGCUUAUCAUUCUGGUAAUACCGUGGUCUAUUUAUACAAAUAUUAAAAGUGCUGUUUAUAGA GCCUGUGUCAUGUGGCAACUUCCUGUGUCAUGACCUCAGGAAAUAAAUUUCCUUGACUUUAUAAAAGCCA AAACGUUUGCCCUCUUCCUUGGAAUUUGAAAUUACUCCAAUUUAAAAUAAAUUACUGGACUGUGGAAAUA ACAUGUAGAAUUGCAGUUUUACACUGUAACAGUUGCUUCUGCCUACCUUAUAAAUAAAGAAUCACUAAGA AAAAGAGUUCUCAGGUCUCCCUGAGCUCAGACUGAGGGGAAACGGAGGCAAAUAAAGCUGAGUUUUGAGA ACUCGGUGGCCUGUGUUCCUAGCCUGUACUCACCCCUUCCCUUAAUAAUAAUAAAACAACAACUUUGUGA AUUUGAGUUUUCCUUAGAGCUCAACAGAUCAUAUUCAGUGUCUUGAAUAAAUUGCUCUAUUUUGAUAUUA GAGAACAUAGUGACUGUGUUUGGUACGAUUAUUUUUUUUAACUAAAAUGAGAUAAAAUUCUAUAUUCUUA UGUGUGUGUGGUUUUUGAUGGGUGAAACUGUCUCAAUUUGAAUAAAUAUUUUUAUUGCAAUUCUGAACCA AUUUUAAAAGAAAAGAUACAAAUGUCCUUCCAAAUAGAGCCUUUUUAUUAAUAAAGGGCCUUGUACUUCA CUUGGAACAAAGGACGUUUCAUUUCAUUGUGUUAAAUGUAUACUUGUAAAUAAAAUAGCUGCAAACCUUA AGCCUUUGAGCUACUUGGUGUAUCUCACUCGGUAUUACGUGCUCUGCAAUAGAAGUUGGUGUGAACAUUC CCAGGUGACAUGCAGUGUUACCACCACCCCUCCAUCAGUAAGCCACUAAUAAAGUGCAUCUAUGCAGCCA CAGGUCUGUCUGCCUCUUUUGGCUGGGCACCUUAAAAGAGAAGUCAAUAAACUGGGCUACACAGUACUUA AAACGCUGAACUGGCUAAGAUGUGUAUUUAUGAAUAUUAAUGAAUAAAAACUGCUUGGAUGGUUUACCUA ACUACUGCAUGAGGUUUUUUUCCUUUCUUUUCUCUCCACUCAAUAAAUACUUUAAAGCACAUUUGGAAUA AAGGAAGAGACUUUUAAGUGGUGCUUAAUGAUAAGGUUUUGACUUGUUAAAUUAAACCAUUUGGAAUAUA UUGUGUGUUUGUAGUAGUCAGUGCCUUUGUUUGUAAACCAAAAAGUAAUAAAUGAAUCCCUAUAUUUCUA UUAUAGCAUCUAUUGUAUUUAAUAUAGUAUUUUAUUUAAGAAAAUAAACUUUGCAGUUUUUGCAUUGUGA AUUCUCUCUCUUCCCGCCCACUGCCAUGAAAAAUGUUGUUUAUGGAAUAAAAAAAAUGUAACUGCCUUUA AAUUUCCUGGUGGCUGUGUU

Functional site

N position counts

Msequence

words

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

PWCMatrix

Page 3: Position-dependent motif characterization using Non-negative matrix Factorization (NMF)

NMF decomposes the PWC matrix into characteristic patterns (motifs)

V =W ⋅HCounts (M x N) Bases (M x r) Weights (r x N)

Wik = weight of ith word in the kth motif

Hkj = abundance of kth motif at the jth position

(content)

(positioning)

r = number of basis functions (patterns)

Page 4: Position-dependent motif characterization using Non-negative matrix Factorization (NMF)

Synthetic data verifies NMF performance

Page 5: Position-dependent motif characterization using Non-negative matrix Factorization (NMF)

RSS provides a robust estimate for the optimal number of vectors (r)

0

10

20

30

40

50

60

3 4 5 6 7 8 9 10 11 12 13

Basis Vector Count (r)

Residue (Test Matrixes)

Test matrix 1

Test matrix 2

120

140

160

180

200

220

240

3 4 5 6 7 8 9 10 11 12 13

Basis Vector Count (r)

Residue (Test Sequences)

0

500

1000

1500

2000

2500

3000

3500

Residue (Human PolyA Sites)Artificial sequences

Human polyA sequences

RSS =

Vij − WH( )ij( )2

Page 6: Position-dependent motif characterization using Non-negative matrix Factorization (NMF)

NMF can characterize complex control sequences

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Mouse 3’-processing sequencesHuman transcription start sites