pattern matching 2003lec31

Embed Size (px)

Citation preview

  • 8/9/2019 pattern matching 2003lec31

    1/20

    Pattern Matching

    Rhys Price Jones

    Anne R. Haake

  • 8/9/2019 pattern matching 2003lec31

    2/20

    What is pattern matching?

    Pattern matching is the procedure of

    scanning a nucleic acid or protein sequence

    for matches to short sequence patterns

    !taden "##$%.

  • 8/9/2019 pattern matching 2003lec31

    3/20

    Why search for patterns?

    &sually the sequences of interest the query

    sequences% are kno'n to (e indicators of

    some important (iological function

    !earch for patterns in nucleotide sequence

    ) *+A or R+A

    !earch for patterns in amino acid sequence

  • 8/9/2019 pattern matching 2003lec31

    4/20

    Motif

    multiples uses of the 'ord

    *ef, a pattern- typically is used to refer to a

    short up to ten (ases or residues% repeated

    or consered pattern in nucleic acids or

    proteins

    *ef, a short consered sequence in a protein-

    usually associated 'ith function) in a (roader sense/ motif is used for all locali0ed

    regions of homology/ regardless of si0e

  • 8/9/2019 pattern matching 2003lec31

    5/20

    !ome e1amples of patterns in *+A

    sequence,

    Restriction sites,recognition sites for the

    restriction endonucleases 2ntron splice sites

    3odons specifying 4R5s

    Promoters

    *+A (inding sites for regulatory proteins

  • 8/9/2019 pattern matching 2003lec31

    6/20

    Restriction !ites

    Why identify them?

    61act or ine1act matches?

    61amples,

    Restriction sites

  • 8/9/2019 pattern matching 2003lec31

    7/20

    !plice !ites

    !plice donor and splice acceptor are consensus sequences

    )A statistical determination of the

    pattern-appro1imates the pattern

    3orA%A7879Aor7%A79 :donor: splice site

    9or3%n+3or9%A787 :acceptor: splice site

    !plice site e1ample

  • 8/9/2019 pattern matching 2003lec31

    8/20

    !plice !ites

    Remem(er that they are consensus sequences

    Why are splice sites of interest?

    ) 7ene finding

    ) Mutations in consensus sequence at the splice ;unctionscommon in many inherited disorders

    61, thalassemias/ muscular dystrophy/ 9ay

    neurofi(romatosis/ *arier=s disease>>..

    4ne of the thalassemias, mutation at splice acceptor

    YYYNCAG| normal

    YYYNCGG| mutant

  • 8/9/2019 pattern matching 2003lec31

    9/20

    3odons !pecifying 4R5s

    4R5s open reading frames%

    !tart codon >.$

  • 8/9/2019 pattern matching 2003lec31

    10/20

    Promoters

    Prokaryotic promoters, 3onsensus sequences TTGACA ---- 171 ----TATAAT

    -35 -10

    6ukaryotic promoters) 9A9A (o1 at )@ relatie to transcriptional start site consensus is =

  • 8/9/2019 pattern matching 2003lec31

    11/20

    9ranscription 5actor Finding !ites

    Regulatory transcription factors are

    sequence

  • 8/9/2019 pattern matching 2003lec31

    12/20

    !ome e1amples of patterns in protein

    sequences motifs%,

    Prediction of secondary and tertiary

    structure

    ) e.g. transcription factors

    heli1

  • 8/9/2019 pattern matching 2003lec31

    13/20

    61act s 2ne1act Appro1imate% Pattern

    Matching

    61act Pattern Matching

    ) Gimited use in (ioinformatics

    ) Well

  • 8/9/2019 pattern matching 2003lec31

    14/20

    4ther uses of e1act pattern matching?

    3heck P3R primers?

    Annotation? te1t matching%

  • 8/9/2019 pattern matching 2003lec31

    15/20

    Why search for patterns?

    Pattern matching in sequences is also the

    (asis of searching through a sequence

    data(ase

    ) !equence alignment

  • 8/9/2019 pattern matching 2003lec31

    16/20

    Pair'ise !equence Alignment

    An alignment (et'een @ sequences is a

    pair'ise match (et'een sequences.

    Pair'ise sequence comparison is the primary

    means of linking (iological function to the

    genome and of propagating kno'n

    information from one genome to another7i(as Jam(eck%

    .

  • 8/9/2019 pattern matching 2003lec31

    17/20

    Why are ine1act pattern matches releant

    in sequence alignments?

    !equencing errors

    Mutation) @ primary types point mutations affect a single nucleotide%

    segmental mutations affect a fe' to hundreds of

    ad;oining nucleotides%

    ) su(stitutions transitions/ transersions%

    ) insertions/ deletions

  • 8/9/2019 pattern matching 2003lec31

    18/20

    Mutations

    Point mutations usually occur from a nucleotidemismatch that (ecomes Ifi1ed during the process ofreplication) 6scapes the *+A repair mechanism

    !ignificant 'hen occur 'ithin a coding region andalso cause a change in functionality) +on

  • 8/9/2019 pattern matching 2003lec31

    19/20

    6olutionary 3onsiderations

    9hrough time mutations tend to (e presered

    if they are not deleterious

    5unctionally important sequences tend to (e

    consered

    +on

  • 8/9/2019 pattern matching 2003lec31

    20/20

    6olutionary 3onsiderations

    9he tendency of functionally important

    sequences to remain relatiely unchanged

    oer time is the (asis for sequence analysis

    )Allo's us to dra' eolutionary connections among

    genes that are related in sequence