Part I ¢â‚¬â€œ Sequence analysis (DNA)bio- Part I ¢â‚¬â€œ Sequence analysis (DNA) : Bioinformatics Software

  • View
    1

  • Download
    0

Embed Size (px)

Text of Part I ¢â‚¬â€œ Sequence analysis (DNA)bio- Part I...

  • 1

    Part I – Sequence analysis (DNA) : Bioinformatics Software

    Chen Xin National University of Singapore

    Bioinformatics software

    Its role in research:

    n Hypothesis- driven research cycle in biology (From Kitano H. Systems biology: a brief overview. Science 2002, 295:1662-4)

  • 2

    Bioinformatics software

    n Cyclical refinement of predictive computer models used to define further biological experiments, including the optimization step.

    n From Brusic et al. 2001, Efficient discovery of immune response targets by cyclical refinement of QSAR models of peptide binding. J. Mol. Graph. Model. 19:405-11, 467

    Bioinformatics software

    n By combining computational methods with experimental biology, major discoveries can be made faster and more efficiently.

    n Today, every large molecular or systems biology project has a bioinformatics component.

    n Use of biological software allows biologists to extend their set of skills for more efficient and more effective analysis of their data, and for planning of experiments.

  • 3

    Genetic information

    n Genetic information carrier n DNA or RNA

    n Genetic information carried n Sequence

    n Hence:

    Life = Life = f f (Sequence)(Sequence)

    New drug discovery

    A drug = Target identification -> Lead discovery ->Lead optimization -> animal trial -> clinical trail

    Target: Key to disease development Specific to disease development Sequence, Sample protein, 3D structure …

  • 4

    DNA sequence analysis Types of analysis: n GC content n Pattern analysis n Translation (Open Reading Frame detection) n Gene finding n Mutation n Primer design n Restriction map n ……

    When you have a sequence

    n Is it likely to be a gene? n What is the possible expression level? n What is the possible protein product? n Can we get the protein product? n Can we figure out the key residue in the

    protein product? n ……

  • 5

    GC content

    n Stability n GC: 3 hydrogen bonds n AT: 2 hydrogen bonds

    n Codon preference n GC rich fragment

    à Gene

    GC Content

    n CpG island n Resistance to methylation n Associated with genes which are frequently

    switched on n Estimate: ½ mammalian gene have CpG

    island n Most mammalian housekeeping genes

    have CpG island at 5’ end

  • 6

    GC content

    n GC Content: n Emboss -> CompSeq n Emboss -> GEECEE n Bioedit

    n CpG Island: n http://l25.itba.mi.cnr.it/genebin/wwwcpg.pl (Italy) n Emboss -> CpGReport

    Pattern analysis

    n Patterns in the sequence n Associated with certain biological

    function n Transcription factor binding n Transcription starting n Transcription ending n Splicing n ……

  • 7

    Gene finding

    n A kind of pattern search n Gene structure

    n Promoter, Exon, Intron n Promoter: TATA box (TATAAT) n Exon: Open Reading Frame (ORF) n Intron: Only eukaryotes, have splicing signal n Other motifs

    Gene

    Picture from the LSM2104 Practical, V.B. LIT

  • 8

    Gene finding

    n Most of the programs focused on Open reading frame n Emboss -> GetORF n Emboss -> ShowORF

    n Other important elements: n Matrix binding site: Emboss -> MarScan n Promoter region: PromoterInspector n Splicing sites: GeneSplicer

    Gene finding

    n Prokaryotes n No intron n Long open reading frame n High density n Easy to detect

    n Eukaryotes n Have intron n Combination of short open reading frames n Low density n Hard to detect

  • 9

    Problem 1:

    n Is it a gene? n Not sure, but have some confidence

    n What is the expression level if it is a gene? n Determined by the promoter and other

    upper stream elements

    Translation

    n Six reading frames n Open reading frame (ORF)

    n Start codon n Stop codon n Certain length

    n Tools: ShowORF

  • 10

    Conceptual translation

    5’ AATGGCAATCCGCGTAGACTAGGCA 3’

    3’ TTACCGTTAGGCGCATCTGTATCGT 5’

    AATAATGGCGGCAATAATCCGCCGCGTCGTAGAAGACTACTAGGCGGCAA AAATGATGGCAGCAATCATCCGCCGCGTAGTAGACGACTAGTAGGCAGCA AAAATGGTGGCAACAATCCTCCGCGGCGTAGTAGACTACTAGGAGGCACA

    TTTACTACCGTCGTTAGTAGGCGGCGCATCATCTGCTGTATTATCGTCGT

    TTTTACCACCGTTGTTAGGAGGCGCCGCATCATCTGTTGTATCATCGTGT TTATTACCGCCGTTATTAGGCGGCGCAGCATCTTCTGTAGTATCGTCGTT

    +1

    -2 -3

    -1

    +2 +3

    Six reading frames

    AATAATGGCGGCAATAATCCGCCGCGTCGTAGAAGACTACTAGGCGGCAA N G N P R R L G N G N P R R L G

    AAAATGGTGGCAACAATCCTCCGCGGCGTAGTAGACTACTAGGAGGCACA W Q S A * T RW Q S A * T R

    TTTACTACCGTCGTTAGTAGGCGGCGCATCATCTGCTGTATTATCGTCGT

    TTTTACCACCGTTGTTAGGAGGCGCCGCATCATCTGTTGTATCATCGTGT TTATTACCGCCGTTATTAGGCGGCGCAGCATCTTCTGTAGTATCGTCGTT

    +1

    -2 -3

    -1

    +3

    AAATGATGGCAGCAATCATCCGCCGCGTAGTAGACGACTAGTAGGCAGCA M A I R V D * AM A I R V D * A

    +2

  • 11

    Problem 2

    n What is the possible product of this gene? n It is likely to be …. n This conceptual translation is in open

    reading frame …… n Can we get the gene product?

    n If expression level high: Directly separate n If expression level low: Clone it

    R ecom

    binant D N

    A

  • 12

    Primer design

    n Design primers only from accurate sequence data

    n Restrict your search to regions that best reflect your goals

    n Locate candidate primers n Verification of your choice

    Primer design

    n (primer 1) CTAGTACGAT n ATGCCGTAGATC……TCCGATCATGCTA n TACGGCATCTAG……AGGCTAGTACGAT n ATGCCGTAG (primer 2)

  • 13

    Primer design

    n Mispriming areas n Primer length: 18-30 (Usually) n Annealing Temperature (55 - 75 C) n GC content: 35% - 65% (usually) n Avoid regions of secondary structure n 100% complimentarity is not necessary n Avoid self-complimentarity

    Primer Design

    Online tools: n http://www.hgmp.mrc.ac.uk/GenomeWeb/nuc-

    primer.html n http://www-genome.wi.mit.edu/cgi-

    bin/primer/primer3_www.cgi n http://www.cybergene.se/primer.html

    Software tools n Omiga n Vecter NTI

  • 14

    Restriction map

    n Restriction enzyme n Recognize a pattern n Recognition site V.S. Cutting site

    n Select restriction enzyme to get a fragment of sequence

    n Rebuild the sequence to create or invalidate a restriction site

    n Tools: Omiga, remap, bioedit

  • 15

    Mutation

    n Can be generated by PCR n Primers that not perfectly match

    n Frame shift mutation n Insertion n Deletion

    n Substitution n Normal n Silent

  • 16

    Mutation

    n Test the importance n Mutate suspected important place

    n Create a pattern n Often silent mutation

    n Invalidate a pattern n Often silent mutation

    n Keep a reading frame

    Problem 3

    n Can we get the protein product? n Clone it and use a bacteria to express it

    n Can we figure out the key residue in the protein product? n Guess the important residue n Mutate the residue to see whether the

    activity loses

  • 17

    Summary

    n Life is determined by nucleotide sequences n Sequence analysis reveals patterns have

    biological significance n Sequence analysis helps the design of wet-lab

    experiments

    n Next part will be on protein sequence analysis