79
1- 1 Chapter 1 Introduction

1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 1

Chapter 1

Introduction

Page 2: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 2

Introduction – Gene( 基因 ) History

1865 Mendel: The basic unit of inheritance is a gene.

Mendel’s work was forgotten until 1900s. 1944 The gene was known to be made of

DNA (Deoxyribonucleic Acid). 1953 James Watson and Francis Crick :

Double helical structure of DNA.

( 雙股螺旋 )

Page 3: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 3

Introduction – Gene History (Cont.)

1990 The Human Genome Project ( 人類基 因體計畫 ) started. 1995 The first free-living organism to be

sequenced : haemophilus influenzae( 流行性感冒嗜血桿菌 )

1998 CELERA joined the gene research. 2000 The human DNA sequence draft was

completed (published in 2001).

Page 4: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 4

Bioinformatics - 國內相關計畫 2000 年國科會「生物資訊」跨領域研究 2001 年國科會國家型研究計畫

基因體醫學國家型計畫 2001 年國科會跨領域專題研究

工程處:資訊科技 生物處:生物資訊

Page 5: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 5

動物細胞 ( 細胞核、細胞質、細胞膜 )

DNA 位於細胞核內之「核仁」

Page 6: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 6

DNA Double Helix ( 雙股螺旋)

Page 7: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 7

DNA Double Helix ( 雙股螺旋)

Page 8: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 8

DNA 中核甘酸間之鍵結

Page 9: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 9

核甘酸 核甘酸 (Nucleotide) 為核酸分子構成單元 核甘酸包含:

五碳糖 (去氧核糖 , deoxyribose) 磷酸基 (phosphate group) 含氮鹼基之一 (A 、 G 、 C 、 T 、 U)

胞嘧啶 (C)

Page 10: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 10

DNA 四種含氮鹼基

Page 11: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 11

DNA Double Helix ( 雙股螺旋)

Page 12: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 12

DNA Sequence

Page 13: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 13

DNA and RNA Nucleotide ( 核甘酸 ) : 腺嘌呤 (adenine, A)

鳥糞嘌呤 (guanine, G)胞嘧啶 (cytosine, C)胸腺嘧啶 (thymine, T)尿嘧啶 (uracil, U)

DNA(deoxyribonucleic acid , 去氧核糖核酸 ) {A, G, C, T} (base pair: GC, A=T ) RNA(ribonucleic acid, 核糖核酸 ) {A, G, C, U} (base pair: GC, A=U, GU )

Page 14: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 14

DNA Length

The total length of the human DNA is about 3109 (30 億 ) base pairs.

1% ~ 1.5% of DNA sequence is useful. # of human genes: 30,000~40,000

Conclusion from the human genome project Expected # is 100,000 originally.

Page 15: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 15

DNA Sequencing( 定序 ) Given DNA sequence:

TGCACTTGACGCATGCT

Cut the sequence after random A:

ATGCT length=5

ACGCATGCT length=9

AACGCATGCT length=10

ACTTGAACGCATGCT length=15

Page 16: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 16

DNA Sequencing 電泳法 (eletrophoresis)

Page 17: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 17

DNA Sequencing

Page 18: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 18

Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

Page 19: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 19

General Structure of an Amino Acid

COO

C HH3N +

CH2

CH2

CH2

CH2

NH3+

Carboxyl Group

Amino Group

R Group

3 groups:

Amino Group ( 胺基 )

Carboxyl Group (羧基 )

R Group (R 基團 )

Page 20: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 20

Amino Acids ( 胺基酸 ) 分子

Page 21: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 21

Amino Acids ( 胺基酸 )分子

Page 22: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 22

Protein ( 蛋白質 ) 分子

Page 23: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 23

Amino Acids and RNA每三個核甘酸 (codon ,基因密碼 ) 對應至一種胺基

酸。Second Position of CodonU C A G

First

Position

U

UUU Phe [F]UUC Phe [F]UUA Leu [L]UUG Leu [L]

UCU Ser [S]UCC Ser [S]UCA Ser [S]UCG Ser [S]

UAU Tyr [Y]UAC Tyr [Y]UAA Ter [en

d]UAG Ter [end]

UGU Cys [C]UGC Cys [C]

UGA Ter [end]UGG Trp [W]

UCAG

Third

Position

C

CUU Leu [L]CUC Leu [L]CUA Leu [L]CUG Leu [L]

CCU Pro [P]CCC Pro [P]CCA Pro [P]CCG Pro [P]

CAU His [H]CAC His [H]CAA Gln [Q]CAG Gln [Q]

CGU Arg [R]CGC Arg [R]CGA Arg [R]CGG Arg [R]

UCAG

A

AUU Ile [I]AUC Ile [I]AUA Ile [I]AUG Met

[M]

ACU Thr [T]ACC Thr [T]ACA Thr [T]ACG Thr [T]

AAU Asn [N]AAC Asn [N]AAA Lys [K]AAG Lys [K]

AGU Ser [S]AGC Ser [S]AGA Arg [R]AGG Arg [R]

UCAG

G

GUU Val [V]GUC Val [V]GUA Val [V]GUG Val [V]

GCU Ala [A]GCC Ala [A]GCA Ala [A]GCG Ala [A]

GAU Asp [D]GAC Asp [D]GAA Glu [E]GAG Glu [E]

GGU Gly [G]GGC Gly [G]GGA Gly [G]GGG Gly [G]

UCAG

AUG is also the “start” codon.

Page 24: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 24

From DNA via RNA to Protein

Page 25: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 25

DNA, Genes and Proteins

DNA: program for cell processes Proteins: execute cell processes

TCCAA

CGGTGC

TGAGGT

GCAC

GeneProtein

DNA

Page 26: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 26

Promoter( 啟動子 ) and Gene

TranscriptionalStart Site

ATG TAG

TranscriptionalTermination Site

TATA

TTG

PromoterUpstream Downstream

intron

exon

Page 27: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 27

GeneRegulatory Element

RNA polymerase(Protein)

Transcription Factor(Protein)

DNA

By Blanchette

Regulation ( 調控 ) of Genes

Page 28: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 28

Gene

RNA polymerase

Transcription Factor(Protein)

Regulatory Element

DNA

By Blanchette

Regulation of Genes

Page 29: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 29

Gene

RNA polymerase

Transcription Factor

Regulatory Element

DNA

New protein

By Blanchette

Regulation of Genes

Page 30: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 30

From DNA via RNA to Protein

Page 31: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 31

From RNA to Protein

Page 32: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 32

From RNA to Protein

Page 33: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 33

Primary Structure ( 一級結構 ) of Protein

牛的胰島素 ( 一種蛋白質 ) 之胺基酸序列

Page 34: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 34

Secondary Structure ( 二級結構 ) of Protein

Page 35: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 35

Tertiary Structure ( 三級結構 ) of Protein

血紅素分子三級結構

Page 36: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 36

Quaternary Structure ( 四級結構 ) of Protein

血紅素分子四級結構

Page 37: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 37

Problems on Different Levels

Page 38: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 38

Some Problems in Bioinformatics Sequence comparison

Longest common subsequence Edit distance Similarity Multiple sequence alignment

Fragment assembly of DNA sequences Shortest common superstring

Physical mapping Double digest problem Consecutive ones problem

Evolutionary trees Molecular structure prediction

Protein folding

Page 39: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 39

Sequence Comparison

Goals: Database search: Given a sequence S and a set

of sequences G, to find all the sequences in G, which are similar to S.

Similarity: To find which parts of the sequences are alike and which parts differ.

- Sequence alignment (global alignment)

- Local alignment

Page 40: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 40

Sequence Alignement

Global alignment

Local alignment

Page 41: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 41

Longest Common Subsequence(1)

To find a longest common subsequence between two strings.

string1: TAGTCACG

string2: AGACTGTC

LCS :AGACG Dynamic programming:

jiji

jiji

jiji

ji

baifc

baifc

baifc

c

0

0

1

max

1,

,1

1,1

,

Page 42: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 42

Longest Common Subsequence(2)

TAGTCACGAGACTGTCLCS:AGACG

- A G A C T G T C

0 0 0 0 0 0 0 0 0-

0 0 0 0 0 1 1 1 1T

0 1 1 1 1 1 1 1 1A

0 1 2 2 2 2 2 2 2G

0 1 2 2 2 3 3 3 3T

0 1 2 2 3 3 3 3 4C

0 1 2 3 3 3 3 3 4A

0 1 2 3 4 4 4 4 4C

0 1 2 3 4 4 5 5 5G

S2

S1

Page 43: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 43

Edit Distance(1) To find a smallest edit process between two string

s.

S1: TAGTCAC G

S2: AG ACTGTC

Operation: DMMDDMMIMII

Insertbdistc

Deleteadistc

baMatchc

c

jji

iji

jiji

ji

),(

),(

)(0

min

1,

,1

1,1

,

.1),(),( Suppose ji bdistadist

Page 44: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 44

Edit Distance(2)

TAGTCAC G

AG ACTGTC

DMMDDMMIMII

- A G A C T G T C

0 1 2 3 4 5 6 7 8-

1 2 3 4 5 4 5 6 7T

2 1 2 3 4 5 6 7 8A

3 2 1 2 3 4 5 6 7G

4 3 2 3 4 3 4 5 6T

5 4 3 4 3 4 5 6 5C

6 5 4 3 4 5 6 7 6A

7 6 5 4 3 4 5 6 7C

8 7 6 5 4 5 4 5 6G

ci-1,j-1 ci-1,j

ci,jci,j-1

S2

S1

Page 45: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 45

Similarity

Two sequences s1 and s2.

p is the match value if ai = bj, else it is the mismatch value.

g is the gap penalty.

jiji

jiji

jiji

ji

baifgc

baifgc

baifpc

c

1,

,1

1,1

, max

Page 46: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 46

Sequence Alignment

a = TAGTCACGb = AGACTGTC----TAGTCACG TAGTCAC-G--

AGACT-GTC--- -AG--ACTGTC Which one is better?

Page 47: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 47

Sequence Alignment Formulac0,0 = 0

ci,0 = ic0,j = j

if ai bj

if ai = bj

2

1

1

1

maxmax

1,1

1,

,1

1,1

,

ji

ji

ji

ji

ji

c

c

c

c

c

Page 48: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 48

Sequence Alignment Example

TAGTCAC-G--

-AG--ACTGTC

- A G A C T G T C

0 -1 -2 -3 -4 -5 -6 -7 -8-

-1 -1 -2 -3 -4 -2 -3 -4 -5T

-2 1 0 0 -1 -2 -3 -4 -5A

-3 0 3 2 1 0 0 -1 -2G

-4 -1 2 2 1 3 2 2 1T

-5 -2 1 1 4 3 2 1 4C

-6 -3 0 3 3 3 2 1 3A

-7 -4 -1 2 5 4 3 2 3C

-8 -5 -2 1 4 4 6 5 4G

Page 49: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 49

Multiple Sequence Alignments1 = ATTCGAT

s2 = TTGAG

s3 = ATGCT alignments1 = ATTCGAT

s2 = -TT-GAG

s3 = AT--GCT If the number of sequences is k, and k is large,

how to solve the problem? NP-complete problem

Page 50: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 50

Multiple Sequence Alignment - SP

Sum-of-pairs

score = ji

ji SSscoring ),(

Page 51: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 51

Example of Sum-of-pairs Score

s1 = ATTCGAT

s2 = -TT-GAG

s3 = AT--GCT

For the alignment, the pairwise alignment scores are:

score(s1,s2) = 5

score(s2,s3) = 0

score(s1,s3) = 5 SP score = 10

Page 52: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 52

Multiple Sequence Alignment - Star Star alignment is an approximation system

of sum-of-pairs (SP) scoring system. Star alignment score =

TiiT SSscoring ),(

Page 53: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 53

Example of Star Scores1 = ATTCGAT

s2 = -TT-GAG

s3 = AT--GCTFor the alignment, the pairwise alignment scores are:

score(s1,s2) = 5

score(s2,s3) = 0

score(s1,s3) = 5

Star score = max{score(s1,s2)+score(s1,s3), score(s2,s1)+score(s2,s3),

score(s3,s1)+score(s3,s2)}= max{5+5, 5+0, 5+0} = 10

Page 54: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 54

Multiple Sequence Alignment - Tree Tree

score =

where Si and Sj are adjacent, Sk and Sl are adjace

nt.

lk

lkji

ji SSscoringSSscoring,,

),(),(

Page 55: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 55

Fragment Assembly

Depending on experimental factors: Fragment length can be as low as 200 or high

as 700. Typical problems involve target sequences

30,000 to 100,000 base-pairs long, and total number of fragments is in the range 500 to 2000.

Page 56: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 56

Shortest Common Superstring Given a set of k strings P ={s1,s2…,sk}, to fi

nd a shortest superstring s containing every string in P as a substring. That is, |s| is the minimal.ACCGT --ACCGT--CGTGC ----CGTGCTTAC TTAC-----TACCGT -TACCGT

---------------TTACCGTGC

NP-complete problem

Page 57: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 57

Physical Mapping Given a (0,1) matrix of probes versus

clones, to reconstruct the relative places of clones or probes.

NP-complete problem

Page 58: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 58

Consecutive Ones Problem(1)

Page 59: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 59

Consecutive Ones Problem(2) Consider a (0,1) matrix M, with rows indexed by

clones and columns by probes, and position (i, j) is 1 if clone i contains probe j.

The problem is to permute the columns so that the ones in each row are consecutive.

A (0, 1) matrix has the k-consecutive ones property (k-C1P) if there exists a column order such that in each row the occurrences of all ones appear in at most k consecutive blocks.

The k-consecutive ones Problem: Does a given (0, 1) matrix have the k-consecutive ones

property? NP-complete, for k 2

Page 60: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 60

Double Digest Problem(1)

enzyme a = |A| = {1, 3, 3, 12}

enzyme b = |B| = {1, 2, 3, 3, 4, 6}

c = |A B| = |C| = {1, 1, 1, 1, 2, 2, 2, 3, 6}

Page 61: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 61

Double Digest Problem(2) Given the lengths of fragments, |Xi Xj|, 1 i j

n, obtained by applying either one of the two restriction enzymes A and B, or both, to determine the order of these fragments.

a = |A| = {ai: 1 i n} from the first digest

b = |B| = {bi: 1 i m} from the second digest.

c = |A B| = |C| = {ci: 1 i l} from first and second digests.

ni mi li

iii cba1 1 1

Page 62: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 62

Evolutionary Trees(1)

siamang( )合趾猴

gibbon( )長臂猿

orangutan( )猩猩

human( )人類

gorilla( )大猩猩

chimpanzee( )黑猩猩

Page 63: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 63

Evolutionary Trees(2) Genome sequences: Given genomes of

several organisms, to build an evolutionary tree in which the number of mutations (changes) is minimal.

Character matrix: Given a (0, 1) character state matrix of several organisms, to build a perfect evolutionary tree.

Distance matrix: Given a distance matrix of several organisms, to build a tree satisfying the distances between all organisms.

Page 64: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 64

Perfect Phylogeny(1)

Page 65: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 65

Protein Structure

Page 66: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 66

Protein Folding Given the primary structure of a protein, to

compute or evaluate its 3-dimensional structure.

Primary structure (sequence):

Page 67: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 67

Protein Folding Problem

The characteristic of each amino: H (hydrophobic, non-polar)

(hating water, 疏水性 ) P (hydrophilic, polar)

(loving water, 親水性 ) The amino acid sequence of a protein can be vi

ewed as a binary sequence of H’s (1’s) and P’s (0’s).

Page 68: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 68

Example of H-P Model

Input sequence: 011001001110010

0 1 1 0

0

1

00

1

11

1 0

0

0

0 1 1 0

0

1

00

1

11

1

0

0

0

Score = 5Score = 3

Page 69: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 69

Protein folding on H-P Model

The protein folding on H-P model: Given a sequence of 1’s (H’s) and 0’s (P’s), to find a self-avoiding paths embedded in either a 2D or 3D lattice such that the number of pairs of adjacent 1’s is maximized.

NP-complete even for 2D lattice.

Page 70: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 70

RNA Secondary Structure Prediction (1)

RNA: {A, G, C, U} Base pairs:

GC (Watson-Crick base pair)

A=U (Watson-Crick base pair)

GU (Wobble base pair) (a,b) is defined as 1 if a and b can form a

base pair; otherwise it is 0.

Page 71: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 71

RNA Secondary Structure Prediction (2)

Page 72: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 72

RNA Secondary Structure Prediction (3)

X(i,j) is maximum number of base pairs in the sequence aiai+1…aj, i j.

Dynamic Programming:

X(i,j) = 0 if | j i | 1.

i k j 1.

)},()],1(1)1,(max{[

)1,1(max),(

1jk aajkXkiX

jiXjiX

Page 73: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 73

Reference - Books Algorithms on strings, trees, and sequences : com

puter science and computational biology, Dan Gusfield, Cambridge University Press, 1997.

Introduction to computational molecular biology, Joao Carlos Setubal and Joao Meidanis, PWS Pub., 1997.

Introduction to computational biology: maps, sequences and genomes, Michael S. Waterman, Champman & Hall, 1995.

Manuscript of Prof. R. C. T. Lee http://www.csie.ncnu.edu.tw/~rctlee/biology.html

Page 74: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 74

Reference – Books (Biology) 生物學 C. Starr & R. Taggart 原著 丁澤民 王偉 張世衿 連慧瑞 編譯 現代分子生物學 朱玉賢 李毅 編著 藝軒出版社 分子生物學入門 駒野徹、酒井裕 合著 何士慶 譯 科技圖書 DNA 圖解小百科 (名詞解釋) 威惹利、培瑞、李哈 合著 潘震澤 譯 新新聞文化公司

Page 75: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 75

Reference – Journals (1) Bioinfomatics (SCI) Bulletin of Mathematical Biology (SCI) Computer Applications in the Biosciences Journal of Computational Biology (SCI expanded) Journal of Mathematical Biology (SCI) Journal of Molecular Biology (SCI) Nucleic Acids Research (SCI) Gene (SCI) Science (SCI)

Page 76: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 76

Reference – Journals (2) Genome Research (SCI) PROTEINS: Structure, Function, and Bioinformati

cs (SCI) Gene (SCI) Current Opinion in Structural Biology (SCI) Protein-Structure Function and Bioinfomatics (SC

I) BMC Bioinformatics (SCI Expanded) Computational Biology and Chemistry (SCI) BioSystems (SCI)

Page 77: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 77

Reference – Web Sites C. B. Yang http://par.cse.nsysu.edu.tw Bioweb link of C.B.Yang BioWeb http://bioweb.uwlax.edu/ MIT Biology Hypertextbook http://esg-w

ww.mit.edu:8001/esgbio/ Bioinformatics Related Journals http://www.

iscb.org/journals.html NCBI (National Center for Biotechnology Inform

ation http://www.ncbi.nlm.nih.gov/

Page 78: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 78

Conclusion (1)Bioinformatics and Computer Science Algorithm: all computing problems. Image processing: 3D images of RNA folds

or protein. Database: massive database and retrieval. Distributed system and parallel processing:

massive storage and accelerating computation.

Page 79: 1- 1 Chapter 1 Introduction. 1- 2 Introduction – Gene( 基因 ) History 1865 Mendel: The basic unit of inheritance is a gene. Mendel’s work was forgotten

1- 79

Conclusion (2)

Biology easily has 500 years of exciting problems to work on.

-- Donald E. Knuth