Chap. 5 Physical mapping of DNA Introduction to Computational Molecular Biology

Chap. 5Physical mapping of DNA

Introduction to Computational Molecular Biology

염색체 구조 및 위치 표현 염색체 구조

계층적 구조 염색체 번호 팔 : 긴 팔 (q), 짧은 팔 (p) 영역 밴드

염색체 위치 표현 특정 유전자의 염색체상의 위치 표시 계층적 구조에 따름 Exam> 1q23 의 밴드

1 번 염색체 긴 팔 2 번 영역 3 번 밴드

유전자 지도 유전자지도 (Gene Map)

염색체의 어느 위치에 어떤 유전자가 있는지 표시하는 것 머리털이나 눈 , 피부색 등을 결정하는 유전자 A, B, C 가 몇 번

염색체의 어느 위치에 있는가를 알 수 있다 .

물리지도 (Physical Map) 염색체를 제한효소로 절단해서 얻은 몇 개의 DNA 조각에

유전자 A, B, C 에 대응하는 probe 를 결합 이 조작으로 유전자 A, B, C 가 염색체의 어느 위치에

있는가를 유전자지도보다 훨씬 정확하게 상세히 알 수 있음 probe 는 무엇인가를 조사하기 위한 탐침 , 여기서는 이미

염기배열을 알 고 있는 DNA 한쪽 사슬을 말함

Types of maps

Genetic maps: 1-10Mb FISH maps: 1Mb Radiation hybrid maps: 100-500kb Optical restriction maps: 500kb STS clone maps: 100kb Restriction maps: 1kb

Marker 어떤 염색체에 있어서 식별 가능한 물리적 위치를 나타낸 DNA 부분

Enzyme 생체 내에서 특정 생화학적 반응을 촉매 하는 특이 단백질

Restriction endonuclease/enzyme 특정 염기 서열에서 DNA 를 잘라내는 효소

Restriction site 특수 제한 효소에 의해 절단되는 DNA 의 특정한 염기 서열 부위

Probe 염기서열 중에서 특정의 서열을 찾기 위하여 사용하는 단일가닥의 DNA

또는 RNA 분자 Clone

유전자 , 세포 또는 생물체 등에서 유전적으로 완전히 동일한 복제본의 집합 Cloning

특정한 유전자만을 순수하게 대량으로 생산해내는 것

용어 설명

Biological background Difference in comparing size

Human chromosome = 108 base pairs Fragment assembly = 104 base pairs

Extremely small part of a chromosome

Different techniques for comparing Create maps

Map of entire chromosome Map of significant fractions of chromosomes

Classes of maps Gene map Physical map Etc

Biological background Physical map

Fragment assembly on large scale Obtain the location of some markers along the original DNA

molecule Marker

Small but precisely defined sequences How do we create map?

Obtain several copies of the target DNA Each copy is broken up into several fragments

Using enzyme Mapping using fingerprint

Restriction site analysis hybridization

Fingerprint What’s fingerprint ?

The information contained in a fragment in some unique way

Element of fingerprint in each techniques Restriction site mapping

Lengths of fragments Hybridization mapping

Markers that is made by binding of probe

Restriction site mapping To locate the restriction sites on the target DNA Double digest

Using two enzyme (each and both) Comparing result (using permutation)

Partial digest Using one enzyme Giving more or less time to the enzyme Iterative method

Problems Uncertainty in length measurement

(gel electrophoresis = error up to 5%) If fragments are too small, don’t measure their length at all Lost in the digestion process

3 8 6 10

4 5 11 7

AB

A

B

3 1 5 2 6 3 7

Hybridization mapping Clones

Context fragments Replicated using a technique called cloning

Mapping Processing Verify whether a probe binds to the clone Overlap information(marker) between fragments

Problems Probe are not unique / Probe in repeats / Lack of data False negative / false positive / deletion / chimeric clone

1 2 3 4 5 6 7

3’ 4’ 5’ 6’ 7’ 8’ 9’ 10’clone A

clone B

Computational complexity Restriction site models

Double digest Permutation of each fragments One-to-one correspondence NP-complete problem Exponential solutions

Partial digest NP-completeness has not been proved

Hybridization mapping Using interval graph Exists NP-hard probelm

b

a

d

c e

a

b

d

ce

Consecutive ones property Assume

Probes are unique There are no errors All “clones x probes” hybridization experiments have been

done Clean for error

Permutation about row or column of m x n matrix Enable in Polynomial time

O(mn), O(m+n+r)

결과 permutation 은 해답일 뿐 정답은 아니다 .

Algorithmic implication What we are really trying to solve

Real-life problem, not an abstract mathematical problem True ordering of the clones

True ordering of the clones Problem features

NP-hard problem Time constraints Not be able to find optimal solutions No guarantee

Desirable features Iterative process (“better and better” or “closer to the truth”) Distinguishing “good” parts of solution

C1P Problem Goal

Find a permutation of the columns such that in each row all 1s are consecutive

Definition For each row i of M, let Si be the set of

columns k where Mi,k =1

Three situation related with row

ijji SS or SS 0ji SS

other the of subseta is them of none and ,SS ji 0

a b d c e

Ka 1 1 1 0 0

Kb 0 1 1 1 0

Kc 0 0 0 1 1

a

b

d

b

d

c c

eKa K

bKc

{2,7,8} {2,7,8} {2,7,8}

L1 -> … 0 1 1 1 0 …

{5}{2,7}{2,7}{8}

L1 -> … 0 0 1 1 1 0 …

L2 -> … 0 1 1 1 0 0 …

{5}{2}{7}{8}{1,4}{1,4}

L1 -> … 0 0 1 1 1 0 0 0 …

L2 -> … 0 1 1 1 0 0 0 0 …

L3 -> … 0 0 0 1 1 1 1 0 …

direction same ,llllll if )32,21min(31

direction opposite ,llllll if )32,21min(31

Time complexity O(mn)

C1 C2 C3 C4 C5 C6 C7 C8

L1 0 1 0 0 0 0 1 1

L2 0 1 0 0 1 0 1 0

L3 1 0 0 1 0 0 1 1

C1P Problem

ijji SS or SS 0ji SS

other the of subseta is them of none and ,SS ji 0

L1

L2

L8

L6 L7

L3

L5

L4

ab

c

d

Make componentsSeparate component

For joining component

Make linkage

Join component

Time complexity O(mn)

Processing Permutation in each a,b,c,d Combine with order Not specific Order May be multiple solution

a

c

d

b

a 의 원소가 b 의 원소를 포함하는

경우

Hybridization mapping with errors Effect errors (in clones x probes matrix M)

if a row corresponds to a chimeric clone two blocks of 1s separated by some number of 0s (call 'gap')

if there is a false negative the corresponding 0 may separate two blocks of 1s

if there is a false positive the corresponding 1 may separate two blocks of 0s

Close correspondence between errors and gaps in the matrix

Using a graph model gap minimization = traveling salesman problem(TSP) TSP : undirected weighted graph G Graph G

vertices = probes weight = number of different column of rows

(Hamming distance) Solution = Minimum weight path

P1 P2 P3 P4 P5 P6

C1 1 1 1 0 0 0

C2 0 1 1 1 0 0

C3 1 0 0 1 1 0

C4 1 1 1 1 0 0

P6

P5

P4

P3

P2

P1

01

2

4

3

2

2

2

2 22

3

3

3

4

Guarantee Formal proof

TSP approach will give us the true permutation with high probability)

(i,j,r,s = clone) h = hamming distance t = true distance

TSP is a good approximation to the true permutation The probability that both permutations are the same

tends to 1 as the number of probes increase In larger and larger instances of the problem

the true permutation with higher and higher probability

rsijrsij tthh

Computational practice Actual results of computational test with algorithms If we Use hybridization graph

Problem Unconnected graph Redundant probe

EvaluationHow ‘close’ the solution found

by a particular mapping algorithm

is to the true probe order

- using ‘strong adjacency cost’ of a given permutation

P1 P2 P3 P4 P5

C1 C2 C3 C4

Heuristics for hybridization mapping

Screening chimeric clones Solving Chimeric clone problem Key is the concept of “relatedness” between probes Connected -> not chimeric Not connected -> chimeric

Obtaining a good probe ordering Selecting one probe ordering Method

Split the other probes into two separate components Estimate for every probe p (the number of probes to its left and

right) Sort the probe and obtain one “good” permutation

P1 P2 P3

a b c

P4 P5 P6

d e f

P1 P2

P3

P4 P5

P6

Not chimeric Chimeric

Documents

Chap. 5 Physical mapping of DNA Introduction to Computational Molecular Biology