Upload
erick-preston
View
218
Download
0
Embed Size (px)
Citation preview
Chap. 5Physical mapping of DNA
Introduction to Computational Molecular Biology
염색체 구조 및 위치 표현 염색체 구조
계층적 구조 염색체 번호 팔 : 긴 팔 (q), 짧은 팔 (p) 영역 밴드
염색체 위치 표현 특정 유전자의 염색체상의 위치 표시 계층적 구조에 따름 Exam> 1q23 의 밴드
1 번 염색체 긴 팔 2 번 영역 3 번 밴드
유전자 지도 유전자지도 (Gene Map)
염색체의 어느 위치에 어떤 유전자가 있는지 표시하는 것 머리털이나 눈 , 피부색 등을 결정하는 유전자 A, B, C 가 몇 번
염색체의 어느 위치에 있는가를 알 수 있다 .
물리지도 (Physical Map) 염색체를 제한효소로 절단해서 얻은 몇 개의 DNA 조각에
유전자 A, B, C 에 대응하는 probe 를 결합 이 조작으로 유전자 A, B, C 가 염색체의 어느 위치에
있는가를 유전자지도보다 훨씬 정확하게 상세히 알 수 있음 probe 는 무엇인가를 조사하기 위한 탐침 , 여기서는 이미
염기배열을 알 고 있는 DNA 한쪽 사슬을 말함
Types of maps
Genetic maps: 1-10Mb FISH maps: 1Mb Radiation hybrid maps: 100-500kb Optical restriction maps: 500kb STS clone maps: 100kb Restriction maps: 1kb
Marker 어떤 염색체에 있어서 식별 가능한 물리적 위치를 나타낸 DNA 부분
Enzyme 생체 내에서 특정 생화학적 반응을 촉매 하는 특이 단백질
Restriction endonuclease/enzyme 특정 염기 서열에서 DNA 를 잘라내는 효소
Restriction site 특수 제한 효소에 의해 절단되는 DNA 의 특정한 염기 서열 부위
Probe 염기서열 중에서 특정의 서열을 찾기 위하여 사용하는 단일가닥의 DNA
또는 RNA 분자 Clone
유전자 , 세포 또는 생물체 등에서 유전적으로 완전히 동일한 복제본의 집합 Cloning
특정한 유전자만을 순수하게 대량으로 생산해내는 것
용어 설명
Biological background Difference in comparing size
Human chromosome = 108 base pairs Fragment assembly = 104 base pairs
Extremely small part of a chromosome
Different techniques for comparing Create maps
Map of entire chromosome Map of significant fractions of chromosomes
Classes of maps Gene map Physical map Etc
Biological background Physical map
Fragment assembly on large scale Obtain the location of some markers along the original DNA
molecule Marker
Small but precisely defined sequences How do we create map?
Obtain several copies of the target DNA Each copy is broken up into several fragments
Using enzyme Mapping using fingerprint
Restriction site analysis hybridization
Fingerprint What’s fingerprint ?
The information contained in a fragment in some unique way
Element of fingerprint in each techniques Restriction site mapping
Lengths of fragments Hybridization mapping
Markers that is made by binding of probe
Restriction site mapping To locate the restriction sites on the target DNA Double digest
Using two enzyme (each and both) Comparing result (using permutation)
Partial digest Using one enzyme Giving more or less time to the enzyme Iterative method
Problems Uncertainty in length measurement
(gel electrophoresis = error up to 5%) If fragments are too small, don’t measure their length at all Lost in the digestion process
3 8 6 10
4 5 11 7
AB
A
B
3 1 5 2 6 3 7
Hybridization mapping Clones
Context fragments Replicated using a technique called cloning
Mapping Processing Verify whether a probe binds to the clone Overlap information(marker) between fragments
Problems Probe are not unique / Probe in repeats / Lack of data False negative / false positive / deletion / chimeric clone
1 2 3 4 5 6 7
3’ 4’ 5’ 6’ 7’ 8’ 9’ 10’clone A
clone B
Computational complexity Restriction site models
Double digest Permutation of each fragments One-to-one correspondence NP-complete problem Exponential solutions
Partial digest NP-completeness has not been proved
Hybridization mapping Using interval graph Exists NP-hard probelm
b
a
d
c e
a
b
d
ce
Consecutive ones property Assume
Probes are unique There are no errors All “clones x probes” hybridization experiments have been
done Clean for error
Permutation about row or column of m x n matrix Enable in Polynomial time
O(mn), O(m+n+r)
결과 permutation 은 해답일 뿐 정답은 아니다 .
Algorithmic implication What we are really trying to solve
Real-life problem, not an abstract mathematical problem True ordering of the clones
True ordering of the clones Problem features
NP-hard problem Time constraints Not be able to find optimal solutions No guarantee
Desirable features Iterative process (“better and better” or “closer to the truth”) Distinguishing “good” parts of solution
C1P Problem Goal
Find a permutation of the columns such that in each row all 1s are consecutive
Definition For each row i of M, let Si be the set of
columns k where Mi,k =1
Three situation related with row
ijji SS or SS 0ji SS
other the of subseta is them of none and ,SS ji 0
a b d c e
Ka 1 1 1 0 0
Kb 0 1 1 1 0
Kc 0 0 0 1 1
a
b
d
b
d
c c
eKa K
bKc
{2,7,8} {2,7,8} {2,7,8}
L1 -> … 0 1 1 1 0 …
{5}{2,7}{2,7}{8}
L1 -> … 0 0 1 1 1 0 …
L2 -> … 0 1 1 1 0 0 …
{5}{2}{7}{8}{1,4}{1,4}
L1 -> … 0 0 1 1 1 0 0 0 …
L2 -> … 0 1 1 1 0 0 0 0 …
L3 -> … 0 0 0 1 1 1 1 0 …
direction same ,llllll if )32,21min(31
direction opposite ,llllll if )32,21min(31
Time complexity O(mn)
C1 C2 C3 C4 C5 C6 C7 C8
L1 0 1 0 0 0 0 1 1
L2 0 1 0 0 1 0 1 0
L3 1 0 0 1 0 0 1 1
C1P Problem
ijji SS or SS 0ji SS
other the of subseta is them of none and ,SS ji 0
L1
L2
L8
L6 L7
L3
L5
L4
ab
c
d
Make componentsSeparate component
For joining component
Make linkage
Join component
Time complexity O(mn)
Processing Permutation in each a,b,c,d Combine with order Not specific Order May be multiple solution
a
c
d
b
a 의 원소가 b 의 원소를 포함하는
경우
Hybridization mapping with errors Effect errors (in clones x probes matrix M)
if a row corresponds to a chimeric clone two blocks of 1s separated by some number of 0s (call 'gap')
if there is a false negative the corresponding 0 may separate two blocks of 1s
if there is a false positive the corresponding 1 may separate two blocks of 0s
Close correspondence between errors and gaps in the matrix
Using a graph model gap minimization = traveling salesman problem(TSP) TSP : undirected weighted graph G Graph G
vertices = probes weight = number of different column of rows
(Hamming distance) Solution = Minimum weight path
P1 P2 P3 P4 P5 P6
C1 1 1 1 0 0 0
C2 0 1 1 1 0 0
C3 1 0 0 1 1 0
C4 1 1 1 1 0 0
P6
P5
P4
P3
P2
P1
01
2
4
3
2
2
2
2 22
3
3
3
4
Guarantee Formal proof
TSP approach will give us the true permutation with high probability)
(i,j,r,s = clone) h = hamming distance t = true distance
TSP is a good approximation to the true permutation The probability that both permutations are the same
tends to 1 as the number of probes increase In larger and larger instances of the problem
the true permutation with higher and higher probability
rsijrsij tthh
Computational practice Actual results of computational test with algorithms If we Use hybridization graph
Problem Unconnected graph Redundant probe
EvaluationHow ‘close’ the solution found
by a particular mapping algorithm
is to the true probe order
- using ‘strong adjacency cost’ of a given permutation
P1 P2 P3 P4 P5
C1 C2 C3 C4
Heuristics for hybridization mapping
Screening chimeric clones Solving Chimeric clone problem Key is the concept of “relatedness” between probes Connected -> not chimeric Not connected -> chimeric
Obtaining a good probe ordering Selecting one probe ordering Method
Split the other probes into two separate components Estimate for every probe p (the number of probes to its left and
right) Sort the probe and obtain one “good” permutation
P1 P2 P3
a b c
P4 P5 P6
d e f
P1 P2
P3
P4 P5
P6
Not chimeric Chimeric