Upload
conrad-fleming
View
216
Download
1
Tags:
Embed Size (px)
Citation preview
some simple basics
1 Introduction
2 Theoretical background Biochemistry/molecular biology
3 Theoretical background computer science
4 History of the field
5 Splicing systems
6 P systems
7 Hairpins
8 Detection techniques
9 Micro technology introduction
10 Microchips and fluidics
11 Self assembly
12 Regulatory networks
13 Molecular motors
14 DNA nanowires
15 Protein computers
16 DNA computing - summery
17 Presentation of essay and discussion
Course outline
Some basics
Book
The contents of the following presentation are based on work discussed in Chapter 2 of
DNA Computingby G. Paun, G. Rozenberg, and A. Salomaa
Book
We have seen how DNA can be used to solve
various optimization problems.
Leonard Adelman was able to use encoded DNA
to solve the Hamiltonian Path for a single-
solution 7-node graph.
The drawbacks to using DNA as a viable
computational device mainly deal with the
amount of time required to actually analyze
and determine the solution from a test tube
of DNA.
Adelman’s experiment
Further considerations
For Adelman’s experiment, he required the
use of words of 20 oligonucleotides to
encode the vertices and edges of the graph.
Due to the nature of DNA’s 4-base language,
this allowed for 420 different
combinations.
However longer length oligonucleotides
would be required for larger graphs.
Given the nature of DNA, we can easily
determine a set of rules to operate on DNA.
Defining a Rule Set allows for programming
the DNA much like programming a computer.
The rule set assume the following: DNA exists in a test tube The DNA is in single stranded form
Defining a rule set
Merge simply merges two test tubes of
DNA to form a single test tube.
Given test tubes N1 and N2 we can merge
the two to form a single test tube, N,
such that N consists of N1 U N2.
Formal Definition: merge(N1, N2) = N
Merge
Amplify takes a test tube of DNA and
duplicates it.
Given test tube N1 we duplicate it to
form test tube N, which is identical
to N1.
Formal Definition: N = duplicate(N1)
Amplify
Detect looks at a test tube of DNA and
returns true if it has at least a single
strand of DNA in it, false otherwise.
Given test tube N, return TRUE if it
contains at least a single strand of DNA,
else return FALSE.
Formal Definition: detect(N)
Detect
Separate simply separates the contents of a test
tube of DNA based on some subsequence of bases.
Given a test tube N and a word w over the
alphabet {A, C, G, T}, produce two tubes +(N, w)
and –(N, w), where +(N, w) contains all strands
in N that contains the word w and –(N, w)
contains all strands in N that doesn’t contain
the word w.
Formal Definition: N ← +(N, w) N ← -(N, w)
Separate / Extract
Length-Separate takes a test tube and
separates it based on the length of the
sequences
Given a test tube N and an integer n we
produce a test tube that contains all DNA
strands with length less than or equal to n.
Formal Definition: N ← (N, ≤ n)
Length - separate
Position-Separate takes a test tube and
separates the contents of a test tube of DNA
based on some beginning or ending sequence.
Given a test tube N1 and a word w produce the
tube N consisting of all strands in N1 that
begins/ends with the word w.
Formal Definition: N ← B(N1, w)
N ← E(N1, w)
Position - Separate
From the given rules, we can now manipulate our
strands of DNA to get a desired result.
Here is an example DNA Program that looks for
DNA strands that contain the subsequence AG and
the subsequence CT:
1 input(N)
2 N ← +(N, AG)
3 N ← -(N, CT)
4 detect(N)
A simple example
1. input(N)
Input a test tube N containing single stranded sequences
of DNA
2. N ← +(N, AG)
Extract all strands that contain the AG subsequence.
3. N ← -(N, CT)
Extract all strands that contain the CT subsequence. Note
that this is done to the test tube that has all AG
subsequence strands extracted, so the final result is a
test tube which contains all strands with both the
subsequence AG and CT.
4. detect(N)
Returns TRUE if the test tube has at least one strand of
DNA in it, else returns FALSE.
Explanation
Now that we have some simple rules at our
disposal we can easily create a simple
program to solve the Hamiltonian Path
problem for a simple 7-node graph as
outlined by Adelman.
Back to Adelman’s experiment
1 input(N)
2 N ← B(N, s0)
3 N ← +(N, s6)
4 N ← +(N, ≤ 140)
5 for i = 1 to 5 do
begin
N ← +(N, si)
end
6 detect(N)
The program
1 Input(N)
Input a test tube N that contains all of the
valid vertices and edges encoded in the graph.
2 N ← B(N, s0)
Separate all sequences that begin with the
starting node.
3 N ← E(N, s6)
Further separate all sequences that end with
the ending node.
Explanation
5 N ← (N, ≤ 140)
Further isolate all strands that have a length of
140 nucleotides or less (as there are 7 nodes and
a 20 oligonucleotide encoding).
6 for i = 1 to 5 do begin N ← +(N, si) end
Now we separate all sequences that have the
required nodes, thus giving us our solutions(s),
if any.
7 detect(N)
See if we actually have a solution within our test
tube.
Explanation
In most computational models, we define a
memory, which allows us to store information
for quick retrieval.
DNA can be encoded to serve as memory
through the use of its complementary
properties.
We can directly correlate DNA memory to
conventional bit memory in computers through
the use of the so called sticker Model.
Adding Memory
We can define a single strand of DNA as
being a memory strand.
This memory strand serves as the template
from which we can encode bits into.
We then use complementary stickers to
attach to this template memory strand and
encode our bits.
The sticker model
How it works
Consider the following strand of DNA:
CCCC GGGG AAAA TTTT
This strand is divided into 4 distinct
sub-strands.
Each of these sub-strands have exactly
one complementary sub-strand as follows:
GGGG CCCC TTTT AAAA
How it works
As a double Helix, the DNA forms the
following complex:
CCCC GGGG AAAA TTTT
GGGG CCCC TTTT AAAA
If we were to take each sub-strand as
a bit position, we could then encode
binary bits into our memory strand.
Each time a sub-sequence sticker has
attached to a sub-sequence on the memory
template, we say that that bit slot is on.
If there is no sub-sequence sticker attached
to a sub-sequence on the memory template,
then we say that the bit slot is off.
How it works
Some memory examples
For example, if we wanted to encode the
bit sequence 1001, we would have:
CCCC GGGG AAAA TTTT
GGGG AAAA
As we can see, this is a direct coding
of 1001 into the memory template.
This is a rather good encoding, however, as we
increase the size of our memory, we have to ensure
that our sub-strands have distinct complements in
order to be able to set and clear specific bits in
our memory.
We have to ensure that the bounds between
subsequences are also distinct to prevent
complementary stickers from annealing across
borders.
The Biological implications of this are rather
difficult, as annealing long strands of sub-
sequences to a DNA template is very error-prone.
Disadvantages
The clear advantage is that we have a distinct
memory block that encodes bits.
The differentiation between subsequences
denoting individual bits allows a natural
border between encoding sub-strands.
Using one template strand as a memory block
also allows us to use its complement as another
memory block, thus effectively doubling our
capacity to store information.
Advantages
Now that we have a memory structure, we
can being to migrate our rules to work
on our memory strands.
We can add new rules that allows us to
program more into our system.
So now what?
Separate now deals with memory strands. It takes a
test tube of DNA memory strands and separates it
based on what is turned on or off.
Given a test tube, N, and an integer i, we separate
the tubes into +(N, i) which consists of all memory
strands for which the ith sub-strand is turned on
(e.g. a sticker is attached to the ith position on
the memory strand). The –(N, i) tube contains all
memory strands for which the ith sub-strand is turned
off.
Formal Definition: Separate +(N, i) and –(N, i)
Separate
Set sets a position on a memory position
(i.e. turns it on if it is off) on a
strand of DNA.
Given a test tube, N, and an integer i,
where 1 ≤i ≤ k (k is the length of the DNA
memory strand), we set the ith position to
on.
Formal Definition: set(N, i)
Set
Clear clears a position on a memory
position (i.e. turns it off if it is
on) on a strand of DNA.
Given a test tube, N, and an integer i,
where 1 ≤ i ≤ k (k is the length of the
DNA memory strand), we clear the ith
position to off.
Formal Definition: clear(N, i)
Clear
Read reads a test tube, which has an
isolated memory strand and determines
what the encoding of that strand is.
Read also reports when there is no
memory strand in the test tube.
Formal Definition: read(N)
Read
To effectively use the Sticker Model, we define a library for input purposes.
The library consists of a set of strands of DNA.
Each strand of DNA in this library is divided into two sections, a initial data input section, and a storage/output section.
Defining a library
The formal notation of a library is as follows:
(k, l) library (where k and l are integers, l ≤
k )
k refers to the size of the memory strand
l refers to length of the positions allowed for
input data.
The initial setup of the memory strand is such
that the first l positions are set with input
data and the last k – l positions are clear.
Library setup
Consider the following encoding for a library:
(3, 2) library.
From this encoding, we see that we have a
memory strand that is of size 3, and has 2
positions allowed for input data.
Thus the first 2 positions are used for input
data, and the final position is used for
storage/input.
A simple example
A quick visualisation
Here is a visualisation of this library:
CCCC CCCC CCCC Encoding: 000
CCCC GGGG AAAA
GGGG CCCC Encoding: 110
CCCC GGGG AAAA
CCCC Encoding: 010
CCCC GGGG CCCC
GGGG Encoding: 100
From this visualisation we see that we can
achieve an encoding of 2l different kinds
of memory complexes.
We can formally define a memory complex as
follows: w0k-l, where w is the arbitrary
binary sequence of length l, and 0
represents the off state of the following
k-l sequences on the DNA memory strand.
Memory considerations
Consider the following NP-complete problem:
Minimal Set Cover
Given a finite set S = {1, 2, …, p} and a
finite collection of subsets {C1, …, Cq}
of S, we wish to find the smallest subset I
of {1, 2, …, q} such that all of the
points in S are covered by this subset.
We can solve this problem by using the brute
force method of going through every single
combination of the subsets {C1, …, Cq}.
We will use our rules to implement the same
strategy using our DNA system.
An interesting example
We will use a library with the following
attributes: (p+q, q) library.
This basically means that our memory stick has p+q
positions to model the p points we want to cover
and the q subsets that we have in the problem.
Q will then be our data input positions, which are
the q subsets that we have in the problem.
What we basically have is the first q positions as
are data input section, and the last p position as
our storage area.
Using DNA
The algorithm is rather simple. We encode all of
the subsets that we have in our problem into the
first q positions of our DNA strand. This
represents a potential solution to our problem
Each position in our q positions represent a
single subset that is in our problem.
A position that is turned on represents
inclusion of that set in the solution.
We simply go through each of the possibilities
for the q subsets in our problem.
Using DNA
The p positions represents the points that we have to
cover, one position for each point.
The algorithm simply takes each set in q and checks which
points in p it covers.
Then it sets that particular point position in p to on.
Once all of the positions in p are turned on, we know that
we have a sequence of subset covers that covers all
points.
Then all we have to do is look at all solutions and
determine which one contains the smallest amount of subset
covers.
Using DNA
So far we’ve mapped each subset cover to a
position and each point to a position.
However, each subset cover has a set of
points, which if covers.
How do we encode this into our algorithm?
We do this by introducing a program specific
rule, known as cardinality.
But how is it done?
The cardinality of a set, X, returns the
number of elements in a set.
Formally, we define cardinality as: card(X)
From this we can determine what elements are
in a particular subset cover in terms of its
position relative to the points in p.
Therefore, the elements in a subset Ci, where
1 ≤ i ≤ q, are denoted by Cij, where 1 ≤ j ≤
card(Ci).
Cardinality
Now that we can easily determine the elements
within each subset cover, we can now proceed
with the algorithm.
We check each position in q and if it is turned
on, we simply see what points this subset
covers.
For each point that it covers, we set the
corresponding position in p to on.
Once all positions in p have been turned on,
then we have a solution to the problem.
Checking each point
for i = 1 to q
Separate +(No, i) and –(No, i)
for j = 1 to card(Ci)
Set(+(No, i), q + cij)
No ← merge((No, i), -(No, i))
for i = q + 1 to q + p
No ← +(No, i)
The program
Loop through all of the positions from 1 to q
for i = 1 to q
Now, separate all of the on and off positions.
Separate +(No, i) and –(No, i)
loop through all of the elements that the subset
covers.
for j = 1 to card(Ci)
Set the appropriate position that that element covers
in p.
Set(+(No, i), q + cij)
Now, merge both of the solutions back together.
No ← merge(+(No, i), -(No, i))
Unraveling it all
Now we simply loop through all of the positions
in p.
for i = q + 1 to q + p
separate all strands that have position i on.
No ← +(No, i)
This last section of the code ensures that we
isolate all of the possible solutions by
selecting all of the strands where all positions
in p are turned on (i.e. covered by the selected
subsets).
Unraveling it all
So now that we have all of the potential solutions
in one test tube, we still have to determine the
final solution.
Note that the Minimal Set Cover problem finds the
smallest number of subsets that covers the entire
set.
In our test tube, we have all of the solutions that
cover the set, and one of these will have the
smallest amount of subsets.
We therefore have to write a program to determine
this.
Output of the solution
for i = 0 to q – 1
for j = i down to 0
separate +(Nj, i + 1) and –(Nj, i + 1)
Nj+1 ← merge(+(Nj, i + 1), Nj+1)
Nj ← -(Nj, i + 1)
read(N1);
else if it was empty, read(N2);
else if it was empty, read(N3);
…
Finding the solution
The program takes each test tube and separates
them based on number of positions in q turned
on.
Thus for example, all memory strands with 1
position in q turned on are separated into one
test tube, all memory strands with 2 positions
in q turned on are separated into one test
tube, etc.
Once this is done, we simply read each tube
starting with the smallest number of subsets
turned on to find a solution to our problem (of
which there may be many).
Finding the solution
The operations outlined above can be used to program
more practical solutions to other programs.
One such area is in cryptography, where it is
postulated that a DNA system such as the one
outlined is capable of breaking the common DES (Data
Encryption Standard) used in many cryptosystem.
Using a (579, 56) library, with 20 oligonucleotide
length memory strands, and an overall memory strand
of 11, 580 nucleotides, it is estimated that one
could break the DES with about 4 months of
laboratory work.
Final considerations
More serious
There is a solid theoretical foundation for splicing
as an operation on formal languages.
In biochemical terms, procedures based on splicing
may have some advantages, since the DNA is used
mostly in its double stranded form, and thus many
problems of unintentional annealing may be avoided.
The basic model is a single tube, containing an
initial population of dsDNA, several restriction
enzymes, and a ligase. Mathematically this is
represented as a set of strings (the initial
language), a set of cutting operations, and a set of
pasting operations.
It has been proved to a Universal Turing Machine.
Tom Head – splicing systems
These are the techniques that are common in the microbiologist's lab and can be used to program a molecular computer. DNA can be:
synthezise desired strands can be created separate strands can be sorted and separated by length merge by pouring two test tubes of DNA into one to
perform union extract extract those strands containing a given
pattern melt/anneal breaking/bonding two ssDNA molecules with
complementary sequences amplify use of PCR to make copies of DNA strands cut cut DNA with restriction enzymes rejoin rejoin DNA strands with 'sticky ends' detect confirm presence or absence of DNA
Tom Head – splicing systems
Initial set (finite or infinite) consists of
double-stranded DNA molecules
Specific classes of enzymatic activities
considered-those of restriction enzymes
Recombinant behavior modeled and associated sets
analyzed by new formalism called Splicing Systems
Attention focused on effect of sets of restriction
enzymes and a ligase that allow DNA molecules to be
cleaved and Re-associated to produce further
molecules.
Tom Head – splicing systems
Circular DNA and Splicing Systems
DNA molecules exist not only in
linear forms but also in circular
forms.
Splicing systems
SPLICING
LINEAR
CIRCULAR
Splicing systems
Linear splicing
…ATTGACCC…
…CAATCAGG…
G|A
AT|C
ligase…ATTG
ACCC…
…CAAT
CAGG…
…ATTGCAGG…
…CAATACCC…
Splicing in nature
V alphabet
r =u1 u2
u4u3
splicing rule u1, u2, u3, u4 V*
(x, y) x, y, z, w V*
x = x1u1u2x2
y = y1u3u4y2
x1, x2, y1, y2 V*
(z, w)r
r
x1u1u4y2 = z
y1u3u2x2 = w
Splicing in DNA computing
= (V, T, A, R)
L() = *(A) T*
if A, R FIN then L() REG
… with permitting context
u1 u2
u4u3
C1
C2
R
if A, R FIN then L() RE
V alphabet
T V terminal alphabetA V* set of strings
R splicing rules
C1, C2 V*
Extended H-system
tA
hAa t1hAa t1
hA
hA bs1cAtA
h1a AtA
t1
h1a
hA AtA
hA
h1abs1ct1 h1bs1cAt1
{s1}
{h1, s1}
{hA, s1}
{s1, tA}
1
2
3
4
h1a bs1ct1 hAbs1c t1
h1bs1cA tA h1bs1cAt1
1 2 3
4
h1a
hA AtA
t1
tA
3
Rotation
h1a AtAh1a AtA h1 at1h1 at1
hAa t1hAa t1hAatA
hA AtA h1 at1
: (x u1u2 y, wu3u4 z)r = u1|u2 $ u3|u4 rule
(x u1 u4 z , wu3 u2 y)
x y w z
x
wz cut
paste
y
sites
Pattern recognition
u1 u2 u3 u4
u1
u2 u3
u4
x u1 zu4 w u3 u2 y
Păun’s linear splicing operation (1996)
Circular splicing
restriction enzyme 1 restriction enzyme 2
ligase enzymes
Circular splicing
Conjugacy relation on A*
w, w A*, w ~ w w = xy, w = yx
Example abaa, baaa, aaab, aaba are conjugates
Ao = A*
o = set of all circular words
ow = [w]o , w A*
Circular languages
Circular language C Ao set of equivalence classes
A* A* o
L Cir(L) = {ow | w L} (circularization of L)
CL
C{w A*| ow C}= Lin(C)
(Full linearization of C)
(A linearization of C, i.e. Cir(L)=C )
Circular languages
FAo ={ C Ao | L A*, Cir(L) = C, L FA, FA Chomsky hierarchy}
Definition
Theorem [Head, Păun, Pixton]
C Rego Lin (C) Reg
Circular languages
Păun’s definition
(A= finite alphabet, I Ao initial language)
SCPA = (A, I, R) R A* | A* $ A* | A* rules
ohu1u2 , oku3u4 Ao r = u1 | u2 $ u3 | u4 R
u2hu1 u4ku3 ou2hu1 u4ku3
Circular splicing systems
Definition
A circular splicing language C(SCPA)
(i.e. a circular language generated
by a splicing system SCPA) is the
smallest circular language
containing I and closed under the
application of the rules in R.
Circular splicing systems
Head’s definition
SCH = (A, I, T) T A* A* A* triples
Ao (p, x, q ), (u,x,v) T
vkux ohpx vkux q
ohpxq , okuxv
q hpx
SCPI = (A, I, R)
Ao (, ; ), (, ; ) R
oh h
oh , o h
h
Pixton’s definition
R A* A* A* rules
h
Other splicing systems
(A= finite alphabet, I Ao initial language)
Characterize
FAo C(Fin, Fin)
C(Reg, Fin)
class of circular languages C=
C(SCPA) generated by SCPA with
I and R both finite sets.
Problem
Theorem [Păun96]
F{Rego, CFo, REo}
R +add. hyp. (symmetry, reflexivity, self-splicing)
Theorem [Pixton95-96]
R Fin+add. hyp. (symmetry, reflexivity)
C(F, Fin) F
F{Rego, CFo, REo}
C(F, Reg) F
C(Rego, Fin)Rego,
Problem
CSo
CFo
Rego
o((aa)*b)
o(aa)*o(an bn)
I= oaa o1, R={aa | 1 $ 1 | aa} I= oab o1, R={a | b $ b | a}
Circular finite splicing languages
Circular automata
J. Kari and L. Kari
Context-free Recombinations, words, sequences,
languages where computer science, biology and
linguistics meet, C. Martin-Vide, V. Mitrana
(Eds.). Kluwer, the Netherlands.
Finite automata for circular languages
Definition Finite automaton A, circular language K-accepted
by A, L( A )oK , all words wo such that A has a
cycle labeled by w K–Acceptance Circular/linear language accepted by
a finite automaton A, defined as L(A) oL(A),
L(A) linear language accepted by automaton A
defined in the usual way
Definition
A circular/linear language L *o is regular
if there is a finite automaton A that accepts the
circular and linear parts of L, i.e. that accepts
L * and L o
Finite automata for circular languages
The following definition is equivalent to a
definition given by Pixton:
the circular language accepted by a finite automaton
is a set of all words that label a loop containing
at least one initial and one final state.
Definition
Given a finite automaton A, the circular language
accepted by A, L(A)oP is the set of all words ow such
that A has a cycle labeled by w that contains at
least one final state.
P-acceptance
The circular languages accepted by finite automaton
by the following definition coincide with the
regular circular languages introduced by Head.
Given a finite automation A, the circular language
accepted by A, L( A )oH is the set of all words ow
such that w = u v and v u L( A )
Pixton has shown that if in addition we assume that
the family of languages is closed under repetition
(i.e., wn is in the language whenever w is)
H – acceptance and P – Acceptance are equivalent
H-acceptance
Advantages of K-acceptance
The same automaton accepts both the
linear and circular components of
the language
K-acceptance
Counting problem
T. Head, Circular Suggestions for DNA Computing, in: Pattern Formation in Biology, Vision and Dynamics, Eds. A.Carbone, M Gromov and P. Prusinkiewicz, World Scientific,Singapore , 2000, pp. 325-335.
J. Kari, A Cryptosystem Based on Propositional Logic, in: Machines, Languages and Complexity, 5th International Meeting of Young Computer Scientists, Czeckoslovakia, Nov. 14-18, 1988, Eds. J. Dassow and J.Kelemen, LNCS 381, Springer, 1989, pp.210-219.
Rani Siromoney, Bireswar Das, DNA Algorithm for Breaking a Propositional Logic Based Cryptosystem, Bulletin of the EATCS, Number 79, February 2003, pp.170-176.
Sources
Introducing CUT-DELETE-EXPAND-LIGATE (C-D-E-L) model
Combine features in Divide-Delete-Drop (D-D-D) (Leiden) and
CUT-EXPAND-LIGATE (C-E-L) (Binghamton) to form CUT-DELETE-
EXPAND-LIGATE (C-D-E-L) model
This enables us to get an aqueous solution to 3SAT which is
a counting problem and known to be in IP.
3SAT Defined as follows:
Instance: F a propositional formula of form F = C1 C2 …Cm
where Ci are clauses and i = 1, 2, …, m.
Each Ci is of the form ( li1 li2 li3) where li j , j = 1, 2,
3 are literals from the set of variables {x1 , x2 , … , xn}
Question What is the number of truth assignments that
satisfy F?
C-D-E-L model
Standard double stranded DNA cloning plasmid are
commercially available.
A plasmid is a circular molecule approximately 3 kb. It
contains a sub-segment, MCS (multiple cloning site) of
approximately 175 base pairs that can be removed using
a pair of restriction enzyme sites that flank the
segment.
The MCS contains pair-wise disjoint sites at which
restriction enzymes act such that each produces a 5’
overhang.
Data register molecule
In C-D-E-L, a segment of the plasmid used is of the form
…c1s1c1…c2s2c2…cnsnncn…
Where ci are called sites, such that no other subsequence
of plasmid matches with this sequence and si are called
stations and i=1,…,n
In D-D-D, lengths of stations are required to be the same
However in C-D-E-L, lengths of stations all different
which is fundamental in solving #3SAT
Bio-molecular operations used in C-D-E-L are similar to
the operations in C-E-L
C-D-E-L model
x1 , … , xn the variables in F,
x1 , … , xn their negations
si station associated with xi
si station associatd with si
ci site associated with station si
ci site associated with station si
vi length of station associated with xi, i=1, …, n
vn+j length of station associated with literal
xj , j=1,…, n
Choose stations in such a way that the sequence [ v1 , … , v2n ]
satisfies the property
k
vi < vk+1 , k = 1, … , 2n-1
i=1
i.e. an Super-increasing (Easy) Knapsack Sequence
From sum, sub-sequence efficiently recovered.
Design
Solution in Cn is analyzed by gel separation
If more than one solution is present, they will be of
different lengths, thus will form separate bands
By counting number of bands we count the number of
satisfying assignments.
Furthermore, from lengths of satisfying assignment ,exact
assignment is read.
This can be done since stations have lengths from easy
knapsack sequence any subsequence of an easy knapsack
sequence has different sum from the sums of other
subsequences.
Solution
C-D-E-L model
Thus solution to 3–SAT viz. finding
the number of satisfying assignments
is effectively done.
Moreover, reading the truth
assignments is a great advantage to
break the cryptosystem based on
propositional logic
Solution
Advantage over previous method of attack
In the cryptanalytic attack proposed earlier,
modifying D-D-D, it was required to execute the
DNA algorithm for each bit in the crypto-text
But in the present method proposed, using C-D-E-L
(combining features of C-C-C and C-E-L ) apply
3-SAT on P and read any satisfying assignment from
the final solution
This gives an equivalent public key, which amounts
to breaking the cryptosystem
Advantage
H-system Lipton[94-95a-95b] Formalization and
generalization of Adleman’s approach
to other NP-complete problems. Ex H-system Circular H-system Sticker system P-system
Splicing systems so far
For computational strength Turing Equivalence Expansion Finiteness & Regularity More Operator Formalization
To confirm homogeneity HPP solving & AGL
Splicing systems so far
Molecular application
Separating and fusing DNA strands Lengthening of DNA Shortening DNA Cutting DNA Multiplying DNA
Operations of DNA molecules
Denaturation separating the single strands without
breaking them weaker hydrogen than phosphodiester bonding heat DNA (85° - 90° C)
Renaturation slowly cooling down annealing of matching, separated strands
Separating and fusing DNA strands
Machinery for Nucleotide Manipulation Enzymes are proteins that catalyze chemical
reactions. Enzymes are very specific. Enzymes speed up chemical reactions extremely
efficiently (speedup: 1012) Nature has created a multitude of enzymes that are
useful in processing DNA.
Enzymes
DNA polymerase enzymes add nucleotides to a DNA molecule
Requirements single-stranded template primer, bonded to the template 3´-hydroxyl end available for extension
Note: Terminal transferase needs no primer.
Lengthening DNA
DNA nucleases are enzymes that
degrade DNA.
DNA exonucleases cleave (remove) nucleotides one
at a time from the ends of the
strands Example: Exonuclease III 3´-
nuclease degrading in 3´-5
´direction
Shortening DNA
DNA nucleases are enzymes that
degrade DNA.
DNA exonucleases cleave (remove) nucleotides one at
a time from the ends of the strands Example: Bal31 removes nucleotides
from both strands
Shortening DNA
DNA nucleases are enzymes that degrade DNA.
DNA endonucleases destroy internal phosphodiester bonds Example: S1 cuts only single strands or within single strand sections
Restriction endonucleases much more specific cut only double strands at a specific set of sites (EcoRI)
Cutting DNA
Amplification of a „small“ amount of a
specific DNA fragment, lost in a huge
amount of other pieces. „Needle in a haystack“ Solution: PCR = Polymerase Chain Reaction devised by Karl Mullis in 1985 Nobel Prize a very efficient molecular copy machine
Multiplying DNA
Start with a solution
containing the following
ingredients:
the target DNA molecule
primers (synthetic oligo-
nucleotides), complementary
to the terminal sections
polymerase, heat resistant
nucleotides
PCR - initialisation
Solution heated close to boiling
temperature.
Hydrogen bonds between the double
strands are separated into single
strand molecules.
PCR – denaturation
The solution is cooled down (to about
55° C).
Primers anneal to their complementary
borders.
PCR - priming
The solution is heated again
(to about 72° C).
Polymerase will extend the
primers, using nucleotides
available in the solution.
Two complete strands of the
target DNA molecule are
produced.
PCR - extension
2n copies after n steps
PCR – copying
Measuring the Length of DNA Molecules
DNA molecules are negatively charged.
Placed in an electric field, they will move towards
the positive electrode.
The negative charge is proportional to the length
of the DNA molecule.
The force needed to move the molecule is
proportional to its length.
A gel makes the molecules move at different speeds.
DNA molecules are invisible, and must be marked
(ethidium bromide, radioactive)
Gel electrophoresis
Gel electrophoresis
reading the exact sequence of nucleotides
comprising a given DNA molecule
based on the polymerase action of extending a primed
single stranded template nucleotide analogues chemically modified e.g., replace 3´-hydroxyl
group (3´-OH) by 3´-hydrogen atom (3´-H) dideoxynucleotides: - ddA, ddT, ddC, ddG Sanger method, dideoxy enzymatic method
Sequencing
Objective We want to sequence a single stranded molecule a.
Preparation We extend a at the 3´ end by a short (20 bp)
sequence g, which will act as the W-C complement
for the primer compl(g). l Usually, the primer is
labelled (radioactively, or marked fluorescently) This results in a molecule b´= 3´- ga.
Sequencing
5' ATTAGACGTCCGTGCAATGC 3'
3'ACGTTACG 5'
Sequencing
4 tubes are prepared Tube A, Tube T, Tube C, Tube G Each of them contains b molecules primers, compl(g) polymerase nucleotides A, T, C, and G. Tube A contains a limited amount of ddA. Tube T contains a limited amount of ddT. Tube C contains a limited amount of ddC. Tube G contains a limited amount of ddG.
Sequencing
Structure of ddTTP
Termination with ddTTP
Reaction in Tube A
The polymerase enzyme extends the primer of b´, using the nucleotides present in Tube A: ddA, A, T, C, G.
using only A, T, C, G: b´ is extended to the full duplex.
using ddA rather than A: complementing will end at the position of the ddA nucleotide.
Sequencing
Sequencing
Sequencing - stopping
Tube C GCCTGCAGATTA C CGGAC CGGACGTC
Tube G GCCTGCAGATTA CG CGG CGGACG
Sequencing -results
Tube A GCCTGCAGATTA CGGA CGGACGTCTA CGGACGTCTAA
Tube T GCCTGCAGATTA CGGACGT CGGACGTCT CGGACGTCTAAT
Sequencing – reading the results