121
some simple basics

Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Embed Size (px)

Citation preview

Page 1: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

some simple basics

Page 2: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

1 Introduction

2 Theoretical background Biochemistry/molecular biology

3 Theoretical background computer science

4 History of the field

5 Splicing systems

6 P systems

7 Hairpins

8 Detection techniques

9 Micro technology introduction

10 Microchips and fluidics

11 Self assembly

12 Regulatory networks

13 Molecular motors

14 DNA nanowires

15 Protein computers

16 DNA computing - summery

17 Presentation of essay and discussion

Course outline

Page 3: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Some basics

Page 4: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Book

Page 5: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

The contents of the following presentation are based on work discussed in Chapter 2 of

DNA Computingby G. Paun, G. Rozenberg, and A. Salomaa

Book

Page 6: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

We have seen how DNA can be used to solve

various optimization problems.

Leonard Adelman was able to use encoded DNA

to solve the Hamiltonian Path for a single-

solution 7-node graph.

The drawbacks to using DNA as a viable

computational device mainly deal with the

amount of time required to actually analyze

and determine the solution from a test tube

of DNA.

Adelman’s experiment

Page 7: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Further considerations

For Adelman’s experiment, he required the

use of words of 20 oligonucleotides to

encode the vertices and edges of the graph.

Due to the nature of DNA’s 4-base language,

this allowed for 420 different

combinations.

However longer length oligonucleotides

would be required for larger graphs.

Page 8: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Given the nature of DNA, we can easily

determine a set of rules to operate on DNA.

Defining a Rule Set allows for programming

the DNA much like programming a computer.

The rule set assume the following: DNA exists in a test tube The DNA is in single stranded form

Defining a rule set

Page 9: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Merge simply merges two test tubes of

DNA to form a single test tube.

Given test tubes N1 and N2 we can merge

the two to form a single test tube, N,

such that N consists of N1 U N2.

Formal Definition: merge(N1, N2) = N

Merge

Page 10: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Amplify takes a test tube of DNA and

duplicates it.

Given test tube N1 we duplicate it to

form test tube N, which is identical

to N1.

Formal Definition: N = duplicate(N1)

Amplify

Page 11: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Detect looks at a test tube of DNA and

returns true if it has at least a single

strand of DNA in it, false otherwise.

Given test tube N, return TRUE if it

contains at least a single strand of DNA,

else return FALSE.

Formal Definition: detect(N)

Detect

Page 12: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Separate simply separates the contents of a test

tube of DNA based on some subsequence of bases.

Given a test tube N and a word w over the

alphabet {A, C, G, T}, produce two tubes +(N, w)

and –(N, w), where +(N, w) contains all strands

in N that contains the word w and –(N, w)

contains all strands in N that doesn’t contain

the word w.

Formal Definition: N ← +(N, w) N ← -(N, w)

Separate / Extract

Page 13: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Length-Separate takes a test tube and

separates it based on the length of the

sequences

Given a test tube N and an integer n we

produce a test tube that contains all DNA

strands with length less than or equal to n.

Formal Definition: N ← (N, ≤ n)

Length - separate

Page 14: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Position-Separate takes a test tube and

separates the contents of a test tube of DNA

based on some beginning or ending sequence.

Given a test tube N1 and a word w produce the

tube N consisting of all strands in N1 that

begins/ends with the word w.

Formal Definition: N ← B(N1, w)

N ← E(N1, w)

Position - Separate

Page 15: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

From the given rules, we can now manipulate our

strands of DNA to get a desired result.

Here is an example DNA Program that looks for

DNA strands that contain the subsequence AG and

the subsequence CT:

1 input(N)

2 N ← +(N, AG)

3 N ← -(N, CT)

4 detect(N)

A simple example

Page 16: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

1. input(N)

Input a test tube N containing single stranded sequences

of DNA

2. N ← +(N, AG)

Extract all strands that contain the AG subsequence.

3. N ← -(N, CT)

Extract all strands that contain the CT subsequence. Note

that this is done to the test tube that has all AG

subsequence strands extracted, so the final result is a

test tube which contains all strands with both the

subsequence AG and CT.

4. detect(N)

Returns TRUE if the test tube has at least one strand of

DNA in it, else returns FALSE.

Explanation

Page 17: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Now that we have some simple rules at our

disposal we can easily create a simple

program to solve the Hamiltonian Path

problem for a simple 7-node graph as

outlined by Adelman.

Back to Adelman’s experiment

Page 18: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

1 input(N)

2 N ← B(N, s0)

3 N ← +(N, s6)

4 N ← +(N, ≤ 140)

5 for i = 1 to 5 do

begin

N ← +(N, si)

end

6 detect(N)

The program

Page 19: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

1 Input(N)

Input a test tube N that contains all of the

valid vertices and edges encoded in the graph.

2 N ← B(N, s0)

Separate all sequences that begin with the

starting node.

3 N ← E(N, s6)

Further separate all sequences that end with

the ending node.

Explanation

Page 20: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

5 N ← (N, ≤ 140)

Further isolate all strands that have a length of

140 nucleotides or less (as there are 7 nodes and

a 20 oligonucleotide encoding).

6 for i = 1 to 5 do begin N ← +(N, si) end

Now we separate all sequences that have the

required nodes, thus giving us our solutions(s),

if any.

7 detect(N)

See if we actually have a solution within our test

tube.

Explanation

Page 21: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

In most computational models, we define a

memory, which allows us to store information

for quick retrieval.

DNA can be encoded to serve as memory

through the use of its complementary

properties.

We can directly correlate DNA memory to

conventional bit memory in computers through

the use of the so called sticker Model.

Adding Memory

Page 22: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

We can define a single strand of DNA as

being a memory strand.

This memory strand serves as the template

from which we can encode bits into.

We then use complementary stickers to

attach to this template memory strand and

encode our bits.

The sticker model

Page 23: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

How it works

Consider the following strand of DNA:

CCCC GGGG AAAA TTTT

This strand is divided into 4 distinct

sub-strands.

Each of these sub-strands have exactly

one complementary sub-strand as follows:

GGGG CCCC TTTT AAAA

Page 24: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

How it works

As a double Helix, the DNA forms the

following complex:

CCCC GGGG AAAA TTTT

GGGG CCCC TTTT AAAA

If we were to take each sub-strand as

a bit position, we could then encode

binary bits into our memory strand.

Page 25: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Each time a sub-sequence sticker has

attached to a sub-sequence on the memory

template, we say that that bit slot is on.

If there is no sub-sequence sticker attached

to a sub-sequence on the memory template,

then we say that the bit slot is off.

How it works

Page 26: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Some memory examples

For example, if we wanted to encode the

bit sequence 1001, we would have:

CCCC GGGG AAAA TTTT

GGGG AAAA

As we can see, this is a direct coding

of 1001 into the memory template.

Page 27: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

This is a rather good encoding, however, as we

increase the size of our memory, we have to ensure

that our sub-strands have distinct complements in

order to be able to set and clear specific bits in

our memory.

We have to ensure that the bounds between

subsequences are also distinct to prevent

complementary stickers from annealing across

borders.

The Biological implications of this are rather

difficult, as annealing long strands of sub-

sequences to a DNA template is very error-prone.

Disadvantages

Page 28: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

The clear advantage is that we have a distinct

memory block that encodes bits.

The differentiation between subsequences

denoting individual bits allows a natural

border between encoding sub-strands.

Using one template strand as a memory block

also allows us to use its complement as another

memory block, thus effectively doubling our

capacity to store information.

Advantages

Page 29: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Now that we have a memory structure, we

can being to migrate our rules to work

on our memory strands.

We can add new rules that allows us to

program more into our system.

So now what?

Page 30: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Separate now deals with memory strands. It takes a

test tube of DNA memory strands and separates it

based on what is turned on or off.

Given a test tube, N, and an integer i, we separate

the tubes into +(N, i) which consists of all memory

strands for which the ith sub-strand is turned on

(e.g. a sticker is attached to the ith position on

the memory strand). The –(N, i) tube contains all

memory strands for which the ith sub-strand is turned

off.

Formal Definition: Separate +(N, i) and –(N, i)

Separate

Page 31: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Set sets a position on a memory position

(i.e. turns it on if it is off) on a

strand of DNA.

Given a test tube, N, and an integer i,

where 1 ≤i ≤ k (k is the length of the DNA

memory strand), we set the ith position to

on.

Formal Definition: set(N, i)

Set

Page 32: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Clear clears a position on a memory

position (i.e. turns it off if it is

on) on a strand of DNA.

Given a test tube, N, and an integer i,

where 1 ≤ i ≤ k (k is the length of the

DNA memory strand), we clear the ith

position to off.

Formal Definition: clear(N, i)

Clear

Page 33: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Read reads a test tube, which has an

isolated memory strand and determines

what the encoding of that strand is.

Read also reports when there is no

memory strand in the test tube.

Formal Definition: read(N)

Read

Page 34: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

To effectively use the Sticker Model, we define a library for input purposes.

The library consists of a set of strands of DNA.

Each strand of DNA in this library is divided into two sections, a initial data input section, and a storage/output section.

Defining a library

Page 35: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

The formal notation of a library is as follows:

(k, l) library (where k and l are integers, l ≤

k )

k refers to the size of the memory strand

l refers to length of the positions allowed for

input data.

The initial setup of the memory strand is such

that the first l positions are set with input

data and the last k – l positions are clear.

Library setup

Page 36: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Consider the following encoding for a library:

(3, 2) library.

From this encoding, we see that we have a

memory strand that is of size 3, and has 2

positions allowed for input data.

Thus the first 2 positions are used for input

data, and the final position is used for

storage/input.

A simple example

Page 37: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

A quick visualisation

Here is a visualisation of this library:

CCCC CCCC CCCC Encoding: 000

CCCC GGGG AAAA

GGGG CCCC Encoding: 110

CCCC GGGG AAAA

CCCC Encoding: 010

CCCC GGGG CCCC

GGGG Encoding: 100

Page 38: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

From this visualisation we see that we can

achieve an encoding of 2l different kinds

of memory complexes.

We can formally define a memory complex as

follows: w0k-l, where w is the arbitrary

binary sequence of length l, and 0

represents the off state of the following

k-l sequences on the DNA memory strand.

Memory considerations

Page 39: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Consider the following NP-complete problem:

Minimal Set Cover

Given a finite set S = {1, 2, …, p} and a

finite collection of subsets {C1, …, Cq}

of S, we wish to find the smallest subset I

of {1, 2, …, q} such that all of the

points in S are covered by this subset.

We can solve this problem by using the brute

force method of going through every single

combination of the subsets {C1, …, Cq}.

We will use our rules to implement the same

strategy using our DNA system.

An interesting example

Page 40: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

We will use a library with the following

attributes: (p+q, q) library.

This basically means that our memory stick has p+q

positions to model the p points we want to cover

and the q subsets that we have in the problem.

Q will then be our data input positions, which are

the q subsets that we have in the problem.

What we basically have is the first q positions as

are data input section, and the last p position as

our storage area.

Using DNA

Page 41: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

The algorithm is rather simple. We encode all of

the subsets that we have in our problem into the

first q positions of our DNA strand. This

represents a potential solution to our problem

Each position in our q positions represent a

single subset that is in our problem.

A position that is turned on represents

inclusion of that set in the solution.

We simply go through each of the possibilities

for the q subsets in our problem.

Using DNA

Page 42: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

The p positions represents the points that we have to

cover, one position for each point.

The algorithm simply takes each set in q and checks which

points in p it covers.

Then it sets that particular point position in p to on.

Once all of the positions in p are turned on, we know that

we have a sequence of subset covers that covers all

points.

Then all we have to do is look at all solutions and

determine which one contains the smallest amount of subset

covers.

Using DNA

Page 43: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

So far we’ve mapped each subset cover to a

position and each point to a position.

However, each subset cover has a set of

points, which if covers.

How do we encode this into our algorithm?

We do this by introducing a program specific

rule, known as cardinality.

But how is it done?

Page 44: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

The cardinality of a set, X, returns the

number of elements in a set.

Formally, we define cardinality as: card(X)

From this we can determine what elements are

in a particular subset cover in terms of its

position relative to the points in p.

Therefore, the elements in a subset Ci, where

1 ≤ i ≤ q, are denoted by Cij, where 1 ≤ j ≤

card(Ci).

Cardinality

Page 45: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Now that we can easily determine the elements

within each subset cover, we can now proceed

with the algorithm.

We check each position in q and if it is turned

on, we simply see what points this subset

covers.

For each point that it covers, we set the

corresponding position in p to on.

Once all positions in p have been turned on,

then we have a solution to the problem.

Checking each point

Page 46: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

for i = 1 to q

Separate +(No, i) and –(No, i)

for j = 1 to card(Ci)

Set(+(No, i), q + cij)

No ← merge((No, i), -(No, i))

for i = q + 1 to q + p

No ← +(No, i)

The program

Page 47: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Loop through all of the positions from 1 to q

for i = 1 to q

Now, separate all of the on and off positions.

Separate +(No, i) and –(No, i)

loop through all of the elements that the subset

covers.

for j = 1 to card(Ci)

Set the appropriate position that that element covers

in p.

Set(+(No, i), q + cij)

Now, merge both of the solutions back together.

No ← merge(+(No, i), -(No, i))

Unraveling it all

Page 48: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Now we simply loop through all of the positions

in p.

for i = q + 1 to q + p

separate all strands that have position i on.

No ← +(No, i)

This last section of the code ensures that we

isolate all of the possible solutions by

selecting all of the strands where all positions

in p are turned on (i.e. covered by the selected

subsets).

Unraveling it all

Page 49: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

So now that we have all of the potential solutions

in one test tube, we still have to determine the

final solution.

Note that the Minimal Set Cover problem finds the

smallest number of subsets that covers the entire

set.

In our test tube, we have all of the solutions that

cover the set, and one of these will have the

smallest amount of subsets.

We therefore have to write a program to determine

this.

Output of the solution

Page 50: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

for i = 0 to q – 1

for j = i down to 0

separate +(Nj, i + 1) and –(Nj, i + 1)

Nj+1 ← merge(+(Nj, i + 1), Nj+1)

Nj ← -(Nj, i + 1)

read(N1);

else if it was empty, read(N2);

else if it was empty, read(N3);

Finding the solution

Page 51: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

The program takes each test tube and separates

them based on number of positions in q turned

on.

Thus for example, all memory strands with 1

position in q turned on are separated into one

test tube, all memory strands with 2 positions

in q turned on are separated into one test

tube, etc.

Once this is done, we simply read each tube

starting with the smallest number of subsets

turned on to find a solution to our problem (of

which there may be many).

Finding the solution

Page 52: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

The operations outlined above can be used to program

more practical solutions to other programs.

One such area is in cryptography, where it is

postulated that a DNA system such as the one

outlined is capable of breaking the common DES (Data

Encryption Standard) used in many cryptosystem.

Using a (579, 56) library, with 20 oligonucleotide

length memory strands, and an overall memory strand

of 11, 580 nucleotides, it is estimated that one

could break the DES with about 4 months of

laboratory work.

Final considerations

Page 53: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

More serious

Page 54: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

There is a solid theoretical foundation for splicing

as an operation on formal languages.

In biochemical terms, procedures based on splicing

may have some advantages, since the DNA is used

mostly in its double stranded form, and thus many

problems of unintentional annealing may be avoided.

The basic model is a single tube, containing an

initial population of dsDNA, several restriction

enzymes, and a ligase. Mathematically this is

represented as a set of strings (the initial

language), a set of cutting operations, and a set of

pasting operations.

It has been proved to a Universal Turing Machine.

Tom Head – splicing systems

Page 55: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

These are the techniques that are common in the microbiologist's lab and can be used to program a molecular computer. DNA can be:

synthezise desired strands can be created separate strands can be sorted and separated by length merge by pouring two test tubes of DNA into one to

perform union extract extract those strands containing a given

pattern melt/anneal breaking/bonding two ssDNA molecules with

complementary sequences amplify use of PCR to make copies of DNA strands cut cut DNA with restriction enzymes rejoin rejoin DNA strands with 'sticky ends' detect confirm presence or absence of DNA

Tom Head – splicing systems

Page 56: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Initial set (finite or infinite) consists of

double-stranded DNA molecules

Specific classes of enzymatic activities

considered-those of restriction enzymes

Recombinant behavior modeled and associated sets

analyzed by new formalism called Splicing Systems

Attention focused on effect of sets of restriction

enzymes and a ligase that allow DNA molecules to be

cleaved and Re-associated to produce further

molecules.

Tom Head – splicing systems

Page 57: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Circular DNA and Splicing Systems

DNA molecules exist not only in

linear forms but also in circular

forms.

 

Splicing systems

Page 58: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

SPLICING

LINEAR

CIRCULAR

Splicing systems

Page 59: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Linear splicing

Page 60: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

…ATTGACCC…

…CAATCAGG…

G|A

AT|C

ligase…ATTG

ACCC…

…CAAT

CAGG…

…ATTGCAGG…

…CAATACCC…

Splicing in nature

Page 61: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

V alphabet

r =u1 u2

u4u3

splicing rule u1, u2, u3, u4 V*

(x, y) x, y, z, w V*

x = x1u1u2x2

y = y1u3u4y2

x1, x2, y1, y2 V*

(z, w)r

r

x1u1u4y2 = z

y1u3u2x2 = w

Splicing in DNA computing

Page 62: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

= (V, T, A, R)

L() = *(A) T*

if A, R FIN then L() REG

… with permitting context

u1 u2

u4u3

C1

C2

R

if A, R FIN then L() RE

V alphabet

T V terminal alphabetA V* set of strings

R splicing rules

C1, C2 V*

Extended H-system

Page 63: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

tA

hAa t1hAa t1

hA

hA bs1cAtA

h1a AtA

t1

h1a

hA AtA

hA

h1abs1ct1 h1bs1cAt1

{s1}

{h1, s1}

{hA, s1}

{s1, tA}

1

2

3

4

h1a bs1ct1 hAbs1c t1

h1bs1cA tA h1bs1cAt1

1 2 3

4

h1a

hA AtA

t1

tA

3

Rotation

h1a AtAh1a AtA h1 at1h1 at1

hAa t1hAa t1hAatA

hA AtA h1 at1

Page 64: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

: (x u1u2 y, wu3u4 z)r = u1|u2 $ u3|u4 rule

(x u1 u4 z , wu3 u2 y)

x y w z

x

wz cut

paste

y

sites

Pattern recognition

u1 u2 u3 u4

u1

u2 u3

u4

x u1 zu4 w u3 u2 y

Păun’s linear splicing operation (1996)

Page 65: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Circular splicing

Page 66: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

restriction enzyme 1 restriction enzyme 2

ligase enzymes

Circular splicing

Page 67: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Conjugacy relation on A*

w, w A*, w ~ w w = xy, w = yx

Example abaa, baaa, aaab, aaba are conjugates

Ao = A*

o = set of all circular words

ow = [w]o , w A*

Circular languages

Page 68: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Circular language C Ao set of equivalence classes

A* A* o

L Cir(L) = {ow | w L} (circularization of L)

CL

C{w A*| ow C}= Lin(C)

(Full linearization of C)

(A linearization of C, i.e. Cir(L)=C )

Circular languages

Page 69: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

FAo ={ C Ao | L A*, Cir(L) = C, L FA, FA Chomsky hierarchy}

Definition

Theorem [Head, Păun, Pixton]

C Rego Lin (C) Reg

Circular languages

Page 70: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Păun’s definition

(A= finite alphabet, I Ao initial language)

SCPA = (A, I, R) R A* | A* $ A* | A* rules

ohu1u2 , oku3u4 Ao r = u1 | u2 $ u3 | u4 R

u2hu1 u4ku3 ou2hu1 u4ku3

Circular splicing systems

Page 71: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Definition

A circular splicing language C(SCPA)

(i.e. a circular language generated

by a splicing system SCPA) is the

smallest circular language

containing I and closed under the

application of the rules in R.

Circular splicing systems

Page 72: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Head’s definition

SCH = (A, I, T) T A* A* A* triples

Ao (p, x, q ), (u,x,v) T

vkux ohpx vkux q

ohpxq , okuxv

q hpx

SCPI = (A, I, R)

Ao (, ; ), (, ; ) R

oh h

oh , o h

h

Pixton’s definition

R A* A* A* rules

h

Other splicing systems

(A= finite alphabet, I Ao initial language)

Page 73: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Characterize

FAo C(Fin, Fin)

C(Reg, Fin)

class of circular languages C=

C(SCPA) generated by SCPA with

I and R both finite sets.

Problem

Page 74: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Theorem [Păun96]

F{Rego, CFo, REo}

R +add. hyp. (symmetry, reflexivity, self-splicing)

Theorem [Pixton95-96]

R Fin+add. hyp. (symmetry, reflexivity)

C(F, Fin) F

F{Rego, CFo, REo}

C(F, Reg) F

C(Rego, Fin)Rego,

Problem

Page 75: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

CSo

CFo

Rego

o((aa)*b)

o(aa)*o(an bn)

I= oaa o1, R={aa | 1 $ 1 | aa} I= oab o1, R={a | b $ b | a}

Circular finite splicing languages

Page 76: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Circular automata

Page 77: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

J. Kari and L. Kari

Context-free Recombinations, words, sequences,

languages where computer science, biology and

linguistics meet, C. Martin-Vide, V. Mitrana

(Eds.). Kluwer, the Netherlands.

Finite automata for circular languages

Page 78: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Definition Finite automaton A, circular language K-accepted

by A, L( A )oK , all words wo such that A has a

cycle labeled by w K–Acceptance Circular/linear language accepted by

a finite automaton A, defined as L(A) oL(A),

L(A) linear language accepted by automaton A

defined in the usual way

Definition

A circular/linear language L *o is regular

if there is a finite automaton A that accepts the

circular and linear parts of L, i.e. that accepts

L * and L o

Finite automata for circular languages

Page 79: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

The following definition is equivalent to a

definition given by Pixton:

the circular language accepted by a finite automaton

is a set of all words that label a loop containing

at least one initial and one final state.

Definition

Given a finite automaton A, the circular language

accepted by A, L(A)oP is the set of all words ow such

that A has a cycle labeled by w that contains at

least one final state.

P-acceptance

Page 80: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

The circular languages accepted by finite automaton

by the following definition coincide with the

regular circular languages introduced by Head.

Given a finite automation A, the circular language

accepted by A, L( A )oH is the set of all words ow

such that w = u v and v u L( A )

Pixton has shown that if in addition we assume that

the family of languages is closed under repetition

(i.e., wn is in the language whenever w is)

H – acceptance and P – Acceptance are equivalent

H-acceptance

Page 81: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Advantages of K-acceptance

The same automaton accepts both the

linear and circular components of

the language

K-acceptance

Page 82: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Counting problem

Page 83: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

T. Head, Circular Suggestions for DNA Computing, in: Pattern Formation in Biology, Vision and Dynamics, Eds. A.Carbone, M Gromov and P. Prusinkiewicz, World Scientific,Singapore , 2000, pp. 325-335.

J. Kari, A Cryptosystem Based on Propositional Logic, in: Machines, Languages and Complexity, 5th International Meeting of Young Computer Scientists, Czeckoslovakia, Nov. 14-18, 1988, Eds. J. Dassow and J.Kelemen, LNCS 381, Springer, 1989, pp.210-219.

Rani Siromoney, Bireswar Das, DNA Algorithm for Breaking a Propositional Logic Based Cryptosystem, Bulletin of the EATCS, Number 79, February 2003, pp.170-176.

Sources

Page 84: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Introducing CUT-DELETE-EXPAND-LIGATE (C-D-E-L) model

Combine features in Divide-Delete-Drop (D-D-D) (Leiden) and

CUT-EXPAND-LIGATE (C-E-L) (Binghamton) to form CUT-DELETE-

EXPAND-LIGATE (C-D-E-L) model

This enables us to get an aqueous solution to 3SAT which is

a counting problem and known to be in IP.

3SAT Defined as follows:

Instance: F a propositional formula of form F = C1 C2 …Cm

where Ci are clauses and i = 1, 2, …, m.

Each Ci is of the form ( li1 li2 li3) where li j , j = 1, 2,

3 are literals from the set of variables {x1 , x2 , … , xn}

Question What is the number of truth assignments that

satisfy F?

C-D-E-L model

Page 85: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Standard double stranded DNA cloning plasmid are

commercially available.

A plasmid is a circular molecule approximately 3 kb. It

contains a sub-segment, MCS (multiple cloning site) of

approximately 175 base pairs that can be removed using

a pair of restriction enzyme sites that flank the

segment.

The MCS contains pair-wise disjoint sites at which

restriction enzymes act such that each produces a 5’

overhang.

Data register molecule

Page 86: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

In C-D-E-L, a segment of the plasmid used is of the form

…c1s1c1…c2s2c2…cnsnncn…

Where ci are called sites, such that no other subsequence

of plasmid matches with this sequence and si are called

stations and i=1,…,n

In D-D-D, lengths of stations are required to be the same

However in C-D-E-L, lengths of stations all different

which is fundamental in solving #3SAT

Bio-molecular operations used in C-D-E-L are similar to

the operations in C-E-L

C-D-E-L model

Page 87: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

x1 , … , xn the variables in F,

x1 , … , xn their negations

si station associated with xi

si station associatd with si

ci site associated with station si

ci site associated with station si

vi length of station associated with xi, i=1, …, n

vn+j length of station associated with literal

xj , j=1,…, n

Choose stations in such a way that the sequence [ v1 , … , v2n ]

satisfies the property

k

vi < vk+1 , k = 1, … , 2n-1

i=1

i.e. an Super-increasing (Easy) Knapsack Sequence

From sum, sub-sequence efficiently recovered.

Design

Page 88: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Solution in Cn is analyzed by gel separation

If more than one solution is present, they will be of

different lengths, thus will form separate bands

By counting number of bands we count the number of

satisfying assignments.

Furthermore, from lengths of satisfying assignment ,exact

assignment is read.

This can be done since stations have lengths from easy

knapsack sequence any subsequence of an easy knapsack

sequence has different sum from the sums of other

subsequences.

Solution

Page 89: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

C-D-E-L model

Page 90: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field
Page 91: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Thus solution to 3–SAT viz. finding

the number of satisfying assignments

is effectively done.

Moreover, reading the truth

assignments is a great advantage to

break the cryptosystem based on

propositional logic

Solution

Page 92: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Advantage over previous method of attack

In the cryptanalytic attack proposed earlier,

modifying D-D-D, it was required to execute the

DNA algorithm for each bit in the crypto-text

But in the present method proposed, using C-D-E-L

(combining features of C-C-C and C-E-L ) apply

3-SAT on P and read any satisfying assignment from

the final solution

This gives an equivalent public key, which amounts

to breaking the cryptosystem

Advantage

Page 93: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

H-system Lipton[94-95a-95b] Formalization and

generalization of Adleman’s approach

to other NP-complete problems. Ex H-system Circular H-system Sticker system P-system

Splicing systems so far

Page 94: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

For computational strength Turing Equivalence Expansion Finiteness & Regularity More Operator Formalization

To confirm homogeneity HPP solving & AGL

Splicing systems so far

Page 95: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Molecular application

Page 96: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Separating and fusing DNA strands Lengthening of DNA Shortening DNA Cutting DNA Multiplying DNA

Operations of DNA molecules

Page 97: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Denaturation separating the single strands without

breaking them weaker hydrogen than phosphodiester bonding heat DNA (85° - 90° C)

Renaturation slowly cooling down annealing of matching, separated strands

Separating and fusing DNA strands

Page 98: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Machinery for Nucleotide Manipulation Enzymes are proteins that catalyze chemical

reactions. Enzymes are very specific. Enzymes speed up chemical reactions extremely

efficiently (speedup: 1012) Nature has created a multitude of enzymes that are

useful in processing DNA.

Enzymes

Page 99: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

DNA polymerase enzymes add nucleotides to a DNA molecule

Requirements single-stranded template primer, bonded to the template 3´-hydroxyl end available for extension

Note: Terminal transferase needs no primer.

Lengthening DNA

Page 100: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

DNA nucleases are enzymes that

degrade DNA.

DNA exonucleases cleave (remove) nucleotides one

at a time from the ends of the

strands Example: Exonuclease III 3´-

nuclease degrading in 3´-5

´direction

Shortening DNA

Page 101: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

DNA nucleases are enzymes that

degrade DNA.

DNA exonucleases cleave (remove) nucleotides one at

a time from the ends of the strands Example: Bal31 removes nucleotides

from both strands

Shortening DNA

Page 102: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

DNA nucleases are enzymes that degrade DNA.

DNA endonucleases destroy internal phosphodiester bonds Example: S1 cuts only single strands or within single strand sections

Restriction endonucleases much more specific cut only double strands at a specific set of sites (EcoRI)

Cutting DNA

Page 103: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Amplification of a „small“ amount of a

specific DNA fragment, lost in a huge

amount of other pieces. „Needle in a haystack“ Solution: PCR = Polymerase Chain Reaction devised by Karl Mullis in 1985 Nobel Prize a very efficient molecular copy machine

Multiplying DNA

Page 104: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Start with a solution

containing the following

ingredients:

the target DNA molecule

primers (synthetic oligo-

nucleotides), complementary

to the terminal sections

polymerase, heat resistant

nucleotides

PCR - initialisation

Page 105: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Solution heated close to boiling

temperature.

Hydrogen bonds between the double

strands are separated into single

strand molecules.

PCR – denaturation

Page 106: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

The solution is cooled down (to about

55° C).

Primers anneal to their complementary

borders.

PCR - priming

Page 107: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

The solution is heated again

(to about 72° C).

Polymerase will extend the

primers, using nucleotides

available in the solution.

Two complete strands of the

target DNA molecule are

produced.

PCR - extension

Page 108: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

2n copies after n steps

PCR – copying

Page 109: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Measuring the Length of DNA Molecules

DNA molecules are negatively charged.

Placed in an electric field, they will move towards

the positive electrode.

The negative charge is proportional to the length

of the DNA molecule.

The force needed to move the molecule is

proportional to its length.

A gel makes the molecules move at different speeds.

DNA molecules are invisible, and must be marked

(ethidium bromide, radioactive)

Gel electrophoresis

Page 110: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Gel electrophoresis

Page 111: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

reading the exact sequence of nucleotides

comprising a given DNA molecule

based on the polymerase action of extending a primed

single stranded template nucleotide analogues chemically modified e.g., replace 3´-hydroxyl

group (3´-OH) by 3´-hydrogen atom (3´-H) dideoxynucleotides: - ddA, ddT, ddC, ddG Sanger method, dideoxy enzymatic method

Sequencing

Page 112: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Objective We want to sequence a single stranded molecule a.

Preparation We extend a at the 3´ end by a short (20 bp)

sequence g, which will act as the W-C complement

for the primer compl(g). l Usually, the primer is

labelled (radioactively, or marked fluorescently) This results in a molecule b´= 3´- ga.

Sequencing

5' ATTAGACGTCCGTGCAATGC 3'

3'ACGTTACG 5'

Page 113: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Sequencing

Page 114: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

4 tubes are prepared Tube A, Tube T, Tube C, Tube G Each of them contains b molecules primers, compl(g) polymerase nucleotides A, T, C, and G. Tube A contains a limited amount of ddA. Tube T contains a limited amount of ddT. Tube C contains a limited amount of ddC. Tube G contains a limited amount of ddG.

Sequencing

Page 115: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Structure of ddTTP

Page 116: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Termination with ddTTP

Page 117: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Reaction in Tube A

The polymerase enzyme extends the primer of b´, using the nucleotides present in Tube A: ddA, A, T, C, G.

using only A, T, C, G: b´ is extended to the full duplex.

using ddA rather than A: complementing will end at the position of the ddA nucleotide.

Sequencing

Page 118: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Sequencing

Page 119: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Sequencing - stopping

Page 120: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Tube C GCCTGCAGATTA C CGGAC CGGACGTC

Tube G GCCTGCAGATTA CG CGG CGGACG

Sequencing -results

Tube A GCCTGCAGATTA CGGA CGGACGTCTA CGGACGTCTAA

Tube T GCCTGCAGATTA CGGACGT CGGACGTCT CGGACGTCTAAT

Page 121: Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field

Sequencing – reading the results